Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Integration of digital twin and generative models in model-based systems upgrade methodology
(USC Thesis Other)
Integration of digital twin and generative models in model-based systems upgrade methodology
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Integration of Digital Twin and Generative Models in
Model-Based Systems Upgrade Methodology
By
Shatad Purohit
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfilment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ASTRONAUTICAL ENGINEERING)
August 2024
Copyright 2024 Shatad Purohit
ii
Acknowledgments
As I conclude my PhD journey, I am filled with immense gratitude. Coming to the United
States and joining USC opened a world of opportunities that I could not have imagined. This
transformative experience has shaped me profoundly, both as a researcher and an individual.
First and foremost, I extend my deepest gratitude to my advisor and mentor, Prof. Azad M.
Madni. His exceptional expertise and mentorship were crucial in shaping my research. Prof.
Madni's support went far beyond the typical advisory role - from advocating for my PhD admission
to offering invaluable backing during the challenging times I faced. His unwavering dedication to
my academic growth has been truly transformative, and I am profoundly grateful for his
extraordinary guidance and support throughout this journey.
I extend my sincere gratitude to Prof. Daniel Erwin and Prof. James Moore, esteemed
members of my dissertation committee. Their insightful guidance and expertise were instrumental
in shaping my research. I am truly thankful for their significant contributions to my academic
journey at USC.
I am grateful for the invaluable guidance of Dr. Michael Sievers and Gen. Ellen
Pawlikowski throughout my doctoral studies. Additionally, I extend my appreciation to Dr. Ayesha
Madni, Dr. Edwin Ordoukhanian, Dr. Parisa Pouya, Kenneth Cureton, Dell Cuason, Linda Ly, Luis
Saballos, and Marlyn Lat for their consistent support during my time at USC.
My deepest gratitude goes to my family, whose unwavering support has been the
cornerstone of this journey. To my parents, Sneha and Kolin Purohit, and my brother Shanal: thank
you for fostering my curiosity and providing steadfast emotional and practical support, especially
iii
during challenging times. I dedicate a special tribute to my late grandfather, Prof. Kanhaiyalal
Hemraj Purohit, whose wisdom continues to inspire me. His profound belief in education and
constant encouragement were a driving force throughout my academic pursuits.
To my better half, Poonam, your endless patience, understanding, and love, especially
during the most demanding phases of the work, have been my anchor. Your unwavering belief in
me has been a constant source of motivation.
iv
Table of Contents
Acknowledgments........................................................................................................................... ii
List of Figures.............................................................................................................................. viii
Abbreviations................................................................................................................................ iix
Abstract.......................................................................................................................................... xi
Chapter 1: Introduction................................................................................................................... 1
1.1 The Systems Upgrade Problem............................................................................................. 1
1.2 Ad Hoc and Unstructured Nature of Current Processes........................................................ 2
1.2.1 Time-Consuming and Resource-Intensive Procedures................................................... 2
1.2.2 Difficulties in Assessing Cross-Domain Impacts........................................................... 2
1.2.3 Challenges in Testing the Correctness of Upgrades ....................................................... 3
1.3 Research Objectives.............................................................................................................. 4
1.4 Research Scope Definition.................................................................................................... 7
Chapter 2: Literature Review.......................................................................................................... 9
2.1 Overview ............................................................................................................................... 9
2.2 Systems Upgrades............................................................................................................... 10
2.2.1 Enhancing Functionality and Performance................................................................... 10
2.2.2 Reducing Operational Costs..........................................................................................11
2.2.3 Gaps in System Upgrades............................................................................................. 12
2.3 MBSE.................................................................................................................................. 14
2.3.1 State of the Art.............................................................................................................. 14
2.3.2. MBSE Capabilities...................................................................................................... 15
2.3.3 Potential MBSE Benefits.............................................................................................. 16
2.3.4 Models Used in Systems Engineering.......................................................................... 19
2.3.5 Roles of Models in MBSE............................................................................................ 25
2.3.6 MBSE Methodologies .................................................................................................. 31
2.4 Digital Twin Technology..................................................................................................... 33
2.4.1 Digital Twin Mathematical Models.............................................................................. 36
v
2.4.2 Data-Driven Model Update Methods........................................................................... 38
2.5 Generative Models .............................................................................................................. 38
2.5.1 Agent-Based AI............................................................................................................. 39
2.5.2 Tool Use........................................................................................................................ 40
2.5.3 Planning and Collaboration .......................................................................................... 41
2.6 Summary of Existing Gaps ................................................................................................. 42
Chapter 3: Methodology ............................................................................................................... 44
3.1 Introduction ......................................................................................................................... 44
3.2 Integrated MBSE Methodology .......................................................................................... 45
3.2.1 Integration of MBSE, Digital Twin, and Generative Models....................................... 46
3.2.2 Methodology Overview................................................................................................ 48
3.3 Knowledge Acquisition: Active Information Gathering ..................................................... 52
3.3.1 Overview ...................................................................................................................... 52
3.3.2 Problems Addressed ..................................................................................................... 53
3.3.3 Multi-Modal Generative AI Agents for Rapid Data Extraction ................................... 54
3.3.4 Automatic Conversion of Gathered Information into Formal Logic............................ 58
3.3.5 Unique Contributions and Detailed Approach.............................................................. 64
3.4 Formal Knowledge Representation..................................................................................... 66
3.4.1 Overview ...................................................................................................................... 66
3.4.2 Problem Addressed....................................................................................................... 67
3.4.3 Layered Ontology Approach ........................................................................................ 69
3.4.4 Advantages of the Layered Ontology Approach........................................................... 73
3.4.5 Formal Logic and Ontologies for Precise System Description .................................... 74
3.4.6 Generative AI Agents for Consistency Checking......................................................... 74
3.5 Analysis: Automated Reasoning and Human Interaction.................................................... 77
3.5.1 Overview ...................................................................................................................... 77
3.5.2. Problem Addressed...................................................................................................... 77
3.5.3 Ontology-Based Model with Temporal Embeddings and Probabilistic Reasoning ..... 78
3.5.4 Reasoning Agents: The Intelligent Core....................................................................... 82
3.5.5 Graph Model for Traceability....................................................................................... 84
3.6 Action: Agent-Based System with Planning and Coordination .......................................... 85
3.6.1 Overview ...................................................................................................................... 85
3.6.2 Problem Addressed....................................................................................................... 85
3.6.3 Multi-Modal Actuation Agents..................................................................................... 86
vi
3.6.4 Interacting with the Physical Twin ............................................................................... 87
3.6.5 Planning and Coordination Agents............................................................................... 88
3.6.6 Utilizing Multi-Agent Collaboration............................................................................ 90
3.7 Summary ............................................................................................................................. 91
Chapter 4: Testbed Implementation: Illustrative Example and Results........................................ 93
4.1 Overview ............................................................................................................................. 93
4.2 Purpose of the Testbed ........................................................................................................ 94
4.3 Overview of the Upgrade Scenario ..................................................................................... 95
4.4 Initial System Configuration: Dual UAV Navigation ......................................................... 98
4.4.1 Description of Physical System Components............................................................... 98
4.4.2 Software and Infrastructure Components..................................................................... 99
4.4.3 Operational Capabilities............................................................................................. 100
4.5 Upgraded System: Multi-UAV Autonomous Search......................................................... 100
4.5.1 Physical System Enhancements.................................................................................. 101
4.5.2 Software and Infrastructure Upgrades........................................................................ 102
4.5.3 Digital Twin and AI Integration.................................................................................. 102
4.5.4 New Operational Capabilities..................................................................................... 103
4.6 Upgrade Process Implementation...................................................................................... 104
4.6.1 Knowledge Acquisition and Ontology Development................................................. 104
4.6.2 Cross-Domain Integration .......................................................................................... 105
4.6.3 Digital Twin Modification and Validation .................................................................. 106
4.6.4 Generative Model Integration..................................................................................... 106
4.7. Experimental Results and Performance Comparison....................................................... 109
4.7.1 Operational Improvements......................................................................................... 109
4.7.2 System Integration and Efficiency...............................................................................110
4.7.3 Development and Testing Enhancements....................................................................114
4.7.4 Evaluation of Upgrade Process Efficiency ..................................................................115
4.7.5 Comparison with Traditional MBSE Approaches.......................................................116
Chapter 5: Conclusions................................................................................................................117
5.1 Methodology Overview .....................................................................................................117
5.2 Summary of Research Contributions.................................................................................119
5.2.1 Development of Automated Digital Twin Creation Method .......................................119
5.2.2 Enhanced Formal Analysis Techniques.......................................................................119
vii
5.2.3 Implementation of Bidirectional Action Execution.................................................... 120
5.2.4 Integration of Generative AI Agents........................................................................... 120
5.3 Key Findings and Accomplishments................................................................................. 121
5.4 Limitations and Opportunities for Future Research.......................................................... 122
Bibliography ............................................................................................................................... 125
Appendix A ................................................................................................................................. 140
viii
List of Figures
Figure 1: Thesis Structure and Chapter Interconnections............................................................... 8
Figure 2: Relationship between key terms used in MBSE ........................................................... 26
Figure 3: Application of digital twin across MBSE life cycle with associated investments and
gains [74] ...................................................................................................................................... 30
Figure 4: Overview of the Digital Twin System Model with Generative AI Agents.................... 47
Figure 5: Integrated Framework for System Upgrades Using Digital Twin and Generative AI
Agents........................................................................................................................................... 50
Figure 6: Conversion of a System Requirement into a Formal Logic and Graph Model............. 59
Figure 7: Integrated Representation of Control Model, Logical Statements, and Ontology Graph
....................................................................................................................................................... 62
Figure 8: Integrated Representation of Extended Model.............................................................. 63
Figure 9: Linking multi-domain knowledge ................................................................................. 69
Figure 10: Temporal embeddings on graph .................................................................................. 81
Figure 11: Transition from Fielded System to Upgraded System................................................. 96
Figure 12: Implementation of the Upgraded System with Graph Databases, Generative AI
Agents, UAVs, and Localization System.................................................................................... 109
Figure 13: Incident Comparison: Before and After Upgrade with Specific Data........................110
Figure 14: UAV moving in front of the Table Fan.......................................................................112
Figure 15: UAV operating in a Virtual Environment in the same region.....................................113
Figure 16: Tagged region highlighted in red in the virtual lab environment ...............................114
Figure 17: Methodology for Ontology Development from System Life Cycle Data ................. 140
Figure 18: Partitioning Unstructured Data.................................................................................. 145
Figure 19: Identification of stakeholders relevant to model development ................................. 148
Figure 20: Generating Decision Support Questions for Scoping the System Model ................. 150
Figure 21: Identifying key concepts for ontology development................................................. 152
Figure 22: Defining key concepts............................................................................................... 155
Figure 23: Data source identification for populating ontology................................................... 158
Figure 24: Building N2 Matrix for Ontology ............................................................................. 161
Figure 25: N2 Matrix Representation of Ontology..................................................................... 165
Figure 26: Graphical Representation of Ontology...................................................................... 166
Figure 27: Notional Representation of Ontology and System Model......................................... 172
Figure 28: Merging ontologies using N2 Matrix Algorithms..................................................... 181
ix
Abbreviations
AI Artificial Intelligence
API Application Programming Interface
CAD Computer-Aided Design
COTS Commercial Off-The-Shelf
DFD Data Flow Diagram
DL Description Logic
DSM Dependency Structure Matrix
DSQ Decision Support Question
EDA Electronic Design Automation
EIS Electronic Image Stabilization
FFBD Functional Flow Block Diagram
FOV Field of View
GPU Graphics Processing Unit
ICAM Integrated Computer-Aided Manufacturing
IDEF ICAM Definition
IMU Inertial Measurement Unit
INCOSE International Council on Systems Engineering
IoT Internet of Things
JPL Jet Propulsion Laboratory
LLM Large Language Model
M&S Modeling & Simulation
MBSE Model-Based Systems Engineering
MDSD Model-Driven Systems Development
OMG Object Management Group
OPM Object-Process Methodology
x
OWL Web Ontology Language
PLM Product Lifecycle Management
RL Reinforcement Learning
ROI Return on Investment
RUP SE Rational Unified Process for Systems Engineering
SA State Analysis
SADT Structured Analysis and Design Technique
SDK Software Development Kit
SE Systems Engineering
SoS System of Systems
SWFTS Submarine Warfare Federated Tactical System
SWRL Semantic Web Rule Language
SysML Systems Modeling Language
TLA TimeLine Analysis
UAV Unmanned Aerial Vehicle
UML Unified Modeling Language
USAF United States Air Force
V&V Verification and Validation
xi
Abstract
Fielded aerospace and defense systems often require unplanned system upgrades to cope
with unforeseen circumstances, such as rapid technological advancements and evolving
operational requirements. However, the current upgrade processes today tend to be ad hoc, often
resulting in lengthy upgrade cycles and non-rigorous testing.
This research developed a systematic approach to accelerate the upgrade process for fielded
systems and ensure the correctness of upgrades. Grounded in a Model-Based Systems Engineering
(MBSE) framework, this approach is supported by two key pillars: generative AI and digital twin
technologies.
One key achievement was creating a unified system model incorporating data from various
sources to provide a multi-domain system representation. Aerospace systems involve diverse data
from different lifecycle phases such as System Requirement Specifications, Test Scenarios, Design
Specification Documents, Software Code, Controls Models, Maintenance Logs, and Operational
Sensor Data. This heterogeneous data, often siloed, makes it challenging to predict upgrade
outcomes and understand cross-domain impacts. By systematically integrating data from these
domains, application of MBSE framework facilitated a holistic view and cross-domain analysis,
supporting the correct implementation of upgrades.
The research developed an automated method to generate system models from lifecycle
data, which helped reduce the time and expertise required for model development. Traditionally,
creating system models for the MBSE framework is a manual and time-consuming process,
xii
requiring extensive data gathering. By leveraging generative AI technology, this research
automated these tasks, converting large, heterogeneous datasets into structured, usable formats
more quickly. The research introduced a formal reasoning and analysis method that aimed to be
robust and transparent, enhancing the ability to predict the outcomes of upgrades.
Another important achievement was designing an agent-based system capable of
establishing bidirectional connections between physical systems and digital twin system models.
Traditional upgrade processes are often time-consuming and resource-intensive, lacking the
capability to rapidly test multiple scenarios. Continuously updated models enabled advanced
analysis and simulation, allowing for more extensive testing and early issue identification in
system upgrade.
Finally, the methodology was validated through a software upgrade scenario for a
miniature multi-UAV operation in a lab environment. This demonstration addressed a high-level
operational requirement change and provided a systematic upgrade process. The MBSE
framework integrated heterogeneous data from various domains, generative AI automated system
model generation, and bidirectional communication with digital twins enabled advanced testing
and analysis. This methodology ensures the correct implementation of upgrades with less time and
reduced failures.
1
Chapter 1
Introduction
1.1 The Systems Upgrade Problem
The necessity for systems upgrades in the aerospace and defense sectors is driven by the
rapid advancement of technology and evolving operational requirements. Modern systems,
particularly in aerospace, must continuously improve to maintain competitiveness, enhance
functionality, and meet stringent regulatory standards. Upgrading systems addresses the challenges
posed by aging infrastructure, ensuring that operational capabilities are not only preserved but
enhanced. Improvements in areas such as propulsion systems, avionics, and materials technology
are critical to achieving goals like increased fuel efficiency, reduced emissions, and enhanced
performance. These targeted upgrades significantly enhance system reliability and efficiency,
thereby supporting modern missions that demand higher performance and reliability.
Furthermore, system upgrades are essential to reducing operational costs and mitigating
the risks associated with system obsolescence. As systems age, the cost of maintenance and
operations inevitably increases due to inefficiencies and the rising difficulty of sourcing parts for
outdated technologies. By integrating modern technologies and optimizing processes, upgrades
can lead to substantial cost savings and operational efficiencies [1], [2].
2
1.2 Ad Hoc and Unstructured Nature of Current Processes
Despite the critical need for systems upgrades, the current processes employed in the
aerospace sector are often ad hoc and unstructured [3], [4].
1.2.1 Time-Consuming and Resource-Intensive Procedures
Upgrading aerospace systems is inherently time-consuming and resource intensive [5].
Several factors contribute to this challenge. Firstly, aerospace systems are highly complex, with
numerous interdependent components. Upgrading one component often requires careful
consideration of its impact on others, adding to the time and effort required [6], [7]. This
interconnectedness necessitates a meticulous approach to ensure that changes in one area do not
adversely affect the overall system performance. Secondly, many aspects of the upgrade process
are still performed manually, including data collection, analysis, and validation. These manual
processes are not only time-consuming but also prone to human error, which can further delay the
upgrade timeline and compromise the quality of the outcome [8]. Lastly, upgrading aerospace
systems requires specialized skills and knowledge, which may not always be readily available.
This can lead to bottlenecks and delays as teams wait for the necessary expertise [9], [10]. The
scarcity of skilled personnel exacerbates the time and resource demands of the upgrade process,
making it even more challenging to complete upgrades efficiently and effectively.
1.2.2 Difficulties in Assessing Cross-Domain Impacts
Aerospace systems encompass multiple interconnected domains, including mechanical,
electrical, software, and more. Assessing the impact of upgrades across these domains is
challenging for several reasons. The interdependencies between different domains mean that a
3
change in one area can have far-reaching effects on others. Understanding these interdependencies
and predicting the impacts of changes requires comprehensive analysis and modeling. However,
existing tools and methodologies often lack the capability to integrate data and insights across
different domains. This siloed approach makes it difficult to perform holistic impact assessments,
as the interactions and dependencies between domains are not fully considered. Furthermore, the
interactions between different components and subsystems are complex and not always welldocumented. This adds to the difficulty of predicting how changes will propagate through the
system, potentially leading to unforeseen consequences and system disruptions. Effective
assessment of cross-domain impacts is crucial for ensuring that upgrades do not inadvertently
compromise system performance or reliability [11], [12].
1.2.3 Challenges in Testing the Correctness of Upgrades
Ensuring that upgrades are correctly implemented and function as intended is a critical
aspect of the upgrade process. However, this is fraught with challenges. Comprehensive testing of
upgraded systems is essential but difficult to achieve [13]. Aerospace systems must operate
reliably under a wide range of conditions, requiring extensive testing and validation. Accurately
simulating the real-world conditions under which the system will operate is challenging, as test
environments may not fully replicate the complexities of actual operation, leading to gaps in testing
coverage. Establishing an effective feedback loop to capture issues and refine upgrades is vital,
but capturing and analyzing feedback from operational systems in real-time can be difficult.
Additionally, testing often requires significant resources, including time, personnel, and
specialized equipment. These resource constraints can limit the extent and thoroughness of testing,
potentially compromising the reliability and effectiveness of the upgrades [14].
4
1.3 Research Objectives
The primary objective of this research was to develop a systematic approach to accelerate
the upgrade process for fielded systems and ensure the correctness of upgrades. This approach was
grounded in a Model-Based Systems Engineering framework, supported by two key pillars:
generative AI and digital twin technologies. The following sections provide the rationale for these
elements and how they contributed to the overall research objectives.
Develop an integrated framework for multi-domain knowledge representation:
Objective: Create a unified system model that incorporates data from various sources, providing
a multi-domain system representation.
Aerospace systems comprise data from diverse and interconnected domains. Information
about fielded systems could be found in various formats and from different lifecycle phases, such
as System Requirement Specifications, Test Scenarios, Design Specification Documents, Software
Code, Controls Models, Maintenance Logs, and Operational Sensor Data. Often, this data is
heterogeneous and exists in silos. When an upgrade is planned for a fielded system, it is difficult
to predict the outcome and understand its impact across domains due to the fragmented nature of
this knowledge about the fielded system. MBSE provided a framework to systematically integrate
data from these different domains [15], [16]. By consolidating diverse data sources, it was possible
to achieve a holistic view and cross-domain analysis. This approach ensured that upgrades are
implemented correctly.
Automate the creation and updating of system models:
Objective: Develop a method to automatically generate system models from lifecycle data,
reducing the time and expertise required for model development.
5
Currently, the process of developing system models for the MBSE framework involves
manual, time-consuming data gathering from diverse sources [17]. It is difficult to convert large
amounts of heterogeneous data into a structured, usable format. Generative AI technology offers
the potential to automate these time-consuming and repetitive data processing tasks involving
heterogeneous data sets [18], [19], [20]. This automation was found crucial for developing models
more quickly and supporting a model-based systems upgrade approach effectively.
Enhance formal analysis techniques for system upgrades:
Objective: Develop formal reasoning and analysis method that is robust and transparent to better
predict the outcomes of upgrades.
Ensuring the correctness of system upgrades earlier and faster is critical. Currently, there
are predictive limitations in assessing the outcomes of system upgrades, leading to potential risks
[20]. Formal knowledge representation in the form of ontologies and formal reasoning provide the
capability for robust analysis and ensure that the analysis process is transparent. Additionally,
generative models offer the capability to interface with external systems and functionalities [21],
[22]. These generative AI agents can take structured inputs and produce structured outputs, use
software tools, and have the potential to be used in combination with formal reasoning to expedite
and advance the analysis process. This approach enhanced the ability to predict system behavior,
assess the impact of upgrades, and identify potential issues early in the upgrade process.
Implement and evaluate a bidirectional digital twin framework:
Objective: Design an agent-based system capable of executing bidirectional connections between
physical systems and digital twin system models, enabling the use of digital twins for faster testing.
Existing processes are often time-consuming and resource-intensive, lacking the capability
to rapidly test multiple upgrade scenarios in a cost-effective manner. Static models have limited
6
utility once the physical system is built, as they cannot accommodate changes or reflect the current
state of the system. Continuously updated models from their physical counterparts enable
advanced analysis and simulation [23], [24], allowing for extensive testing and early identification
of issues before they become critical. This approach enhanced the efficiency of the upgrade
process. However, establishing bidirectional communication was challenging due to issues in
standardization and the need for tailored software solutions. Custom developed generative AI
agents offered the promise of acting as an interface to overcome these challenges. Overall, this
capability contributed to faster upgrades.
Validate the proposed methodology through an Illustrative Example:
Objective: Showcase the model-based systems upgrade methodology using a software upgrade
scenario for a miniature UAV swarm in a lab environment, focusing on addressing a high-level
operational requirement change.
The objective was to validate the practical application of the methodology, learn from the
implementation, and update the approach as necessary. The effectiveness of the methodology was
demonstrated using a software upgrade scenario for a miniature UAV swarm in a lab environment.
This demonstration addressed a high-level operational requirement change and provided a
systematic upgrade process. The MBSE framework integrated heterogeneous data from various
domains, generative AI agents automated system model generation, and bidirectional
communication with digital twins enabled advanced testing and analysis. This process ensured that
upgrades were correctly implemented, reducing risks and enhancing efficiency.
7
1.4 Research Scope Definition
The scope of this research focused on enhancing the upgrade process for aerospace systems
through a systematic approach. The primary problem addressed was the lengthy and cumbersome
system upgrade process. This research aimed to make incremental improvements and provide
methods to accelerate the process by developing faster testing methods and identifying issues early
in the upgrade process. Additionally, it focused on improving the ability to predict and understand
the outcomes of upgrades. While there are several other problems associated with system upgrades,
this research specifically aimed to create a systematic methodology for upgrades. The contribution
lay in developing and refining this approach, utilizing technologies like generative AI, rather than
contributing to the development of generative AI models themselves.
Instead of using generative models as black-box solutions for directly querying life cycle
data, this research took a different approach. Generative models were employed on life cycle data
to extract knowledge and develop an ontology-based multi-domain system model. This system
model was then used for reasoning. The rationale behind this approach was its transparency; the
white-box nature of the ontology-based system model allowed for explainable decisions, which is
not possible when relying solely on black-box generative models. Additionally, this approach was
chosen because system models based on formal ontology can be connected to mathematical and
physics-based models, which is not feasible with generative models alone. In the aerospace
domain, having transparent models and the ability to integrate with advanced computational or
physics models is crucial. This ensures that decision-making processes are clear, understandable,
and grounded in robust scientific principles, aligning with the specific needs and complexities of
aerospace systems. Figure 1 illustrates the logical flow and interconnections between the chapters
of this thesis.
8
Figure 1: Thesis Structure and Chapter Interconnections
9
Chapter 2
Literature Review
2.1 Overview
This chapter provides a comprehensive review of the literature relevant to the subjects
covered in this research. The primary purposes of this literature review are to understand the
current state of the art for each topic and identify the gaps where this research can contribute. As
outlined in the introductory chapter, the key subjects of interest are system upgrades, Model-Based
Systems Engineering (MBSE), digital twin technology, and generative models. Table 1 presents
the topics and sub-topics addressed in this literature review. Through this review, areas for
improvement are identified, guiding the direction of the research presented in subsequent chapters.
Table 1: Literature Review Topics
Topic Sub-Topics
System Upgrades Purposes
Gaps and Challenges
MBSE MBSE
Models Used in Systems Engineering
MBSE Methodologies
Digital Twin MBSE and Digital Twin
Digital Twin Operationalization
Digital Twin Mathematical Models
Data-Driven Model Update Methods
Generative Models Agent-Based AI
Tool Use
Planning and Collaboration
10
Upon examining the issues related to system upgrades, it became evident that the current
processes are often ad hoc, lengthy, and resource intensive. This realization prompted a deeper
investigation into MBSE. While MBSE has significant benefits in addressing knowledge silos and
facilitating early identification of issues, it is also time-consuming and requires substantial
expertise. Recognizing that new technologies are not being fully leveraged within the framework
of MBSE, this research explores the potential of integrating Digital Twin technology and
Generative AI models. These technologies offer promising opportunities to develop a systematic
and efficient methodology for system upgrades.
2.2 Systems Upgrades
2.2.1 Enhancing Functionality and Performance
System upgrades are crucial for maintaining and enhancing the operational capabilities of
various systems, especially in the military and defense sectors. The literature consistently
highlights the need to upgrade systems to enhance their functionality and performance. This need
is driven by the emergence of new technologies and evolving operational requirements,
necessitating organizations to stay competitive and meet new demands.
One primary motivation for system upgrades is to meet evolving customer requirements,
regulatory standards, and market conditions. For example, in the aerospace industry, there is a
growing emphasis on improving fuel efficiency, reducing noise, and lowering emissions.
Upgrading propulsion systems with advanced materials and innovative technologies is often
necessary to achieve these goals. A notable instance is the upgrade of the Craft X-15 propulsion
system, which incorporated advanced combustion chambers, high-performance turbopumps, and
11
regenerative cooling systems to support modern space missions requiring higher thrust efficiency
and reliability [25].
Similarly, the US Air Force's implementation of the Pacer Comet 4 (PC4) system in jet
engine test cells created a more flexible, scalable, and efficient testing environment by integrating
various subsystems into a cohesive network. This upgrade significantly improved diagnostics,
prognostics, and maintenance capabilities [26]. Upgrades in rocket propulsion systems also
involved new materials and designs that improved thrust efficiency and reliability while reducing
mission costs [27]. Furthermore, upgrading naval ship systems with advanced radar, sonar, and
communication technologies significantly enhanced operational capabilities and performance,
which is essential for maintaining combat readiness and situational awareness in modern naval
operations [28].
2.2.2 Reducing Operational Costs
The literature identifies cost reduction as a significant motivator for system upgrades. As
systems age, maintenance and operational costs increase due to obsolescence and inefficiencies.
Upgrading systems can address these issues by leveraging modern technologies and optimizing
processes, leading to substantial cost savings. Advances in materials science, manufacturing
techniques, and digital technologies can reduce energy consumption, minimize downtime, and
extend maintenance intervals. For instance, upgrading the US Army's utility helicopter fleet,
particularly the UH-60 Black Hawk, with modern technologies extended their lifespan, improved
performance, and reduced the need for costly new acquisitions by leveraging existing
infrastructure and minimizing the need for new training programs [28].
12
2.2.3 Gaps in System Upgrades
Despite the numerous benefits and advancements achieved through system upgrades, the
literature identifies several gaps and challenges that need to be addressed to optimize these
processes further. These gaps include issues related to technology integration, budget constraints,
methodological clarity, synchronization, and human resources.
One significant gap in system upgrades is the difficulty in integrating new technologies
with existing systems. This challenge is particularly evident in systems with legacy components
that were not designed to accommodate modern technologies. For example, the modernization of
avionics systems often faces hurdles due to the complexity of integrating new software and
hardware with old infrastructure [29]. This can lead to increased costs and extended project
timelines as extensive modifications and custom solutions are often required. Budget constraints
pose a significant challenge to system upgrades. Rising prices and shrinking defense budgets can
impede the ability to implement necessary upgrades [21]. This issue is compounded by the high
costs associated with advanced technologies and the need for specialized training and
infrastructure to support these upgrades [28]. Consequently, organizations must often prioritize
certain upgrades over others, potentially leaving critical systems without necessary enhancements.
2.2.3.1 Lack of Methodological Clarity and Consistency
A significant gap identified is the lack of clarity in the methodologies used for system
upgrades. Inconsistent and unclear methodologies can hinder the effectiveness of upgrades. For
instance, Swietochowski and Rewak (2019) [27] noted that a lack of standardized procedures and
clear guidelines can lead to inefficiencies and inconsistencies in the modernization of missile
forces and artillery. This lack of methodological clarity can result in miscommunication, errors,
13
and delays in the upgrade process. The synchronization of various components during system
upgrades is also a significant challenge. Coordinating the upgrade of multiple subsystems to ensure
seamless integration and functionality is complex and often fraught with issues. For example, in
the case of the US Air Force's implementation of the Pacer Comet 4 (PC4) system, integrating
various subsystems into a cohesive network to improve diagnostics and maintenance capabilities
required careful coordination and precise timing [26]. Any misalignment in this process can lead
to operational disruptions and reduced system performance. Effective system upgrades require
adequately trained personnel and sufficient human resources. Many organizations struggle to
provide the necessary training and allocate enough personnel to manage and execute upgrades
effectively. For instance, upgrading the UH-60 Black Hawk helicopters required extensive training
programs to familiarize the staff with new technologies and systems [28]. Insufficient training and
human resources can lead to improper use of upgraded systems and underutilization of new
capabilities. The F-35 Joint Strike Fighter program illustrates several of these gaps in a practical
context. The program encountered significant issues, including delays, cost growth, and
transparency challenges. For example, delays in testing the Technology Refresh 3 (TR-3) suite
were caused by software stability issues that were not identified until flight tests, leading to
compressed timeframes for resolving these issues [21]. Additionally, the program faced
transparency challenges as the cost reporting mechanisms did not fully explain the reasons for cost
growth, complicating oversight and accountability. These issues were further exacerbated by
incomplete cost estimates and the incorporation of immature technologies, which led to
underestimations and additional risks [21]. Addressing these specific gaps is essential to enhance
the effectiveness of modernization efforts in such complex programs.
14
2.3 MBSE
2.3.1 State of the Art
MBSE state-of-the-art is predominantly reflected in the MBSE tools available today. While
MBSE tools have advanced significantly, their integration with a simulation that supports
simulation-based experimentation remains costly, complex, and has limitations. Several
simulation tools (e.g., Simulink, ModelCenter, Dymola) work with MBSE tools. However, these
simulations have limits. For example, Ptolemy can make significant contributions when dealing
with heterogeneous systems, multiple time domains, and other complex scenarios. Nevertheless,
Ptolemy is difficult to use [30], [31].
In recent years, the use of simulation within MBSE has increased for system model
verification and testing and for acquiring new insights [32]. This advance departs from traditional
brute-force standalone simulations that tend to be limited in the range of operational conditions
they can test. More recently, the MBSE community has turned to the formal representation of
semantically rich ontologies and metamodels to facilitate the assessment of model completeness,
syntactic correctness, semantic consistency, and requirements traceability [33]. SysML-based
models are metamodel-based and support requirements traceability with normative relationships
between requirements and other elements in a model. For example, it is possible to generate maps
and tables that show, for example, requirement-to-requirement, requirement-to-use case,
requirement-to-V&V, requirement-to-logical and physical entities, and requirements-to-activities
relationships. However, the system representation needs to be flexible and adaptable to learn from
new information for some problems. Such problems require a flexible representation and a
semantically richer language.
15
2.3.2. MBSE Capabilities
MBSE has achieved a positive Return on Investment (ROI) for the US Submarine Fleet
[34]. The Submarine Warfare Federated Tactical System1 (SWFTS) is a rapidly evolving, COTSbased combat system of systems (SoS) acquired by the US Navy for deployment across the fleet.
The SWFTS program integrates the combat system from component systems produced by multiple
programs of record [35]. The Program Executive Office for Submarines practices continuous
process improvement in an unending quest to improve efficiency and deliver greater capability to
the fleet despite increasing budget pressures. In 2011, the SWFTS SoS engineering team started a
three-year program to transition from the traditional document-based systems engineering process
to a model-based systems engineering (MBSE) process utilizing UML and SysML. That transition
was completed and is now producing a positive return on investment (ROI) for the customer.
SWFTS is a level-of-effort contract, so the ROI manifests in a combination of the increased
number of baselines developed, improved quality of systems engineering products, and expanded
scope of work within stable funding. Since the transition to SysML-based MBSE, the average
number of baselines produced on a monthly basis has increased by approximately 30%. The
number of subsystems and combat system variants integrated in these baselines has increased by
60%. Over the same time period, the complexity of an average baseline, as measured by numbers
of requirements and defined interfaces, has grown at 7.5% per annum. The efficiencies achieved
through the transition to MBSE have also enabled the SWFTS SoS engineering team to accomplish
additional tasks that had previously fallen below the funding line. One measure of this growing
engineering scope is that the number of discrete systems engineering products produced by the
team has increased by over 60%, with most generated directly out of the SoS model [34].
16
2.3.3 Potential MBSE Benefits
While digital models have been in use in engineering since the 1960s, the early models
were disparate and based on different assumptions, modeling methods, and semantics. With the
advent of MBSE, models were placed at the center of systems engineering and became the
authoritative source of truth [15], [36]. Notably, they replaced document-centric systems
engineering. Models can include a variety of sub-models such as M-CAD, EDA, SysML, UML
models, and physics models. The move to MBSE allows engineering teams to readily understand
design change impacts, communicate design intent, and analyze a system design in terms of its
properties before building the system. The primary focus of most MBSE efforts in the industry is
to: integrate data through models; and realize an integrated end-to-end modeling environment that
supports understanding of all elements that impact a design; uncover and resolve undesirable
outcomes. MBSE introduces and integrates diverse models and views to create a centralized model
that facilitates a greater understanding of an evolving system (i.e., a system under development).
Importantly, MBSE provides a foundation for a Model-Based Engineering Enterprise, where
multiple enterprise views and functions are facilitated using a centralized digital repository. As
important, MBSE provides timely and early insights during systems engineering [37].
MBSE also supports automation and optimization, allowing systems engineers to focus on
generating value and making practical tradeoffs among competing objectives and system
attributes. The key to a successful model-based initiative is scoping the problem and proactively
managing the modeling process with the end in mind. MBSE goes well beyond traditional system
specifications and Interface Control Documents by creating an integrated system model from
which multiple views can be extracted. The system architecture provides data integration and
transformation construct across the system’s life cycle. The system/product dataset contains
17
several MBSE artifacts: system architecture, system specification, 3D views, system attributes,
design requirements, supplementary notes, and Bill of Material.
A critical activity in systems engineering is upfront tradeoff analysis that ensures
developing the best value system that satisfies mission/customer needs. As mission complexity
increases, it becomes increasingly difficult to understand the factors that impact system
performance. Integrating analytics within the rubric of a formal architecture can provide datadriven insights into system characteristics that typically do not surface in traditional analysis. In
addition, integrated tools allow systems engineers to analyze many more configurations in relation
to mission scenarios, helping to identify key requirements “drivers” and the lowest cost
alternatives for system design.
System architecture models can serve as the focal point for integrated analytics comprising
simulation tools, analysis results capture tools, context managers, requirements managers, and key
tradeoffs among architectural parameters. For example, analysis context specifies the scope of the
analysis, while parametric views inform the necessary sensitivity analysis. Requirements diagrams
capture stakeholder needs, thresholds, and “drivers” that define the trade space. This model-centric
framework provides a consistent and managed computational environment for analysis.
By placing analytics in the hands of system architects, insights into requirements and
architectural features that drive performance and cost can be gleaned. Following this, multidimensional analysis can provide architects with valuable perspectives, helping to identify the
“knee in the curve” between cost and performance within an n-dimensional trade space.
Integrated modeling and analysis can provide the basis for decision support to systems
architects and engineers. Decision-makers will potentially have more information to draw on and
more options to consider before reaching conclusions. Integrated analytics increase the amount of
18
available information while also helping decision-makers make sense of the data. Finally, MBSE
tools can help to explore, visualize, and understand a complex trade space and can potentially
provide early insights into the impact of decisions related to both technical solutions and public
policies.
Today there is growing acceptance of MBSE in defense, aerospace, automotive, and
consumer products industries. In addition, the expansion of MBSE to include analytics and tradeoff
analysis is dramatically increasing the MBSE adoption rate within multiple industries. Finally,
increased collaboration among MBSE practitioners and distributed research teams is helping to
rapidly advance model-based approaches.
The potential benefits of MBSE within system life cycle:
• Consistency – maintenance of centralized digital repository; avoiding duplication and rework;
seamless configuration management of systems engineering artifacts; avoiding defect propagation
within system model; enabling coordination among teams from multiple disciplines [38].
• Completeness – Advanced traceability of requirements and analysis. Ensuring allocation of all
the entities in the model to entities at subsequent layers [39].
• Concept exploration – MBSE approach allowed an organization to advance mission concept
capture and analysis for a complex space mission called Europa Clipper. In addition, the
organization utilized its existing MBSE infrastructure to rapidly develop an architecture of the
system and explore a broader range of trade space [40].
• Design reuse – Rail Manufacturing organization found MBSE methods efficient for product line
definition and model reuse, saving upfront cost. The method allowed a safer, more reliable, and
more rapid mechanism for product development. Additionally, MBSE methods enabled
centralized reuse strategy definition [41].
19
• Communication – The evolved quality of communication among team members facilitated a
better understanding of information within the model, efficient communication among a diverse
set of stakeholders, and relatively rapid knowledge transfer [30], [32].
• Test and evaluation – [42] reported that utilizing an MBSE process allowed test planning earlier
in a program. MBSE framework and Monte Carlo simulation facilitated the rapid development of
test strategies and designs. MBSE allowed the capture of test data for evaluation in structured
format, improving the traceability between detailed test plans and system requirements.
• Verification and validation (V&V) – MBSE approach was used by an organization to generate
consistent architecture across all system elements that allowed management of system
requirements centrally at one location despite having a geographically dispersed team. The MBSE
approach facilitated planning activities for verification and validation early in the system life cycle
phase. The MBSE method also facilitated document generation for verification and validation
based on developed models. As a result, requirements, interface, and verification documents were
generated with ease [43].
• System analyses – MBSE model analysis allowed high-fidelity, mission-level modeling, and
simulation for a complex space mission. The organization used a detailed representation of the
system that enabled analysis of weight, budget, and energy requirements [40].
2.3.4 Models Used in Systems Engineering
The field of systems engineering has been significantly enriched by adopting and adapting
modeling techniques from various disciplines. These include software engineering, electronic
circuit theory, operations research, supply chain management, constraint theory, statistics, time
series analysis, project management, probability theory, design automation, robotics, and
manufacturing [35], [44], [45], [46], [47], [48]. The assimilation of these varied modeling
20
approaches into systems engineering reflects the field's inherent adaptability and its continuous
quest for more robust methodologies to tackle increasingly complex systems. This convergence of
disciplines has not only enriched the systems engineering toolkit but has also fostered a more
holistic approach to problem-solving, inspiring engineers to address multifaceted challenges that
span traditional domain boundaries.
Dependency Structure Matrices (DSMs) were built for the mathematical system of
equations by Don Steward in 1981. The systems engineering community adopted DSM modeling
fairly quickly to perform system analysis, project planning, and organization design [49]. On the
other hand, machine learning models were embraced late in the systems engineering domain. The
adoption of reinforcement learning models in machine learning was recent in system engineering
when it became apparent that complex systems operate in an uncertain environment, and
reinforcement learning methods can help deal with uncertainties [50].
Functional analysis, a cornerstone of systems engineering, has been greatly enhanced by
the use of Functional Flow Block Diagrams (FFBDs). These diagrams serve to define system
functions and illustrate the temporal sequence of functional events. The concept of structured
process documentation dates back to 1921 when Frank Gilbreth introduced the flow process chart.
This laid the groundwork for the development of FFBDs, which have become instrumental in
capturing system behavior dynamics through multi-tiered, time-sequenced flow diagrams [36].
When presented with a system design problem, the systems engineer’s first task is to
understand the problem truly. That means understanding the context in which the problem is set.
A context diagram is a valuable tool for grasping the system to be built, the external domains
relevant to that system, and the interfaces to the system. System context diagrams show the system
21
as a whole, including its inputs and outputs to and from external factors. These context diagrams
typically use a block diagram template.
The late 1960s and early 1970s saw the development of the Structured Analysis and Design
Technique (SADT) within both systems and software engineering. This methodology aimed to
describe systems as functional hierarchies [51], [52], [53]. The concept was further formalized in
1981 with the introduction of the Integrated Computer-Aided Manufacturing (ICAM) Definition,
or IDEF0 [54], [55], [56]. The USAF primarily championed the IDEF0 representation as a viable
way to model systems. IDEF is a family of modeling languages in systems and software
engineering. They cover a wide range of uses, such as IDEF0 for Function modeling, IDEF1 for
Information modeling, IDEF1X for Data modeling, IDEF2 for Simulation model design, IDEF3
for Process description capture, IDEF4 for Object-oriented design, IDEF5 for Ontology
description capture, IDEF6 for Design rationale capture, IDEF7 for Information system auditing,
IDEF8 for User interface modeling, IDEF9 for Business constraint discovery, IDEF10 for
Implementation architecture modeling, IDEF11 for Information artifact modeling, IDEF12 for
Organization modeling, IDEF13 for Three schema mapping design, and IDEF14 for Network
design [55], [56]. Several structured approaches emerged, including structured programming,
design, and analysis.
N2 diagrams were used (or N x N interaction matrix) to identify interactions or interfaces
between major factors from a systems perspective. An N-squared (N2) diagram is a matrix
representation of functional and physical interfaces between system elements at a particular
hierarchical level [57]. The N2 diagram has been used extensively to develop hardware, software,
and human systems data interfaces. Later, the Dependency Structure Matrix (DSM) gained
popularity as an efficient way to represent a system along with its interactions and dependencies.
22
The DSM is a network modeling tool used to represent the elements comprising a system and their
interactions, thereby highlighting the system’s architecture (or designed structure). DSM is
particularly well suited to applications in the development of complex, engineered systems and
has primarily been used in engineering management. In addition, DSM can be used to model the
structure of systems, process architecture, and organizational architecture.
Systems architecting used to be considered more art than science. However, in recent years
there has been a calculated effort to incorporate science into systems architecting [24], [36], [58].
Matrix-based methods such as DSM provide a formal construct to analyze system architectures.
Especially, the matrix-based analysis offers a means to capture and evaluate architectures in
quantitative terms. Moreover, data in matrix format is machine interpretable. Also, matrix-based
architectural methods lend themselves to exploiting advances in machine learning and data
analytics. Furthermore, the compact matrix form lends itself to representing arbitrarily large
systems and system-of-systems analysis. In matrix-based approaches, the system architecture is
represented as a N x N square matrix in which the rows and columns represent the N entities while
the body of the matrix represents the interactions between entities [57]. The matrix-based
representation can describe architectures using multiple layers that represent perspectives such as
operational, functional, physical, implementation, and organizational. The matrix representation
can be customized to capture interface specifications, temporality, as well as technical, social, and
economic characteristics of system entities. To investigate human interaction with subsystems,
relevant portions of the matrix can be converted into graph form for graph-based visualization and
analysis. This mapping allows humans to contribute to the analysis of system architectures [49],
[59], [60].
23
To visualize complex timing relationships in systems, several methods have been
developed. Two notable approaches are timing diagrams and state transition diagrams. Timeline
Analyses (TLAs) models are employed to depict the time sequence of critical functions. Timing
diagrams provide a visual representation of objects changing states and interacting over time,
proving particularly useful for defining the behavior of hardware-driven, software-driven, and
human-driven components. The recognition of the importance of system states led to the adoption
of state machines (or state transition diagrams) across various engineering disciplines. These
methods were quickly applied by the systems engineering community to model system modes and
states. State diagrams simplify the understanding of complex systems by breaking down complex
reactions into smaller, known responses. However, as system complexity increased, state machines
faced scalability issues due to combinatorial explosion in their state space. To address this
challenge, the systems engineering community turned to alternative approaches such as heuristics,
meta-rules, Petri nets, and Petri net variants [61], [62].
The Systems Modeling Language (SysML) has emerged as a pivotal development in the
field of systems engineering, offering a standardized visual modeling language that bridges the
gap between various engineering disciplines. Developed through a collaborative effort between
the International Council on Systems Engineering (INCOSE) and the Object Management Group
(OMG), SysML provides a framework for representing complex systems across multiple domains.
SysML's architecture is built upon a subset of UML (Unified Modeling Language), extending it
with systems-specific modeling constructs. This heritage allows SysML to leverage the
widespread adoption and tooling support of UML while tailoring its capabilities to the unique
needs of systems engineering [63]. The language's structure revolves around four fundamental
pillars:
24
Structure: Defining the hierarchical composition of systems and their interconnections.
Behavior: Capturing the dynamic aspects of system operations and interactions.
Requirements: Formalizing system specifications and tracing their implementation.
Parametric: Representing constraints and equations for performance and engineering
analysis.
These pillars are realized through a set of diagram types, each serving a specific purpose
in the modeling process. For instance, the Block Definition Diagram (BDD) and Internal Block
Diagram (IBD) address structural aspects, while Sequence Diagrams and State Machine Diagrams
tackle behavioral modeling. The Requirements Diagram, a unique feature of SysML, provides a
visual means of requirements management, fostering better traceability between requirements and
design elements. One of SysML's contributions is its ability to serve as a common language across
multidisciplinary teams. In complex projects involving mechanical, electrical, software, and
systems engineers, SysML provides a shared platform for communication and collaboration.
However, the adoption of SysML has not been without challenges. The learning curve
associated with mastering the language and its underlying concepts can be steep, particularly for
teams transitioning from traditional document-centric approaches. Additionally, while SysML
provides a standardized notation, the interpretation and application of these models can vary
between organizations and even between projects within the same organization. To address these
challenges, the systems engineering community has been developing best practices,
methodologies, and tool support for SysML. Initiatives such as MBSE (Model-Based Systems
Engineering) methodologies like OOSEM (Object-Oriented Systems Engineering Method) and
RUP SE (Rational Unified Process for Systems Engineering) have emerged to provide structured
25
approaches to applying SysML in real-world projects [37]. Looking forward, the evolution of
SysML continues with the development of SysML v2, aimed at addressing limitations in the
current version and incorporating feedback from years of application.
2.3.5 Roles of Models in MBSE
A digital model is digital data connected in an intelligent way that provides the ability to
portray, understand or predict the properties or characteristics of the system under conditions or
situations of interest. The following terms are associated with the definition of digital model.
Figure 2 represents the relationships between model and these terms.
System: Assemblage or combination of functionally related elements or parts forming a unitary
whole.
Architecture: Fundamental concepts or properties of a system in its environment embodied in its
elements, relationships, and in the principles of its design and evolution.
System Description: Work product used to express an architecture, behavior or structure of system.
Architecture description could be document-centric, model-based, and repository-based.
Stakeholder: Individual, team, organization, or classes thereof, having an interest in a system.
Concern: Matter of interest or importance to a stakeholder. A concern pertains to any influence on
a system in its environment, including developmental, technological, business, operational,
organizational, political, economic, legal, regulatory, ecological and social influences.
View: work product expressing the architecture of a system from the perspective of specific system
concerns.
26
Figure 2: Relationship between key terms used in MBSE
2.3.5.1 MBSE Adoption for Implementation
As discussed in the previous section, systems engineering initiatives that employ an MBSE
approach require greater investment in the earlier stages of the system life cycle than with
traditional systems engineering. Such initiatives can be expected to produce gains in the latter
stages of the system life cycle. Therefore, an analysis of factors related to early investment in
MBSE, and factors related to later gains from MBSE can provide additional insights for economic
justification of transition to MBSE.
2.3.5.2 Factors Related to MBSE Investment
MBSE investment covers a number of costs including the cost of MBSE process definition,
infrastructure cost, training cost, and model-related costs. Each cost is described next.
27
MBSE process definition cost is a key consideration in calculating the cost of adopting an
MBSE methodology for organization-wide implementation. MBSE process definition depends on
the MBSE methodology selected. There are several MBSE methodologies, such as INCOSE
Object-Oriented Systems Engineering Method (OOSEM), Object-Process Methodology (OPM),
IBM Rational Unified Process for Systems Engineering (RUP SE) for Model-/Driven Systems
Development (MDSD), Vitech Model-Based System Engineering (MBSE) Methodology, JPL
State Analysis (SA) [37]. Implementing a particular methodology and generating models using
that methodology have different costs associated with them (i.e., not all MBSE methodologies cost
the same).
Infrastructure cost is another cost category. Infrastructure cost consists of licenses,
equipment, environments, processing, and collaboration. Infrastructure cost for this study is
assumed to be specific to projects. In general, infrastructure cost is incurred at the enterprise level,
but that can be shared among projects to quantify costs specific to each project.
Training cost is a distinct category. It involves, training on tools, training on modeling
languages, and training employees in systems engineering and MBSE. Training cost involves the
cost for learning curves and organization level resistance to change [11], [35].
Model-related costs are a major source of costs. Model development efforts include
identifying the goal, purpose, and scope of the model, improving model capabilities, defining the
intent of use, and configuring the model for the intended number of users. Building federated,
centralized or hybrid models have different costs [64]. The scale of the model is also a cost
modifier. Unifying the model format has its own cost [12], [65]. Maintaining model consistency
is a source of costs. Building models are not enough; model verification needs to be performed at
each stage of model development to ensure model correctness and credibility [11], [66].
28
Identifying criteria of model verification and determining the right level of model abstraction have
their own costs [67]. Identifying opportunities of model trading and executing it in the right context
also requires effort. In the case of MBSE, an increase in the number of stakeholders working on
central digital models makes the models more prone to errors and require model curation efforts.
Model curation consists of gauging model characteristics, selecting models for implementation,
and devising model policies [68]. Configuration management efforts include maintaining
ownership, managing data rights and distribution of rights, re-use or copyrights.
2.3.5.3 Factors Related to MBSE Gains
The factors that pertain to gains from MBSE include early defect detection, reuse, product
line definition, risk reduction, improved communication, usage in supply chain and standards
conformance [69], [70], [71], [72]. It is well-known that the later in the system life cycle that
defects are detected, the greater the cost of correcting the defects. The ability of MBSE to identify
defects early in the system life cycle can contribute to significant cost savings and thereby provide
significant gains [31]. Another key factor is the ability to use legacy data during later phases of
life cycle to reduce rework. The ability of MBSE to exploit legacy data also contributes to
economic justification. Since individual projects may not have the luxury of time to work on data
reuse, additional measures are needed at the enterprise management level to incentivize projects
to achieve data reuse. In particular, commercial aircraft and automobiles which have product lines
with variants can benefit from model and data reuse. MBSE can be expected to provide significant
gain to define product line characteristics because of single, centralized configurable system
models [64]. The adoption of MBSE can be expected to increase confidence while reducing risks
related to both processes and products. Also, with increasing system complexity, it is becoming
increasingly difficult to account for emergence. Therefore, risk reduction in complex systems is a
29
significant gain [73]. With centralized digital repositories and data analysis capability, MBSE
promises to increase communication efficiency and reduce feedback loops. These characteristics
can provide important gains during the system life cycle. MBSE also provides opportunities to
share data and knowledge in timely fashion, which translates into increased supply chain
efficiency. Additionally, MBSE provides cost savings through quick traceability mechanisms that
are built into today’s modeling environments. However, the ever-increasing complexity of systems
in multiple domains is making it increasingly difficult to conform to standards.
From the analysis of factors related to both investments and returns/gains, it became
apparent that MBSE needs to derive value from a variety of systems engineering activities across
the system life cycle for convincing economic justification. In other words, using model-based
approach solely for requirements engineering is not enough. The use of integrated digital models
for data analysis, trade studies, decision support, and communication is necessary throughout the
detailed design, implementation, production, maintenance, retrofit, operation, upgrade and
retirement stages of the system life cycle.
In MBSE, initial assumptions can be incorrect because initial models are created with
limited data and incomplete knowledge of the system. Furthermore, assumptions may not be valid
because they tend to be based on inadequate understanding and limited evidence [74]. Failure to
revisit the initial assumption is one of the most significant issues that need to be addressed in
MBSE. It is in this area that digital twin technology can play an important role.
30
Figure 3: Application of digital twin across MBSE life cycle with associated investments and
gains [75]
Digital twin technology potentially offers an effective means to replace assumptions with
actual data or revise assumptions thereby increasing model fidelity [75], [76], [77]. Specifically,
new evidence from the physical system (twin) can be acquired and used to update the digital twin
model [78], [79], [80]. The data captured in digital twins can serve to enhance traceability between
system requirements and real-world behaviours.
With digital twin enabled MBSE, the model's accuracy increases with new data. Inference
and analysis can complement MBSE, thereby producing data and insights that can be used to
increase model accuracy leading to superior decisions and predictions [78]. Data acquired from
the physical system (i.e., physical twin) in the real-world can be used to update system models in
the digital twin. Subsequently, analysis and inference can lead to new insights that can be used to
update centralized digital models throughout the system’s life cycle. Figure 3 depicts the notional
cost curves for MBSE, traditional SE, and Digital Twin-Enabled MBSE across the system life
cycle. Color codes are used to distinguish traditional systems engineering, MBSE, and digital twin
31
enabled MBSE curves. The costs and gains associated with Digital Twin-Enabled MBSE
implementation are presented.
As shown in Figure 3, for Digital Twin-Enabled MBSE, additional costs are incurred in
the initial phase of the system life cycle. These costs are related to ontology and metamodel
definition, development, and integration, sensor infrastructure implementation, data processing,
data management, and configuration management. Metamodel provides the structure for the
ontology. Thus, defining the terms (i.e., ontology) that are entered in a database is usually
associated with defining the database schema (metamodel). Consolidating information from the
enterprise and organizational level and continuously updating the system model in the digital twin
facilitates superior decision-making across the system life cycle.
2.3.6 MBSE Methodologies
The MBSE methodologies include IBM Telelogic Harmony-SE, INCOSE Object-Oriented
Systems Engineering Method (OOSEM), IBM Rational Unified Process for Systems Engineering
(RUP SE) for Model-Driven Systems Development (MDSD), Vitech Model-Based System
Engineering (MBSE) Methodology, JPL State Analysis (SA), and Dori Object-Process
Methodology (OPM).
IBM Telelogic Harmony-SE methodology focuses on requirement analysis, functional
analysis, design synthesis, and recursive subsystem development for lower levels. The approach
is based on Object Management Group's Systems Modeling Language (SysML). The methodology
has dual principles in Systems and software engineering [81], [82].
INCOSE Object-Oriented Systems Engineering Method (OOSEM) utilizes SysML with
object-oriented modeling construct for need analysis, requirement analysis, architecture, trade-off
analysis, design specification, and verification. The methodology includes top-down scenario-
32
driven approach and Exploits modeling constructs such as causal analysis, logical decomposition,
partitioning criteria, clustering, node distribution analysis, control strategies, and parametric [63].
IBM Rational Unified Process for Systems Engineering (RUP SE) for Model-Driven Systems
Development (MDSD) method has focus on business modeling and process frameworks within
systems engineering. The methodology implements the spiral development model of iterative and
incremental development. The methodology utilizes SysML use case driven approach and
primarily used in government and industry to manage software development projects [83]. The
methodology provides the highly configurable workflow templates required to identify the
hardware, software, and worker role components [37].
The Vitech Model-Based System Engineering (MBSE) Methodology, STRATA, refers to
the principle of designing a system in strategic layers. This methodology has four central systems
engineering activities: Source Requirements Analysis, Functional/Behavioral Analysis,
Architecture/Synthesis, and Verification and Validation (V&V) [84]. The methodology supports
Top-down, Bottom-up/Reverse Engineering, and Middle-out system development approaches
[64]. These activities are linked and maintained through a central digital repository. Methods
supporting the design and implementation of V&V top-level activity include test plan development
and test planning [84]. These V&V activities represent a small portion of the overall V&V system
life cycle phase.
JPL State Analysis (SA) methodology leverages a state-based control architecture. State
analysis methodology provides requirements on system and software design in the form of explicit
models of system behavior. State Analysis provides an approach for discovering, describing,
defining, and recording the states of a system. It facilitates behavior modeling of state variables
and provides means to trace the system constraints [33], [40], [85], [86].
33
Object-Process Methodology (OPM) provides a standard paradigm for systems
development, life cycle support, and evolution. Fundamental constructs used in OPM are Object,
Process, and State. In OPM, an Object is defined as a Thing that exists or has the potential of
existence physically or mentally. The process is defined as a pattern of transformation that the
object undergoes. Finally, the state is a Situation of the object at a particular moment in time. The
methodology has three main stages requirement specification, analysis, and development and
implementation. OPM paradigm integrates the object-oriented, process-oriented, and state
transition approaches into a single frame of reference [87].
2.4 Digital Twin Technology
Digital Twins, a key concept in Digital Engineering today but with roots that go as far back
as 2003, holds significant potential for enhancing MBSE. Digital Engineering focuses on
communication and data sharing within an organization, whereas MBSE focuses on exploitation
of data/models for decision making. The term “digital twin” was initially coined in the context of
product life cycle management (PLM) in 2003. At the time, the technology to implement digital
twins was not sufficiently mature. Digital twins today are computational models that have bidirectional communication and evolve synchronously with their real-world counterparts [78].
Therefore, digital twins reflect the state and status of the real-world counterpart. The real-world
counterpart can be a system, a component, a process, or a human. Digital twins can be employed
in MBSE in different ways:
• Prototype and iteratively test and evolve the design of a real-world system/product
• Monitor the performance of the physical counterpart and intervene in its operation, if
needed
34
• Collect data from a team of physical twins (aggregate twins) to approximate their behavior
and use the resultant model to support predictive maintenance
The concept of a digital twin is inexorably tied to modeling and simulation technology
which has been around for decades. With digital twins, initial models created tend to be incomplete
and potentially incorrect due to limited knowledge about the system at the beginning of the
modeling activity. On the other hand, models created using traditional Modeling & Simulation
(M&S) technology tend to be generic models of the system of interest. With the advent of Internet
of Things (IoT), Industry 4.0, and applied analytics, the generic system model can be transformed
into a digital twin, a computational model and virtual replica of a particular real-world system,
process, or person. This transformation becomes possible because the generic computational
model can be updated to reflect the state, status, and maintenance history of a specific physical
system, and then evolve in synchronization with the physical system. The digital twin can then be
used to make better decisions about the real-world system than previously possible. Physical
systems usually have onboard sensors that support two-way interaction between the physical world
and the virtual environment of the digital twin.
The range of possibilities for employing a digital twin of various levels of sophistication is
extensive. Digital twins can support a variety of use cases in different industries [77], [88], [89].
For example, in healthcare medical devices (e.g., insulin pump) can have a digital twin, patients
can have digital twin to monitor and record how a patient would respond to different treatments in
healthcare, also process can have digital twins to analyze whether or not intervention is needed.
More sophisticated twins are being employed to model cities while accounting for traffic transit,
power consumption, and pollution considerations [89].
35
Until recently, hardware associated with a physical system and operational analysis were
not an integral part of MBSE methodologies. With the advent of digital twin technology,
integrating physical system hardware into MBSE provides ample opportunities to learn from realworld data.
Taking the plunge into investing in digital twin technology has to be tempered with caution.
Several questions need answering to conclude that pursuing a digital twin approach is worthwhile
for an organization. It begins with identifying the problem being addressed and determining
whether digital twin is suitable for that problem. Beyond that, it is necessary to answer whether
the contemplated digital twin is sustainable, and if so, who will be assigned to ensure its
sustainability. A fundamental and obvious question that needs to be answered is whether the digital
twin is no more than a costly version of a solution that could have been readily satisfied with a
digital document or electronic report. Is the digital twin a real-time system? The critical point here
is that a sound business case needs to be made. This business case should be driven by use cases
that clearly convey the value proposition of a digital twin for the business.
The real-world use case that can be considered is for a Formula One race car, and the
business case that can be made [75]. It begins with creating a simulation that reflects the real
racetrack experience, i.e., it serves as a digital twin and enables extraction of maximum benefit
from limited on-track testing time. The purpose of the digital twin simulator, in this case, is to:
• Help the team set up the car to run faster
• Rapidly advance car development
• Increase team’s ability and speed to fine-tune car enhancements
• Determine where the team needs to improve
• Identify where its competition is the strongest
36
• Pinpoint where the team has weaknesses
• Identify performance areas to enhance
• Test new design features and understand their impact on performance before going on the
track
The next step is the “data to decisions” mapping, i.e., selecting the data that needs to be
captured from the physical twin to extract business value and/or operational success by enabling
informed decision-making. In such a data-rich environment, vast amounts of information can be
gathered in a short period and run through machine learning algorithms to derive insights and
uncover patterns that enable teams to make more informed decisions faster. Appropriately
constructed digital twins can offer insights that help with operational optimization.
2.4.1 Digital Twin Mathematical Models
Much of the work with digital twins thus far has been confined to individual digital twin
supporting a specific application. In 2021, researchers at MIT came up with a probabilistic model
foundation for facilitating predictive digital twins at scale [90], [91]. These researchers
operationalized a model that allows the deployment of digital twins at scale, e.g., creating digital
twins for a fleet of drones. A mathematical representation that takes the form of a probabilistic
graphical model provides the foundation for "predictive digital twins." This model was applied to
an Unmanned Aerial Vehicle (UAV) in a mission scenario in which the UAV suffers minor wing
damage in flight and has to decide whether to land, continue, or reroute to a new destination. In
this case, the digital twin experiences the same damage in a virtual world and faced the same
decision as the physical UAV experiences in the real-world. These researchers discovered that
custom implementations require significant resources, a barrier to real-world deployment. This
problem is further compounded because digital twins are most useful in situations where multiple
37
similar assets are being managed. To this end, these researchers created a mathematical
representation of the relationship between a digital twin and the corresponding physical asset.
However, this representation was not specific to a particular application or use. The resulting
model mathematically defined a pair of physical and digital systems, coupled through two-way
data streams that enable synchronized evolution over time. The digital twin parameters were first
calibrated with data collected from the physical UAV so that the digital twin was an accurate
reflection of the physical twin from the start. Then, as the physical UAV's state changed over time
through wear and tear from flight time logged, these changes were observed and used to update
the state of the digital twin, so it reflected the state of the physical UAV. The resultant digital twin
then became capable of predicting how the UAV was likely to change and used that information
to direct the physical UAV in the future optimally.
The graphical model allowed each digital twin to be based on an identical system model.
However, each physical asset maintained a unique digital state that defined a unique configuration
of the model. This approach made it easier to create digital twins for a large collection of physically
similar assets.
In their experiment, the UAV was the testbed platform for all activities – from collaboration
experiments to simulated light damage events. The digital twin was able to analyze sensor data to
extract damage information, predict the impact of UAV structural health in future activities of the
UAV, and recommend changes in maneuvers to accommodate the changes.
The key idea behind this advance is preserving a steady set of computational models that
are frequently updated and matured alongside the physical twin over its entire life cycle. The
probabilistic graphical model approach helps cover the different phases of the physical twin's life
cycle. In their problem, this property is seen in the graphical model, which seamlessly stretches
38
from the calibration phase to the operational, in-flight phase. The latter is where the digital twin
becomes a decision aid for the physical twin.
2.4.2 Data-Driven Model Update Methods
Digital twin operationalization involves system models to be continually updated from the
physical twins' operational and maintenance data. The system model receives data from its
physical counterpart, making it a digital twin. Otherwise, it remains a virtual prototype.
Furthermore, the digital twin architecture involves closed-loop modeling. In this regard, multiple
data-driven model update methods are reviewed and analyzed for the research.
2.4.2.1 Reinforcement Learning
As an emerging type of machine learning, reinforcement learning (RL) shows decisionmaking capability in the absence of initial environment information [88]. RL has a close loop
model architecture, where inputs from the environment are received to modify the model.
Reinforcement learning problems involve learning what to do how to map situations to
actions| to maximize a numerical reward signal [92]. RL agents have no requirement for initial
knowledge. They pursue optimal actions by interacting with the environment, which may be
challenging to attain in practice. RL can be flexibly engaged for various applications by offline
and online training, considering uncertainties. RL is more straightforward to implement in reallife scenarios. RL can get the optimal outcomes in a look-up table, so its computational efficiency
can be relatively high.
2.5 Generative Models
Generative AI agents have the potential to accelerate the development of system models
by automating the knowledge acquisition process and integrating diverse data sources. These
39
agents can be used to create services that build interoperability mechanisms between different
models. Given these capabilities, this section of the literature review focuses on the current
advancements in generative AI, with a particular emphasis on their potential applications in the
context of system upgrades and MBSE.
2.5.1 Agent-Based AI
Agent-based AI focuses on developing systems with autonomy and decision-making
capabilities. Recent advancements in generative models have significantly contributed to this field,
emphasizing self-improvement, external feedback, and interactive learning. One notable
contribution is the CRITIC framework, which enhances large language models (LLMs) through
iterative self-improvement and external feedback mechanisms. This approach fosters autonomy
and enables continuous refinement of outputs, leading to more accurate and reliable decisionmaking processes [93]. Similarly, the Reflexion framework introduces verbal reinforcement
techniques to enhance agent learning. By incorporating verbal feedback, Reflexion facilitates a
more natural learning process, mimicking human learning behaviors. This framework has shown
promise in improving the performance and adaptability of language agents, making them more
proficient in executing complex tasks autonomously [18].
Additionally, the SELF-REFINE approach leverages iterative feedback to improve LLM
outputs. This method allows models to learn from mistakes and refine strategies in a cyclical
manner, crucial for developing robust Agent-Based capabilities. The iterative nature of SELFREFINE ensures incremental improvements and effective adaptation to new challenges, enhancing
overall planning and coordination skills [94]. These frameworks highlight advancements in AgentBased AI, emphasizing the integration of self-improvement, external feedback, and iterative
learning. The continuous evolution of these models indicates a future where AI systems can
40
operate with greater autonomy, making informed decisions and adapting to dynamic environments.
As research progresses, it is expected that Agent-Based AI will continue to develop, leading to
more capable AI agents.
2.5.2 Tool Use
Tool use in generative AI is an important aspect of enhancing the capabilities of large
language models (LLMs) by integrating external resources and functionalities. This integration
allows models to perform tasks beyond their intrinsic abilities, improving their performance and
utility in practical applications. One example is the Chain-of-Abstraction (CoA) reasoning method,
which enables LLMs to use tools efficiently by abstracting complex reasoning processes into
manageable steps. This method has been shown to improve models' problem-solving skills and
overall functionality by effectively applying specialized knowledge and tools [95].
The Gorilla model represents another advancement in tool use. By enhancing LLMs'
proficiency in making API calls, Gorilla allows these models to interact seamlessly with various
external systems and databases. This capability is crucial for tasks requiring real-time data retrieval
and manipulation, such as generating detailed reports or executing complex queries. The Gorilla
model, through its use of the APIBench dataset, has demonstrated improvement over GPT-4 in
writing API calls, indicating better utility and adaptability when interfacing with external tools
[96]. Additionally, the CRITIC framework emphasizes the role of tool use in the self-improvement
process of LLMs. By incorporating external critics, CRITIC enables models to refine their outputs
based on feedback from these tools, enhancing their performance iteratively. This integration not
only improves immediate outputs but also contributes to long-term learning and adaptation,
making models more adept at utilizing external resources effectively [95].
41
The ability to effectively use tools is a core aspect of modern AI systems aiming to perform
complex, real-world tasks. As research in this domain continues to advance, the integration of tool
use is expected to become more sophisticated, further bridging the gap between AI capabilities
and practical applications.
2.5.3 Planning and Collaboration
Planning and coordination are pivotal in generative AI, enabling systems to execute
complex actions and collaborate effectively. These capabilities are crucial for strategic foresight,
multi-step problem-solving, and synchronized efforts among multiple AI agents or with human
partners.
One significant advancement is the Chain-of-Abstraction (CoA) reasoning method, which
breaks down complex tasks into manageable steps. This hierarchical approach enhances task
execution efficiency by allowing models to manage components effectively and anticipate
potential obstacles [95]. The Reflexion framework uses verbal reinforcement to improve planning
and coordination. By incorporating real-time feedback, Reflexion allows agents to adjust their
actions dynamically, ensuring effective coordination in changing environments. This adaptability
helps models align with their objectives and coordinate better with other agents or human partners
[97]. The Gorilla model enhances LLMs' ability to make accurate API calls, crucial for integrating
external tools and resources into the AI's workflow. This capability facilitates smoother and more
efficient planning and coordination, enabling AI to leverage external systems effectively [96].
Multi-agent collaboration enhances planning and coordination by assigning specific roles
to different AI agents. This approach allows for a division of labor, with each agent focusing on
42
its area of expertise and dynamically adjusting roles based on task requirements. By working
together, agents tackle complex tasks more efficiently, optimizing the overall workflow [98], [99].
These advancements highlight the importance of planning and coordination in generative AI.
Incorporating hierarchical task decomposition, real-time adaptability, and multi-agent
collaboration enables AI systems to manage complex tasks with greater precision and coherence.
As research progresses, these capabilities will become more sophisticated, allowing AI to tackle
increasingly complex tasks effectively.
2.6 Summary of Existing Gaps
The literature reveals several gaps in the current systems upgrade processes and the
application of Model-Based Systems Engineering (MBSE). Firstly, the process of upgrading
fielded systems remains largely ad hoc and unstructured. It is often lengthy and resource-intensive,
resulting in extended downtime and increased operational costs. There is also a notable risk of
disruptions during the upgrade process, which can compromise system reliability and performance.
Additionally, existing methodologies frequently fail to provide a cohesive framework for assessing
cross-domain impacts and handling the heterogeneous data that modern aerospace systems
generate.
In the context of MBSE, despite its potential to streamline system upgrades by addressing
knowledge silos and facilitating early identification of issues, it is often time-consuming and laborintensive. Developing comprehensive system models can be a protracted process, hindered by the
scarcity of expertise in the field. Moreover, while MBSE offers significant benefits, leveraging
new technologies like digital twins and generative models is not yet fully realized. The literature
lacks robust frameworks for integrating these advanced technologies into MBSE practices, which
43
could potentially accelerate system upgrades and enhance the accuracy and efficiency of these
processes.
Furthermore, there is a critical gap in the application of real-time data synchronization and
bidirectional communication capabilities in digital twins, essential for maintaining the accuracy
and consistency of system models during upgrades. The integration of probabilistic reasoning and
advanced analytical techniques into MBSE remains underexplored, limiting the ability to perform
dynamic updates and real-time analyses. Finally, the literature does not adequately address the role
of human-AI collaboration in system upgrades, missing opportunities to combine human expertise
with AI capabilities for better decision-making and reduced error rates.
To address these gaps, the methodology, described in the next chapter, will utilize ModelBased Systems Engineering (MBSE) to integrate different domains. It will leverage generative AI
agents to automate knowledge acquisition from diverse data sources and rapidly create system
models. Additionally, it will utilize the construct of digital twins to develop continuously updated,
closed-loop system models and employ them for comprehensive analysis.
44
Chapter 3
Methodology
3.1 Introduction
Upgrading complex aerospace systems, particularly those operating in dynamic
environments, is fraught with challenges. Unplanned upgrades due to unforeseen circumstances
like component failures or changes in operational requirements exacerbate these difficulties.
Traditional upgrade processes lack agility and predictive capabilities, often resulting in reduced
operational effectiveness and increased costs. Key challenges include limited knowledge of change
impacts, where fragmented and siloed information makes it difficult to understand the full impact
of changes. Another significant issue is the difficulty in testing the correctness of upgrades. Given
the complexity of aerospace systems, it is challenging to ensure that upgrades function correctly
across all components and their interactions, leading to potential system failures. Additionally,
unplanned changes increase the risk of incompatibility and performance issues, potentially leading
to system failures. Resource intensiveness is also a significant challenge, as traditional upgrade
processes are time-consuming, labor-intensive, and costly. Traditional methods for upgrading
systems are often inadequate for managing the complexity of aerospace systems [100]. One
significant limitation is the lack of predictive capabilities, which leads to uncertainties and risks
during implementation. These conventional methods are also characterized by efficiency
constraints, as they are slow and resource-intensive, lacking the ability to rapidly test multiple
45
upgrade scenarios in a cost-effective manner. Additionally, data fragmentation is a major issue,
with information siloed across different domains, hindering comprehensive system understanding.
Furthermore, manual processes are prone to errors, omissions, and misinterpretations, which
compromise the overall effectiveness of the upgrades. Moreover, traditional methods are often ad
hoc and unstructured, making the upgrade processes time-consuming and inefficient. This lack of
structure exacerbates the challenges, resulting in prolonged upgrade timelines and increased
operational costs.
3.2 Integrated MBSE Methodology
This chapter outlines a novel, systematic model-based systems upgrade methodology. The
key advancements include integrating different domains, automating knowledge acquisition from
diverse data sources, rapidly creating system models, and employing continuously updated,
closed-loop system models for comprehensive analysis. This approach leverages Model-Based
Systems Engineering (MBSE), digital twin technology, and generative models to enhance system
upgrade processes. MBSE framework provides a structured approach to system design,
development, and lifecycle management. MBSE ensures coherent representation and systematic
propagation of changes across different system views, reducing errors, improving traceability, and
enhancing communication among stakeholders. Digital twin technology offers virtual
representations of physical assets that are continuously synchronized with real-world data. This
enables continuous monitoring, simulation, and predictive analysis of system behavior, which is
crucial for understanding the potential impacts of upgrades and making informed decisions.
Generative AI agents, such as transformer-based models, generate outputs based on large datasets
and neural networks. These agents automate knowledge acquisition, facilitate actions, and enhance
46
human interaction and collaboration through an approach called Agent-Based AI. Teams of such
agents can take input from one another, with some agents having access to specialized tools,
functions, and visual data, enabling a coordinated and efficient execution of advanced tasks.
3.2.1 Integration of MBSE, Digital Twin, and Generative Models
• MBSE for Domain Integration: Helps address knowledge silos, ensuring a unified and
coherent system model. It facilitates early identification of issues and systematic
propagation of changes [15], [69].
• Digital Twin for Continuous Monitoring and Analysis: Enables dynamic representation
and real-time mirroring of the physical system, allowing for continuous monitoring,
simulation, and predictive analysis [23], [75]
• Generative Models for Automation, Analysis, Action, and Human Collaboration:
Enhance the speed and efficiency of the upgrade process by automating knowledge
acquisition, performing advanced reasoning, facilitating bidirectional communication
between system components, and improving human interaction and collaboration through
intelligent interfaces and support systems [101], [102].
Figure 4 illustrates the Digital Twin System Model, depicting the interaction between
generative AI agents and various components of the methodology.
47
Figure 4: Overview of the Digital Twin System Model with Generative AI Agents
Generative AI agents actively acquire knowledge from the fielded system, creating a
comprehensive data set for analysis. This data is formalized into structured knowledge
representations, ensuring consistency and coherence across domains. Generative AI agents
perform advanced reasoning, utilizing temporal embeddings and probabilistic models to analyze
the system and predict outcomes. The agents then facilitate actions, implementing upgrades
through bidirectional communication with physical systems, digital assets, and software tools,
thereby closing the loop with the upgraded system.
48
3.2.2 Methodology Overview
The methodology for upgrading complex aerospace systems integrates Model-Based
Systems Engineering (MBSE), digital twin technology, and generative AI models within a
cohesive framework. This integration aims to streamline the upgrade process, improve predictive
accuracy, and reduce the time and resources required for implementing system upgrades. The
methodology dynamically gathers, represents, analyzes, and acts on system information to ensure
efficient and effective upgrades.
Knowledge Acquisition: Active Information Gathering
The first component, Knowledge Acquisition, focuses on the automated and intelligent
gathering of system-related information from diverse sources using multi-modal generative AI
agents. These agents are capable of processing and interpreting data from various modalities,
including text, images, audio, and structured data. The objective of this component is to create a
comprehensive and up-to-date knowledge base of the system, thereby reducing the time and effort
required for manual data gathering and integration.
Knowledge Representation: Formal Ontology-based
The second component, Ontology, involves developing and using layered ontologies to
create a unified, structured representation of multi-domain knowledge. This ontology-based
approach ensures consistency and coherence across different domains, facilitating effective
integration and analysis. The primary objective is to address the challenge of disconnected and
inconsistent system models, providing a holistic view of the system and improving
interoperability.
49
Analysis: Automated Reasoning and Human Interaction
The third component, Analysis, enhances cross-domain impact analysis and manages the
dynamics, uncertainties, and correctness of the system model. This component utilizes advanced
reasoning techniques, including temporal embeddings and probabilistic reasoning, and interfaces
with formal logic and advanced computational solvers. The objective is to ensure comprehensive
analysis and accurate decision-making, incorporating human-in-the-loop validation to maintain
oversight and reliability. Generative AI agents enhance human interaction and collaboration by
providing intelligent support systems and interfaces.
Action: Agent-Based System with Planning and Coordination
The fourth component, Action, focuses on executing bidirectional connections between
physical systems and digital assets. Multi-modal actuation agents carry out tasks across various
tools and systems, while planning and coordination agents manage multi-agent interactions and
integrate human oversight. The objective is to enable efficient and accurate execution of system
upgrades, reducing error rates and improving the overall effectiveness of the upgrade process.
Generative AI agents facilitate bidirectional communication and coordination, ensuring seamless
integration of actions across system components.
50
Figure 5: Integrated Framework for System Upgrades Using Digital Twin and Generative AI
Agents
Figure 5 provides a visualization of the methodology for system upgrades, illustrating the
flow of information and the interaction between various components of the system. The
components of the diagram include physical systems, digital assets, and software tools, which
represent the actual operational environment of the systems. These elements provide real-world
data and interact with the digital twin model through various generative AI agents. Multi-modal
agents are responsible for the active knowledge acquisition process, gathering data from multiple
sources, including physical systems, digital assets, and software tools. By processing and
interpreting data from various modalities such as text, images, audio, and structured data, they
create an up-to-date knowledge base of the system.
The central component, formal knowledge representation, involves the formalization of the
gathered data into structured knowledge representations using layered ontologies. This ensures
consistency and coherence across different domains, facilitating effective integration and analysis.
It serves as the core repository of knowledge that the system relies on for reasoning and decision-
51
making. Reasoning agents perform advanced analysis and reasoning on the formal knowledge
representations. Utilizing techniques such as temporal embeddings, probabilistic reasoning, and
inference, they analyze the system’s data to predict outcomes and identify potential issues. These
agents also act as interfaces between formal logic and advanced computational solvers to enhance
reasoning capabilities.
Generative AI agents support human interaction by providing intelligent interfaces and
support systems. They facilitate human-in-the-loop validation, ensuring that the decision-making
process is both comprehensive and reliable. This interaction enhances collaboration and ensures
that the analysis is validated by human expertise. Planning and coordination agents manage the
planning and coordination of multi-agent interactions. They integrate human oversight into the
planning and execution of upgrades, ensuring coherent and effective system actions.
Multi-modal action agents are responsible for executing actions based on the analysis and
plans developed by the system. They facilitate bidirectional communication between physical
systems, digital assets, and software tools, implementing upgrades and ensuring real-time
synchronization with the digital twin model.
The flow of information starts with data acquisition, where multi-modal agents gather data
from physical systems, digital assets, and software tools. The gathered data is then formalized into
structured knowledge representations within the formal knowledge repository. Reasoning agents
analyze the formal knowledge, providing insights and predictions, and facilitate human interaction
for validation and decision-making. Planning and coordination agents develop plans based on the
analysis, ensuring that all actions are coherent and aligned with the overall upgrade strategy.
Finally, multi-modal action agents execute the planned actions, communicating with physical
systems, digital assets, and software tools to implement the upgrades.
52
This integrated approach ensures a dynamic and responsive system for upgrading complex
aerospace systems, addressing traditional process limitations. By leveraging the strengths of
MBSE, digital twin technology, and generative AI, the methodology provides a robust framework
for efficient and correct system upgrades.
3.3 Knowledge Acquisition: Active Information Gathering
3.3.1 Overview
Effective system upgrades are predicated on the comprehensive and up-to-date knowledge
of the system in question. This knowledge enables informed decision-making, accurate impact
assessments, and efficient implementation of upgrades. Traditional methods of knowledge
acquisition, however, often fall short due to their reliance on manual data gathering and integration
processes. These methods are typically time-consuming, prone to human error, and limited by the
diverse formats and silos in which data is stored. As a result, they struggle to capture the
complexity and dynamism of modern systems, particularly in high-stakes sectors such as aerospace
and defense [103]. To address these limitations, this research introduces an innovative approach
leveraging multi-modal generative AI agents. These advanced AI agents are designed to actively
and intelligently gather information from a wide range of sources, integrating and formalizing this
data into a cohesive, machine-readable format. Furthermore, these AI agents collaborate with
human experts to acquire and validate knowledge, helping to accelerate the system model
development process. This approach not only speeds up the knowledge acquisition process but
also ensures greater accuracy and consistency, providing a robust foundation for effective system
upgrades.
53
3.3.2 Problems Addressed
3.3.2.1 Manual and Time-Consuming Data Gathering
One of the primary challenges in upgrading aerospace systems is the labor-intensive and
time-consuming process of data gathering. This process is often hindered by data silos, where
information is compartmentalized within different departments or teams, making it difficult to
obtain a holistic view of the system. The diversity of data formats, including text documents,
software code, simulation results, and sensor data, adds another layer of complexity, as each format
requires different methods of extraction and interpretation. The manual integration of this diverse
data is not only time-intensive, often taking weeks or months for complex systems but also prone
to human error, leading to potential omissions and misinterpretations. Furthermore, by the time
the data-gathering process is complete, some of the information may already be outdated,
especially in dynamic systems. Traditional methods also heavily depend on domain experts to
interpret and integrate data, creating bottlenecks and potential single points of failure. This reliance
on expertise can lead to inconsistencies in how data is interpreted and integrated across different
parts of the system model. These challenges collectively slow down the upgrade process and
increase the risk of errors and oversights, which could compromise the effectiveness of system
upgrades.
3.3.2.2 Difficulty in Converting Heterogeneous Data into a Structured, Usable Format
Another significant challenge in system upgrades is the difficulty of converting
heterogeneous data into a structured, usable format [100]. This challenge is primarily driven by
format diversity, as data comes in various forms, including numerical data from sensors, textual
data from maintenance logs, SysML models from system architecture, and more. Each data format
54
requires distinct methods for extraction and integration, complicating the process of creating a
unified system model. Furthermore, semantic inconsistencies arise when the same data element is
represented differently across various sources, leading to potential misinterpretations and errors in
data integration. For instance, a component identifier might differ between systems, causing
confusion and inaccuracies. Data quality issues also pose a significant problem, as inconsistent
data formats, missing values, and inaccuracies can significantly impact the reliability of the
integrated data. Ensuring high-quality data is crucial for effective analysis and decision-making,
yet traditional methods often struggle to address these quality issues comprehensively.
Collectively, these factors make it challenging to convert diverse and often unstructured data into
a coherent, structured format that can be readily used for system modeling and analysis, thereby
hampering the efficiency and accuracy of the upgrade process.
3.3.3 Multi-Modal Generative AI Agents for Rapid Data Extraction
3.3.3.1 Key Features of AI Agents
The developed methodology leverages advanced multi-modal generative AI agents
designed to enhance the data-gathering process. These agents exhibit several key features that
enable them to effectively and efficiently extract relevant information from diverse data sources.
A critical capability of these AI agents is their multi-modal knowledge acquisition [104].
They can process and interpret various types of data, including text, numerical data, images, and
combinations thereof. This capability allows the agents to gather comprehensive information from
a wide range of sources. This multi-modal approach supports the integration of diverse data types,
providing a more complete understanding of the system. Unlike traditional passive data collection
methods, these AI agents employ active learning strategies. They do not merely wait for data to be
55
provided; instead, they actively seek out relevant information. These agents ensure a better
understanding of the system by asking questions, exploring data sources, and filling knowledge
gaps. This proactive approach accelerates the data-gathering process and improves the quality of
the collected data. The AI agents are equipped with advanced contextual understanding
capabilities. They are trained to comprehend the context of the information they gather, allowing
them to interpret data within the broader framework of the system. This contextual awareness
enables the agents to make sense of complex data sets, identify relationships between different
data points, and ensure that the information they provide is relevant and accurate. Contextual
understanding is crucial for accurate data integration and analysis, as it helps maintain the
relevance and accuracy of the extracted information. Scalability is another essential feature of
these AI agents. They can handle vast amounts of data rapidly and scale their operations to meet
the demands of the task. This scalability ensures that the data-gathering process remains efficient,
regardless of the system's size or complexity. Scalable solutions are particularly important in
dynamic and large-scale environments, where data volume and variety can be substantial.
3.3.3.2 Bidirectional Interaction with System Components
The developed methodology includes multimodal generative AI agents capable of
bidirectional interaction with various system components. This bidirectional interaction ensures
that the AI agents gather information, interact with, and update the system components,
maintaining an accurate and current system model.
Physical Systems Interaction
The agents can directly interface with sensors on physical systems, collecting data on
system performance, environmental conditions, and operational parameters. This sensor
56
integration allows the agents to monitor the physical systems continuously, ensuring that the digital
model accurately reflects the system's current state. In addition to collecting data, the agents can
send commands or queries to control systems, enabling active probing and adjustment of system
behavior. This capability allows for more dynamic interaction with the system, providing feedback
and facilitating adjustments to improve system operation. Continuous interaction with physical
systems enables the AI agents to update the digital twin [89]. This continuous monitoring ensures
that the system model remains current, providing a reliable foundation for analysis and decisionmaking. The ongoing synchronization between the physical system and its digital representation
is critical for maintaining system accuracy and performance.
Digital Assets Interaction
The agents can read and interpret various digital documents, including design
specifications, operational manuals, and maintenance logs. By extracting relevant information
from these documents, the agents can integrate it into the system model, ensuring that pertinent
data is considered in the analysis and upgrade processes. In addition to document analysis, the
agents can interact with databases, formulating and executing queries to extract relevant data. This
capability allows the agents to access structured data stored in various databases, further enriching
the system model with comprehensive and up-to-date information. Beyond querying databases,
the AI agents can also manipulate the data within these databases. They can add new information
and edit existing entries, ensuring that the database remains accurate and current. This functionality
allows the agents to maintain the integrity of the data and ensure that all relevant information is
accurately reflected in the system model.
57
Software Tools Interaction
One of the innovative aspects of these agents, developed as part of this research, is their
ability to navigate and interact with the user interfaces of various software tools used in system
design and analysis. In the industry, a variety of software tools often require custom plugins to
access the information within them, and there are also issues with data-sharing standards. By
understanding and interacting with software UIs, the agents can automate tasks such as data entry,
retrieval, and processing, enhancing efficiency and reducing the risk of human error. This agentbased access approach helps address problems related to custom plugins and data-sharing
standards.
The agents can use the user interfaces to enter data or manipulate the software tool to
execute actions within the software. This capability efficiently evaluates potential system upgrades
and modifications, providing valuable insights into their impacts before implementation.
Additionally, this capability makes it possible for these agents to serve as an interface between
formal representations and advanced computational solvers. This is particularly important because
many computations are not feasible with formal logic solvers alone. In such cases, the
computations can be represented in formal logic, but they are solved by connecting them with
advanced solvers using the generative agent. This approach, developed in this research, ensures
that complex computations can be effectively managed and integrated into the system model.
By facilitating bidirectional interaction with physical systems, digital assets, and software
tools, the AI agents developed in this research ensure that the knowledge acquisition process is
not just a one-time data extraction but an ongoing, dynamic process. This continuous interaction
keeps the system model updated and relevant, providing a robust foundation for effective system
upgrades.
58
3.3.4 Automatic Conversion of Gathered Information into Formal Logic
The critical aspect of this methodology is converting heterogeneous data into formal logic.
This complex process requires a combination of generative AI agents and human input to ensure
accuracy and consistency. The approach developed in this research involves several steps, each
supported by fine-tuned generative AI models.
The AI agents process various types of data, including text, numerical data, and images
[104], [105]. This multi-modal capability allows the agents to extract and interpret information
from a wide range of sources, ensuring that the data is comprehensive and relevant. Once the data
is processed, the agents perform semantic analysis to understand the meaning and context of the
extracted information. This step is crucial for ensuring that the data is interpreted correctly and
integrated into the system model accurately. Semantic analysis helps in identifying key concepts,
relationships, and constraints within the data. After semantic analysis, the extracted information is
mapped to the system's ontology [106]. This step ensures consistency in terminology and
relationships across different data sources. The ontology mapping process involves organizing the
data into a structured format that aligns with the core, mid-level, and domain-specific ontologies
developed in this research. The interpreted and mapped information is then translated into formal
logic statements. This involves creating representations in description logic to capture the various
aspects of the system. Formal logic translation ensures that the data is machine-readable and can
be used for automated reasoning and analysis. Consistency checks are performed as the
information is converted into formal logic to identify and resolve any contradictions or
inconsistencies. This step is essential to maintain the integrity of the system model and ensure that
the data is reliable and accurate.
59
Figure 6: Conversion of a System Requirement into a Formal Logic and Graph Model
Figure 6 demonstrates the process of converting a requirement into formal logic and then
representing it as a graph model. The example given is a requirement stating, "The Mini Drone
Search System shall consist of exactly 3 mini drones operating in a lab environment."
• Requirement Extraction:
o The requirement is extracted from a system composition table, as shown in the
image.
• Formal Logic Conversion:
o This requirement is converted into formal logic notation:
MiniDroneSearchSystem ⊑ (hasContinuantPart exactly 3 MiniDrone) ⊓
(locatedIn some LabEnvironment).
o Here, "MiniDroneSearchSystem" is a class that has exactly three parts of the type
"MiniDrone" and is located in some "LabEnvironment".
• Graph Model Representation:
o The formal logic is then represented in a graph model. In the graph:
60
▪ "MiniDroneSearchSystem" is connected to "MiniDrone" with a relationship
"hasContinuantPart (exactly 3)".
▪ "MiniDroneSearchSystem" is also connected to "LabEnvironment" with a
relationship "locatedIn (some)".
o This visual representation helps in understanding the relationships and constraints
defined by the requirement.
This method of converting requirements into formal logic and subsequently into a graph
model ensures a precise and unambiguous representation of system requirements. It aids in
automated reasoning, validation, and analysis, providing a clear and structured way to manage
complex system specifications. This approach was developed and tested as part of this research,
incorporating both AI-generated models and human expertise to ensure accuracy and relevance.
The description logic (DL) representation offers several benefits for knowledge
representation within the digital twin framework [107]:
● Machine-readable format: Automated reasoning systems can easily process and interpret
DL expressions, enabling efficient inference and knowledge-based decision-making.
● Expressiveness and clarity: DL provides a rich set of constructs for representing complex
relationships and constraints between entities, ensuring that the knowledge is captured
accurately and comprehensively.
● Standardization and Interoperability: DL is a well-established formalism with
standardized syntax and semantics. It facilitates interoperability with other knowledgebased systems and enables knowledge sharing across different domains.
61
By utilizing description logic, the knowledge representation process transforms raw
information into a structured and machine-readable format, laying the foundation for advanced
reasoning and analysis within the digital twin.
Building on the earlier work where I developed a methodology to convert system
requirements into domain ontology graphs, this research extends that approach to a variety of data,
including control models, SysML models, Unity simulation models, and web applications. The
methodology for converting these data into ontology graphs involves an intermediate step where
the data is transformed into logical statements using description logic. The conversion process is
managed by a series of specialized agents, each designed to handle specific aspects of the
transformation. These agents analyze the model—whether it be a Simulink control model, SysML
diagram, Unity simulation, or web application—by identifying key concepts, relationships, and
properties inherent within the model. Additionally, the agents take into account the context, core
ontology, and mid-level ontology, including the scope and purpose of domain ontology. This
contextual awareness ensures that the conversion process aligns with the intended use and
objectives of the models. As an example, this methodology was applied to a UAV flight simulation
model. Figures 7 and 8 depict the initial and extended models, respectively. Figure 7 shows the
initial model within a software tool, including signal and flight control subsystems and their
interfaces. The model's information is captured through the software UI by generative AI agents,
which then convert it into logical statements that form the basis for generating the ontology graph.
Figure 8 illustrates the extended control model, where additional logical statements are
incorporated, leading to an expanded graph. This approach facilitates advanced querying,
reasoning, and data interoperability.
62
Figure 7. Integrated Representation of Control Model, Logical Statements, and Ontology Graph
63
Figure 7: Integrated Representation of Control Model, Logical Statements, and Ontology GraphFigure
8. Integrated Representation of Extended Model
64
3.3.5 Unique Contributions and Detailed Approach
The conversion from heterogeneous data into formal logic is not straightforward. The
research developed an approach that combines generative AI agents and human input, structured
into a 13-step process. For each step, fine-tuned generative AI models were developed and tested.
This approach was used to develop domain ontologies and system models based on those
ontologies, which were later connected through mid-level and core ontologies.
Active Knowledge Acquisition Approach for Model Development from Life Cycle Data
Stage 1: Identify Key Concepts and Categorize
1. Input Portion of Data: Process and transform unstructured data from various system
lifecycle phases into manageable chunks.
2. Identify Stakeholders: Use LLMs to recognize relevant stakeholders and curate the list
with human input.
3. Identify Decision Support Questions (DSQs): Extract and refine DSQs with LLMs and
human validation.
4. List Key Concepts: Extract key concepts using LLMs and validate with human input.
5. Categorization in Core Ontology Concepts: Organize key concepts into core ontologycompatible classes.
6. Create Two-Part Genus Species Definitions: Define terms systematically and validate
with human input.
7. Data Source Identification: Identify potential data sources using LLMs and assess
feasibility with human expertise.
65
Stage 2: Create Connections and Populate Data
8. Create Ontology Representation: Populate an N2 Matrix with relationships between key
concepts using LLMs.
9. Identify Predicates: Identify predicates for each subject-object pair in the N2 matrix.
10. Triplet Creation: Extract subject-predicate-object triplets from the N2 matrix and validate
with human input.
11. Automated Population of Graph Database: Translate ontology into a graph
representation and populate it with real-world data.
12. Decision Support Question Validation: Validate the ontology’s ability to answer DSQs
using formal logic queries and human examination.
Stage 3: Repeat and Refine
13. Repeat the Process: Continuously expand the ontology by iterating the process with
additional data portions.
14. Check Repetition of Key Concepts: Ensure no redundancy in the ontology by applying
matrix merging algorithms and resolving conflicts.
Appendix A provides details on implementing this 3-stage approach. This structured
approach ensures that the ontology remains comprehensive, up-to-date, and aligned with the
evolving knowledge landscape, providing a robust foundation for effective system upgrades and
decision-making processes.
The methodology developed in this research addresses a critical challenge in complex
systems engineering: the need to effectively capture, integrate, and utilize the vast amount of
knowledge spread across an organization. In large-scale projects, such as those in the aerospace
and defense sectors, knowledge is often distributed across various domains, stored in different
66
formats, and siloed within separate departments or teams. This fragmentation of knowledge can
hinder the development of a comprehensive system model, which is essential for understanding
the system's behavior, making informed decisions, and planning successful upgrades.
3.4 Formal Knowledge Representation
3.4.1 Overview
Effective knowledge representation is critical for managing complex systems, particularly
in the context of system upgrades. Complex systems, such as those in aerospace and defense
sectors, involve multiple interconnected domains, each with its own specialized models,
terminologies, and data representations [50], [58]. Managing and upgrading these systems
necessitates a comprehensive and precise representation of the system's knowledge base to ensure
coherence, interoperability, and efficiency [108], [109]. A primary challenge in managing complex
systems is the fragmentation of system knowledge across various domains [110], [111]. This
fragmentation arises because each domain—such as mechanical, electrical, and software
engineering—typically develops its own models and representations, often using incompatible
formats and semantics. This lack of a unified and consistent knowledge representation can
significantly hamper the upgrade process, leading to prolonged timelines, increased costs, and
elevated risks of errors. To address these challenges, a structured and integrated approach to
knowledge representation is essential. This approach should ensure that all relevant information is
accurately captured and seamlessly integrated across domains, facilitating better communication,
improved decision-making, and more efficient and reliable system upgrades. By overcoming the
issues of fragmented knowledge, it is possible to significantly enhance the overall performance
and operational efficiency of complex systems.
67
3.4.2 Problem Addressed
In complex systems engineering, fragmented system knowledge presents significant
challenges that impede the efficiency and effectiveness of system upgrades. These issues stem
from the lack of interoperability, inconsistent terminology, incomplete system views, and
difficulty in tracing dependencies [29]. Firstly, the lack of interoperability is a major issue.
Different domains often use models with incompatible formats and semantics [112]. This lack of
commonality makes it difficult to integrate information across the entire system, leading to siloed
data that cannot be easily shared or utilized in a cohesive manner [21]. For instance, mechanical
engineers might use CAD models, while software engineers rely on UML diagrams. Integrating
these disparate models into a unified system is a complex and time-consuming task. Inconsistent
terminology across domains further complicates the situation. Different fields may use varying
terms to describe similar concepts, leading to misunderstandings and misinterpretations. For
example, a term like "node" might have different meanings in electrical and network engineering.
This inconsistency not only hampers communication between teams but also leads to errors in
system interpretation, making it difficult to ensure that upgrades are correctly implemented [113].
Another significant issue is the incomplete system view that arises from fragmented
knowledge [114]. When knowledge is siloed within specific domains, it becomes challenging to
obtain a holistic understanding of the entire system. This fragmented perspective can lead to
incomplete or biased assessments of system performance and potential upgrade impacts. Without
a comprehensive view, engineers may overlook critical interactions and dependencies, resulting in
suboptimal upgrade decisions [115]. Another critical challenge is tracing dependencies across
domains. Complex systems have numerous interdependencies between components and
subsystems. When knowledge is fragmented, these dependencies are often poorly represented or
68
entirely missing. This lack of visibility makes it hard to predict how changes in one part of the
system will affect others. As a result, engineers may miss potential issues that could arise during
the upgrade process, leading to unexpected failures or degraded performance. Inefficient
knowledge transfer is another consequence of fragmented system knowledge. Without a unified
representation, sharing information between teams and stakeholders becomes cumbersome. Each
team might have its own methods and tools for documenting and managing knowledge, leading to
duplication of effort and inconsistencies. This inefficiency slows down the upgrade process, as
more time is spent reconciling disparate sources of information rather than focusing on the upgrade
itself.
Finally, the increased risk of errors is a significant concern. Inconsistencies between
domain models can lead to design flaws and integration issues, ultimately resulting in system
failures. For example, an upgrade designed based on outdated or incorrect models may not function
as intended, causing operational disruptions and potentially endangering safety.
The impact of these issues on the upgrade process is profound. Fragmented knowledge
leads to longer upgrade timelines, higher costs, and increased risks [116]. The lack of
interoperability and inconsistent terminology makes it difficult to integrate and communicate
effectively, while incomplete system views and difficulty in tracing dependencies hinder accurate
impact analysis. Inefficient knowledge transfer and versioning challenges further delay the
process, and the increased risk of errors can result in costly rework and system failures. Addressing
these challenges is crucial to improving the efficiency and reliability of system upgrades, ensuring
that they are completed on time, within budget, and with minimal risk.
69
3.4.3 Layered Ontology Approach
3.4.3.1 Introduction to the Layered Ontology Methodology
To effectively manage the complexity and fragmentation of system knowledge, this
methodology employs a layered ontology approach. This approach structures the representation of
system knowledge into multiple interconnected layers, each serving a specific purpose and level
of abstraction. The layered ontology methodology facilitates the integration of diverse domainspecific knowledge into a unified, coherent system model, ensuring consistency, interoperability,
and scalability.
Figure 8: Linking multi-domain knowledge
Figure 9 illustrates the layered ontology approach. The Core Ontology at the center defines
fundamental concepts and relationships applicable across all domains. The Mid-Level Ontology
70
bridges the core ontology and detailed Domain Ontologies. On the left, System Requirement
Specifications feed into the Requirements Engineering Domain Ontology, and Maintenance Logs
feed into the Maintenance Domain Ontology. These domain ontologies integrate with the midlevel and core ontologies, ensuring seamless communication and interoperability across domains.
It's important to note that Requirements Engineering and Maintenance are just examples; in
practice, many more such domains can be connected across the enterprise.
Core Ontology
The core ontology forms the foundation of the layered ontology approach, defining
fundamental, universal concepts applicable across all system domains [117]. This foundational
layer provides the basic building blocks for constructing more specialized ontologies, ensuring a
consistent and unified representation of knowledge throughout the system. The core ontology
establishes fundamental concepts and relationships universally applicable across different
domains, supporting interoperability and integration between various domain-specific models
[118]. This foundational layer helps eliminate inconsistencies and ensures that all subsequent
ontologies are built upon a shared conceptual framework.
Critical aspects in the core ontology include 'Entity,' 'Process,' 'Property,' and
'Relationship.' [119]. These entities are abstract and domain-independent, representing essential
elements relevant to any system. For example, 'Entity' can represent any physical or abstract
component, 'Process' refers to any action or series of actions performed by or on an entity, and
'Property' describes attributes or characteristics of entities. Characterized by its high level of
abstraction and domain independence, the core ontology is designed to apply to all other ontologies
within the system. Its generalized framework can be adapted and extended to suit the specific needs
of various domains. It serves as a common reference point that facilitates communication and data
71
exchange between different domain-specific models. The core ontology's role is crucial for
maintaining consistency across the entire system. Defining a common set of basic concepts and
relationships ensures that all subsequent ontologies adhere to the same conceptual framework. This
consistency is essential for effective integration and interoperability, allowing different domainspecific models to combine seamlessly into a coherent system model.
Mid-Level Ontology
The mid-level ontology bridges the highly abstract core ontology and detailed domainspecific ontologies, providing a concrete representation of system knowledge applicable across
multiple domains [120], [121]. It incorporates more specific concepts and relationships than the
core ontology while maintaining broad applicability. This layer facilitates the integration of
domain-specific knowledge into a cohesive system model, promoting interoperability and
consistency across domains.
Critical concepts in the mid-level ontology include 'System,' 'Component,' 'Function,'
'Requirement,' 'Interface,' and 'Behavior.' These concepts provide a detailed framework for
representing system knowledge. For example, 'System' represents an organized set of components
and their interactions, 'Component' refers to individual parts of a system, 'Function' describes
intended operations, 'Requirement' captures specific needs or conditions, 'Interface' denotes points
of interaction, and 'Behavior' outlines dynamic aspects of the system. These concepts allow for
comprehensive and nuanced system representations, facilitating detailed analysis and decisionmaking.
The mid-level ontology is characterized by its increased detail compared to the core
ontology. It captures the complexities of various domains while remaining general enough for
broad application. This balance makes it an essential bridge between the core and domain-specific
72
ontologies. By providing a common set of concepts and relationships, the mid-level ontology
ensures seamless integration of different domain-specific models into a unified system model,
maintaining consistency and facilitating effective communication and data exchange between
domains.
Domain Ontologies
Domain ontologies capture specialized knowledge specific to individual domains within
the system, such as mechanical, electrical, or software engineering. They provide detailed
representations of each domain's unique concepts, relationships, and constraints, enabling experts
to develop accurate models reflecting the complexities of their fields.
For example, the Design Ontology covers system design, including components,
relationships, and properties, integrating information from CAD models and design documents.
The Manufacturing Ontology focuses on materials, equipment, processes, and quality control in
manufacturing. The Operation Ontology includes sensor data, control signals, operational states,
and performance metrics, while the Maintenance Ontology deals with schedules, fault diagnosis,
and repair procedures. The Simulation Ontology represents knowledge about system simulations,
including models and results.
These ontologies integrate seamlessly with mid-level and core ontologies through shared
concepts and relationships, ensuring they fit into the broader system model. Multiple domain
ontologies can coexist within a single system model, each contributing specialized knowledge
while remaining part of a unified, coherent representation. This structured approach ensures
domain-specific knowledge is accurately captured and utilized within the broader system model,
enhancing the performance and reliability of system upgrades.
73
3.4.4 Advantages of the Layered Ontology Approach
The structured nature of the layered ontology approach allows for the easy integration of
new domains [122]. As new knowledge areas or technologies emerge, they can be incorporated
into the system model by developing additional domain ontologies that link to the existing core
and mid-level ontologies. This scalability ensures the system model can grow and evolve without
losing its coherence or integrity. Separating knowledge into different layers provides the flexibility
to develop detailed and specialized models for specific domains without losing connection to the
broader system model. Each layer can be modified or extended independently, allowing for
adaptive and responsive updates to the system knowledge base. The layered approach ensures
consistency across different domains by establishing a common foundation through the core and
mid-level ontologies. Shared concepts and relationships provide a uniform framework that all
domain-specific models adhere to, reducing the risk of inconsistencies and errors. Using a common
set of concepts and relationships across all layers facilitates effective communication and data
exchange between domain-specific models. This interoperability is crucial for integrating diverse
types of knowledge and ensuring that information can be seamlessly shared and utilized across the
entire system. Core and mid-level ontologies, due to their generality and broad applicability, can
be reused across different projects and systems. This reusability reduces the time and effort
required to develop new system models, as foundational elements do not need to be recreated for
each new application. Instead, they can be adapted and extended to meet the specific needs of each
new project. The layered ontology approach provides a robust framework for representing and
integrating complex system knowledge. It ensures that all relevant information is accurately
captured and seamlessly integrated, facilitating better communication, improved decision-making,
and more efficient and reliable system upgrades.
74
3.4.5 Formal Logic and Ontologies for Precise System Description
Integrating formal logic with ontologies provides a rigorous foundation for precise system
description, which is essential for managing complex aerospace and defense systems. Description
logic and OWL2 DL, the standard for expressing ontologies, ensure that knowledge is represented
clearly and unambiguously. This approach enables advanced reasoning capabilities, including
subsumption reasoning for automatic classification of concepts, consistency checking to maintain
logical coherence, and inference reasoning to derive new knowledge. Reasoning engines like Pellet
and Hermit support these capabilities by ensuring logical consistency, handling complex
dependencies, and probabilistic reasoning.
To leverage structured knowledge within ontologies, powerful query and rule languages
such as Cypher and SWRL are employed. Cypher facilitates complex queries and exploration of
relationships within the knowledge base, while SWRL defines intricate rules and constraints that
extend OWL2 DL's expressiveness. This integration ensures that system descriptions are both
human-readable and machine-processable, making them intuitive for human experts and suitable
for automated reasoning systems. Overall, this methodology provides a structured approach to
system description, enhancing analysis, decision-making, and system upgrades.
3.4.6 Generative AI Agents for Consistency Checking
Using generative AI agents is crucial for maintaining the integrity of the ontology-based
system model. These agents play a significant role in ensuring that the knowledge representation
remains consistent, up-to-date, and capable of handling new information and evolving patterns.
Agents continuously monitor and update the system model, identifying potential inconsistencies
or outdated information that could impact the upgrade process. This dynamic approach allows for
75
real-time adjustments and improvements, ensuring that the system model accurately reflects the
system's current state and provides a reliable foundation for planning and implementing upgrades.
3.4.6.1 Consistency Checking Agents
The primary function of consistency-checking agents is to monitor the ontology for logical
consistency and adherence to predefined rules and constraints. These agents are designed to detect
and address inconsistencies within the system model, ensuring that the knowledge base remains
coherent. Consistency-checking agents possess several key capabilities that enable them to manage
the integrity of the ontology effectively. These agents ensure that there are no contradictions or
conflicts within the knowledge representation. They verify that all relationships and properties
adhere to the logical rules defined in the ontology. Consistency-checking agents maintain the
structural integrity of the ontology by ensuring that all entities and relationships are correctly and
consistently represented. This includes verifying that the ontology's hierarchical and relational
structures are intact and logically sound. These agents also ensure that information is consistent
across different domain ontologies. By checking for alignment and coherence between domain
ontologies, mid-level ontology, and core ontology, they facilitate seamless integration and
interoperability within the system model.
The process of maintaining consistency involves several steps. Consistency-checking
agents periodically scan the entire ontology to detect any inconsistencies. These regular scans help
identify potential issues early, allowing for timely resolution. In addition to periodic scans, these
agents perform real-time checks whenever new information is added, or existing information is
modified. This ensures that any changes to the ontology are immediately verified for consistency.
When inconsistencies are detected, the agents can either automatically resolve simple issues or
76
flag more complex problems for human review. This dual approach ensures that the ontology
remains accurate and reliable while leveraging human expertise for more nuanced issues.
3.4.6.2 Knowledge refinement agents
Generative AI agents for knowledge refinement focus on continuously improving the
ontology based on new data and detected patterns. This dynamic process ensures that the
knowledge representation evolves in response to new information and changing requirements.
Knowledge refinement agents possess several critical capabilities. These agents identify
unusual patterns or outliers in the knowledge base that might indicate errors or new insights.
Agents detect areas where the ontology lacks sufficient detail or coverage. By identifying these
gaps, they guide the development of more comprehensive and detailed knowledge representations.
The ontology is continuously refined based on new data, user feedback, and system behavior
observations. This iterative process ensures that the knowledge base remains current and relevant.
The integration of generative AI agents for consistency checking and knowledge
refinement offers several significant benefits. Agents ensure that the ontology remains logically
consistent and structurally sound. By automating consistency checks, they reduce the risk of errors
and improve the reliability of the system model. The dynamic nature of refinement agents allows
the ontology to adapt to new information and evolving patterns. This adaptability is crucial for
maintaining an accurate and relevant knowledge base in the face of changing requirements. By
leveraging both automated processes and human expertise, this approach ensures that the
knowledge representation evolves in response to new insights and developments.
77
3.5 Analysis: Automated Reasoning and Human Interaction
3.5.1 Overview
The integration of digital twin technology and generative models in the context of modelbased systems upgrade methodology aims to tackle several critical challenges that impede the
efficiency and effectiveness of system upgrades. This section highlights the primary challenges,
including limitations in cross-domain impact analysis, difficulties in handling dynamics and
uncertainty and ensuring the correctness of upgrades.
3.5.2. Problem Addressed
The traditional methodologies for system analysis during upgrades face numerous
problems that can lead to suboptimal decisions and increased risks. System dynamics pose
significant challenges in the analysis of upgrades. Many traditional analysis methods rely on static
models that fail to capture the dynamic behavior of systems over time, resulting in inaccurate
assessments. Changes in system behavior or configuration over time are often not adequately
represented, leading to outdated analyses that do not reflect the system's current state. The inability
to effectively model and analyze system dynamics hampers the ability to predict the long-term
impacts of upgrades, making it difficult to make informed decisions. Managing uncertainty is
crucial for reliable system analysis. Traditional methods often use deterministic models that do not
account for inherent uncertainties in complex systems, leading to potentially flawed conclusions.
Working with incomplete or imperfect information is usually a reality, but traditional methods do
not adequately address this issue. The failure to properly quantify and propagate uncertainties can
lead to underestimating the risks associated with upgrades, resulting in unforeseen issues. In the
78
context of system upgrades, cross-domain impact analysis is often hampered by several issues.
Each domain typically conducts its analysis independently without adequately considering the
impacts on other domains. This siloed approach can lead to overlooked interactions and unintended
consequences. The complex interactions between different system components across domains are
frequently misunderstood or ignored, leading to incomplete analysis. The lack of a comprehensive
cross-domain perspective results in incomplete risk assessments, leaving potential risks
unmitigated and causing unexpected problems post-upgrade.
Ensuring the correctness of analysis and upgrade decisions involves overcoming several
challenges. Reliance on manual verification processes is time-consuming and susceptible to human
error. The application of formal verification methods is often limited, which can lead to potential
oversights in ensuring system correctness. As systems become more complex, ensuring the
correctness of upgrade decisions becomes increasingly challenging, and traditional methods
struggle to scale effectively. The integration of human expertise in the analysis process is often
inadequate: Some approaches rely too heavily on automation, neglecting the valuable insights and
intuition that human experts can provide. Black-box analysis methods make it difficult for human
experts to understand and trust the results, hindering effective collaboration and decision-making.
3.5.3 Ontology-Based Model with Temporal Embeddings and Probabilistic Reasoning
The methodology addresses the limitations of traditional system analysis approaches by
leveraging an ontology-based model enhanced with temporal embeddings and probabilistic
reasoning. This solution provides a framework for handling system dynamics and uncertainties,
ensuring more effective upgrade decisions.
79
3.5.3.1 Temporal Embeddings
The methodology addresses the dynamic nature of complex systems through temporal
embedding techniques. Temporal embedding allows for the representation and reasoning about the
temporal aspects of the system and its behavior. By incorporating temporal information into the
knowledge representation, the methodology enables the capture and analysis of the system's
evolution over time. Each piece of knowledge is associated with a temporal context, indicating
when it was acquired or considered valid. This allows for tracking changes in the system's state,
behavior, and requirements over its lifecycle. Temporal embedding also supports representing
dynamic relationships and dependencies between system components. By capturing the temporal
aspects of these relationships, the methodology enables the analysis of how changes in one part of
the system can propagate and affect other components over time. The temporal embedding
techniques ensure that the system model remains current and relevant as new knowledge is
acquired and existing knowledge is updated. As the system evolves, the methodology allows for
the integration of new information into the knowledge base, maintaining an up-to-date
representation of the system.
Figure 10 demonstrates the concept of temporal embeddings within a knowledge graph.
The left side of the figure presents a graph structure, depicting interconnected nodes (entities) and
edges (relationships) that form the knowledge graph. On the right side, two columns of arrays are
shown, representing the temporal embeddings for an edge and a node in the graph. The redbordered column represents the temporal embedding array for an edge, while the green-bordered
column represents the temporal embedding array for a node. Each row in these arrays (denoted as
a1, a2, ..., an for the edge; b1, b2, ..., bn for the node) corresponds to the embedding values at
different time steps. This structure captures the temporal evolution of the edge's and node's
80
properties or states, respectively. It's important to note that for more complex data representations,
these embeddings could take the form of multi-dimensional vector embeddings instead of simple
arrays. This allows for richer representations of temporal dynamics in complex systems. This
representation enables the knowledge graph to capture and reason about dynamic, time-dependent
information. By incorporating temporal embeddings, it becomes possible to model how
relationships and entities evolve over time, enhancing its capability to handle complex, evolving
systems and support sophisticated temporal inference tasks.
81
Figure 9: Temporal embeddings on graph
3.5.3.1 Probabilistic Reasoning
Knowledge acquired from various sources may have different reliability levels, leading to
gaps and inconsistencies. To address these challenges, the methodology incorporates probabilistic
modeling techniques into formal knowledge representation, associating each piece of knowledge
with a probability value that reflects its certainty. The uncertainty management framework
82
integrates with the ontology graph, allowing for a powerful, flexible knowledge base. Each node
and edge in the ontology graph is annotated with a probability value, reflecting the belief in their
validity. This is achieved by extending the graph embedding to include a "probability" attribute,
facilitating efficient access during reasoning and query processing. Probabilistic annotation
considers the distinction between axioms and assertions and the nature of knowledge sources.
Axioms, representing general rules, are assigned conservative probabilities, while assertions,
supported by reliable evidence, receive higher probabilities. Initial probabilities are assessed based
on generative model confidence scores, source reliability, and domain knowledge encoded in the
ontology [123], [124]. Probabilistic reasoning algorithms navigate and infer from the knowledge
base, considering uncertainties associated with each element. When inconsistencies are detected,
reasoning agents trigger human intervention for expert review and resolution. Traceability agents
maintain records of knowledge origins, ensuring easy identification of discrepancies. This human
feedback loop refines uncertainty measures, keeping the system model aligned with the best
available knowledge. By merging probabilities with the ontology graph, the digital twin
methodology creates a robust knowledge base that enables effective reasoning and decisionmaking.
3.5.4 Reasoning Agents: The Intelligent Core
Ensuring the correctness of analysis requires rigorous validation methods and tools to
detect and correct errors [125], [126]. Reasoning agents, as the intelligent core of the system
model, provide the necessary guidance and insights to translate knowledge into action effectively.
These agents leverage various reasoning techniques, drawing from the formalized knowledge base
83
and interacting with external tools to analyze situations, generate hypotheses, draw conclusions,
and inform decision-making processes.
3.5.4.1 Deductive Reasoning Agents
Deductive reasoning agents employ formal logic and rigorous inference rules to analyze
the interconnected knowledge within the ontology graph, generating actionable insights. The
implementation of deductive reasoning in the system model involves several key components. The
knowledge base, represented as an ontology graph with description logic expressions, forms the
foundation for deductive reasoning. Axioms and assertions are formally encoded using description
logic syntax and semantics, enabling automated reasoning and inference [126]. A dedicated
reasoning engine performs deductive inference on the knowledge base, applying predefined
inference rules to derive new knowledge and conclusions. The reasoning engine can chain together
multiple inference steps to uncover indirect connections and derive conclusions, enhancing the
system model’s ability to assist in system upgrade analysis and decision-making.
3.5.4.2 Generative AI Reasoning Agents Interfacing with Formal Logic and Computational
Solvers
The methodology leverages generative AI agents to enhance reasoning capabilities. These
agents interface with formal logic systems and computational solvers, providing a robust analysis.
To ensure rigorous reasoning, generative AI agents utilize description logic for ontological
classifications and relationships and description logic solvers to verify logical conclusions or prove
system properties, ensuring accurate analysis. In addition to formal logic systems, generative AI
agents interface with advanced computational solvers like Matlab for numerical analyses. The
84
integration process involves translating the formal logic representation of a problem into a specific
query suitable for these solvers, configuring solver parameters based on the system model, and
interpreting solver outputs to integrate them back into the ontology-based representation. This
ensures that results are incorporated into the overall system model.
3.4.4.3 Decision Support Question-Driven Approach with Human-in-the-Loop Validation
The methodology uses a decision-support question-driven approach with human-in-theloop validation to ensure the completeness of the system model. Stakeholders and AI agents
generate questions that traverse the system model across domains. Specialized AI agents interpret
these questions, translating them into formal queries. These queries are then executed against the
system model, leveraging advanced reasoning techniques to retrieve relevant information. Human
experts review these analyses to verify their accuracy and ensure the system model is sound. This
feedback loop refines the analysis, enhancing its credibility. This integration of AI and human
validation ensures reliable analysis for effective decision-making in upgrade scenarios.
3.5.5 Graph Model for Traceability
A key feature of the methodology is the use of an ontology as a comprehensive graph model
for traceability. This graph-based structure allows users to trace every element of the system
model, including discrepancies and uncertainties. The formal knowledge representation the
ontology provides enables detailed tracing of relationships and dependencies across various
domains. Additionally, it supports indirect relationship tracing through chaining and inference,
enhancing the ability to understand complex interconnections within the system. This approach
ensures transparency in the system upgrade analysis process.
85
3.6 Action: Agent-Based System with Planning and Coordination
3.6.1 Overview
The "Action" component bridges the analytical power of digital twins with practical
execution in both digital and physical environments. It establishes bidirectional connections,
crucial for continuous data synchronization, ensuring digital models accurately reflect physical
systems. This integration allows for effective feedback loops, optimizing operations based on
digital insights. Continuous synchronization is vital for monitoring and informed decision-making.
By continuously updating digital models with current data, the system maintains visibility essential
for operational optimization [98].
3.6.2 Problem Addressed
A significant challenge in the methodology is the difficulty in executing bidirectional
connections between physical systems and digital assets without relying on custom plugins.
Continuous synchronization between digital twins and physical systems is inherently complex.
Ensuring that digital models accurately reflect the state of physical systems requires robust
mechanisms for data capture, processing, and updating. Achieving accurate data reflection and
maintaining data integrity is another critical issue. The data exchanged between physical systems
and digital twins must be consistent to be useful for decision-making and operational adjustments
[103].
The methodology introduces a unique solution to address the challenges of bidirectional
connections between physical systems and digital assets through the deployment of Multi-Modal
Actuation Agents and Planning and Coordination Agents.
86
3.6.3 Multi-Modal Actuation Agents
Architecture and AI Foundation: The architecture of the multi-modal actuation agents is
built on generative AI models, including transformer-based architectures like GPT-4o. These
models enable flexible and context-aware behavior, allowing the agents to process and understand
complex data from diverse sources, making informed decisions based on continuous analysis.
Furthermore, generative models such as GPT-4o incorporate function-calling mechanisms that can
take real-time inputs and generate structured outputs. These outputs, often in the form of JSON
objects, can be connected with external APIs, controllers, or other functions. This capability is
crucial for continuous control of systems, as it allows the models to perform specific actions based
on inputs. Thus, the agents not only process data but also generate actionable data that can be
utilized by other systems and agents, enhancing operational capabilities and integration efficiency.
In the realm of digital systems, actuation agents excel in integrating with software tools,
managing database operations, and performing file system tasks. They interact with APIs for
programmatic control and use UI automation techniques for tools lacking accessible APIs,
ensuring seamless integration and operational automation. These agents handle sensor data
processing and actuator control. The process begins with knowledge acquisition agents that ingest
continuous data from various sensors and convert it into formal representations using ontologies,
as mentioned in previous sections. This formalized data is then utilized by reasoning agents to
determine the best course of action. The resulting recommendations are passed to the actuation
agents, which generate precise control commands for different actuators. By processing feedback
signals to verify correct execution, the actuation agents enable dynamic adjustments to physical
systems. This approach ensures that the physical system control is updated and corrected. The
generative AI agents facilitate human interaction through natural language processing. They bridge
87
the communication gap between complex system data and human operators, enabling interactive
querying to clarify instructions or provide additional information.
3.6.4 Interacting with the Physical Twin
Action agents directly interact with the physical twin, creating a closed-loop feedback
system where the digital twin's insights influence the physical system's behavior and performance.
This enables real-time optimization, adaptation, and control.
Providing Commands and Adjustments
Action agents send commands to actuators, adjust control parameters, and modify settings
within the physical system based on the digital twin's knowledge and reasoning. This direct
intervention enables continuous monitoring and optimization of the physical system. Agents can
adjust parameters to improve efficiency or enhance performance metrics. As conditions change,
agents adapt control strategies, ensuring the system remains responsive. The digital twin can also
predict potential issues or failures, with action agents taking preventative measures to avoid
downtime.
Generating Recommendations:
Action agents analyze the system's state and behavior, generating recommendations for
human operators or automated control systems. These insights help improve system performance.
Agents might suggest adjustments to operating parameters to address flaws and enhance
functionality.
88
Simulation as a Catalyst for Knowledge Update:
To bridge the gap between formal logic and advanced reasoning, the system uses
simulation for knowledge update and refinement. Action agents trigger simulations in various
environments to model complex behavior, analyze parameters, and generate data for the
knowledge base. This feedback loop allows the digital twin system model to validate and refine
its understanding.
3.6.5 Planning and Coordination Agents
The planning and coordination agents employ a hierarchical planning system to manage
complex tasks. This system involves planning, which decomposes high-level objectives into
manageable sub-goals and allocates resources. Tactical planning determines the sequence of tasks
and assigns them to appropriate agents, while operational planning specifies the exact actions and
timing coordination needed for execution. Effective coordination among multiple agents is
achieved through robust communication protocols and conflict resolution strategies. Agents
communicate via standardized protocols for data exchange, ensuring consistency and reducing
errors. Coordination mechanisms manage task dependencies and facilitate collaborative decisionmaking, ensuring agents work harmoniously towards common goals. The integration of human
oversight is crucial for validating agent actions and ensuring alignment with operational goals.
Interactive user interfaces provide real-time visibility into agent activities, while approval
workflows and escalation procedures allow human operators to intervene when necessary.
Continuous feedback mechanisms incorporate human input into the agents' decision-making
processes, enhancing overall system reliability and effectiveness.
89
This unique combination of multi-modal actuation agents and planning and coordination
agents ensures a flexible and efficient system for managing bidirectional connections between
physical systems and digital assets, addressing the complexities of continuous synchronization.
3.6.5.1 Multi-Step Plan Generation
Reasoning Engine: The planning process begins with a well-defined goal. This goal is
typically derived from a decision-support question posed by a stakeholder. Action agents then
leverage the generative models and the knowledge base to generate multi-step plans that detail the
sequence of actions required to achieve the desired goal [105]. The generative model functions as
a reasoning engine, analyzing the goal, the current state of the system, and the available knowledge
within the knowledge base to identify potential pathways to success. This process involves the
generative model’s ability to understand natural language, reason about relationships between
entities, and generate solutions.
Knowledge Base as a Resource: The knowledge base, structured as a probabilistic temporal
ontology graph, equips the generative model with comprehensive information about the system,
its components, their properties, and their temporal dynamics [128]. This resource includes axioms
that establish the fundamental rules and principles governing the system, assertions that detail
specific facts and observations, and time series data that capture the changes in dynamic attributes
over time.
Chain-of-Thought Reasoning: Drawing inspiration from the work of Wei et al. (2022) [99],
the planning process incorporates chain-of-thought reasoning, where the generative model
formulates a series of intermediate reasoning steps leading to the final plan. This approach
90
promotes a more transparent and interpretable planning process, offering insights into the
generative model’s decision-making and helping to identify potential errors or biases.
Plan Representation: The generated plan is articulated as a sequence of actions, each
accompanied by its parameters, preconditions, and expected outcomes. This structured format
ensures the plan can be efficiently executed and monitored by the action agents, facilitating clear
and systematic implementation of the planned actions [129].
3.6.6 Utilizing Multi-Agent Collaboration
When tackling complex challenges that require diverse skills and knowledge, the digital
twin system model leverages multi-agent collaboration. Drawing inspiration from human
teamwork and research on LLM-based systems like ChatDev and MetaGPT, this framework
enables specialized agents to work together seamlessly. Each agent contributes unique capabilities,
enhancing efficiency in action execution.
3.6.6.1 Communication and Coordination
Effective communication and coordination are crucial for multi-agent collaboration. The
digital twin system model uses message passing and a shared knowledge base to facilitate this.
Agents exchange information about tasks, progress, and observations through message-passing
systems, ensuring real-time coordination and awareness of the system's state. The shared
knowledge base, enriched with temporal and probabilistic data, provides a consistent and up-todate understanding for all agents. To prevent conflicts, coordination protocols define agent
interactions, task sequences, communication channels, and decision-making processes.
91
3.6.6.2 Conflict Resolution
Conflicts in a multi-agent system can arise from differing objectives or priorities. The
digital twin system model resolves these issues through prioritization and using a central
coordinator agent. A central coordinator oversees agents' activities and resolves disputes, ensuring
coherent and effective collective action. Research on LLM-based multi-agent systems informs
this implementation. ChatDev's chat-based interface and role-playing for task management [130]
and MetaGPT's standardized operating procedures[131] guide structured communication and
workflows. These frameworks ensure consistency and efficiency in multi-agent interactions,
enabling the digital twin system model to optimize performance and solve problems through
coordinated teamwork.
3.7 Summary
This chapter presented a novel, systematic model-based systems upgrade methodology
designed to address the inherent challenges in upgrading complex aerospace systems. The
methodology integrates Model-Based Systems Engineering (MBSE), digital twin technology, and
generative AI models to create a cohesive framework to enhance system upgrade processes. Key
components of the methodology include:
• Knowledge Acquisition: Utilizing multi-modal generative AI agents for active knowledge
acquisition, reducing the time and effort required for manual model development and
integration.
• Ontology: Employing layered ontologies for formal knowledge representation, ensuring
consistency and coherence across different domains and facilitating effective integration
and analysis.
92
• Analysis: Leveraging formal reasoning as the core, enhanced by temporal embeddings and
probabilistic reasoning to manage system dynamics, uncertainties, and correctness. The
integration of formal reasoning with generative AI agents further enhances reasoning
capabilities and human interaction.
• Action: Implementing an agent-based system with planning and coordination capabilities
to execute bidirectional connections between physical systems and digital assets, ensuring
efficient and accurate execution of system upgrades.
The methodology can improve the efficiency of system upgrades by automating knowledge
acquisition, enhancing formal analysis techniques, and integrating generative AI agents for rapid
system model creation and analysis. The next chapter will focus on the implementation of the
methodology in a test bed environment. The methodology was applied to an exemplary system,
detailing the steps involved in the upgrade process and evaluating the results. This practical
application showcased the methodology's effectiveness and provided insights into its real-world
applicability and benefits.
93
Chapter 4
Testbed Implementation: Illustrative Example and Results
4.1 Overview
The implementation of a testbed and the subsequent analysis of results are critical steps in
validating the efficacy of the methodology for integrating Digital Twin and Generative Models in
Model-Based Systems Upgrade. Testbeds provide a controlled environment where theoretical
concepts can be rigorously tested, demonstrated, and refined. The significance of a testbed lies in
its ability to facilitate detailed observation, measurement, and analysis, which are essential for
ensuring the robustness of the system upgrade methodology. Through testbeds, it is possible to
simulate real-world conditions and identify potential issues early, thereby enhancing the overall
effectiveness and efficiency of the upgrade process.
This chapter provides an overview of the experimental setup, the upgrade scenario selected
for testing, and the results obtained from these experiments. The experimental setup overview
includes detailed descriptions of the hardware and software configurations used and the specific
scenarios designed to test the system's capabilities. The selected upgrade scenario involves
transitioning from a simple dual-UAV navigation system to a more complex multi-UAV
autonomous search system. This scenario was chosen for its relevance to current challenges in
autonomous systems and its potential to showcase the strengths of the approach. By presenting
results and analyses, this chapter aims to demonstrate the tangible benefits and improvements
94
achieved through this methodology, thereby providing concrete evidence of its impact on system
upgrade processes. The results highlight the methodology's ability to enhance system performance,
reduce operational risks, and improve overall upgrade efficiency.
4.2 Purpose of the Testbed
The primary purpose of the testbed is to create a controlled environment that accurately
represents the complexities and challenges of system upgrades. This controlled setting allows for
the validation of theoretical foundations, ensuring that the concepts and models in the methodology
are both sound and effective. By rigorously testing these theories, the testbed can confirm their
applicability in practical scenarios. In addition to validation, the testbed serves as a crucial platform
for demonstrating the practical application of the methodology. By implementing a scaled-down
version of a system, the testbed makes it possible to showcase the methodology's capabilities in a
tangible and accessible manner. This demonstration is essential for communicating the benefits
and potential applications of the methodology to stakeholders, including researchers, practitioners,
and decision-makers in the field. The testbed also facilitates extensive experimentation and data
collection. By allowing for the systematic manipulation of variables and observation of outcomes,
the testbed enables a wide range of experiments to be conducted in a controlled and repeatable
manner. This experimentation is vital for gathering detailed performance data, which can be used
for rigorous quantitative analysis of the methodology's impact. The ability to collect and analyze
this data is crucial for understanding the strengths and limitations of the approach and for making
evidence-based improvements. Finally, the testbed supports the iterative refinement of the
methodology. By providing a platform for rapid testing and feedback, the testbed allows for the
quick identification and resolution of potential issues. This iterative process ensures that the
95
methodology can be continuously improved and adapted to address new challenges and
incorporate emerging technologies. Through repeated cycles of testing, analysis, and refinement,
the testbed plays a key role in enhancing the overall robustness and effectiveness of the system
upgrade methodology.
4.3 Overview of the Upgrade Scenario
The upgrade scenario involves a transition from a basic dual-UAV navigation system to a
more complex multi-UAV autonomous search system. Initially, the system consisted of two
miniature UAVs that performed simple point-to-point navigation while avoiding static obstacles
in a lab environment. The upgraded system, however, expands to include three UAVs that are
capable of autonomously searching for specific objects in an environment containing both static
and dynamic obstacles. This increase in complexity demonstrates the methodology's ability to
manage multi-domain changes and integrate advanced technologies effectively.
Figure 11 shows photographs of these settings in different lab environments. On the left,
the fielded system is depicted with 2 UAVs operating in the initial lab setup. On the right side, the
upgraded system is shown with 3 UAVs flying in a new, more complex lab environment. In the
upgraded system, obstacles in the form of white boxes are placed in the lab, obstructing the route
of the UAVs. Additionally, a fan is placed in the lab environment in a specific navigation zone.
This fan is connected to a control system that automatically adjusts its speed at specific time
intervals. The controller randomly varies the fan speed from zero to maximum, generating
intermittent and unpredictable air forces in the area in front of it. This setup creates dynamic and
unpredictable environmental challenges for the UAVs to navigate. Both photos capture snapshots
during operation and do not show the complete lab environments, but rather focus on the UAVs in
96
action to illustrate the transition from the initial to the upgraded system. The change in lab
environments, including the addition of the controllable fan, further highlights the increased
complexity and capabilities of the upgraded system in handling dynamic conditions.
Figure 10: Transition from Fielded System to Upgraded System
The selection criteria for this upgrade scenario were based on several factors. First, the
relevance to current challenges in autonomous systems, particularly in areas such as search and
rescue, environmental monitoring, and surveillance, was paramount. These applications require
sophisticated coordination, sensing, and decision-making capabilities, making them ideal for
demonstrating the strengths of the methodology. Additionally, a crucial consideration was the
scenario's potential to highlight key features of the methodology, such as real-time data integration,
autonomous decision-making, and enhanced operational efficiency. The ability to showcase these
features in a controlled and repeatable setting was essential for validating the methodology's
effectiveness.
Key aspects of the upgrade include several changes and enhancements:
1. Hardware Changes: Although the hardware changes were minimal, the addition of a third
UAV was an enhancement. This increase in the number of UAVs necessitated
97
improvements in coordination and control mechanisms. Each UAV was equipped with
additional sensors to improve environmental awareness and data collection capabilities,
enhancing the overall system performance.
2. Software Upgrades: The most substantial changes occurred on the software side,
particularly in path planning and strategy development. Advanced autonomous search
algorithms were implemented to enable the UAVs to perform complex search tasks
efficiently. The control system transitioned from a centralized architecture to a
decentralized one, allowing each UAV to make autonomous decisions while coordinating
with others. This change significantly enhanced the system's flexibility and robustness.
3. Infrastructure Modifications: The system's data management infrastructure was
upgraded from a centralized cloud-based system to a more sophisticated graph database
and distributed computing architecture. This modification improved data storage, retrieval,
and processing capabilities, allowing for more efficient handling of complex,
interconnected data. The graph database enabled better integration and querying of data
across different domains, facilitating more effective decision-making processes.
4. Operational Enhancements: The operational capabilities of the system were enhanced.
Previously, the UAVs could only navigate from point A to point B. With the upgrade, the
UAVs can now autonomously search for specific objects, adapt to dynamic environments,
and coordinate their movements in real-time. This involves complex decision-making
processes, advanced path planning, and the ability to react to unexpected changes in the
environment, transforming the basic navigation system into a sophisticated multi-UAV
autonomous search platform.
98
This upgrade scenario demonstrates the methodology's ability to manage substantial
changes across multiple domains, integrate new technologies, and enhance overall system
performance. By comparing the performance and capabilities of the system before and after the
upgrade, the tangible benefits and improvements achieved through this approach are clearly
illustrated.
4.4 Initial System Configuration: Dual UAV Navigation
The initial system configuration for dual UAV navigation encompasses several key
components, each critical to the system's overall functionality and performance. This section
provides a description of the physical system components, software and infrastructure elements,
and operational capabilities of the initial setup.
4.4.1 Description of Physical System Components
UAV Specifications: The initial system utilizes two miniature UAVs designed for basic
navigation tasks. These UAVs are lightweight, weighing approximately 80 grams each, including
propellers and batteries. Their compact dimensions are 98 mm x 92.5 mm x 41 mm, making them
suitable for indoor environments. The UAVs are equipped with 3-inch propellers and include
several built-in functions such as a range finder, barometer, LED, vision system, 2.4 GHz 802.11n
Wi-Fi, and a 720p live view camera. The drones also have a micro-USB charging port.
Flight Performance: The UAVs offer a maximum flight distance of 100 meters, a top
speed of 8 m/s, and a maximum flight time of 13 minutes. They can reach a maximum flight height
of 30 meters. The battery is detachable with a capacity of 1.1 Ah / 3.8 V and a charging time of
approximately 90 minutes.
99
Camera: The camera specifications include a photo resolution of 5 MP (2592 x 1936) and
a field of view (FOV) of 82.6°. The video resolution is HD 720p at 30 fps, and the formats
supported are JPG for photos and MP4 for videos. The camera also features electronic image
stabilization (EIS).
Safety Features: The UAVs are equipped with several safety features including failsafe
protection, which ensures safe landing if the connection is lost, and propeller guards to prevent
damage. They also support auto takeoff and landing to simplify flight operations.
4.4.2 Software and Infrastructure Components
Programming and SDK: The UAVs support an advanced software development kit
(SDK) for developing custom applications. This SDK provides functionalities for real-time image
transmission, camera and video recording, firmware upgrades, and UAV calibration.
Control System: The control system operates from a high-performance workstation
equipped with an Intel Core i9-13900H processor and 64 GB of RAM. Each UAV has a separate
controller, and there is an additional controller for executing autonomous operations. The
localization system includes four Intel depth cameras (D455) and a dedicated controller to process
their inputs.
Cloud Infrastructure: The initial system employs a centralized cloud infrastructure
hosted on Microsoft Azure. This setup includes a database for storing telemetry data, mission
parameters, and system logs. The cloud-based approach allows for scalable data management and
processing but can introduce challenges related to latency and connectivity.
Obstacle Avoidance Algorithm: The initial system features a basic obstacle avoidance
algorithm that uses the front-facing camera feed. Image processing techniques are employed to
100
detect obstacles and adjust the UAVs' trajectories accordingly. While this method is effective for
avoiding large, static obstacles, it has limitations in more complex or dynamic environments.
4.4.3 Operational Capabilities
Point-to-Point Navigation: The primary operational capability of the initial system is
point-to-point navigation. The UAVs can be assigned start and end points within the test arena,
and the central control system calculates and executes a flight path between these points. This
capability forms the foundation for more complex missions but is limited in its adaptability to
changing environments or objectives.
Static Obstacle Avoidance: The system also has the ability to avoid static obstacles. Using
the front-facing cameras and the obstacle avoidance algorithm, the UAVs can detect and navigate
around stationary objects in their path. However, this capability is limited to obstacles directly in
front of the UAVs and does not account for dynamic or complex obstacle arrangements.
In summary, the initial system configuration provides a solid baseline for demonstrating
the impact of the upgrade methodology. The system's limitations in sensing, processing, and
autonomous capabilities present clear opportunities for improvement, setting the stage for an
upgrade that will transform the primary navigation system into a sophisticated multi-UAV
autonomous search platform.
4.5 Upgraded System: Multi-UAV Autonomous Search
The upgraded system transforms the initial dual-UAV navigation setup into a more
sophisticated multi-UAV autonomous search system. This upgrade includes enhancements in the
101
physical system, software and infrastructure, digital twin and AI integration, and operational
capabilities.
4.5.1 Physical System Enhancements
Addition of Third UAV: The most notable physical enhancement is the addition of a third
UAV. This increase in the number of UAVs necessitates improved coordination and control
mechanisms, allowing for more complex missions and enhanced system functionality. The
addition of the third UAV significantly expands the system's operational scope, enabling more
efficient and effective search and exploration tasks.
Expanded Sensor Suite: Using the digital twin system model and its integrated agents to
query APIs and software code, the capability to utilize both front and downward-facing cameras
was identified and integrated into the algorithm. The UAVs now alternate between the front and
bottom cameras to enhance environmental awareness and data collection. This expanded sensor
suite improves obstacle detection and navigation, providing a more comprehensive view of the
surroundings and enhancing the overall performance of the UAVs during autonomous search tasks.
Enhanced Localization: The digital twin system model's graph-based nature allowed for
improved sensor fusion, enhancing the localization system. The initial system used four Intel depth
cameras (D455) positioned around the test arena. The upgraded system integrates these depth
cameras with Inertial Measurement Unit (IMU) sensors, using sensor fusion techniques to provide
more accurate and robust position tracking. This enhanced localization system ensures reliable
navigation and coordination among the UAVs, even in challenging environments.
102
4.5.2 Software and Infrastructure Upgrades
Transition to Neo4j Cloud Infrastructure: The system's data management infrastructure
has been upgraded from a centralized cloud-based system to a Neo4j cloud infrastructure. This
graph database allows for more efficient storage, retrieval, and processing of complex,
interconnected data. The transition to the Neo4j graph database enhances the system's ability to
manage and query data across multiple domains, facilitating better decision-making and
coordination.
Decentralized Control System: The control system has been transitioned from a
centralized architecture to a decentralized one. In the upgraded system, each UAV is capable of
making autonomous decisions while coordinating with others. This decentralized approach
improves the system's flexibility, scalability, and robustness, allowing for more complex and
adaptive mission planning and execution.
Implementation of Autonomous Search Algorithms: The digital twin system model,
connected to a virtual simulation environment, allowed for extensive testing of various
autonomous search algorithms. This integration enabled the testing of a wide range of algorithms
and search strategies without the need for physical UAV flights. Advanced autonomous search
algorithms were implemented to improve the UAVs' adaptive behavior and performance over time.
This capability enhances the system's ability to conduct complex search tasks in dynamic
environments.
4.5.3 Digital Twin and AI Integration
Continuous Updates: A digital twin system model of the entire system has been
implemented, receiving continuous updates from the physical UAVs. This digital twin allows for
103
an accurate representation of the physical system's state, enabling better predictive modeling and
decision-making. Continuous updates ensure that the digital twin remains an up-to-date
representation of the physical system.
Generative Models for Planning and Coordination: Generative AI models have been
integrated to handle complex tasks such as mission planning, inter-UAV coordination, and humansystem interaction. These models can generate novel solutions to unforeseen challenges,
significantly enhancing the system's adaptability. The generative models are crucial for optimizing
search strategies and improving overall system performance.
Ontology-Based Data Integration: The upgraded system uses an ontology-based
approach for data integration, implemented through the Neo4j graph database. This approach
allows for flexible integration of diverse data types, supporting complex queries and reasoning
across different domains. Ontology-based data integration ensures that all relevant data is
accessible and usable for decision-making and planning for system upgrade processes.
4.5.4 New Operational Capabilities
Autonomous Object Search: One of the key new capabilities of the upgraded system is
autonomous object search. The UAVs are now capable of searching for and identifying specific
objects within their environment. This capability involves complex coordination between UAVs,
real-time image processing, and adaptive search strategies, significantly enhancing the system's
operational effectiveness.
Navigation in Environments with Static and Dynamic Obstacles: The upgraded system
can navigate through environments with both static and dynamic obstacles. This involves real-time
path planning, predictive modeling of obstacle movements, and coordinated maneuvers between
104
UAVs. The ability to navigate in such complex environments demonstrates the system's advanced
capabilities and its potential for a wide range of applications.
Overall, the upgraded system represents a leap in capabilities and performance. The
enhancements in physical components, software and infrastructure, digital twin and AI integration,
and operational capabilities collectively transform the initial dual-UAV navigation system into a
sophisticated multi-UAV autonomous search platform. This transformation showcases the
effectiveness of the methodology in managing complex system upgrades and integrating advanced
technologies.
4.6 Upgrade Process Implementation
The implementation of the upgrade process for transforming the dual-UAV navigation
system into a multi-UAV autonomous search system involves several critical steps. These steps
ensure a comprehensive and efficient upgrade, leveraging advanced technologies to enhance
system performance and capabilities.
4.6.1 Knowledge Acquisition and Ontology Development
Active Knowledge Acquisition Agents: The upgrade process begins with the deployment
of active knowledge acquisition agents. These agents actively gather information from various
sources, including technical specifications, operational manuals, sensor data logs, and expert
interviews. This automated process significantly accelerates the knowledge acquisition phase,
reducing the time required from weeks to days.
Formal Knowledge Representation: Once the information is gathered, it is converted
into a formal knowledge representation. The data is structured and organized using formal
105
ontologies that capture the relationships and hierarchies within the system. This formal
representation ensures consistency and facilitates the integration of diverse data types.
Ontology and System Model in a Graph Database: The formal knowledge is then
integrated into an ontology-based system model within a graph database. This approach leverages
the capabilities of graph databases to manage complex, interconnected data efficiently. The
ontology-based model provides a comprehensive and flexible framework for representing the
system, enabling sophisticated queries and analyses.
4.6.2 Cross-Domain Integration
Layered Ontology Approach: To address the complexities of integrating knowledge
across different domains, a layered ontology approach is employed. This approach involves
creating core, mid-level, and domain-specific ontologies that capture the various aspects of the
system. The Industrial Ontology Foundry Core ontology and ISO/IEC 21838-2:2021 Information
technology — Top-level ontologies (TLO) are used as the core ontologies. Common Core
Ontology is utilized as a mid-level ontology, while domain ontologies are developed through
human-agent collaboration in an accelerated manner. The layered structure facilitates the
integration of multi-domain knowledge, ensuring that relationships and dependencies are
accurately represented.
Graph Database Technologies for Integration: Initially, relational databases were used
to manage the system data. However, these relational databases were found to be ineffective due
to the complexity and number of relationships among the elements. Therefore, for the upgraded
system, Azure Cloud was used with Neo4j AuraDB graph database and Azure serverless functions.
This setup, combined with generative AI agents based on GPT-4o and CrewAI, along with tools
106
like Python, Selenium, and AutoHotkey (AHK) for robotic process automation, significantly
enhanced the system's ability to manage, integrate, and query data across multiple domains.
Knowledge graph RAG (retrieval-augmented generation) agents, along with reasoning agents, are
employed to provide data to the ontology-based system model and to retrieve data from that model.
4.6.3 Digital Twin Modification and Validation
Continuous Updating and Validation: The digital twin of the system is continuously
updated with real-time data from the UAVs. This ongoing process ensures that the digital twin
remains an accurate and up-to-date representation of the physical system. Continuous validation
of the digital twin is performed by comparing its predictions with actual system performance,
identifying and correcting discrepancies as needed.
Virtual Testing: The digital twin is used for virtual testing. By conducting extensive
virtual tests, various scenarios and conditions can be evaluated without the risks and costs
associated with physical testing. This approach allows for thorough testing of new features and
algorithms, significantly accelerating the development and validation process.
4.6.4 Generative Model Integration
Development and Testing of Control Codes: Generative AI models are employed to
assist in the development and testing of control codes. These models use the comprehensive
knowledge base and digital twin to generate and evaluate control strategies, optimizing the
system's performance.
Refinement of Planning and Decision-Making Algorithms: The generative AI models
also play a crucial role in refining planning and decision-making algorithms. By leveraging the
107
digital twin and virtual simulation environment, these models can test and improve various
algorithms, ensuring that the system can adapt to changing conditions and mission requirements
effectively.
Interface with Virtual Simulation Environment: The digital twin system model
interfaces with the virtual simulation environment to facilitate extensive testing and evaluation.
This integration allows for the seamless transfer of data and control commands between the
physical and virtual systems. The virtual simulation environment, developed using Unity 3D and
C# APIs, provides a flexible and robust platform for testing different search strategies and pathplanning algorithms.
Overall, the upgrade process implementation leverages advanced knowledge acquisition
techniques, formal ontologies, graph database technologies, digital twin models, and generative
AI to transform the dual-UAV navigation system into a sophisticated multi-UAV autonomous
search platform. This comprehensive and systematic approach ensures that the upgraded system is
robust, efficient, and capable of meeting the complex demands of autonomous search operations.
Figure 12 illustrates the comprehensive architecture of the upgraded system's testbed
implementation. At the heart of this architecture is the digital twin system model, which is realized
through the combination of a central graph database and a network of generative AI agents. The
central graph database serves as the repository for the ontology and data, while the surrounding
agents perform various specialized functions such as knowledge acquisition (perception), action,
reasoning, human interaction, coordination, and planning. These agents work in concert with the
graph database, continuously exchanging information to maintain an up-to-date and accurate
digital representation of the system. Surrounding this digital twin are the physical and virtual
components of the system, including hardware elements, software tools, and digital assets. These
108
components are bidirectionally integrated with the digital twin, allowing for real-time
synchronization between the physical system and its digital counterpart. Figure 12 illustrates the
upgraded multi-UAV autonomous search system, which integrates advanced technologies for
enhanced performance. The system comprises UAVs (UAV1, UAV2, UAV3) equipped with
sensors and controlled by ROS2 and Python. Four lab cameras provide data to the localization
system, ensuring accurate navigation. A central controller coordinates UAV operations.
Knowledge acquisition and action agents, powered by GPT-4o and CrewAI, process data from
hardware specifications and system requirements. The Neo4j AuraDB graph database stores
integrated ontologies and system models, facilitating complex queries. Coordination and planning
agents, also powered by GPT-4o and CrewAI, utilize this data for efficient mission planning.
Graph DB reasoning and RAG agents enhance system performance through advanced reasoning
capabilities. A virtual simulation environment, developed in Unity 3D, allows extensive algorithm
testing. Finally, a human interaction agent provides intuitive control and real-time feedback,
ensuring seamless human-system interaction. This architecture exemplifies a sophisticated,
integrated approach to multi-UAV operations, leveraging graph databases, AI agents, and virtual
simulations
109
Figure 11: Implementation of the Upgraded System with Graph Databases, Generative AI
Agents, UAVs, and Localization System
4.7. Experimental Results and Performance Comparison
The experimental results and performance comparison highlight the improvements and
benefits achieved through the upgrade of the dual-UAV navigation system to a multi-UAV
autonomous search platform. This section provides an analysis and discussion of operational
improvements, system integration and efficiency, development and testing enhancements, digital
twin and model improvements, evaluation of upgrade process efficiency, the impact of digital twin
and generative AI integration, and a comparison with traditional MBSE approaches.
4.7.1 Operational Improvements
The upgraded system demonstrated a reduction in failure rates and collisions. As present
in Figure 13, the data shows a decrease in the number of crashes from 21 incidents before the
110
upgrade to 15 incidents after the upgrade. Similarly, the number of technical issues decreased from
18 incidents to 12 incidents post-upgrade. This reduction in incidents highlights the enhanced
robustness of the upgraded system, resulting in fewer collisions and errors during operation and
testing. The system's ability to navigate in dynamic environments saw improvement. In tests
involving dynamic obstacles, the upgraded system successfully avoided operational regions with
uncertainty.
Figure 12: Incident Comparison: Before and After Upgrade with Specific Data
4.7.2 System Integration and Efficiency
The upgraded system successfully integrated data from multiple domains, including sensor
data from operational logs, controls code, UAV hardware API documentation, SysML model for
111
use cases, requirements specification, virtual simulation, and hardware and software design
specification. The integration of data from different domains facilitated a more comprehensive
understanding of the system's operations. The implementation of the graph database and ontologybased approach significantly enhanced cross-domain model searching and understanding. This
improvement facilitated better decision-making and more efficient system diagnostics. The ability
to perform cross-domain model searches enabled better system understanding and facilitated the
unlocking of previously inaccessible system features. The upgraded system uncovered and utilized
several previously inaccessible system features. For example, leveraging API documentation and
flight software code allowed for the successful integration of front and bottom cameras for
enhanced search operations.
In the upgraded system, UAVs were integrated with the system model and a virtual
environment. Figures 14 and 15 illustrate a scenario where a UAV moves in front of a table fan,
blowing air horizontally in the UAV's path. Figure 14 is a photo of the real physical environment,
and Figure 15 is a snapshot of the virtual environment. The external force generated by the table
fan causes the UAV to deviate from its intended path points. The commanded path points and the
actual path points diverge. Upon observing this behavior repeatedly, the reasoning agents identify
the region near the fan in the virtual environment as an area of high uncertainty and update the
digital twin system model. This identification triggers the controller to avoid assigning path points
for other UAVs in this region. In this scenario, the digital twin system model assimilated
information from the real world, updated the model accordingly, and directed the physical system's
actions in the real world. Figure 16 highlights the region with high uncertainty values, marked in
red, within the virtual lab environment.
112
Figure 13: UAV moving in front of the Table Fan
113
Figure 14: UAV operating in a Virtual Environment in the same region
114
Figure 15: Tagged region highlighted in red in the virtual lab environment
4.7.3 Development and Testing Enhancements
The integration of the digital twin and virtual simulation environment allowed for a
threefold increase in algorithm testing capacity. This improvement enabled the evaluation of 15
algorithm variations, significantly accelerating the development and optimization of flight control
and mission planning algorithms. The ability to test three times more algorithms in the same time
frame compared to the previous system demonstrates the effectiveness of the virtual testing
environment. The enhanced system diagnostics and cross-domain data integration dramatically
reduced issue resolution times. Complex issues that previously took an average of 3 days to resolve
were now being addressed in an average of 4 hours. Enhanced troubleshooting capabilities were
evident, particularly in resolving a UAV takeoff issue quickly using the digital twin system model
and a human interaction agent, which identified an incorrect propeller configuration. The digital
115
twin and virtual simulation environment significantly improved troubleshooting capabilities. For
instance, a UAV takeoff issue was quickly diagnosed using the digital twin and human interaction
agent, identifying an incorrect propeller configuration rather than a suspected software or motor
issue. This diagnosis, which might have taken days of physical inspections and tests, was
completed in under an hour. The digital twin enabled the identification of specific regions in the
lab environment where model predictions had higher uncertainty. These areas were mapped and
quantified. This information was then used to adapt flight patterns and sensor processing
algorithms, improving overall system reliability.
4.7.4 Evaluation of Upgrade Process Efficiency
The overall time required for the upgrade process, from initial planning to final
implementation and testing, was reduced by 40% compared to the initial system. The number of
person-hours required for the upgrade decreased largely due to the automation of many analysis
and decision-making processes. Additionally, the cost of physical prototyping and testing was
reduced, as many iterations could be performed virtually using the digital twin. The digital twin
allowed for rapid testing and validation of new features and algorithms, significantly reducing the
time and risk associated with physical testing. The virtual simulation environment facilitated
extensive testing of various scenarios. Generative AI models provided novel solutions to complex
upgrade challenges. For example, they optimized the UAV swarm's search pattern for the new
three-UAV configuration.
116
4.7.5 Comparison with Traditional MBSE Approaches
The AI-driven approach significantly reduced the time required for model development
and updating, from weeks to days, with continuous updates occurring. The layered ontology and
graph database approach excelled in integrating models from different domains, leading to
identifying cross-domain impacts during the upgrade process. The generative AI components
allowed for rapid adaptation of models and plans in response to new information, a process that
can be time-consuming and error-prone in traditional MBSE frameworks.
117
Chapter 5
Conclusions
The focus of this dissertation has been on addressing the inefficiencies and challenges
associated with upgrading fielded aerospace systems. The current upgrade processes within the
aerospace sector are often ad hoc, unstructured, and time-consuming. These processes struggle
with assessing cross-domain impacts, handling heterogeneous data, and ensuring the correctness
of upgrades. This research sought to develop a systematic approach that accelerates the upgrade
process and ensures the correctness of upgrades. This approach was grounded in a Model-Based
Systems Engineering (MBSE) framework, supported by two key pillars: generative AI and digital
twin technologies. MBSE was utilized to integrate different domains and facilitate cross-domain
analysis, ensuring better predictions of upgrade outcomes. Generative AI agents were leveraged
to automate knowledge acquisition from diverse data sources and rapidly create system models,
accelerating processes and aiding in early defect detection in the upgrade process. The construct
of digital twins was employed to develop a continuously updated, closed-loop system model,
enabling faster and more effective testing to accelerate the upgrade process.
5.1 Methodology Overview
The methodology involves four key components: knowledge acquisition, ontology,
analysis, and action. Each component plays a critical role in enhancing the upgrade process:
118
1. Knowledge Acquisition: Active knowledge acquisition using multi-modal generative AI
agents to extract and formalize heterogeneous data from various sources. This ensures that
the data is comprehensive and up-to-date.
2. Ontology: Formal knowledge representation using layered ontologies to link multi-domain
knowledge and ensure consistency across the system.
3. Analysis: Utilizes automated reasoning and human interaction, leveraging ontology-based
models with formal reasoning, and generative AI agents. This combination handles
dynamics, uncertainty, advanced computational reasoning, enhancing the ability to predict
the outcomes of system upgrades accurately.
4. Action: Implements an agent-based system with planning and coordination capabilities,
ensuring bidirectional connections between physical systems and digital twin system
model for rapidly testing multiple upgrade scenarios in a cost-effective manner.
An advantage of this methodology is the increase in testing efficiency achieved through
the use of digital twin system models. These digital twins can be used for testing in virtual
simulations, allowing for extensive scenarios to be examined. The bidirectional connectivity
between digital twins and physical systems ensures that data from the physical systems can be
continuously integrated into the simulations. This capability enabled a wide array of tests to be
conducted within the virtual environment, significantly accelerating the testing process. The
transparency of the model and the integration of a multi-domain model facilitate reductions in
troubleshooting time. The enhanced cross-domain search capabilities allowed for more effective
identification of issues, while human interaction agents assisted in translating natural language
questions into formal queries. This made it easier to locate and address problems. The significance
119
of this methodology lies in its ability to integrate advanced technologies to overcome traditional
challenges in system upgrades.
This research provides a framework for upgrading aerospace systems, addressing the
complexities and inefficiencies of current processes, and paving the way for more advanced and
efficient upgrade methodologies in the future.
5.2 Summary of Research Contributions
5.2.1 Development of Automated Digital Twin Creation Method
One notable contribution of this research is the development of an automated method to
create a digital twin system model from the lifecycle data of fielded systems. This method utilizes
multi-modal generative AI agents to interact bidirectionally with physical systems, digital assets,
and software tools and extract heterogeneous data. The agents automatically convert the gathered
information into formal logic, facilitating the timely and accurate creation of system models. This
approach aims to reduce the time for model development while ensuring that digital twins remain
continuously updated and synchronized with their physical counterparts, thus supporting a more
dynamic system representation.
5.2.2 Enhanced Formal Analysis Techniques
The research presents an improved approach to performing analysis on digital twins,
focusing on speed and formal methods. This approach employs ontology-based models with
temporal embeddings and probabilistic reasoning to address dynamics and uncertainty. Generative
AI reasoning agents are integrated with formal logic and computational solvers, providing
enhanced computational reasoning capabilities. Additionally, consistency-checking agents are
120
used to maintain model integrity. Multi-domain knowledge is linked through a layered ontology
approach (core, mid-level, domain), enabling cross-domain reasoning. This comprehensive
analysis framework seeks to improve the accuracy and efficiency of system analysis, enabling
early detection of potential issues in the upgrade process.
5.2.3 Implementation of Bidirectional Action Execution
Another important contribution is the development of a method to perform actions based
on the analysis of digital twins, with the goal of reducing errors. This involves implementing multimodal actuation agents capable of executing actions across tools and systems, manipulating data,
running simulations, and interacting with physical systems. Planning and coordination agents are
employed to manage multi-agent interactions and integrate human oversight, striving to ensure
that actions are executed effectively. This bidirectional communication between digital twins and
physical systems aims to enable real-time synchronization and rapid testing of multiple upgrade
scenarios.
5.2.4 Integration of Generative AI Agents
The research also demonstrated the integration of generative AI agents in the system
upgrade process to enhance the speed and efficiency of model creation and analysis. These agents
are designed to access and interpret user interfaces of software tools, extract information, and
convert it into formal representations. They also have the capability to manipulate software tools
based on reasoning outcomes, bridging the gap between formal logical representations and
advanced computational solvers. Fine-tuned generative models are employed to execute each step
of a novel formal ontology and system model development process, incorporating human-in-the-
121
loop verification. This integration aims to streamline the upgrade process and improve the overall
capabilities of the system.
5.3 Key Findings and Accomplishments
Decreased Failure Rates
The implementation of the methodology has led to a reduction in system failure rates
during upgrade scenarios. This improvement is due to the use of digital twin system models, which
facilitate extensive testing and simulation in a virtual environment before applying upgrades to
physical systems. By conducting a wide array of test scenarios, potential issues can be identified
and addressed early in the process. This proactive approach helps ensure that upgrades are robust
and less prone to failure. Additionally, the ability to analyze the impact of changes across various
domains contributes to identifying unforeseen impacts, further preventing failures. This
comprehensive testing and analysis process helps ensure that the outcomes of upgrades align with
planned objectives.
Improved Troubleshooting
The research also found improvements in troubleshooting processes. The transparency of
the digital twin models, combined with enhanced cross-domain search capabilities, makes it easier
to identify and resolve issues. Generative AI agents assist by translating natural language questions
into formal queries, which streamlines the troubleshooting process. This integration of human
interaction agents and automated reasoning agents facilitates quicker and more accurate problemsolving. By enabling more efficient troubleshooting, the methodology reduces downtime during
system upgrades.
122
Enhanced Cross-Domain Model Searching
The methodology enhances the ability to perform cross-domain model searching, which is
critical for complex aerospace systems that involve multiple interconnected domains. The use of
layered ontologies to link multi-domain knowledge ensures that information is consistent and
accessible. This structured approach allows for more effective searches across different domains,
making it easier to find relevant data and insights. The involvement of generative AI agents with
formal solvers in this process helps maintain consistency and handle uncertainty, further improving
the quality of the search results. This capability supports better decision-making and more efficient
system upgrades.
Utilization of Previously Inaccessible System Features
One of the notable achievements of this research is the ability to utilize previously
inaccessible system features. The comprehensive data acquisition and analysis facilitated by
generative AI agents and digital twins enable the discovery of new capabilities within existing
systems. This enhanced system transparency allows for a deeper understanding of how different
elements interact across domains. By uncovering these hidden features, the methodology not only
improves current system performance but also opens up new possibilities for future enhancements
and innovations. This aspect of the research underscores the potential for continuous improvement
and optimization in systems.
5.4 Limitations and Opportunities for Future Research
Despite the advancements and contributions made by this research, several limitations were
encountered that highlight areas for future improvement. One significant limitation was the
performance issues faced while reasoning on graph databases where the ontology-based system
123
model was implemented. These performance bottlenecks indicate a need for optimizing the
reasoning process and exploring more efficient algorithms or systems to handle complex graphbased models. Future work should focus on enhancing the performance of reasoning agents on
these models.
Another notable limitation is related to the core and mid-level ontologies used in the
methodology. There is a need for a more engineering-focused core ontology, as the current core
ontology has limited aspects to deal with temporal concepts and relations. Additionally, the
integration of temporal embeddings and probabilistic models in the formal model can be further
advanced. Future work should focus on developing a more comprehensive and engineeringoriented core ontology and enhancing the integration of temporal concepts handling within the
ontologies.
Furthermore, the methodology was tested in a controlled lab environment using illustrative
examples. This controlled setting may not fully capture the complexity and unpredictability of
real-world scenarios. Therefore, it is essential to validate and refine the methodology through realworld implementations to ensure its practical applicability and to uncover new challenges and
opportunities for improvement.
By addressing these limitations, future research can continue to evolve and improve the
methodology. Key areas for future exploration include the enhancement of reasoning agents by
improving their performance on graph-based models and integrating more advanced probabilistic
models with formal models. Another area of focus should be the development of a more
engineering-focused core ontology and enhancing the capabilities for handling temporal data
within ontologies. Lastly, validating and refining the methodology in real-world scenarios is
crucial to ensure its practical applicability and to address any new challenges that arise. These
124
future directions will help in overcoming the current limitations and advancing the methodology
for system upgrades.
125
Bibliography
[1] F. J. Romeo Rojo, R. Roy, E. Shehab, K. Cheruvu, and P. Mason, “A cost estimating
framework for electronic, electrical and electromechanical (EEE) components
obsolescence within the use-oriented product–service systems contracts,” Proc Inst Mech
Eng B J Eng Manuf, vol. 226, no. 1, pp. 154–166, 2012, doi: 10.1177/0954405411406774.
[2] F. J. Romero Rojo, R. Roy, and E. Shehab, “Obsolescence management for long-life
contracts: State of the art and future trends,” International Journal of Advanced
Manufacturing Technology, vol. 49, no. 9–12, pp. 1235–1250, Aug. 2010, doi:
10.1007/s00170-009-2471-3.
[3] M. R. Emes*, A. Smith, A. M. James, M. W. Whyndham, R. Leal, and S. C. Jackson,
“8.1.2 Principles of Systems Engineering Management: Reflections from 45 years of
spacecraft technology research and development at the Mullard Space Science
Laboratory,” INCOSE International Symposium, vol. 22, no. 1, pp. 1069–1084, 2012, doi:
https://doi.org/10.1002/j.2334-5837.2012.tb01389.x.
[4] E. Rebentisch, D. H. Rhodes, and E. Murman, "Lean Systems Engineering: Research
Initiatives in Support of a New Paradigm," presented at the Conference on Systems
Engineering Research, Los Angeles, CA, USA, Apr. 15-16, 2004, Paper no. 122. [Online].
Available: http://web.mit.edu/adamross/www/RHODES_CSER04.pdf
[5] P. Sandborn and P. Sandborn, Cost analysis of electronic systems, 1st ed., vol. 1. in WSPC
series in advanced integration and packaging, vol. 1. Singapore: World Scientific
Publishing Co. Pte. Ltd, 2012. doi: 10.1142/8353.
[6] E. Fricke and A. P. Schulz, “Design for changeability (DfC): Principles to enable changes
in systems throughout their entire lifecycle,” Systems engineering, vol. 8, no. 4, p. no-no,
2005, doi: 10.1002/sys.20039.
[7] C. Eckert, P. J. Clarkson, and W. Zanker, “Change and customisation in complex
engineering domains,” Res Eng Des, vol. 15, no. 1, pp. 1–21, 2004, doi: 10.1007/s00163-
003-0031-7.
126
[8] S. A. Sheard and A. Mostashari, “Principles of complex systems for systems engineering,”
Systems Engineering, vol. 12, no. 4, pp. 295–311, Dec. 2009, doi: 10.1002/sys.20124.
[9] J. Estefan, "Survey of Model-Based Systems Engineering (MBSE) Methodologies,"
INCOSE MBSE Focus Group, vol. 25, Jan. 2008.
[10] C. Williams and M.-E. Derro, NASA Systems Engineering Behavior Study. NASA Office
of the Chief Engineer, Oct. 2008.
[11] A. M. Madni and M. Sievers, “System of systems integration: Key considerations and
challenges,” Systems Engineering, vol. 17, no. 3, pp. 330–347, 2014, doi:
10.1002/sys.21272.
[12] E. Crawley, B. Cameron, and D. Selva, System Architecture: Strategy and Product
Development for Complex Systems, 1st ed. USA: Prentice Hall Press, 2015.
[13] N. bin Ali, K. Petersen, and M. Mäntylä, “Testing highly complex system of systems: an
industrial case study,” in Proceedings of the ACM-IEEE International Symposium on
Empirical Software Engineering and Measurement, in ESEM ’12. New York, NY, USA:
Association for Computing Machinery, 2012, pp. 211–220. doi:
10.1145/2372251.2372290.
[14] H. M. Hastings, J. Davidsen, and H. Leung, “Challenges in the analysis of complex
systems: introduction and overview,” Dec. 01, 2017, Springer Verlag. doi:
10.1140/epjst/e2017-70094-x.
[15] A. M. Madni and M. Sievers, “Model-based systems engineering: Motivation, current
status, and research opportunities,” Systems Engineering, vol. 21, no. 3, pp. 172–190, May
2018, doi: 10.1002/sys.21438.
[16] A. M. Madni, D. Erwin, and M. Sievers, “Constructing models for systems resilience:
challenges, concepts, and formal methods,” Systems, vol. 8, no. 1, pp. 1–14, Mar. 2020,
doi: 10.3390/systems8010003.
[17] A. M. Madni, “Integrating humans with software and systems: Technical challenges and a
research agenda,” Systems Engineering, vol. 13, no. 3, pp. 232–245, Sep. 2010, doi:
10.1002/sys.20145.
127
[18] N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language
Agents with Verbal Reinforcement Learning,” Adv Neural Inf Process Syst, vol. 36, 2023.
[19] A. M. Madni, “Exploiting Augmented Intelligence in Systems Engineering and
Engineered Systems,” INSIGHT, vol. 23, no. 1, pp. 31–36, Mar. 2020, doi:
10.1002/inst.12282.
[20] A. M. Madni, “Augmented intelligence: A human productivity and performance amplifier
in systems engineering and engineered human-machine systems,” in Systems Engineering
for the Digital Age: Practitioner Perspectives, wiley, 2023, pp. 375–391. doi:
10.1002/9781394203314.ch17.
[21] “United States Government Accountability Office Weapon Systems Annual Assessment
Programs Are Not Consistently Implementing Practices That Can Help Accelerate
Acquisitions ARMY NAVY AND MARINE CORPS AIR FORCE AND SPACE FORCE
JOINT DOD,” 2023.
[22] A. M. Madni and C. C. Madni, “Architectural Framework for Exploring Adaptive HumanMachine Teaming Options in Simulated Dynamic Environments,” Systems, vol. 6, no. 4,
2018, doi: 10.3390/systems6040044.
[23] A. M. Madni, S. Purohit, and C. C. Madni, “Exploiting digital twins in MBSE to enhance
system modeling and life cycle coverage,” in Handbook of Model-Based Systems
Engineering, Springer International Publishing, 2023, pp. 527–548. doi: 10.1007/978-3-
030-93582-5_33.
[24] A. M. Madni, “Elegant systems design: Creative fusion of simplicity and power,” Systems
Engineering, vol. 15, no. 3, pp. 347–354, Sep. 2012, doi: 10.1002/sys.21209.
[25] A. S. Levenko, V. I. Kukushkin, and A. I. Konashkov, “Modernization of the Propulsion
System Scheme of the Craft X-15 with Liquid Propellant Jet Engine for the Airspace
Plane,” Frontiers in Aerospace Engineering, vol. 2, no. 4, p. 227, 2013, doi:
10.14355/fae.2013.0204.02.
[26] M. Wright and H. R. Peltier, “Applying distributed architecture to the modernization of
US Air Force Jet engine test cells,” in 2008 IEEE AUTOTESTCON, 2008, pp. 480–483.
doi: 10.1109/AUTEST.2008.4662663.
128
[27] N. Swietochowski and D. Rewak, “Modernization of the Missile Forces and Artillery,”
Scientific Journal of the Military University of Land Forces, vol. 191, no. 1, pp. 49–70,
Jan. 2019, doi: 10.5604/01.3001.0013.2398.
[28] T. F. Bentzel, J. W. Brzezinski, J. C. Calhoun, and M. T. Stiner, Modernizing the Army’s
Utility Helicopter Fleet to Meet Objective Force Requirements, MBA Professional Report,
Naval Postgraduate School, Monterey, CA, Mar. 2004.
[29] S. M. Boiko, V. H. Romanenko, V. Stushchanskyi Yu, M. O. Nozhnova, and V. M.
Doludariev, “Modern aspects of helicopters modernization,” Monograph/SM Boiko, VH
Romanenko, Yu. V. Stushchanskyi, MO Nozhnova, VM Doludariev, Ya. S. Doludarieva, IM
Koval, NA Koversun Warsaw: iScience Sp. zoo, 2020.
[30] W. Lyu, Y. Yang, J. Miao, S. Cao, and L. Kong, “Architecture Preliminary Design and
Trade-Off Optimization of Stratospheric Airship Based on MBSE,” Aerospace, vol. 11,
no. 7, pp. 582-, 2024, doi: 10.3390/aerospace11070582.
[31] E. R. Carroll and R. J. Malins, “Systematic Literature Review: How is Model-Based
Systems Engineering Justified?,” United States, 2016. doi: 10.2172/1561164.
[32] Y. Bijan, J. Yu, H. Graves, J. Stracener, and T. Woods, “6.6.1 Using MBSE with SysML
Parametrics to Perform Requirements Analysis,” INCOSE International Symposium, vol.
21, no. 1, pp. 769–782, 2011, doi: 10.1002/j.2334-5837.2011.tb01242.x.
[33] D. A. Wagner, M. Bennett, R. Karban, N. Rouquette, S. Jenkins, and M. D. Ingham, “An
ontology for State Analysis: Formalizing the mapping to SysML,” 2012 IEEE Aerospace
Conference, pp. 1–16, 2012, [Online]. Available:
https://api.semanticscholar.org/CorpusID:14031599
[34] S. W. Mitchell, “Transitioning the SWFTS program combat system product family from
traditional document-centric to model-based systems engineering,” Systems Engineering,
vol. 17, no. 3, pp. 313–329, 2014, doi: 10.1002/sys.21271.
[35] A. M. Madni and S. Purohit, “Economic analysis of model-based systems engineering,”
Systems, vol. 7, no. 1, Mar. 2019, doi: 10.3390/systems7010012.
129
[36] A. M. Madni, “Models in Systems Engineering: From Engineering Artifacts to Source of
Competitive Advantage,” in Recent Trends and Advances in Model Based Systems
Engineering, A. M. Madni, B. Boehm, D. Erwin, M. Moghaddam, M. Sievers, and M.
Wheaton, Eds., Cham: Springer International Publishing, 2022, pp. 567–578.
[37] T. Huldt and I. Stenius, “State‐of‐practice survey of model‐based systems engineering,”
Systems engineering, vol. 22, no. 2, pp. 134–145, 2019, doi: 10.1002/sys.21466.
[38] K. Vipavetz, D. Murphy, and S. Infeld, “Model-based systems engineering pilot program
at NASA langley,” in AIAA SPACE Conference and Exposition 2012, 2012.
[39] M. Alenazi, “Toward Improved Traceability of Safety Requirements and State-Based
Design Models,” ProQuest Dissertations & Theses, 2021.
[40] T. Bayer et al., “11.5.1 Early Formulation Model-Centric Engineering on NASA’s Europa
Mission Concept Study,” INCOSE International Symposium, vol. 22, no. 1, pp. 1695–
1710, 2012, doi: 10.1002/j.2334-5837.2012.tb01431.x.
[41] H. G. C. Góngora, M. Ferrogalini, and C. Moreau, “How to Boost Product Line
Engineering with MBSE - A Case Study of a Rolling Stock Product Line,” Cham:
Springer International Publishing, 2015, pp. 239–256. doi: 10.1007/978-3-319-11617-
4_17.
[42] E. A. Bjorkman, S. Sarkani, and T. A. Mazzuchi, “Using model-based systems engineering
as a framework for improving test and evaluation activities,” Systems Engineering, vol.
16, no. 3, pp. 346–362, Sep. 2013, doi: 10.1002/sys.21241.
[43] D. R. Wibben and R. Furfaro, “Model-Based Systems Engineering approach for the
development of the science processing and operations center of the NASA OSIRIS-REx
asteroid sample return mission,” Acta Astronaut, vol. 115, pp. 147–159, 2015, doi:
10.1016/j.actaastro.2015.05.016.
[44] A. Tarski, “Contributions to the Theory of Models,” Journal of Symbolic Logic, vol. 21,
no. 4, pp. 405–406, 1956, doi: 10.2307/2268420.
[45] G. J. Klir, Facets of Systems Science, 1st ed., vol. 7. in IFSR International Series in
Systems Science and Systems Engineering, vol. 7. New York, NY: Springer, 1991. doi:
10.1007/978-1-4899-0718-9.
130
[46] Y. K. Lin, General Systems Theory: A Mathematical Approach, 1st ed., vol. 12. in IFSR
International Series in Systems Science and Systems Engineering, vol. 12. Boston, MA:
Springer, 2002. doi: 10.1007/b116863.
[47] N. P. Suh, “Axiomatic Design Theory for Systems,” Res Eng Des, vol. 10, pp. 189–209,
1998, doi: 10.1007/s001639870000.
[48] E. Yourdon, Modern structured analysis. USA: Yourdon Press, 1989.
[49] S. Purohit and A. M. Madni, “A Model-Based Systems Architecting and Integration
Approach Using Interlevel and Intralevel Dependency Matrix,” IEEE Syst J, vol. 16, no.
1, pp. 747–754, 2022, doi: 10.1109/JSYST.2021.3077351.
[50] A. M. Madni, D. Erwin, A. Madni, E. Ordoukhanian, P. Pouya, and S. Purohit, “Next
Generation Adaptive Cyber Physical Human Systems,” Hoboken, NJ, USA, Sep. 2019.
doi: 10.21236/AD1099997.
[51] D. T. Ross, “Structured Analysis (SA): A Language for Communicating Ideas,” IEEE
Transactions on Software Engineering, vol. SE-3, no. 1, pp. 16–34, 1977, doi:
10.1109/TSE.1977.229900.
[52] W. S. Davis, Tools and Techniques for Structured Systems Analysis and Design. Reading,
MA: Addison-Wesley, 1983.
[53] D. Marca and C. L. McGowan, SADT: Structured Analysis and Design Technique. New
York: McGraw-Hill, 1988.
[54] P.-M. Spanidis, F. Pavloudakis, and C. Roumpos, “Introducing the IDEF0 Methodology in
the Strategic Planning of Projects for Reclamation and Repurposing of Surface Mines,”
MDPI AG, 2021, p. 26. doi: 10.3390/materproc2021005026.
131
[55] K. Jeong, L. Wu, J. Hong, K.-Y. Jeong, L. Wu, and J.-D. Hong, “IDEF method-based
simulation model design and development framework Journal of Industrial Engineering
and Management (JIEM) Provided in Cooperation with: : IDEF method-based simulation
model design and development framework Standard-Nutzungsbedingungen: IDEF
method-based simulation model design and development 337 IDEF method-based
simulation model design and development IDEF method-based simulation model design
and development 338,” Journal of Industrial Engineering and Management, vol. 2, no. 2,
pp. 337–359, 2009, doi: 10.3926/jiem.v2n2.p337-359.
[56] L. P. Decker and R. J. Mayer, Information System Constraint Language (ISyCL) Technical
Report, Knowledge Based Systems Laboratory, Dept. of Industrial Engineering, Texas
A&M University, College Station, TX, Air Force Materiel Command, Wright-Patterson
Air Force Base, OH, Final Technical Paper, Sept. 1992.
[57] S. D. Eppinger and T. R. Browning, Design Structure Matrix Methods and Applications,
1st ed., vol. 1. in Engineering Systems, vol. 1. Cambridge: MIT Press, 2012. doi:
10.7551/mitpress/8896.001.0001.
[58] A. M. Madni, "Formal Methods for Intelligent Systems Design and Control," in 2018
AIAA Information Systems-AIAA Infotech @ Aerospace, AIAA SciTech Forum,
American Institute of Aeronautics and Astronautics, 2018. doi: 10.2514/6.2018-1368.
[Online]. Available: https://doi.org/10.2514/6.2018-1368.
[59] A. M. Madni, S. Purohit, D. Erwin, and R. Minnichelli, “Analyzing Systems Architectures
using Inter-Level and Intra-Level Dependency Matrix (I2DM),” in 2019 IEEE
International Conference on Systems, Man and Cybernetics (SMC), 2019, pp. 735–740.
doi: 10.1109/SMC.2019.8913854.
[60] J. E. Bartolomei, D. E. Hastings, R. De Neufville, and D. H. Rhodes, “Engineering
Systems Multiple-Domain Matrix: An organizing framework for modeling large-scale
complex systems,” Systems Engineering, vol. 15, no. 1, pp. 41–61, Mar. 2012, doi:
10.1002/sys.20193.
[61] M. D. Zisman, “Use of Production Systems for Modelling Asynchronous, Concurrent
Processes,” Philadelphia, PA, Apr. 1977. doi: 10.21236/ADA062767.
132
[62] A. M. Madni, “Models in Systems Engineering: From Engineering Artifacts to Source of
Competitive Advantage,” in Recent Trends and Advances in Model Based Systems
Engineering, A. M. Madni, B. Boehm, D. Erwin, M. Moghaddam, M. Sievers, and M.
Wheaton, Eds., Cham: Springer International Publishing, 2022, pp. 567–578.
[63] Sanford. Friedenthal, Alan. Moore, and Rick. Steiner, A practical guide to SysML : the
systems modeling language, [2nd ed.]. in The MK/OMG Press. Waltham, MA: Morgan
Kaufmann, 2012.
[64] F. K. J, W. D. D, H. R. Douglas, R. G. J, and S. T. M, Systems Engineering Handbook - A
Guide for System Life Cycle Processes and Activities (4th Edition), Fourth edition. John
Wiley & Sons, 2015.
[65] Y. Menshenin, Y. Mordecai, E. F. Crawley, B. G. Cameron, M. W. Sievers, and J. Estefan,
“Model-Based System Architecting and Decision-Making,” in Handbook of Model-Based
Systems Engineering, Cham: Springer International Publishing, pp. 289–330. doi:
10.1007/978-3-030-93582-5_17.
[66] A. M. Madni, C. Paulson, M. Spraragen, M. C. Richey, M. L. Nance, and M. Vander Wel,
“Model-Based Optimization of Learning Curves: Implications for Business and
Government,” INCOSE International Symposium, vol. 25, no. s1, pp. 1070–1084, 2015,
doi: 10.1002/j.2334-5837.2015.00116.x.
[67] A. L. Ramos, J. V. Ferreira, and J. Barceló, “Model-Based Systems Engineering: An
Emerging Approach for Modern Systems,” IEEE Transactions on Systems, Man, and
Cybernetics, Part C (Applications and Reviews), vol. 42, no. 1, pp. 101–111, 2012, doi:
10.1109/TSMCC.2011.2106495.
[68] D. H. Rhodes, “Investigating Model Credibility Within a Model Curation Context,” in
Recent Trends and Advances in Model Based Systems Engineering, Cham: Springer
International Publishing, pp. 67–77. doi: 10.1007/978-3-030-82083-1_7.
[69] A. M. Madni, M. Sievers, and C. C. Madni, “Adaptive Cyber‐Physical‐Human Systems:
Exploiting Cognitive Modeling and Machine Learning in the Control Loop,” INSIGHT,
vol. 21, no. 3, pp. 87–93, Oct. 2018, doi: 10.1002/inst.12216.
[70] A. E. Trujillo and A. M. Madni, “MBSE methods for inheritance and design reuse,” in
Handbook of Model-Based Systems Engineering, Springer International Publishing, 2023,
pp. 783–814. doi: 10.1007/978-3-030-93582-5_47.
133
[71] M. Shahbakhti, J. Li, and J. K. Hedrick, “Early model-based verification of automotive
control system implementation,” in ACC, IEEE, 2012, pp. 3587–3592. doi:
10.1109/ACC.2012.6314852.
[72] M. Faugere, T. Bourbeau, R. De Simone, and S. Gerard, “MARTE: UML Profile for
Modeling AADL Applications,” in ICECCS, IEEE, 2007, pp. 359–364. doi:
10.1109/ICECCS.2007.29.
[73] O. L. de Weck, D. Roos, and C. L. Magee, Engineering Systems: Meeting Human Needs
in a Complex Technological World, 1st ed. in Engineering Systems. Cambridge: MIT
Press, 2011. doi: 10.7551/mitpress/8799.001.0001.
[74] T. D. West and A. Pyster, “Untangling the Digital Thread: The Challenge and Promise of
Model-Based Engineering in Defense Acquisition,” Insight (International Council on
Systems Engineering), vol. 18, no. 2, pp. 45–55, 2015, doi: 10.1002/inst.12022.
[75] A. M. Purohit Shatad and Madni, “Employing Digital Twins Within MBSE: Preliminary
Results and Findings,” in Recent Trends and Advances in Model Based Systems
Engineering, B. and E. D. and M. M. and S. M. and W. M. Madni Azad M. and Boehm,
Ed., Cham: Springer International Publishing, 2022, pp. 35–44.
[76] A. M. Madni and C. C. Madni, “Digital twin: Key enabler and complement to modelbased systems engineering,” in Handbook of Model-Based Systems Engineering, Springer
International Publishing, 2023, pp. 633–654. doi: 10.1007/978-3-030-93582-5_37.
[77] J.-F. Yao, Y. Yang, X.-C. Wang, and X.-P. Zhang, “Systematic review of digital twin
technology and applications,” Vis Comput Ind Biomed Art, vol. 6, no. 1, p. 10, 2023, doi:
10.1186/s42492-023-00137-4.
[78] A. M. Madni, C. C. Madni, and S. D. Lucero, “Leveraging Digital Twin Technology in
Model-Based Systems Engineering,” Systems, vol. 7, no. 1, 2019, doi:
10.3390/systems7010007.
[79] P. C. Chen, D. Baldelli, and J. Zeng, “Dynamic Flight Simulation (DFS) Tool for
Nonlinear Flight Dynamic Simulation Including Aeroelastic Effects,” in AIAA
Atmospheric Flight Mechanics Conference and Exhibit, in Guidance, Navigation, and
Control and Co-located Conferences. , American Institute of Aeronautics and
Astronautics, 2008. doi: doi:10.2514/6.2008-6376.
134
[80] E. H. Glaessgen and D. S. Stargel, "The Digital Twin Paradigm for Future NASA and U.S.
Air Force Vehicles," presented at the 53rd Structures, Structural Dynamics, and Materials
Conference: Special Session on the Digital Twin, American Institute of Aeronautics and
Astronautics, Hampton, VA, 2012.
[81] B. P. Douglass, Real-Time UML Workshop for Embedded Systems, 2nd edition. San
Diego: Elsevier Science, 2014.
[82] H.-P. Hoffmann, “Deploying model-based systems engineering with IBM® rational®
solutions for systems and software engineering,” in DASC, IEEE, 2012, pp. 1–8. doi:
10.1109/DASC.2012.6383084.
[83] M. Sudarma, S. Ariyani, and P. A. Wicaksana, “Implementation of the Rational Unified
Process (RUP) Model in Design Planning of Sales Order Management System,”
INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi, 2021,
[Online]. Available: https://api.semanticscholar.org/CorpusID:237950176
[84] M. Aboushama, “ViTech Model Based Systems Engineering: Methodology Assessment
Using the FEMMP Framework,” Oct. 2020, Ingolstadt, Germany.
[85] D. A. Wagner, M. Chodas, M. Elaasar, J. S. Jenkins, and N. Rouquette, “Semantic
modeling for power management using CAESAR,” in Handbook of Model-Based Systems
Engineering, Springer International Publishing, 2023, pp. 1135–1152. doi: 10.1007/978-3-
030-93582-5_81.
[86] D. A. Wagner, M. Chodas, M. Elaasar, J. S. Jenkins, and N. Rouquette, “Ontological
metamodeling and analysis using openCAESAR,” in Handbook of Model-Based Systems
Engineering, Springer International Publishing, 2023, pp. 925–954. doi: 10.1007/978-3-
030-93582-5_78.
[87] I. Dori Dov and Reinhartz-Berger, “An OPM-Based Metamodel of System Development
Process,” in Conceptual Modeling - ER 2003, S. W. and L. T.-W. and S. P. Song Il-Yeol
and Liddle, Ed., Berlin, Heidelberg: Springer Berlin Heidelberg, 2003, pp. 105–117.
[88] Y. Xu, Y. Sun, X. Liu, and Y. Zheng, “A Digital-Twin-Assisted Fault Diagnosis Using
Deep Transfer Learning,” IEEE Access, vol. 7, pp. 19990–19999, 2019, doi:
10.1109/ACCESS.2018.2890566.
135
[89] F. Tao, H. Zhang, A. Liu, and A. Y. C. Nee, “Digital Twin in Industry: State-of-the-Art,”
IEEE Trans Industr Inform, vol. 15, no. 4, pp. 2405–2415, 2019, doi:
10.1109/TII.2018.2873186.
[90] M. G. Kapteyn, D. J. Knezevic, D. B. P. Huynh, M. Tran, and K. E. Willcox, “Data-driven
physics-based digital twins via a library of component-based reduced-order models,” Int J
Numer Methods Eng, vol. 123, no. 13, pp. 2986–3003, 2022, doi:
https://doi.org/10.1002/nme.6423.
[91] M. G. Kapteyn, J. V. R. Pretorius, and K. E. Willcox, “A probabilistic graphical model
foundation for enabling predictive digital twins at scale,” Nature Computational Science,
vol. 1, no. 5, pp. 337–347, 2021, doi: 10.1038/s43588-021-00069-0.
[92] R. S. Sutton and A. G. Barto, Reinforcement learning : an introduction, Second edition.
Cambridge, Massachusetts ; The MIT Press, 2018.
[93] Z. Gou et al., “CRITIC: Large Language Models Can Self-Correct with Tool-Interactive
Critiquing,” arXiv.org, 2024, doi: 10.48550/arxiv.2305.11738.
[94] A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S.
Prabhumoye, Y. Yang, and S. Gupta, "Self-refine: Iterative refinement with self-feedback,"
Advances in Neural Information Processing Systems, vol. 36, Feb. 2024.
[95] C. Qu, S. Dai, X. Wei, H. Cai, S. Wang, D. Yin, J. Xu, and J. R. Wen, "Tool Learning with
Large Language Models: A Survey," arXiv preprint arXiv:2405.17935, 2024.
[96] S. G. Patil, T. Zhang, X. Wang, and J. E. Gonzalez, “Gorilla: Large Language Model
Connected with Massive APIs,” arXiv.org, 2023, doi: 10.48550/arxiv.2305.15334.
[97] Y. Shen, K. Song, T. Xu, D. Li, W. Lu, and Y. Zhuang, “HuggingGPT: Solving AI Tasks
with ChatGPT and its Friends in Hugging Face,” arXiv.org, 2023, doi:
10.48550/arxiv.2303.17580.
[98] X. Huang et al., “Understanding the planning of LLM agents: A survey,” arXiv.org, 2024,
doi: 10.48550/arxiv.2402.02716.
[99] J. Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language
Models,” arXiv.org, 2023, doi: 10.48550/arxiv.2201.11903.
136
[100] X. Zheng, M. Wang, and J. Ordieres-Meré, “Comparison of data preprocessing approaches
for applying deep learning to human activity recognition in the context of industry 4.0,”
Sensors (Switzerland), vol. 18, no. 7, Jul. 2018, doi: 10.3390/s18072146.
[101] Z. Wang, S. Cai, G. Chen, A. Liu, X. Ma, and Y. Liang, “Describe, explain, plan and
select: Interactive planning with large language models enables open-world multi-task
agents,” arXiv preprint arXiv:2302.01560, 2023.
[102] Q. Wu et al., “Autogen: Enabling next-gen llm applications via multi-agent conversation
framework,” arXiv preprint arXiv:2308.08155, 2023.
[103] J. Grieves Michael and Vickers, “Digital Twin: Mitigating Unpredictable, Undesirable
Emergent Behavior in Complex Systems,” in Transdisciplinary Perspectives on Complex
Systems: New Findings and Approaches, S. and A. A. Kahlen Franz-Josef and Flumerfelt,
Ed., Cham: Springer International Publishing, 2017, pp. 85–113. doi: 10.1007/978-3-319-
38756-7_4.
[104] M. Y. Lu et al., “A Multimodal Generative AI Copilot for Human Pathology,” Nature,
2024, doi: 10.1038/s41586-024-07618-3.
[105] T. Brown et al., “Language Models are Few-Shot Learners,” in Advances in Neural
Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and
H. Lin, Eds., Curran Associates, Inc., 2020, pp. 1877–1901. [Online]. Available:
https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac14
2f64a-Paper.pdf
[106] N. F. Noy and D. L. Mcguinness, “Ontology Development 101: AGuide to Creating Your
First Ontology.” [Online]. Available:
http://translate.googleusercontent.com/translate_c?anno=2&hl=…opment/ontology101.pd
f&usg=ALkJrhjVKWkRnOxyPTCMqp98iRSKKnZ9HQ
[107] F. Baader, I. Horrocks, and U. Sattler, “Chapter 3 Description Logics,” in Handbook of
Knowledge Representation, vol. 3, F. van Harmelen, V. Lifschitz, and B. Porter, Eds., in
Foundations of Artificial Intelligence, vol. 3. , Elsevier, 2008, pp. 135–179. doi:
https://doi.org/10.1016/S1574-6526(07)03003-9.
137
[108] A. M. Madni and S. Purohit, “Augmenting MBSE with Digital Twin Technology:
Implementation, Analysis, Preliminary Results, and Findings,” in 2021 IEEE International
Conference on Systems, Man, and Cybernetics (SMC), 2021, pp. 2340–2346. doi:
10.1109/SMC52423.2021.9658769.
[109] S. Purohit, A. Madni, A. Adiththan, and A. M. Madni, “Digital Twin Integration for
Software Defined Vehicles: Decoupling Hardware and Software in Automotive System
Development,” in 2023 IEEE International Conference on Systems, Man, and Cybernetics
(SMC), 2023, pp. 1259–1264. doi: 10.1109/SMC53992.2023.10394507.
[110] A. M. Madni, S. Purohit, D. Erwin, and R. Minnichelli, “Analyzing Systems Architectures
using Inter-Level and Intra-Level Dependency Matrix (I2DM),” in 2019 IEEE
International Conference on Systems, Man and Cybernetics (SMC), 2019, pp. 735–740.
doi: 10.1109/SMC.2019.8913854.
[111] S. and M. C. C. Madni A. M. and Purohit, “Exploiting Digital Twins in MBSE to Enhance
System Modeling and Life Cycle Coverage,” in Handbook of Model-Based Systems
Engineering, N. and S. M. Madni Azad M. and Augustine, Ed., Cham: Springer
International Publishing, 2020, pp. 1–22. doi: 10.1007/978-3-030-27486-3_33-1.
[112] J. Whittle, J. Hutchinson, and M. Rouncefield, “The State of Practice in Model-Driven
Engineering,” IEEE Softw, vol. 31, no. 3, pp. 79–85, 2014, doi: 10.1109/MS.2013.65.
[113] A. Ferrando, L. A. Dennis, D. Ancona, M. Fisher, and V. Mascardi, “Verifying and
Validating Autonomous Systems: Towards an Integrated Approach,” Cham: Springer
International Publishing, 2019, pp. 263–281. doi: 10.1007/978-3-030-03769-7_15.
[114] D. Sales and L. Becker, “Systematic Literature Review of System Engineering Design
Methods,” Aug. 2018, pp. 213–218. doi: 10.1109/SBESC.2018.00040.
[115] M. Sievers, “Semantics, Metamodels, and Ontologies,” in Handbook of Model-Based
Systems Engineering, N. and S. M. Madni Azad M. and Augustine, Ed., Cham: Springer
International Publishing, 2023, pp. 15–46. doi: 10.1007/978-3-030-93582-5_2.
[116] F-35 Joint Strike Fighter: More Actions Needed To Explain Cost Growth and Support
Engine Modernization Decision. Congressional Publications. 2023.
138
[117] R. Batres et al., “An upper ontology based on ISO 15926,” Comput Chem Eng, vol. 31,
no. 5, pp. 519–534, 2007, doi: https://doi.org/10.1016/j.compchemeng.2006.07.004.
[118] M. Doerr et al., “Towards a Core Ontology for Information Integration,” Journal of
Digital Information; Vol 4, No 1 (2003), Aug. 2003.
[119] R. Kumbhar, “8 - Modern knowledge organisation systems and interoperability,” in
Library Classification Trends in the 21st Century, R. Kumbhar, Ed., in Chandos
Information Professional Series. , Chandos Publishing, 2012, pp. 95–113. doi:
https://doi.org/10.1016/B978-1-84334-660-9.50008-4.
[120] E. Blomqvist, A. Gangemi, and V. Presutti, “Experiments on pattern-based ontology
design,” in K-CAP’09 - Proceedings of the 5th International Conference on Knowledge
Capture, Aug. 2009, pp. 41–48. doi: 10.1145/1597735.1597743.
[121] C. Batini, M. Lenzerini, and S. B. Navathe, “A comparative analysis of methodologies for
database schema integration,” ACM Comput. Surv., vol. 18, no. 4, pp. 323–364, Dec.
1986, doi: 10.1145/27633.27634.
[122] D. Thakker, V. Dimitrova, L. Lau, R. Denaux, S. Karanasios, and F. Yang-Turner, “A
priori ontology modularisation in ill-defined domains,” in ACM International Conference
Proceeding Series, Aug. 2011, pp. 167–170. doi: 10.1145/2063518.2063541.
[123] F. Del Carratore et al., “Integrated Probabilistic Annotation: A Bayesian-Based Annotation
Method for Metabolomic Profiles Integrating Biochemical Connections, Isotope Patterns,
and Adduct Relationships,” Anal Chem, vol. 91, no. 20, pp. 12799–12807, Oct. 2019, doi:
10.1021/acs.analchem.9b02354.
[124] Y. Zhou, W. He, W. Hou, and Y. Zhu, “Pianno: a probabilistic framework automating
semantic annotation for spatial transcriptomics,” Nat Commun, vol. 15, no. 1, p. 2848,
2024, doi: 10.1038/s41467-024-47152-4.
[125] J. CARETTE and J. Harrison, “Handbook of Practical Logic and Automated Reasoning,”
Journal of functional programming, vol. 21, no. 6, p. 663, 2011, doi:
10.1017/S0956796811000220.
[126] M. Huth and M. Ryan, Logic in computer science : modelling and reasoning about
systems, 2nd ed. Cambridge: Cambridge University Press, 2004.
139
[127] S. J. (Stuart J. Russell, P. Norvig, and E. Davis, Artificial intelligence : a modern
approach, Third edition. in Prentice Hall series in artificial intelligence. Upper Saddle
River, New Jersey: Prentice Hall, 2010.
[128] J. F. Allen, “Maintaining knowledge about temporal intervals,” Commun. ACM, vol. 26,
no. 11, pp. 832–843, Nov. 1983, doi: 10.1145/182.358434.
[129] M. Ghallab, D. Nau, and P. Traverso, Automated Planning and Acting. Cambridge:
Cambridge University Press, 2016. doi: DOI: 10.1017/CBO9781139583923.
[130] C. Qian, X. Cong, C. Yang, W. Chen, Y. Su, J. Xu, Z. Liu, and M. Sun, "Communicative
agents for software development," arXiv preprint arXiv:2307.07924, vol. 6, 2023.
[131] S. Hong, X. Zheng, J. Chen, Y. Cheng, J. Wang, C. Zhang, Z. Wang, et al., "MetaGPT:
Meta programming for multi-agent collaborative framework," arXiv preprint
arXiv:2308.00352, 2023.
140
Appendix A
A.1 Overview of the Methodology for Automated Ontology
Generation using LLMs
The methodology for automated ontology generation using generative models, specifically Large
Language Models (LLMs), consists of three main stages as described in Figure 17: 1) Identify Key
Concepts and Categorize, 2) Create Connections and Populate Data, and 3) Repeat and Refine.
Figure 16: Methodology for Ontology Development from System Life Cycle Data
Stage 1: Identify
Key Concepts and
Categorize
•1. Input Portion of
Unstructured Data:
•2. Identify Stakeholders:
•3. Identify Decision
Support Questions
(DSQs):
•4. List Key Concepts:
•5. Categorization in
Core Ontology
Concepts
•6. Create Two-Part
Genus Species
Definitions:
•7. Data Source
Identification:
Stage 2: Create
Connections and
Populate Data
•8. Create Ontology
Representation:
•9. For Each Subject,
Identify Predicates with
All Objects:
•10. Triplet Creation
with Key Concepts and
Relations:
•11. Automated
Population of Graph
Database with Ontology
and Data:
•12. Decision Support
Question Validation:
Stage 3: Repeat
and Refine
•13. Repeat the Process
for Next Portion of
Unstructured Data:
•14. Check Repetition of
Key Concepts in
Merged Ontology N2
Matrix:
141
Stage 1: Identify Key Concepts and Categorize
1. Input Portion of Unstructured Data: Process the data for transformation by collecting all
available unstructured data from various phases of the system lifecycle and creating manageable
chunks or partitions. This step enables human validation, preserves context, and ensures effective
analysis.
2. Identify Stakeholders: Use generative models to recognize relevant stakeholders based
on the context of each portion of unstructured data. Fine-tune the generative models with industryspecific knowledge to ensure accurate stakeholder identification. Human users analyze and curate
the generated stakeholder list to align the ontology development with business objectives.
3. Identify Decision Support Questions (DSQs): Extract DSQs related to the unstructured
data using generative models fine-tuned with industry-specific knowledge and stakeholder needs.
Generate specific, unambiguous questions to define the purpose and scope of the ontology. Human
users analyze, curate, and finalize the DSQs to ensure alignment with decision-making needs.
4. List Key Concepts: Leverage generative models to find and list key concepts from the
unstructured data. Fine-tune the generative models with the context of ontology development,
DSQs, and core ontology principles. Human users validate the generated list of key concepts to
ensure relevance and completeness.
5. Categorization in Core Ontology Concepts: Organize key concepts into core ontology
compatible classes using a generative model fine-tuned with core ontology principles. This step
ensures consistency, interoperability, and reusability of the resulting ontology.
6. Create Two-Part Genus Species Definitions: Define terms systematically using a
generative model and the two-part genus-species definition method. Leverage the categorization
142
of key concepts into core ontology classes. Human users check and validate the created definitions
to establish a common vocabulary and understanding.
7. Data Source Identification: Using a generative model, find and predict potential sources
for populating identified concepts. Consider contextual information from previous steps and
leverage human expertise to assess the feasibility and appropriateness of the suggested data
sources. This step aligns ontology development with practical considerations of data availability.
Stage 2: Create Connections and Populate Data
8. Create Ontology Representation: Derive the ontology representation from the N2 Matrix
by systematically populating the matrix with relationships between key concepts using generative
models. The N2 matrix provides a structured, machine-readable, and comprehensive
representation of the ontology.
9. For Each Subject, Identify Predicates with All Objects: Use a column-by-column
approach with generative models to identify predicates for each subject-object pair in the N2
matrix. Fine-tune the generative model with ontology development principles, formal logic
constraints, and core ontology predicates. This step ensures a comprehensive and logically
consistent representation of knowledge.
10. Triplet Creation with Key Concepts and Relations: Structure the ontology effectively
by parsing the N2 matrix and extracting subject-predicate-object triplets or axioms. Establish
traceability to source data using generative models. Human users review and validate the triplets
to ensure semantic accuracy and relevance.
11. Automated Population of Graph Database with Ontology and Data: Translate the
ontology into a system model by transforming triplets into a graph representation. Populate the
143
graph with real-world data instances mapped to ontology nodes. Human users validate the data
mappings and relationships to ensure accuracy and reliability.
12. Decision Support Question Validation: Check if the ontology can answer the DSQs
using reasoners. Convert DSQs into formal logic queries using a specifically tuned generative
model and execute queries against the populated ontology. Human users examine the answers for
correctness and satisfaction. Refine the ontology based on the validation results to improve its
effectiveness.
Stage 3: Repeat and Refine
13. Repeat the Process for Next Portion of Unstructured Data: Continuously expand the
ontology by iteratively processing additional portions of unstructured data. Follow the steps from
stakeholder identification to DSQ validation for each new data chunk. This iterative approach
ensures the ontology remains comprehensive, up-to-date, and aligned with the evolving knowledge
landscape.
14. Check Repetition of Key Concepts in Merged Ontology N2 Matrix: Ensure no
redundancy in the ontology by applying matrix merging algorithms to identify and eliminate
duplicate key concepts and relations. Resolve conflicts and discrepancies to maintain logical
consistency and coherence. Human users validate the merged ontology to assess completeness,
consistency, and alignment with DSQs.
The methodology leverages the power of generative models, specifically LLMs to
automate various tasks, such as stakeholder identification, DSQ generation, key concept
extraction, and relationship identification. Human expertise is incorporated throughout the process
to validate, refine, and ensure the quality and effectiveness of the generated ontology. The iterative
144
nature of the methodology allows for continuous expansion, refinement, and alignment of the
ontology with the evolving knowledge landscape of the domain.
The rationale behind each step is to ensure the ontology's integrity, consistency, efficiency,
and alignment with business objectives and decision-making needs. The methodology aims to
create a comprehensive, up-to-date, and functionally effective ontology that can support real-world
applications and decision-making processes.
The following sections provide detailed explanations, examples, and discussions of each
stage and its substages, highlighting their contributions and implications in the context of system
upgrades and ontology generation. The integration of generative models, ontologies, and graph
databases represents a novel and innovative approach to automating and enhancing the system
modeling process for system upgrades, offering benefits in terms of accuracy and scalability.
A.2 Methodology for Automated Ontology Generation using
Generative Models (LLMs)
Stage 1: Identify Key Concepts and Categorize
1. Input Portion of Unstructured Data: Process the data for transformation
The first step in the methodology is to collect all available unstructured data and divide it
into manageable chunks or partitions for further processing. This unstructured data can originate
from various phases of the system lifecycle, including requirements documents, specification
documents, interface documents, maintenance records, repair logs, supply chain data,
manufacturing data, planning and scheduling information, and operational data. Including
unstructured data from multiple phases of the system lifecycle is essential for building a
145
comprehensive ontology and system model that captures the complex relationships, dependencies,
and interactions within the system. Each phase contributes unique knowledge that is essential for
understanding the system as a whole.
Figure 17: Partitioning Unstructured Data
Once all unstructured data is collected, the methodology employs a model to create chunks
or partitions of the data, as depicted in Figure 18. This chunking process is designed to facilitate
human validation and maintain the context of the information. When dealing with data from a
specific document, the chunking process introduces overlaps between the partitions to ensure that
important contextual information is not lost during the transformation process. However, when
dealing with data from different documents, overlaps are not necessary, as the context may not be
146
directly related. The size of the chunks is determined with human capabilities in mind. The
methodology aims to create partitions that can be effectively handled and analyzed by human
experts, ensuring that each chunk is manageable and allows for thorough validation and
interpretation.
The chunking process is critical to the methodology for two main reasons:
1. It enables human validation and oversight. By breaking the unstructured data into smaller,
manageable portions, human experts can more easily review and validate the information.
This human-in-the-loop approach is essential for maintaining the quality, accuracy, and
relevance of the data used in the ontology and system model development process. The
size of the chunks is carefully considered to ensure that human experts can effectively
handle and analyze each partition. This approach balances the need for detailed analysis
with the cognitive limitations of human experts.
2. It helps preserve the context of the information within each partition. The overlaps
introduced between chunks from the same document ensure that important contextual
details are not lost during the transformation process. This is particularly important when
dealing with complex systems, where the relationships and dependencies between different
components and aspects of the system are crucial for understanding its overall behavior
and functionality.
The input portion of unstructured data is processed through a chunking model that creates
partitions of the data while maintaining context and considering human capabilities. This approach
enables human validation, preserves important contextual information, and ensures that each
chunk is manageable for human analysis. By involving human experts in the loop, the
methodology aims to enhance the quality, accuracy, and relevance of the data used in the ontology
147
and system model development process. The rationale behind this step is to leverage human
expertise and cognitive abilities to guide the transformation of unstructured data into a structured
and meaningful representation that can support the subsequent stages of the methodology.
2. Identify Stakeholders: Use generative models (LLMs) to recognize stakeholders based on
context.
In the next step of the methodology, an LLM is used to identify relevant stakeholders for a
portion of the unstructured data corresponding to a specific system lifecycle phase. The LLM is
fine-tuned with a context of ontology and system model development. Identifying the relevant
stakeholders is a crucial step in the ontology and system model development process. Stakeholders
are vital in defining the requirements and using system models during the system life cycle. Using
LLM for stakeholder identification offers several advantages; LLM has the ability to process and
understand large amounts of unstructured data, making it well-suited for analyzing the context and
content of each data portion. By leveraging the contextual information present in the data, LLM
can identify the key entities, roles, and relationships that are most relevant to the specific industry
and system lifecycle phase. Fine-tuning the LLM with the context of ontology and system model
development for a specific industry tailors the stakeholder identification process to the domain's
unique requirements and characteristics. This fine-tuning involves training the LLM on industryspecific terminology, concepts, and relationships, enabling them to accurately recognize and
extract the most relevant stakeholders from the unstructured data. The involvement of human users
in analyzing and curating the stakeholder list generated by the LLM is a critical aspect of the
methodology. While LLM can provide an initial set of relevant stakeholders, human expertise is
essential for validating and refining the list. Human users bring domain knowledge, business
148
understanding, and strategic insights that can help ensure the stakeholder list is comprehensive,
accurate, and aligned with the overall objectives of the ontology and system model development
process.
Figure 18: Identification of stakeholders relevant to model development
The inclusion of the stakeholder identification step in the methodology, as illustrated in
Figure 19, is driven by the recognition that ontology and system model development is not merely
a technical exercise but also a business-driven endeavor. The ultimate goal is to create a knowledge
representation that delivers organizational value and supports decision-making. By identifying and
involving the relevant stakeholders, the methodology ensures that the ontology and system model
are developed with a clear purpose and are closely aligned with the needs and objectives of the
business.
The stakeholder identification step in the methodology leverages the power of LLM to
recognize relevant stakeholders based on the context of each portion of unstructured data. Fine-
149
tuning the LLM with industry-specific knowledge ensures accurate and tailored stakeholder
identification. The involvement of human users in analyzing and curating the stakeholder list
brings domain expertise and strategic insights to the process. The rationale behind this step is to
align the ontology and system model development with the business objectives and ensure the
delivery of value to the organization. By identifying the right stakeholders, the methodology aims
to create a knowledge representation that is purpose-driven, industry-relevant, and closely tied to
the needs of the organization.
3. Identify DSQs: Extract Decision Support Questions related to the unstructured data.
After identifying the relevant stakeholders, the next step in the methodology focuses on
generating Decision Support Questions (DSQs) for the specific portion of unstructured data being
considered. Figure 20 presents the inputs and outputs of this step. DSQs play a crucial role in
defining the purpose and scope of the ontology and system model development process. In this
step, a fine-tuned generative model (LLM) is employed again, leveraging the context of the
ontology and system model development process for the specific industry. The LLM is fine-tuned
with industry-specific knowledge and an understanding of the stakeholders' needs and objectives.
Additionally, the LLMs utilize the information about the identified stakeholders from the previous
step to generate targeted and relevant DSQs. The generation of DSQs is a critical step in the
methodology because it helps define the purpose and scope of the ontology and system model.
DSQs serve as a guide for determining what information should be included in the knowledge
representation and what can be left out. By explicitly defining the decision support requirements,
the methodology ensures that the ontology and system model are developed with a clear focus on
supporting specific decision-making processes.
150
Figure 19: Generating Decision Support Questions for Scoping the System Model
Using LLM to generate DSQs offers several benefits. LLM can process and analyze large
amounts of unstructured data, identifying the key themes, relationships, and decision points that
are most relevant to the specific industry and stakeholders. By leveraging the contextual
information present in the data and the industry-specific fine-tuning, LLM can generate DSQs that
are targeted, relevant, and aligned with the needs of the stakeholders. The LLM is fine-tuned to
generate specific, unambiguous DSQs, each addressing a single decision. This approach ensures
that the DSQs are clear, focused, and actionable. By generating specific and unambiguous
questions, the methodology aims to provide a clear direction for the ontology and system model
development process, reducing the risk of scope creep and ensuring that the resulting knowledge
representation is directly applicable to the decision-making needs of the stakeholders.
The involvement of human users in analyzing, curating, and finalizing the DSQs generated
by the LLM is a critical aspect of the methodology. While LLM can provide an initial set of
relevant and specific DSQs, human expertise is essential for validating and refining the questions
along with the elimination of unnecessary DSQs. Human users bring domain knowledge, business
151
understanding, and strategic insights that can help ensure the DSQs are comprehensive, relevant,
and aligned with the overall objectives of the ontology and system model development process.
Including the DSQ identification step in the methodology recognizes that a clear purpose and welldefined goals are essential for the success of any ontology and system model development project.
DSQs serve as a compass, guiding the development process and ensuring that the resulting
knowledge representation is directly applicable to the decision-making needs of the stakeholders.
This approach helps to focus the development efforts on the most critical aspects of the domain,
ensuring that the ontology and system model are concise, targeted, and directly applicable to the
decision-making processes they are intended to support. Furthermore, the involvement of human
users in the DSQ generation process ensures that the questions are not only technically sound but
also aligned with the business objectives and strategic goals of the organization. By generating
specific and unambiguous DSQs, the methodology aims to create a knowledge representation that
is directly applicable to the organization.
4. List Key Concepts: Leverage generative models (LLMs) to find and list key concepts
from the data.
In this step of the methodology, a separate fine-tuned LLM is employed to analyze the
portion of unstructured data and identify the key concepts that are essential for developing the
ontology. These key concepts form the foundation of the ontology and become the classes that
represent the main entities within the domain. The LLM used in this step is fine-tuned with the
specific context of developing an ontology and system model to address the Decision Support
Questions (DSQs) identified in the previous step. As shown in Figure 21, by providing the DSQs
as input to the LLM, the methodology ensures that the key concepts identified are directly relevant
152
to the decision-making needs of the stakeholders and aligned with the overall purpose and scope
of the ontology and system model. Identifying key concepts is crucial in the ontology development
process, as it lays the groundwork for creating a structured and meaningful representation of the
domain knowledge. By leveraging LLM to find and list key concepts from unstructured data, the
methodology takes advantage of the power of natural language processing to efficiently and
effectively identify the most important entities mentioned within the data.
Figure 20: Identifying key concepts for ontology development.
Fine-tuning the LLM with the context of ontology and system model development is a key
aspect of this step. By incorporating the specific requirements and constraints of the ontology
development process, such as the need to address the DSQs and adhere to core ontology principles,
the LLM is able to generate a list of key concepts that are directly applicable and relevant to the
task at hand. The inclusion of the DSQs as input to the LLM is particularly important, as it ensures
that the key concepts identified are not only relevant to the domain but also aligned with the
specific decision-making needs of the stakeholders. By focusing on the concepts that are most
pertinent to answering the DSQs, the methodology helps to create an ontology that is targeted in
153
supporting the desired decision-making processes. Another important aspect of this step is the
consideration of core ontology categories. The categorization of key concepts into core ontologycompatible classes is a critical step in the ontology development process. It ensures that the
resulting ontology adheres to the established principles and standards of the core ontology, which
promotes consistency, interoperability, and reusability across different domains and applications.
Using an LLM for this categorization task offers several advantages. The LLM is finetuned with the specific context of the core ontology principles and the details of the core ontology
model. This fine-tuning enables the LLM to accurately and efficiently assign each key concept to
the appropriate class category defined in the core ontology. The LLM's ability to process and
analyze large amounts of data quickly and consistently is particularly beneficial in this step.
Manually categorizing a large number of key concepts into core ontology classes would be a timeconsuming and error-prone process. The LLM automates this task, reducing the effort required and
minimizing the risk of inconsistencies or misclassifications. Using an LLM for categorization
helps maintain the integrity and coherence of the resulting ontology. By consistently applying the
core ontology principles and class categories across all key concepts, the LLM ensures that the
ontology maintains a logical and structured organization. This consistency is essential for enabling
effective reasoning, querying, and inference over the ontology. This alignment facilitates the
integration and exchange of knowledge between different domain ontologies that adhere to the
same core ontology. It enables seamless communication, data sharing, and reasoning across
multiple domains, as the core ontology provides a common language and structure for representing
knowledge. Furthermore, standardizing class categories helps synchronize the ontology
development process across different domains and unstructured data sources. This standardization
154
reduces the effort required to map and align ontologies from different sources, as they all share a
common foundation.
Using LLM to identify key concepts offers several advantages over manual approaches. LLM
can process and analyze large volumes of unstructured data much more efficiently than human
experts. This efficiency is particularly important when dealing with complex domains and
extensive datasets, where manual analysis would be time-consuming and prone to errors and
inconsistencies. The fine-tuning of the LLM with the context of ontology and system model
development ensures that the key concepts identified are directly relevant to the task at hand. By
incorporating the specific requirements and constraints of the ontology development process, such
as the need to address the DSQs and adhere to core ontology principles, the LLM is able to generate
a list of key concepts that are tailored to the specific needs of the project. The involvement of
human users in validating the list of key concepts generated by the LLM is a critical aspect of the
methodology. While LLM can provide an initial set of relevant and important concepts, human
expertise is essential for ensuring the quality and completeness of the final list. Human users bring
domain knowledge that can help refine the list of key concepts and ensure that it captures the most
important entities within the domain.
The key concept identification step in the methodology leverages the power of LLM to
find and list the most important key concepts from the unstructured data. The LLM is fine-tuned
with the context of ontology and system model development, incorporating the DSQs and core
ontology principles to ensure the relevance and applicability of the identified concepts. The
involvement of human users in validating the list of key concepts brings domain expertise and
strategic insights to the process. The rationale behind this step is to create a foundation for the
ontology that is directly relevant to the decision-making needs of the stakeholders, aligned with
155
the core ontology principles and captures the most important aspects of the domain. By identifying
the key concepts, the methodology sets the stage for creating a structured and meaningful
representation of the domain knowledge that can support the desired decision-making processes.
5. Create Two-Part Genus Species Definitions: Define terms systematically.
In this step of the methodology, an additional LLM is employed to create definitions for
the identified key concepts from the unstructured data. Figure 22 presents the inputs and outputs
of this step. The LLM receives input from the previous step, which includes the list of key concepts
and their corresponding categorization into core ontology classes. The specific method used to
create these definitions is called the two-part genus-species definition, which is a well-established
and widely used approach in the domain of ontology development. The two-part genus-species
definition method involves defining a concept by specifying its genus (the broader category to
which it belongs) and its differentia (the specific characteristics that distinguish it from other
concepts within the same genus). This approach provides a systematic and structured way to define
terms, ensuring clarity, consistency, and precision in the resulting ontology.
Figure 21: Defining key concepts
156
Creating clear and precise definitions for the key concepts is a critical step in the ontology
development process. It establishes a common understanding of the terms used in the ontology,
reducing ambiguity and promoting effective communication among all stakeholders, including
humans and generative models. Using the two-part genus-species definition method offers several
advantages. It provides a structured and systematic approach to defining terms. By specifying the
genus and differentia of each concept, the LLM can create definitions that are clear, concise, and
unambiguous. This structure helps to ensure consistency in the way terms are defined across the
ontology, promoting a coherent and unified representation of knowledge. The two-part genusspecies definition method leverages the categorization of key concepts into core ontology classes
from the previous step. By utilizing the genus information derived from the core ontology
categorization, the LLM can create definitions that are aligned with the established ontological
framework. This alignment reinforces the consistency and interoperability of the resulting
ontology, as the definitions are grounded in the principles and standards of the core ontology. The
involvement of human users in checking the created definitions is a crucial aspect of this step.
While the LLM can generate initial definitions based on the two-part genus species method, human
expertise is essential for validating and refining these definitions. Human users bring domain
knowledge, contextual understanding, and linguistic expertise that can help ensure the accuracy,
clarity, and appropriateness of the definitions. They can identify any inconsistencies, ambiguities,
or gaps in the definitions and provide necessary revisions.
Creating systematic and precise definitions for key concepts is essential for building a
robust and effective ontology, as it establishes a common vocabulary and understanding among all
stakeholders, facilitating effective communication and knowledge sharing. This approach
promotes clarity and reduces ambiguity, as each concept is defined in terms of its broader category
157
and specific distinguishing characteristics. The resulting definitions provide a solid foundation for
understanding the meaning and context of each concept within the ontology. Moreover, the
involvement of human users in checking the created definitions adds an essential layer of quality
control and validation. Human expertise can identify any issues or inconsistencies in the
definitions, ensuring that they accurately capture the intended meaning and are appropriate for the
specific domain and context. This human oversight helps to maintain the integrity of the ontology.
Establishing a common vocabulary and understanding through systematic definitions is crucial for
enabling effective collaboration and integration among different models and systems. When all
stakeholders, including humans and generative models, share a consistent understanding of the
terms used in the ontology, it facilitates seamless communication, data exchange, and knowledge
reasoning. This common ground is essential for promoting interoperability and reusability of the
ontology across different applications and domains.
The creation of two-part genus species definitions is a critical step in the ontology
development process. It involves using an LLM to systematically define the key concepts
identified from the unstructured data, leveraging the categorization into core ontology classes. The
two-part genus species method provides a structured and consistent approach to defining terms,
promoting clarity and alignment with the core ontology principles. Human users play a vital role
in checking and validating the created definitions, ensuring their accuracy and appropriateness.
The rationale behind this step is to establish a common vocabulary and understanding among all
stakeholders, facilitate effective communication and knowledge sharing, and enable
interoperability and reusability of the ontology. By creating precise and systematic definitions, the
methodology lays the foundation for building a robust and effective ontology that can support
various applications and domains.
158
6. Data Source Identification: Find and predict potential sources for populating identified
concepts.
In this step of the methodology, a generative model (LLM) is employed to identify and
predict potential data sources that can be used to populate each identified key concept or
recognized class from the portion of the unstructured data. The LLM leverages its knowledge
and understanding of the system lifecycle phases to guess possible data sources that could exist
and be relevant for populating the ontology. The LLM employed in this step is specifically
fine-tuned for the task of data source identification. As shown in Figure 23, this step takes into
account the inputs from previous steps, such as the identified stakeholders, decision support
questions, key concepts, and their corresponding classes. By considering this contextual
information, the LLM can make more accurate and relevant predictions about potential data
sources.
Figure 22: Data source identification for populating ontology
159
After the LLM generates a list of potential data sources for each class, human user review
and validate the identified sources, bringing their domain expertise and knowledge of available
data sources to assess the appropriateness and feasibility of the LLM's suggestions. They can make
necessary changes, add their own inputs, or replace the suggested data sources with more suitable
alternatives based on their understanding of the system and the availability of data. The
identification and prediction of potential data sources is a crucial step in the ontology and system
model development process. It addresses the important consideration of data availability and
feasibility early in the development cycle. When building an ontology, it is essential to ensure that
the identified key concepts and classes can be populated with real-world data. Without relevant
data sources, the ontology remains a theoretical construct and cannot be effectively transformed
into a functional system model that can answer questions and support decision-making.
Incorporating the data source identification step early in the methodology provides several
benefits. It allows for a proactive assessment of data availability and feasibility. By identifying
potential data sources upfront, the development team can determine whether the necessary data
exists and can be accessed to populate the ontology. This early assessment helps to identify any
gaps or challenges in data availability and enables the team to plan and prioritize data-gathering
efforts accordingly. The involvement of an LLM in predicting potential data sources leverages its
knowledge and understanding of the system lifecycle phases. The LLM can draw upon its training
and fine-tuning to suggest relevant data sources that may not be immediately apparent to human
users. This assists in the process to ensure that important data sources are not overlooked, and that
the ontology is built with a comprehensive consideration of available data. The active involvement
of human users in reviewing and validating the identified data sources adds a critical layer of
expertise and practicality to the process. Human users can assess the feasibility and
160
appropriateness of the suggested data sources based on their domain knowledge and understanding
of the system. They can identify any limitations, constraints, or challenges associated with
accessing and utilizing the proposed data sources. This human input helps to refine the data source
identification and ensures that the ontology is built with realistic and achievable data population
in mind. The data source identification step is crucial for exposing the degree of feasibility in
building a functional system model. It helps to align the ontology development with the practical
considerations of data availability and accessibility. It exposes any gaps or challenges in data
availability and allows for proactive planning and prioritization of data gathering efforts.
Furthermore, this step facilitates effective communication and collaboration among stakeholders.
By clearly identifying the potential data sources for each class, it provides a common
understanding of the data requirements and dependencies. This transparency helps to align
expectations, facilitate data sharing agreements, and foster collaboration among different teams or
departments responsible for data provision and management.
The data source identification step is a critical component of the ontology and system
model development process. It involves using an LLM to predict potential data sources for
populating the identified key concepts and classes, taking into account the contextual information
from previous steps. Human users play a vital role in reviewing and refining the identified data
sources based on their domain expertise and knowledge of available data. The rationale behind
this step is to assess the feasibility of building a functional system model, identify any data
availability gaps, and facilitate effective planning and collaboration. By incorporating this step
early in the methodology, it ensures that the ontology development is grounded in the practical
considerations of data availability and sets the foundation for a successful transformation into a
system model that can answer questions and support decision-making.
161
Stage 2: Create Connections and Populate Data
7. Create Ontology representation: Derive from N2 Matrix
In this step of the methodology, the key concepts identified from the portion of data in the
first stage are used to create an ontology representation using the N2 matrix using a list-to-matrix
conversion script. Figure 24 presents the concept of this step. The N2 matrix, also known as the
N-squared matrix or the design structure matrix, is a powerful tool for representing and analyzing
the relationships between entities in a system.
Figure 23: Building N2 Matrix for Ontology
To create the N2 matrix, the identified key concepts are placed in the first column and the
first row, forming a square matrix. Each key concept represents a node or an entity in the ontology.
The body of the matrix is then populated with the relationships between these key concepts. The
process of populating the N2 matrix involves systematically going through each cell of the matrix
and determining the relationship between the corresponding key concepts that intercept in that cell.
In the next steps, this task is performed by an LLM that has been trained and fine-tuned on the
domain knowledge and ontology principles. The LLM analyzes the semantic and contextual
information associated with each key concept pair and identifies the most appropriate relationship
162
or connection between them. It draws upon its understanding of the domain, the ontology structure,
and the specific context of the data to determine the nature and strength of the relationships.
The creation of an ontology representation using the N2 matrix offers several significant
advantages and benefits:
a. Machine Readability: The matrix representation is easily readable and processable by
machines, making it more efficient for computational tasks compared to other
representations like graphs. The structured format of the N2 matrix allows algorithms and
software tools to quickly access and analyze the relationships between entities, enabling
faster and more effective ontology processing and reasoning.
b. Emphasis on Relationships: The N2 matrix places a strong emphasis on the relationships
between entities, which is a critical aspect of ontology development. By explicitly
capturing the connections and dependencies between key concepts, the N2 matrix provides
a clear and comprehensive representation of the ontology structure. It helps to identify
patterns, clusters, and hierarchies within the ontology, facilitating a deeper understanding
of the domain knowledge.
c. Systematic and Comprehensive: The process of populating the N2 matrix cell by cell
ensures a systematic and comprehensive approach to capturing relationships. By
methodically examining each pair of key concepts, the LLM can identify and establish
connections that may not be immediately apparent or explicitly stated in the data. This
systematic approach helps to uncover hidden dependencies, indirect relationships, and
potential inconsistencies, leading to a more robust and complete ontology representation.
d. Scalability and Modularity: The N2 matrix representation is scalable and modular, allowing
for the incremental development and expansion of the ontology. As new key concepts are
163
identified or additional portions of data are processed, the matrix can be easily extended or
merged with existing matrices. This modularity enables the ontology to grow and evolve
over time, accommodating new knowledge and insights while maintaining its structural
integrity.
e. Integration with Other Tools: The matrix representation of the ontology can be seamlessly
integrated with various software tools and platforms commonly used in ontology
development and knowledge management. Many tools support the import and export of
matrix formats, enabling interoperability and facilitating the exchange of ontology
representations across different systems and applications.
The rationale behind using the N2 matrix for ontology representation is rooted in its ability
to capture and organize complex relationships in a structured and machine-readable format. By
leveraging the power of LLM to populate the matrix, the methodology can ensure that the ontology
is built upon a foundation of domain knowledge and semantic understanding.
The systematic approach of examining each cell in the matrix allows for a thorough and
comprehensive capture of relationships, minimizing the chances of overlooking important
connections or dependencies. The LLM's ability to analyze the semantic and contextual
information associated with each key concept pair enables the identification of meaningful and
relevant relationships, enhancing the quality and accuracy of the ontology representation.
Furthermore, the N2 matrix representation offers a visual and intuitive way to explore and
understand the ontology structure. It allows domain experts and stakeholders to easily grasp the
connections and dependencies between entities, facilitating communication, collaboration, and
decision-making.
164
The creation of an ontology representation using the N2 matrix is a crucial step in the
methodology. It leverages the power of LLMs to systematically populate the matrix with
relationships between key concepts, resulting in a structured, machine-readable, and
comprehensive representation of the ontology. The rationale behind this approach lies in its ability
to capture complex relationships, facilitate machine processing, enable scalability and modularity,
and promote a deeper understanding of the domain knowledge. By deriving the ontology
representation from the N2 matrix, the methodology ensures a robust and effective foundation for
further ontology development, reasoning, and application.
8. For Each Subject, Identify Predicates with All Objects: Apply a column-by-column
approach using LLM
In this step of the methodology, the focus is on identifying the relationships or predicates
between the key concepts in the ontology using the N2 matrix. The goal is to create meaningful
and logically consistent axioms that represent the knowledge in the ontology using formal logic.
The N2 matrix, populated with key concepts in the first row and column, serves as the foundation
for this process. Each cell in the matrix represents the intersection between a subject (row) and an
object (column), and the task is to identify the appropriate predicate or relationship between them.
To accomplish this, a systematic approach is employed, using a fine-tuned LLM to analyze each
cell of the matrix. The LLM is trained and fine-tuned with the context of ontology development,
taking into account the inputs from previous steps, such as decision support questions, a list of key
concepts, and core ontology categorization along with the list of standardized predicates and their
definitions from the core ontology. The LLM processes the portion of unstructured data associated
with each cell and attempts to identify the most suitable predicate or relationship between the
165
subject and object. It considers the semantic and contextual information present in the data, as well
as the formal logic constraints and the list of predicates from the core ontology. The approach
follows a column-by-column pattern, with the LLM examining each subject (row) and
systematically identifying predicates for each object (column) in that row. It progresses from left
to right, considering subject1-predicate1-object1, then subject1-predicate2-object2, and so on.
This systematic approach ensures a thorough and comprehensive identification of predicates for
each subject-object pair. Figures 25 and 26 illustrate two complementary representations of a
portion of ontology. Figure 25 displays the N2 matrix that was produced by the methodology.
Figure 26 presents a graphical visualization of the N2 matrix.
Figure 24: N2 Matrix Representation of Ontology
166
Figure 25: Graphical Representation of Ontology
In some cases, the LLM may identify multiple predicates for a given cell based on the
information present in the unstructured data. These multiple predicates are captured and recorded
in the matrix in the array form, providing a richer representation of the relationships between the
key concepts.
The LLM is specifically fine-tuned for this task, with a deep understanding of ontology
development principles and description logic. It ensures that the identified predicates are within
the formal logic domain and are computable with deductive reasoning. The LLM also utilizes the
list of predicates from the core ontology as a reference, selecting the most appropriate predicate
167
for each cell based on the context and the portion of input unstructured data. Through this process,
the N2 matrix is populated with key concepts and the relationships between them, creating formal
axioms that satisfy the requirements of formal logic. The rationale behind this approach is to create
a comprehensive and logically consistent representation of the knowledge in ontology using formal
axioms. By systematically identifying predicates between each subject and object, the
methodology ensures that the ontology captures the essential relationships and dependencies
among the key concepts.
The use of a fine-tuned LLM for this task offers several advantages. First, the LLM's ability
to understand and process natural language allows it to analyze the unstructured data and extract
meaningful relationships between the key concepts. It can identify predicates that may not be
explicitly stated in the data but can be inferred based on the context and semantic information.
Second, the LLM's fine-tuning with the context of ontology development and description logic
ensures that the identified predicates are logically consistent and computable. It takes into account
the formal logic constraints and the list of predicates from the core ontology, ensuring that the
resulting axioms are valid and can be used for deductive reasoning. Third, the systematic columnby-column approach guarantees a thorough and comprehensive identification of predicates for
each subject-object pair. It ensures that no potential relationships are overlooked and that the
ontology captures a complete representation of the knowledge.
The inclusion of multiple predicates for a given cell, when applicable, allows for a richer
and more nuanced representation of the relationships between key concepts. Furthermore, the
resulting formal axioms satisfy the requirements of completeness, consistency, and traceability.
Completeness is achieved by systematically examining each subject-object pair and identifying all
relevant predicates based on the available information. Consistency is ensured by the LLM's fine-
168
tuning with formal logic constraints, preventing logical contradictions or inconsistencies in the
axioms. Traceability is maintained by linking the identified predicates back to the specific portions
of unstructured data from which they were derived, allowing for transparency and auditability of
the ontology development process.
The process of identifying predicates for each subject-object pair using a column-bycolumn approach with LLMs is a crucial step in creating a comprehensive and logically consistent
ontology. The rationale behind this approach lies in its ability to capture essential relationships,
ensure logical consistency, and provide a complete and traceable representation of the knowledge.
By leveraging the power of fine-tuned LLMs and a systematic approach, the methodology ensures
that the resulting formal axioms satisfy the requirements of completeness, consistency, and
traceability, laying the foundation for a robust ontology.
9. Triplet Creation with Key Concepts and Relations: Structure the ontology effectively.
In this step of the methodology, the N2 matrix populated with key concepts and their
relationships is parsed to extract a list of subject-predicate-object triplets or axioms. These triplets
form the backbone of the ontology, representing the knowledge in a structured and machinereadable format. Creating triplets involves systematically traversing the N2 matrix and identifying
each valid subject-predicate-object combination. For each cell in the matrix that contains a
predicate, a corresponding triplet is created by combining the subject (row), predicate (cell), and
object (column). This step ensures that all the relationships captured in the matrix are explicitly
represented as triplets. To enhance the traceability and provenance of the triplets, a fine-tuned LLM
is employed to identify the specific portion of unstructured data from which each triplet or axiom
is derived. The LLM analyzes the content of the matrix and the associated unstructured data, using
169
its understanding of the context and semantics to establish the link between the triplets and their
corresponding data sources.
Traceability is crucial for several reasons:
a. It allows for transparency and auditability of the ontology development process. By linking
each triplet to its source data, it becomes possible to trace back the origin of the knowledge
represented in the ontology. This is particularly important for validation, verification, and
maintenance purposes, as it enables users to understand the basis and justification for each
axiom.
b. It supports the iterative refinement and expansion of the ontology. As new data becomes
available or changes occur in the underlying knowledge, the triplets can be easily updated
or modified by referring back to their source data. This helps maintain the consistency and
accuracy of the ontology over time.
After the triplets are generated and linked to their data sources, a human user reviews and
validates the list of triplets or axioms. This manual review process is essential to ensure the quality
and correctness of the ontology. The human user brings their domain expertise and knowledge to
assess the semantic validity and relevance of each triplet. They can identify any potential errors,
inconsistencies, or ambiguities in the triplets and make necessary adjustments or corrections.
Moreover, the human user can provide additional context or knowledge that may not be explicitly
present in the unstructured data. They can enrich the triplets with their understanding of the
domain, adding annotations or metadata to clarify the meaning or scope of certain axioms. This
human input complements the automated process and ensures that the ontology accurately
represents the intended knowledge. Reasoners are employed to validate the axioms and further
ensure the consistency and logical coherence of the ontology. Reasoners are software tools that
170
apply logical inference rules to the triplets, checking for any inconsistencies, contradictions, or
violations of the ontology's formal semantics. They help identify and resolve any logical conflicts
or errors in the ontology structure. Using reasoners is crucial for maintaining the integrity of the
ontology. By detecting and resolving inconsistencies, reasoners ensure that the ontology is
logically sound and can support accurate reasoning and inference. They help prevent the
introduction of contradictory or conflicting knowledge, which could lead to incorrect conclusions
or decisions based on the ontology.
The rationale behind the triplet creation step is to transform the knowledge captured in the
N2 matrix into a structured and machine-readable format that can be effectively utilized by
ontology-based systems and applications. Triplets provide a clear and concise representation of the
relationships between key concepts, enabling efficient storage, retrieval, and reasoning over the
ontology. The traceability established by linking triplets to their source data is essential for
ensuring the transparency, auditability, and maintainability of the ontology. It allows users to
understand the provenance of each axiom and provides a mechanism for updating and refining the
ontology as new knowledge becomes available. Traceability enhances the trust and confidence in
the ontology, as it provides a clear trail of evidence supporting each triplet. The human user's
involvement in reviewing and validating the triplets is critical for ensuring the semantic accuracy
and relevance of the ontology. Human expertise complements the automated process, adding
contextual knowledge and correcting any potential errors or ambiguities. This collaborative
approach between the LLM and human users results in a high-quality ontology that accurately
represents the intended knowledge. Using reasoners to validate the consistency of the axioms is
essential for maintaining the logical integrity of the ontology. By ensuring logical consistency,
171
reasoners enable reliable reasoning and inference over the ontology, leading to accurate and
trustworthy results.
The triplet creation step is crucial for structuring the ontology effectively by transforming
the knowledge captured in the N2 matrix into a machine-readable format. The rationale behind
this step lies in its ability to provide a clear and concise representation of the relationships between
key concepts, establish traceability to source data, incorporate human expertise for semantic
validation, and ensure logical consistency through the use of reasoners. By following this
approach, the methodology ensures that the resulting ontology is transparent, auditable,
maintainable, and reliable, serving as a robust foundation for ontology-based systems and
applications.
10. Automated Population of Graph Database with Ontology and Data: Translate ontology
into system model.
In this step of the methodology, the triplets created in the previous step are transformed
into a graph representation, forming the ontology graph. This process involves converting the
subject and object entities from the triplets into nodes in the graph, while the predicates become
the edges connecting these nodes. The resulting graph structure captures the relationships and
interconnections between the key concepts in a visually intuitive and computationally efficient
manner. Automatically populating the graph representation with the ontology and data is a crucial
step in translating the ontology into a system model depicted in Figure 27. By representing the
ontology as a graph, it becomes possible to leverage the power of graph representations and their
associated query languages and algorithms for efficient storage, retrieval, and analysis of the
ontological knowledge.
172
Figure 26: Notional Representation of Ontology and System Model
Graph representations are particularly well-suited for handling complex and interconnected
data, making them ideal for representing ontologies. They offer several advantages over traditional
relational databases, such as flexibility in modeling complex relationships, efficient traversal and
querying of the data, and the ability to handle large-scale and evolving knowledge structures. Once
the ontology graph is generated, the next critical step is to populate the graph with actual data
instances. This data population process transforms the ontology from a conceptual framework into
a practical system model that can be used for real-world applications and decision support. To
populate the ontology graph with data, the methodology leverages the data sources identified in
the previous phase. For each key concept in the ontology, the corresponding data instances are
retrieved from the identified data sources and mapped onto the nodes in the graph database. This
173
mapping process establishes the connections between the ontological concepts and their real-world
data representations. The data population step involves creating mappings and relationships
between the data instances and the ontology nodes. This process requires careful consideration of
the data structure, format, and semantics to ensure accurate and meaningful integration with the
ontology. The mapping process involves data transformation, cleaning, and normalization to align
the data with the ontological representations.
Automatically populating the graph representation with data is a critical step in converting
the ontology into a functional system model. By connecting real-world data to ontological
concepts, the system model becomes grounded in the actual domain knowledge and can provide
actionable insights and support data-driven decision-making. To ensure the accuracy and integrity
of the populated system model, a human user is involved in checking and validating the data
mappings and relationships. The human user brings his domain expertise and understanding of the
data to verify that the data instances are correctly associated with the appropriate ontology nodes
and that the relationships between them are semantically meaningful and consistent. This human
validation step is crucial for catching any potential errors, inconsistencies, or anomalies in the data
population process. It serves as a quality control measure to ensure that the resulting system model
is reliable, accurate, and fit for its intended purpose. The rationale behind automatically populating
the graph representation with ontology and data is to create a functional and actionable system
model that can support real-world applications and decision-making processes. By translating the
ontology into a graph representation and populating it with relevant data, the methodology bridges
the gap between the conceptual knowledge captured in the ontology and the practical needs of the
domain.
174
Graph representation provides a natural and intuitive way to represent ontologies and their
associated data. The graph structure allows for efficient traversal and querying of the knowledge,
enabling fast and flexible access to the relevant information. Graph representations with graph
databases also scale well to handle large and complex ontologies, making them suitable for
representing knowledge in diverse domains. The data population step is critical because it brings
the ontology to life by connecting it with real-world data instances. Without the data population,
the ontology remains a theoretical construct, lacking practical applicability. The human validation
of the data mappings and relationships is essential to ensure the accuracy and reliability of the
populated system model. Human expertise is invaluable in catching potential errors,
inconsistencies, or semantic discrepancies that may arise during the automated data population
process. By involving human users in the validation step, the methodology ensures that the
resulting system model is of high quality and aligns with the domain knowledge and requirements.
Furthermore, automatically populating the graph database with ontology and data enables efficient
maintenance and updates of the system model. As new data becomes available or changes occur
in the underlying knowledge, the graph database can be easily updated by modifying the data
mappings and relationships. This flexibility and adaptability are crucial for keeping the system
model up-to-date and relevant in evolving domains.
Automatically populating the graph database with ontology and data is a critical step in
translating the ontology into a functional system model. The rationale behind this step lies in its
ability to bridge the gap between the conceptual knowledge captured in the ontology and the
practical needs of the domain. By leveraging the power of graph databases and incorporating
human validation, the methodology ensures that the resulting system model is accurate, reliable,
175
and actionable, enabling efficient storage, retrieval, and analysis of ontological knowledge for realworld applications and decision support.
11. Decision Support Question Validation: Check if the ontology can answer the decision
support questions using reasoners.
In this crucial step of the methodology, the focus is on validating the effectiveness and
completeness of the ontology by evaluating its ability to answer the decision support questions
(DSQs) generated in the first phase. This validation process involves converting the DSQs into
formal logic queries and executing them against the populated ontology using reasoners. To
accomplish this, a specifically tuned LLM converts the natural language DSQs into formal logic
queries. The LLM is trained and fine-tuned to understand the structure and semantics of the
ontology, as well as the syntax and requirements of the formal logic query language. By leveraging
the power of natural language processing, the LLM can accurately translate the DSQs into precise
and executable queries. Once the DSQs are converted into formal logic queries, they are executed
against the populated ontology using reasoners. Reasoners are powerful tools that can infer new
knowledge and derive answers based on the logical rules and relationships defined in the ontology.
They use logical inference and deduction to traverse the ontology graph and retrieve the relevant
information needed to answer the queries.
The execution of the formal logic queries against the ontology serves as a comprehensive
test of the ontology's ability to support decision-making processes. By running these queries, we
can assess whether the ontology contains the necessary knowledge and relationships to provide
accurate and meaningful answers to the DSQs. After executing the queries, a human user carefully
examines the answers returned by the reasoners to determine their correctness and satisfaction.
176
The human user brings his domain expertise and understanding of the expected results to validate
the quality and relevance of the answers. They compare the answers obtained from the ontology
with the expected outcomes based on their knowledge and the requirements of the decision-making
process. If the results are satisfactory and align with the expected answers, it indicates that the
ontology is well-structured, complete, and capable of supporting the desired decision-making
processes. However, if the results are not satisfactory or if there are discrepancies between the
expected and obtained answers, it suggests that the ontology may require refinement or
modification. In cases where the results are unsatisfactory, the methodology includes a feedback
loop to refine the ontology based on the findings from the DSQ validation process. The specific
DSQs that yield unsatisfactory results are flagged, and the corresponding answers and related
triplets in the ontology are extracted for further analysis.
The refinement process involves a thorough examination of the flagged triplets and
answers to identify the necessary modifications required to improve the ontology's ability to
answer the DSQs correctly. This involves adding missing triplets, deleting incorrect or redundant
ones, and modifying the predicates to better capture the intended relationships. The refinement
process aims to align the ontology with the expected answers and ensure that it provides accurate
and relevant information for decision support. After making the necessary modifications to the
ontology, the system model and its associated data are updated accordingly to maintain consistency
and integrity. The refined ontology is then retested against the DSQs to verify that the changes
have indeed improved its ability to provide satisfactory answers. This iterative process of
validation, refinement, and retesting continues until the ontology consistently produces accurate
and reliable results for the given DSQs. The rationale behind the DSQ validation step is to ensure
that the ontology is not only structurally sound but also functionally effective in supporting real-
177
world decision-making processes. By validating the ontology against the DSQs, we can assess its
practical utility and identify any gaps or limitations in its knowledge representation.
The use of a tuned LLM to convert natural language DSQs into formal logic queries is a
significant advantage of this methodology. It automates the process of translating the questions
into a format that can be executed against the ontology, saving time and reducing the potential for
human errors. The LLM's ability to understand the nuances of natural language and map them to
the appropriate formal logic constructs ensures accurate and efficient query generation. The
involvement of human users in validating the answers obtained from the ontology is critical for
ensuring the reliability of the results. Human expertise is essential for assessing the correctness
and relevance of the answers in the context of the specific decision-making process. By
incorporating human judgment and domain knowledge, the methodology ensures that the ontology
is not only technically sound but also aligned with the real-world requirements and expectations.
The feedback loop for refining the ontology based on the DSQ validation results is a crucial aspect
of this methodology. It acknowledges that ontology development is an iterative process and that
the initial version of the ontology may not be perfect. By flagging unsatisfactory results and
analyzing the related triplets and answers, the methodology provides a systematic approach to
identify and address any shortcomings in the ontology. The refinement process allows for
continuous improvement and ensures that the ontology evolves to better support the decisionmaking needs of the domain. Furthermore, the iterative nature of the validation, refinement, and
retesting process ensures that the ontology undergoes rigorous quality control and verification. By
repeatedly testing the ontology against the DSQs and making necessary adjustments, the
methodology guarantees that the final ontology is robust and capable of providing accurate and
relevant answers for decision support.
178
Stage 3: Repeat and Refine
12. Repeat the Process for the Next Portion of Unstructured Data: Continuously create new
ontologies.
This step emphasizes the iterative and incremental nature of ontology development. After
successfully processing one portion of the unstructured data and refining the ontology based on
the decision support question (DSQ) validation, the process is repeated for the next chunk or
portion of the unstructured data. The repetition of the entire process for the next portion of data
ensures that the ontology is continuously expanded and enriched with new knowledge. By
processing additional data, the ontology grows in scope and depth, capturing a wider range of
concepts, relationships, and insights relevant to the domain. The process restarts with identifying
stakeholders for the next portion of data. The stakeholders are determined based on their relevance
and expertise in the specific context of the new data chunk. This step ensures that the ontology
development remains aligned with the needs and perspectives of the key stakeholders throughout
the iterative process. Next, the decision support questions (DSQs) are generated for the new
portion of data. These DSQs are carefully crafted to capture the essential information needs and
decision-making requirements specific to the new data chunk. The DSQs guide the subsequent
steps of the ontology development process, ensuring that the ontology is focused on providing
answers to the most relevant and pressing questions.
The process then moves on to identifying key concepts and relations from the new portion
of unstructured data. This step involves leveraging the power of LLMs and domain expertise to
extract the most significant entities and relationships from the data. The key concepts and relations
form the building blocks of the ontology, representing the core knowledge elements that need to
be captured and organized. Once the key concepts and relations are identified, they are converted
179
into the ontology representation using the N2 matrix approach described earlier. The next crucial
step is to populate the ontology with relevant data instances. This data population process involves
mapping the new concepts and relations to the corresponding data sources and integrating the data
into the ontology. The populated ontology becomes a rich knowledge base that combines the
conceptual structure with real-world data instances. Finally, the newly created and populated
ontology is subjected to the decision support question validation process. The relevant DSQs for
the new portion of data are converted into formal logic queries and executed against the ontology
using reasoners. The answers obtained from the ontology are validated by human experts to ensure
their accuracy, relevance, and alignment with the expected results. If the DSQ validation reveals
any gaps or inconsistencies in the ontology, the refinement process is triggered. The ontology is
fine-tuned based on the feedback from the validation, making necessary adjustments to the
concepts, relations, and data mappings. The refined ontology is then retested against the DSQs to
verify its improved ability to provide accurate and comprehensive answers.
The iterative process of stakeholder identification, DSQ generation, key concept, and
relation extraction, ontology creation, data population, and DSQ validation continues for each
subsequent portion of unstructured data. This incremental approach allows for the gradual and
systematic creation of ontologies, ensuring that they capture the knowledge landscape of the
domain. By processing additional data chunks, the methodology ensures that the ontology captures
a broader range of concepts, relationships, and perspectives. The incremental approach to ontology
development promotes scalability and adaptability. As the volume and complexity of the
unstructured data grow, the methodology enables the ontologies to evolve. The iterative process
allows for the seamless integration of new knowledge elements into the existing structure, ensuring
that the newly developed ontologies remain cohesive and coherent. The involvement of
180
stakeholders at each iteration ensures that the ontology development remains aligned with the
needs and requirements of the domain experts and decision-makers. By continuously engaging
with stakeholders and incorporating their feedback, the methodology ensures that the ontologies
remain relevant and usable for the intended audience. Moreover, the repetition of the process for
each portion of unstructured data promotes the reusability and interoperability of the ontology. By
following a consistent and structured approach to ontology development, the methodology ensures
that the resulting ontologies are compatible with existing knowledge bases and can be easily
integrated with other systems and applications.
13. Check Repetition of Key Concepts in Merged Ontology N2 Matrix: Ensure no
redundancy in the ontology.
After processing multiple portions of unstructured data and creating separate ontologies
for each portion, the next crucial step is to merge these ontologies into a single, cohesive ontology.
This merging process is essential to ensure that the resulting ontology is comprehensive,
consistent, and free from redundancy. To facilitate the merging process, the methodology utilizes
the final N2 matrices of the validated ontologies created in the previous steps as demonstrated in
Figure 28. The N2 matrix representation of the ontologies provides a structured and standardized
format that enables efficient comparison and integration of the key concepts and relationships.
The merging process starts by applying a matrix merging algorithm to the N2 matrices of
the individual ontologies. This algorithm systematically compares the key concepts and
relationships present in each matrix, identifying any duplicates or overlaps. The goal is to create a
unified N2 matrix that combines the unique elements from each ontology while eliminating any
redundancy. During the merging process, the matrix representation proves advantageous as it
181
Figure 27: Merging ontologies using N2 Matrix Algorithms
182
allows for easy identification of duplicate key concepts and relations. By comparing the rows and
columns of the matrices, the algorithm can detect any identical or semantically equivalent concepts
and relationships across the ontologies. This ensures that the merged ontology does not contain
any redundant or conflicting elements. In cases where duplicate key concepts or relations are
identified, the merging algorithm applies a set of predefined rules and heuristics to determine the
most appropriate representation in the merged ontology. These rules may consider factors such as
the frequency of occurrence, the context in which the concepts appear, and the relationships they
participate in. The algorithm aims to preserve the most meaningful and representative elements
while eliminating any redundancy.
The merging process also involves checking for any conflicts or discrepancies between the
ontologies being merged. This includes identifying any inconsistencies in the relationships
between concepts, conflicting definitions, or contradictory axioms. The matrix representation
facilitates the detection of such conflicts by enabling a systematic comparison of the cells in the
matrices. If any conflicts or discrepancies are identified during the merging process, the
methodology includes a resolution mechanism. This involves manual intervention by domain
experts and the application of predefined conflict resolution strategies. The goal is to ensure that
the merged ontology maintains logical consistency and coherence without any contradictory or
ambiguous elements. Once the merging process is complete, the resulting ontology represents a
cohesive and comprehensive knowledge structure that combines the key concepts and relationships
from the individual ontologies. This merged ontology serves as a unified representation of the
domain knowledge, capturing the essential information from multiple portions of unstructured
data. The merging process is iterative and incremental, aligning with the overall methodology of
ontology development. As new portions of unstructured data are processed and their corresponding
183
ontologies are created, they are continuously merged with the existing ontology. This iterative
approach ensures that the ontology grows and evolves over time, incorporating new knowledge
and insights as they become available.
The scope and extent of the merging process are guided by the decision support questions
and the purpose of the ontology development effort. The decision-support questions act as a
compass, directing the focus and boundaries of the ontology. The merging process aims to create
an ontology that is comprehensive enough to answer the relevant decision-support questions
effectively. Throughout the merging process, human users play a crucial role in validating the
merged ontology and ensuring that it meets the desired quality criteria. Domain experts review the
merged ontology to assess its completeness, consistency, and alignment with the decision support
questions. They provide feedback and guidance to refine the ontology, making necessary
adjustments to improve its accuracy and effectiveness. Finally, the merged ontology undergoes a
rigorous validation process to ensure that it can successfully answer the decision support questions.
The ontology is tested against a set of predefined queries and scenarios, evaluating its ability to
provide accurate and relevant answers. Any gaps or limitations identified during this validation
process are addressed through further refinement and iteration. The rationale behind checking for
repetition of key concepts in the merged ontology N2 matrix is to ensure the ontology's integrity
and consistency. Redundancy in the ontology can lead to several problems, such as ambiguity and
increased complexity, which can hinder its usability and effectiveness. By identifying and
eliminating duplicate key concepts and relations during the merging process, the methodology
ensures that the ontology maintains a clear and concise structure. Redundant elements can
introduce confusion and make the ontology harder to navigate and understand. By removing
duplicates, the ontology becomes more streamlined and easier to use, both for human users and
184
automated systems. If the same concept or relationship is represented in multiple ways or with
conflicting definitions, it can create ambiguity and undermine the reliability of the ontology. By
detecting and resolving such inconsistencies during the merging process, the methodology ensures
that the ontology maintains logical coherence and provides a trustworthy representation of the
domain knowledge.
The use of the N2 matrix representation during the merging process offers several
advantages. The matrix format provides a structured and standardized way to compare and
integrate the key concepts and relationships from different ontologies. It allows for efficient
identification of duplicates and overlaps, as well as the detection of conflicts and discrepancies.
The matrix representation also facilitates the application of merging algorithms and conflict
resolution strategies, enabling a systematic and automated approach to ontology integration. The
iterative and incremental nature of the merging process aligns with the overall methodology of
ontology development. As new portions of unstructured data are processed and their corresponding
ontologies are created, the merging process ensures that the ontology continuously grows and
evolves to incorporate new knowledge. This iterative approach allows for the gradual refinement
and expansion of the ontology, ensuring that it remains up-to-date and relevant as the domain
knowledge evolves. The involvement of human users in validating the merged ontology is crucial
to ensure its quality and effectiveness. Domain experts bring their knowledge and insights to assess
the completeness, consistency, and alignment of the ontology with the decision-support questions.
Their feedback and guidance help to refine the ontology, making necessary adjustments to improve
its accuracy and usability. Finally, the validation process, where the merged ontology is tested
against decision support questions, serves as a critical quality assurance step. By evaluating the
ontology's ability to provide accurate and relevant answers, the methodology ensures that the
185
ontology meets its intended purpose and can effectively support decision-making processes. Any
gaps or limitations identified during validation are addressed through further refinement and
iteration, ensuring that the ontology continuously improves and adapts to the changing needs of
the domain.
In summary, checking for repetition of key concepts in the merged ontology N2 matrix is
crucial to ensure the ontology's integrity, consistency, and efficiency. The rationale behind this step
is to eliminate redundancy, resolve inconsistencies, and maintain a clear and concise structure. The
N2 matrix representation facilitates efficient merging and conflict resolution, while the iterative
and incremental approach allows for continuous refinement and expansion of the ontology. Human
validation and testing against decision support questions ensure the quality and effectiveness of
the merged ontology in supporting decision-making processes.
Abstract (if available)
Abstract
Fielded aerospace and defense systems often require unplanned system upgrades to cope with unforeseen circumstances, such as rapid technological advancements and evolving operational requirements. However, the current upgrade processes today tend to be ad hoc, often resulting in lengthy upgrade cycles and non-rigorous testing. This research developed a systematic approach to accelerate the upgrade process for fielded systems and ensure the correctness of upgrades. Grounded in a Model-Based Systems Engineering (MBSE) framework, this approach is supported by two key pillars: generative AI and digital twin technologies. One key achievement was creating a unified system model incorporating data from various sources to provide a multi-domain system representation. Aerospace systems involve diverse data from different lifecycle phases, such as System Requirement Specifications, Test Scenarios, Design Specification Documents, Software Code, Controls Models, Maintenance Logs, and Operational Sensor Data. This heterogeneous data, often siloed, makes it challenging to predict upgrade outcomes and understand cross-domain impacts. By systematically integrating data from these domains, the application of the MBSE framework facilitated a holistic view and cross-domain analysis, supporting the correct implementation of upgrades. The research developed an automated method to generate system models from lifecycle data, which helped reduce the time and expertise required for model development. Traditionally, creating system models for the MBSE framework is a manual and time-consuming process, requiring extensive data gathering. By leveraging generative AI technology, this research automated these tasks, converting large, heterogeneous datasets into structured, usable formats more quickly. The research introduced a formal reasoning and analysis method that aimed to be robust and transparent, enhancing the ability to predict the outcomes of upgrades. Another important achievement was designing an agent-based system capable of establishing bidirectional connections between physical systems and digital twin system models. Traditional upgrade processes are often time-consuming and resource-intensive, lacking the capability to rapidly test multiple scenarios. Continuously updated models enabled advanced analysis and simulation, allowing for more extensive testing and early issue identification in system upgrades. Finally, the methodology was validated through a software upgrade scenario for a miniature multi-UAV operation in a lab environment. This demonstration addressed a high-level operational requirement change and provided a systematic upgrade process. The MBSE framework integrated heterogeneous data from various domains, generative AI automated system model generation, and bidirectional communication with digital twins enabled advanced testing and analysis. This methodology ensures the correct implementation of upgrades with less time and reduced failures.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Modeling and simulation testbed for unmanned systems
PDF
Extending systems architecting for human considerations through model-based systems engineering
PDF
A declarative design approach to modeling traditional and non-traditional space systems
PDF
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
PDF
In-situ quality assessment of scan data for as-built models using building-specific geometric features
PDF
Deriving component‐level behavior models from scenario‐based requirements
PDF
Model-driven situational awareness in large-scale, complex systems
PDF
Context-adaptive expandable-compact POMDPs for engineering complex systems
PDF
A framework for comprehensive assessment of resilience and other dimensions of asset management in metropolis-scale transport systems
PDF
Towards generalized event understanding in text via generative models
PDF
Calibration uncertainty in model-based analyses for medical decision making with applications for ovarian cancer
PDF
Theoretical and computational foundations for cyber‐physical systems design
PDF
Bridging the visual reasoning gaps in multi-modal models
PDF
Power optimization of asynchronous pipelines using conditioning and reconditioning based on a three-valued logic model
PDF
Modeling and analysis of propulsion systems and components for electrified commercial aircraft
PDF
Efficient crowd-based visual learning for edge devices
PDF
Model, identification & analysis of complex stochastic systems: applications in stochastic partial differential equations and multiscale mechanics
PDF
A computational framework for diversity in ensembles of humans and machine systems
PDF
Modeling and simulation of multicomponent mass transfer in tight dual-porosity systems (unconventional)
PDF
A system framework for evidence based implementations in a health care organization
Asset Metadata
Creator
Purohit, Shatad
(author)
Core Title
Integration of digital twin and generative models in model-based systems upgrade methodology
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Astronautical Engineering
Degree Conferral Date
2024-08
Publication Date
08/14/2024
Defense Date
06/27/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
aerospace systems,automated reasoning,digital twin,formal logic,generative AI,human-AI collaboration,knowledge representation,large language models,model-based systems engineering,OAI-PMH Harvest,ontology engineering,system upgrade,unmanned aerial vehicle
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Madni, Azad (
committee chair
), Erwin, Dan (
committee member
), Moore, James (
committee member
)
Creator Email
shatadkp@usc.edu,shatadpurohit@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113998UKK
Unique identifier
UC113998UKK
Identifier
etd-PurohitSha-13392.pdf (filename)
Legacy Identifier
etd-PurohitSha-13392
Document Type
Dissertation
Format
theses (aat)
Rights
Purohit, Shatad
Internet Media Type
application/pdf
Type
texts
Source
20240814-usctheses-batch-1198
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
aerospace systems
automated reasoning
digital twin
formal logic
generative AI
human-AI collaboration
knowledge representation
large language models
model-based systems engineering
ontology engineering
system upgrade
unmanned aerial vehicle