Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Process implications of executable domain models for microservices development
(USC Thesis Other)
Process implications of executable domain models for microservices development
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PROCESS IMPLICATIONS OF EXECUTABLE DOMAIN MODELS FOR MICROSERVICES DEVELOPMENT by Bo Wang A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2022 Copyright 2022 Bo Wang Acknowledgements First and foremost, I would cordially thank my advisor, Prof. Barry Boehm, for his tremendous support, invaluable advice and enduring patience throughout my entire PhD program. I would like to say thank you to my dissertation committee members - Prof. Sandeep Gupta, Prof. Aiichiro Nakano, Prof. William Halfond and Prof. Wei-min Shen for their generous help and constructive feedback during my thesis research. I would like to thank for the great support from the CSSE family. Thanks to Julie Sanchez, Supan- nika Mobasser, Kamonphop Srisopha, Daniel Link, Pooyan Behnamghader, and Kan Qi for their warmest encouragement and all kinds of help in life and in research. I would like to express my gratitude to Mr. Doug Rosenberg for his inspiration, knowledge and help in finding out opportunities for case study projects. Also thanks to USC Computer Science department for everyone’s friendliness. Thanks to the students once enrolled in my directed research and in USC Software Engineering course. I learned a lot from the teaching and mentoring experience. The endless thanks to my family. I am indebted to my devoted mom and dad. You always care about my health and happiness. Your love gives me the strength to overcome anything. I miss you so much. Lastly, my wife, Xi Zhang, always gives me a hug that brightens my day during difficult times. During my PhD journey, I have encountered numerous happy and sad moments. Thanks to those who cheer me up and those who make me stronger. ii TableofContents Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii ListofTables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi ListofFigures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Microservices Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Plan-driven vs. Feedback-driven . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 The Early Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4.1 Domain Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.2 Management Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.6 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.7 Organization of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter2: RelatedWorkandBackground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Source Code Generation (SCG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.1 From Natural Language to Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.2 From Design to Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Process Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 Requirements Elicitation (RE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.2 Commercial-off-the-Shelf (COTS) Analysis . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Model-driven Engineering (MDE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Tools and Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.1 Domain-driven Design (DDD) Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 Service-oriented Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.3 Other State-of-the-arts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Chapter3: ResearchMethodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 iii Chapter4: DomainModeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1 The Early Stages - From the Perspective of Cost Estimation . . . . . . . . . . . . . . . . . . 26 4.1.1 The Sweet Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.2 Learning from COCOMO II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Domain Modeling Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3 Executable Domain Models - The Difference . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Chapter5: EarlyDomainIdentification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.1 The Domain Identification Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.1.1 Knowledge Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.1.2 The Proposed Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2 User Stories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.3 NLP Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.4 User Story Processing Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.4.1 Error Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.4.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.4.3 Sentence Structure Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.4.4 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.4.5 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.5 Proposed Domain Identification Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.5.1 Error Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.5.2 Preprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.5.3 Sentence Structure Matcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.5.4 Rule Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.5.5 Visualizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.6.1 Required Skills and Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.6.2 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.6.3 Analysis and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Chapter6: EarlyMicroservicesGeneration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.1 Technical Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.1.1 Intermediate Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.1.2 Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.1.3 Generation of RESTful APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.1.3.1 API Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.1.3.2 Two-level API Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.1.4 Generation of Database Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.1.4.1 Schema Transformation Rules for Multiplicity . . . . . . . . . . . . . . . 63 6.1.4.2 Three-level Data Schema Constraint . . . . . . . . . . . . . . . . . . . . 63 6.1.5 Generation of API Reference and Configuration Files . . . . . . . . . . . . . . . . . 64 6.1.6 Reusable Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.1.7 Consistency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.2 Proposed Service Generation Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.2.1 Code Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.2.2 Enabling Continuous Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.3 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 iv 6.3.1 Required Infrastructures, Skills and Process . . . . . . . . . . . . . . . . . . . . . . 71 6.3.2 The First Case Study - Project BDR . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.3.3 The Second Case Study - Project TIKI . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.3.4 The Third Case Study - Project PicShare . . . . . . . . . . . . . . . . . . . . . . . . 75 6.3.5 The Fourth Case Study - Project MGD . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Chapter7: ConclusionsandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.1 General Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.2 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.3.1 Improvement of the Current Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.3.2 Potential Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Appendix A: Sentence Structures for Domain Identification . . . . . . . . . . . . . . . . . . . . . . 93 Appendix B: Transformation Rules for Domain Identification . . . . . . . . . . . . . . . . . . . . . 97 v ListofTables 1.1 Comparison of Plan-driven and Feedback-driven Methodologies . . . . . . . . . . . . . . . 4 2.1 Mature RE Techniques Currently Used [90] . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.1 An Example of Sentence’s Pos-tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.2 An Example of Sentence’s TDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 An Example of Sentence Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.4 An Example of Transformation Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.5 An Example of Multiple Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.6 An Example of Overlapping Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.7 Summary of Questions and Metrics in the Questionnaire . . . . . . . . . . . . . . . . . . . 51 5.8 avgEC, avgRC, DC and DR of the Identified Domain Models . . . . . . . . . . . . . . . . . . 53 6.1 COTS Enables Continuous Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.2 Effort and Percentage of Issues on domain modeling in Project PicShare . . . . . . . . . . . 77 A.1 Sentence Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 B.1 Transformation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 vi ListofFigures 1.1 Main Activities at the Early Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 E-service Projects Tickets Distribution at the Early Stages . . . . . . . . . . . . . . . . . . . 7 2.1 Code Generation Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 General MDE Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Dependency of Components Generated by DDD Tools . . . . . . . . . . . . . . . . . . . . 19 2.4 Dropwizard Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1 The Sweet Point: Just Enough Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Use Cases Decomposition, Domain Modeling and Microservices Generation . . . . . . . . 28 4.3 Overview of Domain Modeling Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.1 An Example of Incrementally Developed Domain Models . . . . . . . . . . . . . . . . . . . 34 5.2 Workflow of the Proposed Domain Model Identification Steps . . . . . . . . . . . . . . . . 37 5.3 Box Plots for the avgEC, avgRC, DC and DR of the Identified Domain Models . . . . . . . . 52 5.4 Degree of Familiarity of Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.5 Impression for Usefulness of Early Domain Identification . . . . . . . . . . . . . . . . . . . 55 5.6 Impression for Usefulness of the Proposed Tool . . . . . . . . . . . . . . . . . . . . . . . . 55 6.1 Simple UML Class Metamodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 Dictionary Metamodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.3 Intermediate Structure Metamodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 vii 6.4 Example of RESTful APIs Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.5 Example of Database Schema Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.6 Overview of the Code Generator Components . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.7 Infrastructure for Continuous Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.8 Domain Model Evolves from Uncertainty to Certainty in Project BDR . . . . . . . . . . . . 72 6.9 Effort Distribution in Project PicShare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.10 Ticket Distribution in Project PicShare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.1 Summary of Executable Domain Model Framework . . . . . . . . . . . . . . . . . . . . . . 81 viii Abbreviations SDLC Software Development Life Cycles COTS Commercial-off-the-Shelf DSMs Daily Stand-up Meetings CSG Source Code Generation UML Unified Modeling Language RE Requirements Elicitation MDE Model-driven Engineering CASE Computer-aided Software Engineering DDD Domain-driven Design TBCG Template-based Code Generation UI User Interface API Application Programming Interface SDK Software Development Kit ICSM Incremental Commitment Spiral Model COCOMO II Constructive Cost Model DDL Data Definition Language NLP Natural Language Processing TDs Type Dependencies ix POS-tags Part-of-Speech Tags EDM Executable Domain Models PA Parallel Agile KOTPV Keeper of the Project Vision USC University of Southern California x Abstract Microservice has been recognized as an important enabler for the continuous development of many cloud- based systems. A wide range of development methodologies and tools are available for delivering microser- vices at a fast pace. Microservices development has proven to be agile. However, common agile practices are not always the best solution to microservices development. This situation is subject to the nature of microservices and several technical and social factors during the project development life cycles. To apply the agile methodology in microservices development without reasoning may result in undesired overhead. Derived from the adoption of agile methodology in microservices development, the uncertainty ex- isting at the early stages of each development iteration is a major concern. This research identifies the way of improvement and develops an approach to eliminate the delays brought by the uncertainty. The proposed approach, Executable Domain Models, mainly consists of a process elaboration and a toolkit that assists implementation teams to identify application domain models and generate microservices at the early stages. The approach aligns this toolkit with the development process and coordinates domain modeling activity over project life cycles. Empirical studies have been conducted to assess the effectiveness and efficiency of Executable Domain Models. Experiments and several minimum viable products have been accomplished during the past years. The collected project data shows the domain identification tool generates more correct domain models for knowledge sharing purposes. The service generation tool, bound with domain modeling activity, results xi in a 10% saving of effort and fewer issues at later stages. Effort saving increases to 30% under an extreme condition with high-rate personnel turnover. xii Chapter1 Introduction To get rid of the heavy dependency on documentation and rigid planning in traditional development methodologies, agile practices [51] and continuous software development [28] have become more attrac- tive to today’s software industry [86, 102]. Meanwhile, the power of tooling is highlighted [34] over soft- ware development life cycles (SDLC). Under this circumstance, microservice, as one of the key enablers for continuous integration and delivery [46], is gaining popularity. 1.1 MicroservicesDevelopment Microservice was initially recognized as an architectural style, which splits an application into smaller cooperating components [113]. These components are typically deployed separately on a cloud infras- tructure and interact over lightweight mechanisms [52], such as RESTful APIs. The innovation in the tooling domain has made microservices architecture much more achievable for continuous development [86]. A mature toolchain includes code/configuration management tools such as Git [55], build integration tools such as Jenkins [68], deployment tools such as Docker [44], and monitoring tools such as Sensu [100]. These automation tools dramatically increase the delivery frequency with bare effort, specifically at later stages of each development iteration. 1 Additionally, microservices ensure availability and scalability [47]. A change or scaling of one service can be tackled without affecting the status of other services. A single point of failure thus can be avoided. By way of comparison, this is quite different from the traditional development of a monolithic application where components are highly coupled, following a heavy class hierarchy - any change or failure in a small part results in the entire monolith to be stopped, rebuilt and deployed; Scaling requires scaling of the entire system rather than the parts that need greater resources. Moreover, microservices can be also viewed as a project management strategy under which a small but cohesive implementation team responsibly manages a partial system. This tends to be less burdensome and more focused, allowing development tasks to be assigned to multiple implementation units in parallel. On the other hand, services can be written in different programming languages and set up by different frameworks and configurations, and it is more friendly to team diversity. The key benefits of microservices development can be concluded as follows: • Advanced tool chain can reduce the deployment time and difficulty, leading to continuous integra- tion and delivery; • Highly decoupled architecture can prompt better availability and scalability; • A small cohesive team and isolated management of services can contribute to better productivity and team environment. As a corollary to these benefits, microservices development emphasizes not only a decision of system architectural style, but also a thoroughly considered development process equipped with sophisticated toolsets [31]. With rapid and continuous delivery, this process imposes greater attention on feedback- driven capabilities. 2 1.2 Plan-drivenvs. Feedback-driven Software development methodologies are often discussed and categorized from plan-driven to feedback- driven. Theplan-drivenmethodologies impose a working process in a sequential series from requirements to deployment [5]. Documentation is enforced throughout the SDLC. It ensures that the implementation team creates shared and complete knowledge in the inception phase, and moves into the upcoming phases with well-planned steps. Change is discouraged due to the negative effect on the budget and time [53]. The plan-driven methodologies are also considered heavyweight and traditional methodologies. On the contrary, thefeedback-drivenmethodologies apply very light documentation and minimal iterative planning. People factors [35] such as skill and communication, are the primary drivers of project success [62]. Project knowledge is incrementally developed during frequent releases so that changes can be accepted and adopted at a quick pace. The feedback-driven methodologies are also considered lightweight and agile methodologies. Both methodologies have strengths and weaknesses [5, 13, 18, 20, 21, 80]. Generally, the plan-driven methodologies promise predictability, stability, and high assurance, but are considered burdensome with the need for rapid development and coping with continuous change [19, 20]. The quick response to changes in the feedback-driven methodologies creates more business values for products. However, it is challenging in terms of on-cost success [61] due to their flexibility and instability. A brief summary of comparison is in Table 1.1. 3 Table 1.1: Comparison of Plan-driven and Feedback-driven Methodologies Description Plan-driven Feedback-driven Primary Objective Conformation to plan Business value Approach Predictive Adaptive Emphasis Process-oriented People-oriented Planning Comprehensive Minimal; flexible Requirements Knowable early; stable Emergent incrementally; rapid change Documentation Heavy Light Architecture Design for current and foreseeable Design for current Development In sequence; limited cycles Iterative; numerous cycles 1.3 TheEarlyStages Unlike traditional development methodologies, which require a design procedure to be rigorously carried out and design to be formally represented before everything is ready for implementation, agile develop- ment usually applies an incremental and iterative process model. The incremental and iterative model slices a product into small but fully working pieces which are called increments [84]. Increments are built in a sequence of planning iterations. The first iteration addresses the basic requirements. Then sup- plementary functionalities are gradually delivered in each iteration until all designed functionalities are implemented [8]. A representative instance of this process model is the Incremental Commitment Spiral Model (ICSM) [24], which also laid the foundation of this study. In microservices development, activities in an iteration can be considered from many levels of detail. From the perspective of toolchain support, an iteration generally consists of the following stages [75]: • Planning. This stage includes visiting or updating requirements, and synchronizing shared knowl- edge among the stakeholders. • Development. In this stage, a partial system is built by an implementation unit. • IntegrationandTesting. This stage integrates the work from multiple implementation units along with integration testing. 4 Figure 1.1: Main Activities at the Early Stages • Release. In this stage, working functionalities are appended and deployed into the existing system. • Operation. This stage includes monitoring the running system and reviewing users’ feedback so that potential requirement changes can be captured and appended to the upcoming iterations. In this study, the author views the early stages as the combination of the planning stage and the early development stage. The effort spent at early stages aims at a better requirement elicitation workflow, more specifically, finalizing the scope of the ongoing increment and getting everything ready for implementation (Figure 1.1). At first, user specifications should be collected at the beginning of each iteration. It requires all stakeholders to participate in a series of meetings, among which win-win negotia- tion is the most important to produce initial user specifications and prioritization. Initial domain glossary and domain models are then further identified from user specifications to guarantee knowledge sharing among all stakeholders. The process moves forward with more detailed requirements elicited. The ac- tual workflows may vary from team to team, but generally follow the same strategy further described in Section 2.2.1. It usually takes a relatively long time until everything is ready for coding with a complete design. Due to this reason, a team may decide to start development when a specific level of detail is achieved. Although 5 some issues such as scope creep and knowledge inconsistency are still bothering and may be severe later, the trade-off between time and completeness depends on how people foresee and estimates potential risks. 1.4 Motivation Despite the huge benefits, the disadvantages of microservices development can be seen from what are evermore questioned to agile practices in [5, 13, 97]. In order to satisfy the expectation of rapidness, the development process tends to follow a feedback-driven manner at the expense of predictability. The loosely coupled implementation teams usually lack focus on the overall design, and spend too much time on too frequent releases of too poor quality [64]. As a result, too much rework has to be done due to the uncertainty accumulated from the early stages. Besides, microservices development, to some extent, runs contrary to the agile manifesto [51], which values individuals and interactions over processes and tools. The value of the investment in its toolchain and process is at least as important as the values associated with individuals and interactions [86]. For this reason, it requires balancing the effort between early customer satisfaction and early analysis of im- plementation needs. The trade-off adds difficulty to a team with a small number of developers. Boehm identified "the cone of uncertainty” [16], in which the curve of uncertainty tends to decrease as a project moves on and more decisions are made based on a better understanding of the project. The nature of microservices development brings undesired effects to small distributed teams on squeezing the curve, especially at the early stage of each development iteration. The author views the uncertainty at the early stages from two aspects: • Domainuncertainty refers to the lack of domain knowledge such as domain glossary, requirements and business logic. 6 Figure 1.2: E-service Projects Tickets Distribution at the Early Stages • Management uncertainty refers to the lack of evidence for allocating responsibilities, tasks and processes. The author draws on the experience from 12 recent software projects (i.e. real-customer e-service projects and directed research artifacts), which were built upon a service-oriented style (Figure 1.2). Dif- ferent teams used different approaches to reduce the level of domain uncertainty. For example, some teams had frequent on-side visiting or client interactions; the others tactically started with a relatively simple domain. Whatever approach ended up with a validating procedure with customers. Namely, implemen- tation teams had to transform the design ideas into more tangible prototypes so that the customer could validate the correctness. This feedback-driven manner is commonly used in agile processes. It includes effort in learning existing technical solutions, such as commercial-off-the-shelf (COTS), and developing demonstrations against critical features. The tickets distribution of Development Relevant tasks in Figure 1.2 indicates that this procedure occupies the largest portion of all tasks at the early stages. Nevertheless, after reviewing the project artifacts (e.g. operational concept, system architecture de- sign), and the observation from code review and core capability drive-through sessions, mismatches be- tween domain knowledge and implementation were still found. A design decision may be easily reflected 7 on the modification of prototyping, but is hard to track and update via documents or simple traceability metrics. With a compact schedule, a small implementation team often overlooks the design update and synchronization. The author started to question whether a feedback-driven manner should always be encouraged at the early stages. 1.4.1 DomainUncertainty In plan-driven development methodologies, it is easy to deal with domain uncertainty because software requirements are always the first order of business [22], which guarantees that all domain elements have been fully described and there are no missing or mismatched elements before the implementation team carries out the other activities. Small changes in requirements may occasionally happen but are addressable with straightforward change control procedures [67]. However, the above situation is not always achievable in a feedback-driven environment. The reason lies in the changes happening over the development life cycles, making it difficult to finalize the scope of the application domain at the early stages. As a result, requirements may be contracted without consensual understanding among all stakeholders. This introduces a risk of requirement mismatching at later stages. In order to mitigate this risk, revisiting the application domain becomes a frequent task. Most often, users who are asked to specify requirements generally claim, "I do not know how to tell you, but I will know it when I see it." (IKIWISI) [22] And their needs may change after they begin to operate the system. The highest priority to satisfy early customers’ needs is to conduct prototyping against changes. However, due to the lack of domain knowledge and skills, prototyping is usually initialized with insufficient understanding and incomplete technical solutions. To summarize the dilemma of resolving domain uncertainty, on the one hand, prototyping is desired to achieve a better understanding of the domain knowledge. On the other hand, due to the lack of domain 8 knowledge and skills, prototyping may perform poorly and be time-consuming. In microservices devel- opment, domain uncertainty drives too many development relevant activities at the early stages, leading to overlooking of design updates. 1.4.2 ManagementUncertainty Software development is usually a team activity that involves several parties with structural dependencies. Although modern organizations have tools that can meet most management needs, some procedures can still be problematic and deficient. This is intrinsically related to how people perceive, learn and think of information [106]. It is not always possible to be aware of the root cause, but one can be alert to those factors that may influence the success of a project. Three concerns can often be foreseen from the nature of microservices development: • Personnel Factors. According to the analysis by LinkedIn [14] in 2018, Software had the highest job turnover rate (13.2%) of any industry. Almost half of departing technical employees take another job within another technical sector. After organization switch, uncertainty can arise from the unfa- miliarity of the environment, application domain, technical stack, etc. A novice may take more time than an expert to produce the same artifacts. Personnel factors may introduce extra training and re-baselining cost. • Knowledge Synchronization. In a microservice hierarchy, synchronization within a team and across the teams is always a topic to be improved. For example, a downstream development team usually suffers from querying knowledge from the upstream team. The upstream team may already know what is going on. However, queries are processed slowly due to people’s task prioritization. Stray’s recent survey [107] shows that 87% agile teams have daily stand-up meetings (DSMs). How- ever, the larger the team, the less is the satisfaction with DSMs. Most agile methods drive frequent 9 meetings or messaging bouncing across teams. However, it turns out to be a daily interruption that is unproductive. • Planning Factors. Many factors such as loosely coupled organization, schedule pressure and de- velopment team diversity may together contribute to difficult project planning. In a large project, different teams may apply different programming languages/frameworks; due to communication gaps and deadline pressure, each team tends to take a feature-focused manner at the early stages and worry about the overall design integration later. Hence technical debts and risks are accumu- lated. 1.5 ResearchQuestions Research questions are developed under the context of microservices development, primarily focused on the early stages of each development iteration. RQ1. How to achieve a better balance between plan-driven and feedback-driven in microservices development? RQ2. What activities, responsibilities and artifacts can be introduced to mitigate the uncertainty? RQ3. What tools can be introduced at the early stages to mitigate the uncertainty? RQ4. In what ways does the proposed solution improve the project outcomes? RQ5. How could the proposed solution be improved? 1.6 ResearchContributions The contributions of this research can be summarized as follows: 10 • Elaboration of the development process for better addressing management uncertainty at the early stages. It highlights the necessity of domain modeling activity under a microservices hierarchical structure. • Domain identification tool aligned to the proposed process for domain discovery. This automated tool helps collaborative implementation units to build and share domain knowledge at the early stages. • Microservices generation tool aligned to the proposed process for prototyping and software build- ing. This automated tool helps the implementation teams to generate executable, reusable and re- targetable code at the early stages. The overall idea is summarized as the concept of Executable Domain Models (EDM), which facilitates knowledge sharing and effort saving for microservices development. 1.7 OrganizationofDissertation The organization of this dissertation is as follows: Chapter 2 presents the background information about the current processes, techniques, tools and other related work. Chapter 3 explains the research methodology used to test the hypotheses. Chapter 4 Adds additional introduction to the concept of Executable Domain Models. It elaborates the workflow of the proposed development process and highlights the difference compared with the common practices. Chapter 5 introduces the technical solution for early domain identification and discusses the evaluation results from 13 projects. 11 Chapter 6 introduces the technical solution for early microservices generation and discusses the eval- uation results from 4 case studies. Chapter 7 summarizes the contributions and proposes future research work. 12 Chapter2 RelatedWorkandBackground 2.1 SourceCodeGeneration(SCG) To obtain instant prototyping at the early stages, SCG approaches can be employed. Traditional code generation approaches [45, 99] suggest creating a large number of models from scratch. The models describe the desired system from a high level abstraction. As a general example of using the unified modeling language (UML) as the description in Figure 2.1, the abstract level decreases from left to right in the pipeline as more design and analysis work is carried out. The source code of desired objects with attributes and behaviors is usually generated from a detailed design. This process requires a procedure to be rigorously performed and the design to be formally represented. The modern code generation approaches further identify variant description systems as the input to depict software objects [123]. The outputs also purposefully target different levels of abstraction. Based on the characteristics of the input and output, the related work of microservices generation can be categorized into two directions. 13 Figure 2.1: Code Generation Pipeline 2.1.1 FromNaturalLanguagetoDesign Considering SCG as a task of transformation, syntactic and semantic parsing [74] are used to transform meaning from natural language description into various formal representations. Graphic description such as [26, 119] and domain specific description such as [54, 56] are often used to describe the output. Given both available and expected artifacts at the early stages of a development iteration, there have been some efforts in the direction of constructing domain models from use case specifications. [60, 76, 89, 98] apply semi-automated approaches to assist the developers in deriving the domain models. Developers’ intervention is required to either specify more information or verify the results. [12, 65, 77, 112, 124] apply automated approaches in which user specifications are suggested to be well-written without any vagueness and inconsistency. As a result, the outputs are highly consistent with the actual problem domain. 2.1.2 FromDesigntoCode Considering domain-relevant requirements, some researchers try to introduce more expressive description patterns to customize design choices with more accurate specifications. The generation process is usually 14 combined with program synthesis technique [57] and template-based approach [109]. The generated code snippet often involves dependency on a popular domain and covers more complex usage scenarios. In the field of microservices or service-oriented development, the availability of different service de- scription methods can give developers a range of options to choose from so that they can have an ap- propriate description method that fits best their services [110]. For example, [17, 79] extends web service descriptions to address the reliability and process characteristics of a service. [30] introduces a new de- scription language to describe tradable services which are advertised in electronic marketplaces. [71, 103] provides a more expressive way to specify the configuration of cloud infrastructures. 2.1.3 Summary Generally, these SCG approaches assume rich information to be created before the generation. They may be the best fit at some certain stage of development. However, at the early stages, domain abstraction is always ahead of detailed system design. Before achieving a more detailed level of abstraction, an application model should have been iterated many times. Finkelstein’s viewpoints framework [50] states that everyone has her own viewpoint of models that are only relevant to oneself. It allows for temporary inconsistencies rather than enforcing consistency at all times. This insight fits the situation of the early stages well because model consistency cannot always be preserved among all developers at this point of time, especially under an incremental and often parallel development environment. The related researches in this section highlight the solution background associated with the context of this study. However, a better process model remains to explore. 2.2 ProcessModels This section reviews the major tasks to be done at early stages based on software development processes. 15 Table 2.1: Mature RE Techniques Currently Used [90] Technique Description Methodologies Traditional Gathering generic data in order to identify Interviews stakeholders’ needs and system limitations. Surveys Task analysis Questionnaires Collaborative Negotiating and promoting agreements Focus groups among stakeholders. Workshops Brainstorming Prototyping Obtaining detailed information and promot- ing feedback among stakeholders. Prototyping Modeling Providing a specific type of information for Scenarios better understanding the context. Goal-based approaches Business process models Use cases Cognitive Knowledge acquisition and creating a Ontology knowledge base. Card sorting Repertory grid Contextual Collecting exhaustive data about Ethnography stakeholders’ work environment. Ethnomethodology Agile Standard tasks in agile development Mind mapping environment. User stories Group storytelling 2.2.1 RequirementsElicitation(RE) Requirements elicitation is a critical activity at early stages. This activity involves all stakeholders inter- acting to obtain quality requirements. It provides a base for the implementation team to construct the desired system [72]. There are many techniques for gathering software requirements. [90] identified the mature techniques currently used into 7 categories which can be summarized in Table 2.1. They also highlighted that more than one technique is often used in a project and some techniques are not frequently used due to the increased use of others, such as collaborative and agile techniques. [121] identified 6 types of contributors for a better RE process, including frameworks, models, methods, techniques, approaches and tools. They also state that the level of automation, knowledge reuse/transfer, human factors and collaboration are the main aspects of concern during RE. 16 Figure 2.2: General MDE Process 2.2.2 Commercial-off-the-Shelf(COTS)Analysis Although the definition of COTS varies by field, the author refers to the definition by [114] that a COTS product is a commercially available or open-source piece of software that other software projects can reuse and integrate into their own products. A COTS is a prebuilt product; hence the usage of COTS benefits shrinking budgets, accelerating rates, and expanding system requirements [83]. COTS analysis helps the implementation team to decide whether a project should use any prebuilt products or create a new solution to fulfill a requirement. Traditionally, COTS analysis happens along with requirements gathering, with the emphasis on gathering external information [82]. The accommodation involves steps of COTS identification, evaluation and selection. Deciding whether and which to use is difficult. To decide whether to use requires a continuous trade- off among the system requirements, the COTS products on the market, and the software architecture [114]. To decide which to use often depends on expert evaluation from various criteria [27, 81]. Besides, often a few stakeholders are involved in this process, and they might hold conflicting viewpoints [120]. [81] also concludes that if selected inappropriately, COTS can cost more to use than developing needed software functionality from scratch. 17 2.2.3 Model-drivenEngineering(MDE) MDE considers models as the primary development artifact and uses them as a basis for obtaining an executable system in different ways [105]. At early stages of a project, requirements are initialized with a high-level description. MDE usually involves an object-oriented design process in which developers represent the high-level description as a variety of models. Then code generation is applied to transform design models into an executable system. Different approaches try to provide standard ways to represent the specification of models (i.e. the metamodel) and its transformation [58]. Several approaches have been integrated into CASE tools. For example, Xpand [73] allows developers to first define metamodel in Ecore as the design-time input and then define the corresponding models and templates as runtime input. Its templates can be extended with a custom language (e.g. Xtend and Java). Other MDE tools like [2, 3] apply a similar strategy but differ in design syntax. These tools provide many advantages, including rapid development and high consistency between design and implementation. However, conducting a model-driven process is more than a generic agile development process - ad- ditional roles, tools and activities should be identified and taught. Specifically, as shown in Figure 2.2, a general MDE process requires domain experts, implementation experts and solution architects to work closely so that design-time input, design template and runtime input conform with each other. 2.3 ToolsandFrameworks 2.3.1 Domain-drivenDesign(DDD)Tools DDD is a design approach encapsulates relevant domain objects within a bounded context and iteratively refines the domain concepts to address the domain problems [48]. Consider a single microservice is a 18 Figure 2.3: Dependency of Components Generated by DDD Tools bounded context that presents a unit of functionality [25], DDD is highly compatible with MDE for gen- erating functional prototypes of microservices. While MDE is more concerned with transforming design models into implementation for different programming languages and platforms, DDD focuses more on a better domain modeling practice. A representative tool, Apache Isis[10], based on the idea of Naked Objects [91] pattern, allows rapid prototyping by automatically generating/updating object-oriented user interface (UI) and rendering UI- related behaviors from built models. Similar tools include [88, 92, 115] for different programming languages and platform support. Automated generation and updating generally alleviate the coding work and make a complex domain understandable at early stages. In spite of that, the generated components (Figure 2.3) are often overloaded with high dependency and poor readability. This adds difficulty to maintaining the code with special requirements at later stages, and complicates manual modification in the subsequent iteration. 2.3.2 Service-orientedFramework Dropwizard Framework [41] provides Java-based web services with sophisticated configuration, applica- tion metrics and operational tools. It provides developers with stable and ready-to-use libraries to build 19 Figure 2.4: Dropwizard Dependencies back-end services. For example (Figure 2.4), it uses Jetty for creating HTTP server, Jersey for building RESTful web applications and Jackson for conversion between the JSON format and Java entity. It also contains a number of useful libraries such as JDBI for relational database, Hibernate Validator for input validation, etc. Popular frameworks with similar technical strategies but different language support are [49, 104]. Those frameworks help developers to make architectural decisions by providing mature solutions for most development scenarios. Nevertheless, developers are expected to have proficient skills in those libraries and programming languages. Hence extra effort is usually spent on required training and feasibility analysis on these frame- works before implementation. 2.3.3 OtherState-of-the-arts Several template-based code generation (TBCG) approaches related to service development and configura- tion are worth exploring. XSLT [4] is used in transforming XML documents into other formats. It looks for each tag of the input document and applies the corresponding template by following a straightforward fil- tering and matching strategy. It can be used for presenting models in specific XMI formats. JET[69] allows developers to use a programming language (e.g. Java) to handle the dynamic part of the code generation. 20 Combined with the templates, developers have more choices in implementing the target code[109]. It can be concluded that in the field of TBCG, researchers [73, 91, 109] consider different forms of representation as potential input sources, and a broad range of development artifacts as potential targets. Source and target usually conform to different metamodels. The purpose of generation varies in different stages. A successful transformation between target and source should be accomplished with, at least but not limited to, a specification of conditions, transformation rules and the corresponding scheduling control [40]. To facilitate microservices generation, tools like Swagger [108] and APIMatic [11] try to generate both API interface and SDKs from specific API description in standard formats, such as RAML, WADL, WSDL, etc. This feature significantly accelerates the development of services. It can be applied as a technical solution in most service-oriented projects. However, developers still need to seek for a strategy for API evolution against flexibility and domain uncertainty during the runtime of a project. Moreover, the solution to back-end construction remains to be determined because neither service infrastructure nor data storage is finalized. It can be concluded from the tools above that a generation of services should include but not be limited to the following items: • Databaseschemaandmanipulationfunctions, handling database initialization and data manip- ulations. • Server-sidescriptsandAPIreferences, providing endpoints toward the application domain. And the scripts usually target to a mainstream web framework so that developers are able to have further tuning on the code. • Client-sideSDKs, which can be adopted by different platforms and programming languages. • Other supportive artifacts, such as configuration files for both the client-side and server-side scripts, enabling automated integration and deployment in the later phase of development. 21 Chapter3 ResearchMethodology 3.1 Hypotheses This research conducts case studies against several project instances to prove: Overall Hypothesis: Executable Domain Model efficiently helps eliminate domain uncertainty and mitigate risks caused by management uncertainty at the early stages. The data collected in Chapter 5 tend to prove: Hypothesis1.1: Early Domain identification mitigates risks caused by knowledge inconsistency. Hypothesis1.2: Early Domain identification mitigates risks caused by personnel factors. To test these two hypotheses, 13 project instances were carried out and observed. To compare between the domain models identified by the proposed tool and the domain models produced by developers, data are collected and analyzed during requirement elicitation. A questionnaire regarding usage and satisfaction of the proposed approach was also collected and analyzed afterwards. The data collected in Chapter 6 tend to prove: Hypothesis 2.1: Early services generation geared to domain modeling activity efficiently eliminates domain uncertainty. Hypothesis2.2: Early services generation geared to domain modeling activity mitigates risks of plan- ning. 22 To test these two hypotheses, 4 project instances were carried out and observed. Different project instances focus on different aspects of assessment. The first case study explains how the proposed approach assists domain modeling activity and knowledge sharing in a project with a high personnel turnover rate. The second case study observes the performance in a project with complicated domain concepts. The third case study observes the performance and compares the outcomes from a control experiment. The fourth case study observes the performance in an industrial project containing a large number of domain entities. 3.2 ThreatstoValidity Possible validity threats of this research and its evaluation procedure are as follows: • Non-representativeprojectsandparticipants: Not all project instances were conducted in an in- dustrial environment. Instead, some projects were conducted in a master-level software engineering class. One may question the representativeness of classroom projects. Three points can be clarified for this question. First, these projects are all developed for real customers from different business do- mains. The whole development life cycles were fully guided and observed by working professionals. Second, to test Hypothesis 1.1 and 1.2, the analysis filters out participants without industrial expe- rience. Third, to consider a trade-off between the level of control and degree of realism for testing Hypothesis 2.1 and 2.2, the classroom projects are more deterministic. Hence classroom projects were conducted to simulate an environment with a controllable level of uncertainty. • Invaliddata: The project activities and artifacts are fully observed and reviewed. To test Hypothesis 1.1 and 1.2, the data are collected at early stages during requirement elicitation. To test Hypothesis 2.1 and 2.2, project data were collected and fixed every other week. In order to track the actual effort and risks, participants were asked to create Jira tickets. Tickets creation and update follow a well-defined workflow and labeling rules. 23 • Learningcurve: The possibility of inexperienced developers and learning curve for using the pro- posed approach and tools could be overcome by providing training sessions and user guide docu- ments. One may argue that if any specific skills are required, it needs extra effort to learn. The results show an 1-2 hours of training before launching project instances for those who have basic software engineering background. Some external validity of the proposed approach can be questioned as follows: • Adaptabilityindifferentdomains : This study highlights the necessity of domain modeling, along with the use-case-driven approach in microservices development. One may think it may not work in all ways of software development. It is agreed that no single approach can fit all cases. Corre- sponding project instances show its adaptability in several different domains. This study also holds the view that the proposed approach can bring a significant improvement, especially for systems involving service-oriented and data-centric features. • Adaptability in different development processes : The same argument may lie in the adapt- ability of different development processes. Different organizations may apply various development processes. This study holds the view that one can always consider to introduce domain modeling activity at early stages, especially when an organization has a complex development hierarchy. 24 Chapter4 DomainModeling Software development can be a highly problematic procedure. It can be affected by various situational factors [33] such as the nature of the application under development, team size, requirements volatility and personnel experience. Failure to take these factors into consideration may lead to delay, over budget, less capability than initially promised, etc. From another perspective, it can be inferred that these undesired consequences are generally due to underestimating the required effort. Hence a good cost estimation model (COCOMO II [23]) not only predicts effort, but also provides a comprehensive reference for people to be aware of these factors at a particular stage. Moreover, lessening the impact of these aggravating factors, thus reducing effort, should be one of our primary goals. We put the eyes on the early stages of each development iteration, where more uncertain factors exist than at the other stages. A better elaboration of the process model could drive a better solution to mitigate these uncertainties. This chapter does not propose a new process model, but highlights the necessity of domain modeling and discusses its adaption, especially for implementation organizations involving complex hierarchy (e.g. microservices development). The elaboration and proposed activity are based on Incremental Commitment Spiral Model (ICSM) [24]. 25 Figure 4.1: The Sweet Point: Just Enough Planning 4.1 TheEarlyStages-FromthePerspectiveofCostEstimation 4.1.1 TheSweetPoint Software processes can be rated on a formality scale (Figure 4.1) ranging from feedback-driven to plan- driven[18]. The extremes on either side of this scale are expensive. The sweet point is the cost minimum somewhere in the middle. To achieve the sweet point, trials should be made for "just enough planning", namely, balance the activities between feedback-driven and plan-driven. 4.1.2 LearningfromCOCOMOII We view the early stages as the combination of the planning and early development stages during the SDLC. At the early stages, many potential high-risk problems need to be solved. Most problems, including 26 but not limited to system interaction, performance and technical maturity, are due to the level of under- standing regarding the nature of the desired system. Hence an Application Composition Model [23] is used to estimate the cost based on the interoperable components (Formula 4.1). PM = ObjectPoints× (1− %reuse) PROD (4.1) PM measures required effort in person-month. Object Points [15] adds the weighted object instances based on their complexity. %reuse estimates the level of re-usability to be achieved in a project. PROD represents productivity and can be determined by developer’s experience/ability and development envi- ronment maturity. The value for each component is customized and then attached to the whole body of the project differently with a different level of challenge [6]. An Early Design Model or a Post-Architecture Model [23] is used at a later phase (Formula 4.2). PM =A× Size E × N Y i=1 EM i (4.2) A is a constant derived from historical project data. Size is in KSLOC (thousand source lines of code), or converted from function points [7] or object points. E is an exponent depending on 5 scale factors. EM i is the effort multiplier for the ith cost driver. There are 17 cost drivers to adjust the effort in the Post- Architecture Model, while 7 of them are used in the Early Design Model due to the limited information that can be acquired in the early design phase. For the purpose of this study, it can be learnt that at the early stages, in order to better estimate and plan: • Due to the limited source to be analyzed for size and effort, the focus is put on a better understanding of the system components. • Persons, tools and platforms to be employed significantly impact the final estimated effort. 27 Figure 4.2: Use Cases Decomposition, Domain Modeling and Microservices Generation • As a project proceeds, the system and the other environmental factors are better known. And more factors are identified for more precise estimation and planning. 4.2 DomainModelingActivity A domain model gathers the information of an application domain, as stakeholders need, to better under- stand and create shared knowledge on the requirements of an application [85]. The activity of domain modeling guides stakeholders in achieving specific goals at different project stages. It helps build shared knowledge, and a better understanding of domain objects and components incrementally. On the other hand, the microservices style can be viewed as a strategy of development under which a system is decomposed into a number of separated components. The use-case-driven approach [93] is an excellent match to this strategy; that is, each implementation unit is assigned to a sub-domain of the system based on use case decomposition. All implementation units then model the problem domain and produce microservices in parallel (Figure 4.2). The detailed workflow of activities is shown in Figure 4.3: In the Inception Phase, while initializing the scope of a system, everything is uncertain. An initial abstraction of the problem domain is built from user stories, with a common vocabulary established as a part of shared knowledge. 28 Figure 4.3: Overview of Domain Modeling Activity In theFoundationsandDevelopmentPhases, while creating operational concepts and system ar- chitecture, a problem domain is decomposed into sub-domains with different levels of detail. Meanwhile, prototyping is conducted against critical features to eliminate the domain uncertainty. As the development process obeys a spiral model, while incremental design and re-baseline intertwine as a project moves forward, domain models get updated and integrated. Meanwhile, new sub-models may be introduced from additional functional requirements, leading to an update of shared knowledge and prototyping against new user expectations. 4.3 ExecutableDomainModels-TheDifference All knowledge-intensive processes tend to identify a sequence of activities in which goals are clear, and tasks are not piled together. However, if the risks are negligible, we always expect to combine the activities into a more streamlined process to shorten the schedule [24]. Hence the proposed approach starts code generation from the initial domain model at a relatively early stage instead of a detailed design at a later stage, such that it helps streamline the process from design to implementation. In this way, the proposed process is different from traditional methods, in which code generation happens near the end of the Foundations Phase or at the beginning of the Development Phase. In this 29 approach, the code can be generated whenever a domain model is created or updated. It uses early backend-agnostic services generation for prototyping to achieve feedback-driven. And proto- typing is used as a requirements discovery mechanism for better planning. This happens in the Inception Phase, with the purpose of eliminating domain uncertainty and mitigating the risks from man- agement uncertainty. 30 Chapter5 EarlyDomainIdentification Even though stakeholders may conceptually hold different views of an application domain, they can use domain models to represent and share their viewpoints. Code generation then can be applied immediately from the domain models. And prototyping with the generated code becomes a way to converge different viewpoints and achieve eventual consistency. Most development practices apply code generation from domain models at a relatively later stage. Undoubtedly, the requirements collected in the early stages are usually limited and incomprehensive. From the concerns of cost, the activity above seems to be inhabited by the personnel and planning factors, because producing valid domain models in a short time and with higher accuracy is a significant challenge. This chapter proposes a domain identification tool that extracts valid domain models from early project artifacts. 5.1 TheDomainIdentificationProcess Current microservices development approaches target rapid evolution with good maintainability for con- tinuous delivery and integration. The development of such software requires developers to have a good understanding of the shared knowledge and to keep tracking changes among different use cases, between development iterations, and across teams. Domain modeling provides a way to satisfy these requirements. 31 It helps in making domain knowledge explicit, and in integrating the domain knowledge with the concep- tual model of the application [63]. . The models are evolved during the continuous development as more details from requirement, design and implementation trigger changes, resulting in a more explicit design model, including entities used for the design of classes, and data schema used for the design of databases. To identify domain models, a formal method is found [48]. The following aspects are commonly con- sidered: • Access requirement specifications . Requirement specifications are a major artifact produced from the requirement elicitation process. The earliest artifact is usually a list of user stories that are collected from stakeholder interactions. For implementation purposes, the user stories are usu- ally transformed into a more formal format such as use case specifications [66]. • Identify domain objects, relationships and behaviors. To initialize the modeling process, do- main entities are first identified from requirement specifications, and used to build a common vo- cabulary for communication purposes. Domain relations and behaviors are further identified and mapped to different domain entities. • Visualizedomainmodels. With all required domain elements identified, a representation such as a UML class diagram is often used for visualization. The visualization facilitates domain knowledge sharing and communication among all stakeholders. 5.1.1 KnowledgeSharing Developers involved in the relevant use cases need to collaborate to achieve the consistency of domain knowledge. A formal presentation (e.g. a document of use case specifications, a class diagram, or a piece of well-formatted comment in the source code) needs to be created as the shared knowledge. Any subsequent changes trigger revisit and modification of the shared knowledge, which will be further verified during the subsequent activities such as code review and testing. 32 It brings frustration when we discuss the environment for nowadays agile projects. First, due to the time and cost pressure with rapid iteration, detailed design cannot always be worked out before develop- ment, but be made during a development iteration and integrated at the end of it. It is not easy to precisely pre-determine a complete model. Second, changes in requirement, design and development happen more frequently than before. It is possible to work on requirement, design and implementation on the same day and to have issues from one stage impact issues from its prior stage. It brings uncertainty to domain definition and refinement. Third, people may hold different viewpoints on domain concepts; Manual work of domain identification at the early stages is error-prone. Fourth, less documentation makes it relatively difficult to trace the updated shared knowledge and the models’ evolution. Instead, more communication among developers is required. Due to the above intricacies, a way to guarantee the consistency between conceptual models and a more explicit design model is unexplored, leaving developers to either self-organize the collaboration or query the Keeper of the Project Vision (KOTPV) [37], who is sometimes inaccessible. Consequently, domain information becomes untraceable and challenging to maintain after several development iterations. 5.1.2 TheProposedProcess The proposed process suggests automatically identifying domain or partial models from user stories when- ever changes happen. It allows a representation of incrementally developed knowledge to be shared any- time. For example, at the very beginning of a project, an initial model (in the dashed rectangle of Figure. 5.1) can be identified and only contain a few entities extracted from user stories. By the time when this model is initialized, its correctness is unknown. However, the knowledge has been created and can be shared and verified by requirement specifiers. When more decisions are made and requirements are collected, 3 sub-models of separated use cases can be identified. More attributes are then added to entities with several 33 Figure 5.1: An Example of Incrementally Developed Domain Models associations refined. However, it does not mean that the domain knowledge will remain unchanged in the later iterations. The latest identified domain models can be a knowledge foundation for the upcoming requirement updates. To distinguish the difference from the traditional way, the proposed process • Access domain models earlier. It happens at the very beginning, right after the initial user interac- tions instead of after the relatively detailed design. • Automate domain identification with a little effort of developers’ intervention. • Always keeps shared knowledge up to date. Domain models are used as the shared knowledge, and are consistent with what are mentioned in the requirement specifications. 34 5.2 UserStories The growing popularity of agile practices leads to increasing adoption of user stories during requirement elicitation [70]. A user story usually follows a standard writing format [118] to capture a functional re- quirements: As <arole>, I can <do aactivity>, so that <to achieve agoal>. This format comprises 3 fields: • Role filed indicates "Who". The actor who wants the functionality to be realized. This field is always required. • Activity field indicates "What". Describes the functionality to be realized in the desired system. This field is always required. • Goal field indicates "Why". The benefit from the provided functionality. This field is optional. There is another format to capture the non-functional requirements such as level of services, program- ming language/framework, etc. The format is shown below. To create application domain models at the early stages, this study will not focus on non-functional requirements. The system shall/should <...>. Buglione et al. [29] suggest INVEST criteria for improving the quality of user story writing. It high- lights the importance of writing independent and small user stories, which are also advocated by the pro- posed approach. In other words, a user story should be as independent as possible from another. Moreover, it should be sufficiently granular and not defined at too high a level. 35 5.3 NLPStructures To automatically identify the domain elements from functional user stories, two NLP structures are used: Parts-of-Speech tags (POS-tags), refers to the words in a sentence assigned with tags as parts of speech [38], such as nouns, verbs, etc. Given a sentence"Asauser,Icanviewmyprofile." The corresponding POS-tags are shown in Table 5.1. Table 5.1: An Example of Sentence’s Pos-tags Word Pos-tags Description As IN the word is a preposition or subordinating conjunction a DT the word is a determiner user NN the word is a singular noun I PRP the word is a personal pronoun can MD the word is a modal view VB the word is a verb, base form my PRP$ the word is a possessive pronoun profile NN the word is a singular noun Type Dependencies (TDs), represents the semantic relationships between the words of a sentence [42]. Given a sentence "A user can view his profile." , The corresponding TDs are shown in Table 5.2 Table 5.2: An Example of Sentence’s TDs TDs Head Dependent Description det(user, A) user A "A" is determiner of "user" nsubj(view, user) view user "user" is subject of "view" aux(view, can) view can "can" is auxiliary of "view" obj(view, profile) view profile "profile" is object of "view" nmod:poss(profile, his) profile his "his" is possession modifier of "profile" 5.4 UserStoryProcessingWorkflow This section shows the overall workflow and concepts of processing functional user stories. The detailed configurations of each processing step is then described in Section 5.5. 36 Figure 5.2: Workflow of the Proposed Domain Model Identification Steps 37 5.4.1 ErrorDetection The first step is to read user stories, detects potential misspelling error, and filters out illegal characters. Because user stories are usually collected hastily during stakeholder interactions, this step attempts to clean up those caused by human mistakes, providing a better corpus for the subsequent steps. Developers’ intervention is then required to fix the user stories based on the output error messages and suggestions. 5.4.2 Preprocessing Stemming and Lemmatization: This step is a standard procedure in NLP that reduces inflected word to its root form. For example, given the following sentence as the input, words with plural forms are transformed into their corresponding root forms in the output. [Sentence] "User can use WAT points to redeem items from a virtual store." [Result] "User can use WAT point to redeem item from a virtual store." Rewriting: This step rewrites functional requirements from a composite structure to a simple structure. A composite structure increases the difficulty in the transformation step. This step simplifies these sentences from 4 steps discussed in Section 5.5. 5.4.3 SentenceStructureMatching Tokenization and Tagging: A sentence is parsed into tokens with POS-tags and TDs identified. For example, given a sentence "A user can upload bad driver report to the reporter account." as the input, the output is: [Tokens] A/1 user/2 can/3 upload/4 bad/5 driver/6 report/7 to/8 the/9 reporter/10 account/11 38 [Type Dependency] det(2, 1), nsubj(4, 2), aux(4, 3), obj(4, 7), nmod(4, 11), amod(7, 5), compound(7, 6), case(11, 8), det(11, 9), compound(11, 10) [POS-tags] ’DT’: [(1, 1), (9, 9)], ’JJ’: [(5, 5)], ’MD’: [(3, 3)], ’NN’: [(2, 2), (6, 6), (7, 7), (10, 10), (11, 11)], ’NP’: [(1, 2), (5, 7), (9, 11)], ’TO’: [(8, 8)], ’VB’: [(4, 4)]] Structure Mapping: A sentence structure is defined as consisting of POS-tags and TDs. An sentence structure example is shown in Table 5.3. It describes a structure whereA,B andC are words in a sentence. B is a subject of A, C is a object of A, B is a noun, C is a noun and A is a verb. Table 5.3: An Example of Sentence Structure Structure Notes TDs POS-tags SVObj Subject-Verb-Object nsubj*(A,B), obj(A,C) B==NN*, C==NN*, A==VB* There are 26 sentence structures predefined in the proposed tool, which is further discussed in Section 5.5 and Appendix A. A sentence can be matched to multiple sentence structures, addressing different parts of the transformation. If a sentence cannot be matched to any predefined structure, it will be marked and prompted for developers’ intervention. If no action is done for this unmatched sentence, then this sentence is skipped by the transformation process. 5.4.4 Transformation With the predefined sentence structure, a list of transformation rules are used to extract relationships between domain elements. A transformation rule example is shown in Table 5.4. For a sentence marked as "SVObj", it denotes that actor B takes an action A on target C. Hence B and C are identified as the domain entities, A is identified as the domain behavior of B, B and C has a relationship via A. 39 Table 5.4: An Example of Transformation Rule Structure Operation Rule SVObj add_behavior actor=B,target=C,action=A There are 32 transformation rules predefined in the proposed tool, which is further discussed in Section 5.5 and Appendix B. The rules try to identify domain entities and their corresponding attributes, behaviors, and relations. The output is an intermediate structure that can be later converted into any format for visualization and analysis purposes. 5.4.5 Visualization The identified domain elements are reorganized and represented as a UML class diagram in which domain entities are potential UML design classes and attributes, domain behaviors are potential UML functions, and domain relations are potential UML associations between classes. 5.5 ProposedDomainIdentificationTool The proposed tool mainly uses Stanford CoreNLP [78] APIs version 4.2.2 to handle NLP tasks. The current tool consists of 5 components corresponding to the proposed processing steps. Each component works independently with its input, output and configuration. The output from each step can be used as the input for the next step. This section discusses the detailed configurations and algorithms applied in each component. 5.5.1 ErrorDetector This component reads raw data (a list of user stories) as the input. To avoid mistaken extraction of domain elements and eliminate ambiguity in the final domain model, an error detector detects illegal characters and potential misspelling words. Two solutions are applied. 40 Algorithm1 Misspelling Detection of Domain-specific Words Require: sentences,ACCEPTED_EDIT _DISTANCE,ACCEPTED_USAGE 1: functionDetectDomainSpecificMisspelling 2: candidates← list of words cannot be found in the dictionary 3: word_frequency← table of word frequency of the given corpus 4: forword incandidatesdo 5: if thenword_frequency[word] <ACCEPTED_USAGE 6: misspelling_candidates.add(word) 7: else 8: domain_specific_candidates.add(word) 9: endif 10: endfor 11: forword_m inmisspelling c andidatesdo 12: forword_c indomain_specific_candidatesdo 13: ifcheckEditDistance(word_m,word_c)<ACCEPTED_EDIT_DISTANCEthen 14: Fire potential misspell word:word_m 15: Fire potential fixing suggestion word: word_c 16: endif 17: endfor 18: endfor 19: endfunction Dictionary-baseddetection. An English dictionary is imported from spaCy 1 library to detect common misspelling error. Domain-specific detection . If a term appeared in the raw data cannot be found in the provided dictio- nary, it will be marked as a domain-specific term. For example, given a sentence "Webcrawler can visit publicsites,extractsimplestructuredrelationaldata.","Webcrawler" is marked as a domain-specific term. In the raw data, another sentence "The webcraweler can gather head shots of players." mistakenly introduces an extra "e" in this word. In order to prompt a helpful error message and fix suggestions, a domain-specific misspelling detection is applied by using word frequency and edit distance [43]. In this solution (Algorithm 1), a word frequency table is created to record the frequency usage of each word in the raw data. Then string edit distance is applied to find out the most similar word to each misspelling candidate. The word with higher frequency 1 spaCy: https://spacy.io/ 41 usage will be prompted as the potential edit suggestion of the one with lower frequency. Furthermore, a misspelling candidate will be prompted without an edit suggestion if no similar word is found. 5.5.2 Preprocessor This component reads the well-modified user stories and rewrites them from a composite structure to a relatively simple one. Meanwhile, a domain glossary, which extracts all nouns from user stories, is built for communication purposes. There are 4 rewriting steps presented in this component. Identifyfunctionalrequirements. Based on the two formats presented in Section 5.2, a regular expres- sion is used to distinguish functional requirements from other requirements. Merge compound nouns and identify domain entities. This step removes the space between nouns for a phrase that consists of multiple nouns, and maps the new word with all semantically identical words in the phrase. A new sentence is produced with this mapping. For example, given a sentence "As a user, I can use WAT point to redeem item from a virtual store." The output is: [Glossary Mapping] "WATPoint" : ["point", "WAT point"] [Rewriting Result] As a user, I can use WATPoint to redeem item from a virtual store. This rewriting step also aims to identify consistent domain entities, including entities consisting of multiple words. It is built based on both TDs and POS-tags: • Use POS-tags structure to identify noun (NN) or proper noun (NNP). The nouns and proper nouns can be easily identified by checking NN and NNP in the POS-tags structure. For example, given "User can use WAT point to redeem item from a virtual store." as the input, the following words can be identified: " User/NN, WAT/NNP, point/NN, store/NN" 42 • Use TDs structure to identify the compound nouns or compound nouns with an adjectival modifier (amod). For example, the structure rule compound(B,A), amod(B,C) can be used to identify the phrase "bad driver report" - compound("report", "driver") and amod("report", "bad"). • Use the combination of TDs and POS-tags to identify nominal modifier (nmod) with a subordinating conjunction (IN). Modifiers denotes to elements that change the meaning of another element in the sentence structure, For example, the structure rule nmod(A,C) withB/IN can be used to identify phrase "point of WAT" - nmod("point", "WAT"), "of"/IN Eliminatethenongranularexpression. Slash and conjunction (e.g. "and") appear in a user story makes a functional requirement nongranular. For example, the following example involves two functionalities desired (i.e. "view profile" and "update profile" ). However, user stories are often written in this way for the sake of time. This step splits a non-atomic sentence into multiple sentences: [Sentence] "As a user, I can view/update my profile." [Rewriting Results] "As a user, I can view my profile." "As a user, I can update my profile." Replacepronounswithactor. This step resolves the correference between the actor and its pronouns. In the following example, the pronoun (i.e. "I") and possessive pronoun (i.e. "my") refer to the actor (e.g. "user"). [Sentence] "As a user, I can view my profile." 43 [Rewriting Result] "User can view user’s profile." 5.5.3 SentenceStructureMatcher This component matches those rewritten functional requirements into predefined sentence structures which comprises: • Variables, the placeholders representing potential words to be captured from a sentence. • TDs list, specifies key relationships of variables. • POS-tags list, specifies the part of speech of variables. • Keyword, specifies a specific word to be captured in a sentence. A reserved field, allows this com- ponent to process user requirements written in a formal format more accurately. The predefined sentence structures are a set of sentence patterns that the approach can capture and transform. It should be comprehensive enough to include all simple sentence patterns and a few complex sentence patterns regarding requirement writing. The sentence structures used in the proposed tool are initially derived from Hornby’s 25 verb patterns [59] in which a simple sentence can be written. After learning the corpus of early user stories from 59 projects, 6 simple patterns are deprecated and 7 complex patterns are newly added. The final definition of the proposed sentence structures is presented in Appendix A. A sentence is allowed to be matched to multiple sentence structures to address different parts. Given "Usercancommandthemotortostart." Table 5.5 shows two structures are matched where "SVObj" captures the relation of "user-command-motor" and "VOToInf" captures the relation of "motor-start". A structure matching result may overlap with one another. For example, in Table 5.6, "SVObj" matches a sentence where a subject comes first, a verb comes second, and an object comes thirdly. It captures more 44 Table 5.5: An Example of Multiple Matching Structure Notes TDs POS-tags Example SVObj Subject-Verb- nsubj*(A,B), B==NN*,C==NN*, User cancommand Object obj(A,C) A==VB* themotor to start. VOToInf Verb-Object- obj(A,B), B==NN*, C==VB* User can command To-Infinitive advcl:to(A,C) themotor tostart. comprehensive domain relation (i.e. "poster-create-post") from the first example sentence. "SV" matches a sentence where a verb follows a subject. It captures "poster-create" only, thus "post" is missing from the relation. The less comprehensive capturing adds redundant information into the application domain. However, "SV" is more applicable on capturing relation from the second example sentence (i.e. "customer- purchase"). In order to decrease the redundancy, a topological order of sentence structures is considered during the processing. It is explicit that "SVObj" is a supplement structure to "SV". In this case, "SVObj" has a higher priority to be matched. Then a matched sentence will not be examined by "SV". Table 5.6: An Example of Overlapping Matching Structure Notes TDs POS-tags Example SVObj Subject-Verb- nsubj*(A,B), B==NN*,C==NN*, Poster cancreatepost. Object obj(A,C) A==VB* Customer can purchase online.× SV Subject-Verb nsubj*(A,B) A==VB*, B==NN* Poster cancreate post. Customer canpurchase online. 5.5.4 RuleTransformer This component performs a sequential procedure that extracts domain elements from sentences based on identified sentence structures and results from the preprocessing step. Given the value of variables obtained from a sentence structure, this component takes the following 4 actions. Action 3 is the key action. Because information is usually embedded at different places in different sentences, it is extracted from the semantic relationships between variables obtained via the corresponding 45 TDs list. Results obtained from the POS-tags list and the preprocessing step contribute to the remaining actions. • Action 1: Identify Domain Entities. If a variable appears to be a noun (NN) or a proper noun (NNP), it will be appended to the Domain Entity list. • Action 2: Identify Entity Attributes. An Entity Attribute can be identified from a user story containing possessive structure (e.g. "user’s profile" ). An Attribute is also considered as a Domain Entity because it is a noun term. • Action 3: Identify Domain Behaviors. A Domain Behavior comprises a Behavior Actor, a Be- havior, and a Behavior Target (optional, sometimes a Behavior has no target). The Behavior will be appended to the Domain Behavior list, with mapping to its Actor and Target. • Action4: IdentifyDomainRelationships. An Domain Relationship comprises 1) a general rela- tionship from which a user story containing preposition structure (e.g. "purchase with card", "send emailtouser") or 2) a generalization relationship from which a user story containing predicative verb (e.g. "admin is a user"). Different structures will be marked correspondingly because they indicates different types of relationships. This component currently applies 32 transformation rules which address the mentioned actions above. These transformation rules are further presented in Appendix B. 5.5.5 Visualizer This component uses PlantWeb 2 library to render a UML Class diagram from the transformation result. A straightforward mapping is first conducted, that is, • Create UML Classes from Domain Entity list, including corresponding Entity Attributes. 2 PlantWeb: https://plantweb.readthedocs.io/ 46 • Add Functions to corresponding Class based on Domain Behavior list. • Create UML Associations from Domain Relationship list. Additional modifications to the UML stereotype are conducted to bridge the gap between a conceptual model and a design Class diagram: • A Class which represents a Role in the original user story (See Section 5.2) will be modified as an Actor Class. • An Association will be modified as a Generalization Association based on its mark. • Since an Entity Attribute is also considered as a Domain Entity, some extra UML Classes will be created. If one of this kind of Class has its own attributes or functions (which denotes this object is a complex attribute, a.k.a. nested domain object), it will be removed from its host Class’s attribute list. Instead, an Aggregation Association will be created between this Class and its host Class. Otherwise, this Class will be removed from the Class list. 5.6 Evaluation This section shows an experiment for comparing the domain model generated by the proposed tool with those produced by software developers. The experiment and evaluation try to answer the following ques- tions: • How effective is the proposed tool in improving knowledge sharing? • How effective is the proposed tool in eliminating risks caused by personnel factors? • Are there any constraints hindering it? 47 5.6.1 RequiredSkillsandProcess The participants of this experiment are 48 students selected from a master-level software engineering class. According to collected resumes, all participants have relevant industrial experience in individual development or team development. Relevant lectures, such as Agile Development, Software Requirements and Object-oriented Analysis and Design had been given in the class before the experiment. The data are collected from 13 software projects with real clients from various business domains. Each participant was involved in one of those project teams, which had well-guided client interactions and produced early user stories (~20 sentences per project) as the input to this evaluation process. To initialize the experiment, a step-by-step instruction ofDomainModelingActivity was given to clarify what was expected from the participants. Meanwhile, the proposed tool was deployed as a service, pro- viding a user interface. It takes early user stories as the input, then produces a domain model (represented in a UML Class diagram) as the output. During the experiment, each participant was asked to individually identify and produce an initial do- main model from early user stories. Participants were given enough time to finish this work and were not allowed to discuss with the others. These domain models were collected and compared to those generated by the proposed tool. Participants were then asked to use the proposed tool to generate the domain model, synchronize their work with the other participants in the same team to better digest the domain concepts, and collaboratively update the domain model. Finally, a questionnaire was given to collect their feedback. 5.6.2 EvaluationMethods For evaluating of effectiveness of eliminating risks caused by personnel factors, the evaluation uses four quality metrics to assess the domain model generated by the proposed tool and those produced by software developers. The analysis procedure is based on the experiments presented in [111, 124]. 48 AverageEntityCorrectness(AvgEC) Entity Correctness (EC) =(E Attributes +E Behaviors +E Significance +E Name +E Type )/5. It computes the proportional correctness of an presented entity. AvgEC computes the average ofEC for all the entities in the produced domain model. • Correctness of Entity Attributes (E Attributes ) = # of correctly identified attributes / Total # of at- tributes that should be identified, if the Total # of attributes that should be identified > 0. Otherwise, 0. • Correctness of Entity Behaviors (E Behaviors ) = # of correctly identified behaviors / Total # of behav- iors that should be extracted, if the Total # of behaviors that should be identified > 0. Otherwise, 0. • Entity Significance ( E Significance ) = 1, if an identified entity presents a meaningful and significant concept of the application domain. Otherwise, 0. • Correctness of Entity Name (E Name ) = 1, if the entity name in the domain model is correct (i.e. should be a noun and is consistent with what is mentioned in user stories). Otherwise, 0. • Correctness of Entity Stereotype (E Type ) = 1, if correct stereotype is assigned to an entity in the domain model. Otherwise, 0. AverageRelationCorrectness(AvgRC) Relation Correctness (RC) =(R Endpoint1 +R Endpoint2 +R Navigability +R Significance +R Message + R Type )/6. It computes the proportion of the correctness of a presented relation. AvgRC computes the average of EC for all the relations in the produced domain model. • Correctness of Endpoint1 (R Endpoint1 ) = 1, if a relation correctly connects the first entity. Otherwise, 0. 49 • Correctness of Endpoint2 (R Endpoint2 ) = 1, if a relation correctly connects the second entity. Other- wise, 0. • Correctness of Navigability (R Navigability ) = 1, if the navigability of a relation is correct (especially when a generalization or an aggregation relation is presented). Otherwise, 0. • Relation Significance ( R Significance ) = 1, if an identified relation presents a meaningful and signifi- cant relationship of the application domain. Otherwise, 0. • Correctness of Relation Message (R Message ) = 1, if the message assigned to a relation is correct (i.e. usually should be a verb and is consistent with what is mentioned in user stories). Otherwise, 0. • Correctness of Relation Stereotype (R Type ) = 1, if correct stereotype is assigned to a relation in the domain model. Otherwise, 0. DomainCompleteness(DC) Domain Completeness (DC) =(C Entity +C Behavior +C Relation )/3. It measures whether significant domain elements are identified from user stories. In other words, if a domain element is non-significant, or a significant domain element is missing, it decreases the value of DC. • Entity Completeness (C Entity ) = # of identified significant entities / Total # of significant entities that should be identified. • Behavior Completeness (C Behavior ) = # of identified significant behaviors / Total # of significant behaviors that should be identified. • Relation Completeness (C Relation ) = # of identified significant relations / Total # of significant rela- tions that should be identified. 50 DomainRedundancy(DR) Domain Redundancy (DR) = (R Entity + R Behavior + R Relation )/3. Domain Redundancy measures whether redundant domain elements are identified. Different from those three metrics above, a lower value is considered better for DC. • Entity Redundancy (R Entity ) = # of identified redundant entities / Total # of identified entities in the domain model. Redundant entities include 1) Those that should not be identified as entities. 2) An single isolated entity which does not have any relation with other entities. • Behavior Redundancy (R Behavior ) = # of identified redundant behaviors / Total # of identified be- haviors in the domain model. Redundant behaviors include those behaviors which are not really considered desired functions in the application domain. • Relation Redundancy (R Relation ) = # of identified redundant relations / Total # of identified relations in the domain model. Redundant relations include those relations which are incorrectly assigned to two entities. For evaluating the effectiveness of knowledge sharing improvement. A questionnaire (Table 5.7) is given after the experiment. Table 5.7: Summary of Questions and Metrics in the Questionnaire Question Metrics (Q1) How do you familiar with domain modeling (M1) Degree of familiarity of participants. and constructing domain models? (Q2) How useful do you think the early domain (M2) Impression for usefulness of identification is for your team? early domain identification. (Q3) Did the tool provide insights to the tasks of (M3) Impression for usefulness of the domain identification and concepts synchronization? proposed tool. 51 5.6.3 AnalysisandResults To compare those domain models collected from participants’ manual work with those generated by the proposed tool. Table 5.8 shows the data set of avgEC, avgRC, DC and DR calculated for the proposed approach and each individual participant. Figure 5.3 shows the comparison results. It can be inferred that in different projects, the proposed tool generally outperforms the participants. Manual Work Proposed Tool 0.6 0.7 0.8 0.9 1 Average Entity Correctness (avgEC) Manual Work Proposed Tool 0.6 0.7 0.8 0.9 1 Average Relation Correctness (avgRC) Manual Work Proposed Tool 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Domain Completeness (DC) Manual Work Proposed Tool 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Domain Redundancy (DR) Figure 5.3: Box Plots for the avgEC, avgRC, DC and DR of the Identified Domain Models 52 Table 5.8: avgEC, avgRC, DC and DR of the Identified Domain Models Project# Approach avgEC avgRC DC DR 1 Proposed Tool 0.9 0.98 1 0.12 Manual Work 0.85 0.8 0.48 0.17 0.76 0.83 0.38 0.13 0.83 0.98 0.57 0.1 0.69 0.68 0.45 0.21 2 Proposed Tool 0.91 0.97 0.81 0.18 Manual Work 0.81 0.83 0.57 0 0.87 0.72 0.62 0.13 0.79 0.77 0.72 0.57 0.65 0.65 0.42 0.24 3 Proposed Tool 0.93 0.95 0.99 0.07 Manual Work 0.87 0.99 0.51 0.09 0.77 0.75 0.43 0.23 0.84 0.8 0.45 0.1 0.92 0.92 0.69 0.17 4 Proposed Tool 0.93 0.98 0.96 0.12 Manual Work 0.89 0.93 0.73 0.23 0.8 0.75 0.63 0.23 0.71 1 0.08 0.33 5 Proposed Tool 0.95 0.98 0.95 0.12 Manual Work 0.9 0.81 0.84 0.19 0.88 0.81 0.42 0.21 0.89 1 0.6 0.33 6 Proposed Tool 0.94 1 0.78 0.06 Manual Work 0.72 0.71 0.39 0.27 0.83 0.78 0.27 0.08 7 Proposed Tool 0.95 0.99 1 0.04 Manual Work 0.9 0.98 0.45 0.07 0.81 0.76 0.53 0.11 0.93 0.92 0.63 0.19 8 Proposed Tool 0.97 0.99 0.91 0.06 Manual Work 0.9 0.82 0.78 0.07 0.89 0.78 0.68 0.15 0.84 0.79 0.6 0.15 0.86 0.78 0.59 0.18 0.85 0.95 0.42 0.15 0.85 0.98 0.44 0.07 9 Proposed Tool 0.98 1 0.84 0 Manual Work 0.94 0.97 1 0.12 0.87 0.84 0.68 0.32 0.93 0.94 0.81 0.25 0.88 0.84 0.74 0.47 0.88 0.91 0.46 0.3 10 Proposed Tool 0.97 1 0.91 0 Manual Work 0.89 0.93 0.61 0.06 11 Proposed Tool 0.97 0.97 1 0.1 Manual Work 0.87 1 0.53 0.33 0.79 0.79 0.77 0.3 0.86 0.92 0.68 0.22 0.84 0.86 0.5 0.12 0.87 0.98 0.69 0.15 12 Proposed Tool 0.99 1 1 0 Manual Work 0.93 0.96 0.88 0.14 0.91 0.93 0.63 0.12 0.92 0.92 0.58 0.32 13 Proposed Tool 0.95 0.98 0.98 0.11 Manual Work 0.76 0.64 0.36 0.09 0.89 1 0.76 0 0.85 0.97 1 0.11 0.88 0.69 0.67 0 0.71 0.8 0.4 0.25 53 Figure 5.4: Degree of Familiarity of Participants To summarize the questionnaire results and feedback from participants, Figure 5.4 shows the distribu- tion of participants’ familiarity of domain modeling skills: • Novice, has no knowledge about domain modeling and domain model construction. • Beginner, has basic no knowledge, but no practical experience. • Competent, has practical experience, but not often used this skill in software development. • Proficient, often applies this skill in software development. Figure 5.5 shows participants’ impression for usefulness of early domain identification. After perform- ing this activity, most participants think it helps to make an overview of the application domain clearer at the early stages. The discussion and synchronization based on early domain models help to address do- main objects which may bring confusion at a later stage. Those vague domain concepts, such as semantic overload (a word or phrase has more than one meaning) and shared concepts (different words or phrases share the same meaning) can be resolved immediately after discussion with a visualized model. Figure 5.6 shows participants’ impressions of the usefulness of the proposed tool. Most participants can easily follow the domain identification task sequence at the early stages with this tool. It helps extract information from user stories better, and verify that the final domain model meets those mentioned in the requirement description. However, the Novice category has less satisfaction with this tool. This is mainly 54 Figure 5.5: Impression for Usefulness of Early Domain Identification Figure 5.6: Impression for Usefulness of the Proposed Tool 55 because the current tool may extract a few extraneous entities, or fail to capture a few relations from the user stories. Without the basic domain modeling skill, they suffered a learning curve to fix the generated domain models. 5.7 Conclusion It can be concluded that domain identification and modeling are feasible at the very early stages. It helps push stakeholders to synchronize domain knowledge and matches the domain model to the requirement descriptions. Manually identifying domain models produces more incorrect results. The proposed tool eliminates most mistakes by automatically identifying domain models from user stories with less time spent on this task. According to the evaluation data set and participants’ feedback, a few extraneously identified or missing domain elements still bothers. Hence basic domain modeling knowledge is still re- quired to fix those incorrect domain elements. Further definition of more expressive sentence structures and transformation rules could be considered to address this concern. 56 Chapter6 EarlyMicroservicesGeneration With application domain models identified at the early stages, a live sandbox system can be generated for rapid prototyping and converging viewpoints from different stakeholders. In a loosely coupled microser- vice implementation hierarchy, the challenges of code generation lie in the tool’s flexibility against team diversity, and its sustainability in the later stages. This chapter proposes a microservices generation tool that generates back-end agnostic and re-targetable code at the early stages. 6.1 TechnicalStrategies 6.1.1 IntermediateStructure This generation process requires an intermediate structure that unifies the formats from different sources, providing a persisted view for the stakeholders, along with the traceability for potential analysis. This structure can be described in any textual format because conversion among these formats is straightfor- ward. In the proposed approach, a JSON structure is chosen. To make the discussion more concrete, the metamodels of two sample sources and their intermediate structure are presented. Figure 6.1 shows a simplified metamodel of UML class model. A Class may contain many Attributes and Operations. A Class also may have multiple Associations to the other Classes. An Association can have 57 Figure 6.1: Simple UML Class Metamodel Figure 6.2: Dictionary Metamodel three different types (e.g. composition, aggregation and generalization). The abstract Classifier derives Primitive Data Types known as simple types and Classes known as complex types. UML class metamodel gives a comprehensive and unified description of domain model. In the real world, we cannot always guarantee a comprehensive and consistent representation while conducting code generation. For example, in one of the case studies, a different representation was ex- ported from a legacy data extraction tool. Figure. 6.2 shows its metamodel. In the space of Dictionaries, it uses Dictionary as the first level to present the concept of domain element. Each Dictionary refers to an Object which comprises multiple Tags. A Tag can also refers to an Object to present a nested Object. With the above concern, the first step of generation aims at the following goals: 58 Figure 6.3: Intermediate Structure Metamodel • It maps different representations of domain models to a persisted view and provides query functions for the subsequent generation process. • It interprets different definitions of elements into a set of consistent terms. • It assigns default values if some information is missing from the source. The metamodel in Figure. 6.3 describes the intermediate structure. The transformation needs to achieve the following mappings. • Domain mapping. For example, every Package in Fig. 6.1 should be mapped to the Domain in Fig. 6.3, as well as every Dictionary in Fig. 6.2. • Entity mapping. For example, every Class in Fig. 6.1 should be mapped to the Entity in Fig. 6.3, as well as every Dictionary in Fig. 6.2. • Attribute mapping. For example, every Attribute in Fig. 6.1 should be mapped to the Attribute in Fig. 6.3, as well as every Tag in Fig. 6.2. • Relation and Behavior mappings, if there are any, need to be processed in the same way. 59 6.1.2 TypeSystem The representation of an application domain model may not always be comprehensive due to the loose restriction of its type system. For example, in Figure 6.2, a model contain many attributes with unknown types such as"Dollar" and"Money". At the early stages, we are unsure about the difference between these two types and whether they are simple types (e.g. a floating number) or complex types (e.g. nested object). In order to address this concern and generate the code, a type mapping should be further created between the source and the intermediate structure. Due to the uncertainty of application domain at the early stages, a default type (e.g. string) can be assigned to any unknown domain attributes. Detailed type mappings (e.g. "Dollar-to-Object" and "Money- to-Double" mappings) can be specified once this uncertainty is eliminated in later stages and represented in a textual format. The following XML schema shows an example of type mapping definition. <xsl:template name="type_converter"> <xsl:param name="type" as="xs:string" /> <xsl:choose> <xsl:when test="$type=’Money’">object</xsl:when> <xsl:when test="$type=’Dollar’">double</xsl:when> <xsl:otherwise>string</xsl:otherwise> </xsl:choose> </xsl:template> 6.1.3 GenerationofRESTfulAPIs On the server side, endpoints can be generated as RESTful APIs from the domain element in the corre- sponding intermediate structure. For example, given a simple sub-model of the use case "user management" (Figure. 6.4) as the input, the output produces a list of endpoints that comprises basic CRUD operations and 60 Figure 6.4: Example of RESTful APIs Generation behavior signatures. The former helps a developer fully utilize a sandbox environment, while the latter targets the functional requirements of the desired system. 6.1.3.1 APISpecifications Services communicate via HTTP protocol. The usage of HTTP verbs and status codes should be fixed with HTTP specification[1]. In addition, a set of expressive APIs should provide useful features such as handling authentication, paging, and hypermedia to cover common usage scenarios. 6.1.3.2 Two-levelAPIConstraint In order to support continuous microservices development, additional API constraints are proposed in terms of the development authority. Each generated microservice instance should have two-level API constraint: • The Application-level REST (such as register and update in Figure. 6.4) addresses functionalities required by desired application. For the concern of data security, an access token is generated along with the sandbox environment. Application REST can only be accessible by querying with that token. 61 Figure 6.5: Example of Database Schema Generation • The Development-level REST (such as essential CRUD functions) is mainly used in a development or testing environment. Some Development REST may be switched to Application REST at a later stage if it realizes a desired functionality. By default, all RESTful APIs are initialized as Development REST. An implementation team is allowed to switch each endpoint between the Application level and Development level. This constraint functions as a gatekeeper between the prototyping/development and production environment. 6.1.4 GenerationofDatabaseSchema On the server side, a common data definition can be generated from the intermediate structure. More specifically, a domain element can be mapped to a data entity. And the properties of an entity can be obtained from the attributes of a domain element. This mapping allows the intermediate structure to be further transformed into data definition language (DDL) for different database creations (Figure. 6.5). 62 6.1.4.1 SchemaTransformationRulesforMultiplicity In Fig. 6.5,Entitybad_driver_report aggregates inEntityaccount with many-to-one multiplicity. According to this Relation, a new Attribute is added to Entity bad_driver_report with the reference to Entity account. This transformation rule conforms to the definition of UML class model and database normalization rules [36]. Similarly, as to many-to-many multiplicityRelation, a newEntity is introduced with reference to both existing Entities in a relational database. In a document-oriented database, both existing Entities have an Attribute referring to each other. 6.1.4.2 Three-levelDataSchemaConstraint In order to deal with the process from uncertainty to certainty, a three-level data schema constraint is proposed. While we conduct rapid and continuous prototyping from the very beginning of a project, the data schema frequently changes because each independent team is working on the partial system and may re- quire different attributes from the database. The first level of constraint allows developers to store anything in the database collection without validation. In the middle stage of a project, when the sub-domain or the partial systems are almost finalized, the second level of constraint validates the fixed data model but still allows new properties to be added. While transiting a project into production, the third level of constraint validates the fixed data model. The first and second level constraints can be realized by a NoSQL instance with loose and moderate restriction on its schema definition. The implementation team can consider using a NoSQL database with a strict restriction or re-target to a relational database for the concern of performance, consistency, etc. At any stage of development, it is allowed to switch the constraints and then immediately regenerate the data schema definition. A template-based approach (mentioned in Section 6.2) makes this possible. 63 6.1.5 GenerationofAPIReferenceandConfigurationFiles After a service instance is successfully launched, it requires an API reference to be neatly presented so that each implementation team can quickly comprehend and utilize the running instance. In a microservices hierarchy, APIs should be exposed to other teams to reduce communication costs and coding effort. Hence the API reference generation should not target to a simple web page or document only. It should target to a format that can be reproduced and visualized, and can be imported and integrated into the source code of the other microservice instances. Last but not least, to streamline the building pipeline, configuration files need to be generated for the deployment of multiple microservice instances. These files are used to set up the networking, data storage, service registration and discovery. 6.1.6 ReusableFeatures It is noticed that many projects have common requirements for some functionalities. For example, a user- intensive system often requires an email module to send notifications; a crowdsourcing system usually expects a module to process large files. Additionally, projects always have nonfunctional requirements for server security and observability. It always costs a lot of effort to realize these functionalities from scratch. To address the above concern, these requirements are identified as reusable use cases, and their func- tional code can be predefined as featured packages. These packages can be added to a generated service and be further configured in a development iteration. 6.1.7 ConsistencyAnalysis Since sub-domains are worked in parallel by different teams, they are expected to be integrated at the end of each development iteration. The purpose of integration lies in better management of domain knowledge 64 for subsequent development. The proposed approach conducts the following basic analysis against the intermediate structure: • Name matching. All existing domains and domain entities are supposed to find their names in the domain glossary, which is extracted from user stories. If any entity is missing, domain knowledge may be incomplete. • Shared entity identification. If an entity is found in different sub-domains, it is highly possible that these sub-domains require the same data. The ownership of this entity needs to be further discussed. • Unknown type identification. If a domain entity’s type is neither a primitive type nor defined in the type system, this entity needs to be revisited. The analysis script uses Stanford CoreNLP [78] to extract the domain glossary by detecting common nouns, proper nouns and compound nouns from user stories. Then a string matching is executed between the domain glossary and the intermediate structure. The result is provided to implementation teams for further insights. A further update can be reflected on either user stories or domain models. In this way, the consistency between domain knowledge and implementation is guaranteed. 6.2 ProposedServiceGenerationTool The system components of the current implementation are shown in Figure. 6.6. The core system has four main components. It is built as a cloud service with a simple interactive interface for infrastructure management. The system’s input can be a domain model constructed and uploaded by the implementation team. The output is a microservices instance ready for use. ModelParser transforms the domain model into an intermediate structure. It maps any domain presen- tation to a uniform and concrete representation which is used for the subsequent code generation process. A standard Model Parser is created for processing domain models in XMI format (e.g. the OMG standard 65 Figure 6.6: Overview of the Code Generator Components XML metadata interchange format), which can be exported by domain modelers such as Enterprise Archi- tect 1 (EA) and Visual Paradigm 2 (VP). In addition, an XSLT[32] parser is built for processing the input in other formats. CodeGenerator applies template-based technique for code generation. By combining intermediate struc- ture with predefined code templates, a complete server-side script can be generated without too much effort. It happens early in a project, enabling developers to work in parallel with live services immediately. On the server side, it generates DB schema and basic CRUD functions for data manipulations. It gener- ates RESTful APIs for data accessing and specific behaviors of desired functionalities. It generates API references based on OpenAPI, which allows the current service exposes its interfaces to the other services. This format of API references can also be rendered as a web page for developers’ better understanding of the usage. On the client side, it generates SDK, which wraps the remote procedure calls and provides functional interfaces to the implementation team. 1 http://www.sparxsystems.com/products/ea/ 2 https://www.visual-paradigm.com/ 66 Service Deployer sets up a virtual environment on a container and deploys the generated scripts as a microservice instance. Authenticator takes care of the API authority. An implementation team can have full access to the re- sources of their sandbox service with generated authentication tokens. 6.2.1 CodeTemplates A code template can be thought as a skeleton with holes that are dynamically filled during the runtime of generation. They can be considered as a degenerate form of generation rules[39]. Some static code can also be a part of the template if they are reused in many common scenarios. An expressive template can be reused by many projects and re-target to different infrastructure settings. The code templates for microservices infrastructure are considered as the skeletons for full stack development, which target both the front-end and back-end, however, without the attention to the development of user interface. After several revisions based on the ideas above, the current code templates consist of the following skeletal classes: Databaseutilizationandaccessskeleton: - DB Adapter bridges common database schema and a target database system. It provides basic ma- nipulation functions (e.g. Create, Read, Update, Delete, etc.) on the target database system. The current skeletons provide full support to a NoSQL database and a relational database. It covers the development needs for the case studies, and proves the re-targetability of the proposed framework. To support more database systems, the Object-Relational Mapping (ORM) technique can be applied. -Databaseconfiguration is the template for initializing the connection and user privileges to the target database system. It also helps to achieve the goal of the three-level database constraints as mentioned in Section 6.1.4.2. 67 Server-sideserviceregistrationskeleton: - Service configuration reserves required variables for setting up the back-end environment. They will be automatically synthesized when a service is launched. - Entity Class is used for constructing the registered entities, including their deployed endpoints, data schema and data access objects. - Behavior Mapper connects endpoints to those functions for achieving a specific functional require- ment. Usually, a functional requirement can be mapped to one or multiple behaviors of registered entities. This template aims to output the behavior signature and hand over the behavior body to programmers for its business logic. -APIReference is the template which conforms to the format of OpenAPI Specification (OAS) [87]. It is the API exposure solution for Section 6.1.5 which can be either rendered as an API document or imported by the source code from the other services. -Authenticator communicates with the authentication module of the central system for verifying user authority. It helps to achieve the goal of two-level API constraints as mentioned in Section 6.1.3.2. Client-sideserviceaccessskeleton: - Adapter is used for wrapping remote procedure calls from any client-side to a server-side instance. It is a piece of static code. - Model Controller directs procedure calls to Adapter. It also defines the interfaces of entity classes, along with their data validators. - Configuration File contains a descriptive script for automated integration with the other parts of the services. 68 Figure 6.7: Infrastructure for Continuous Deployment With the above skeletons, the current framework contains templates for generation targeting to Ex- press as the server-side framework, MongoDB and MySQL as the candidate database systems. Client-side SDK targeting to Java, Swift and JavaScript, enables either web-based or mobile development. 6.2.2 EnablingContinuousDeployment The infrastructure has been viewed as an indispensable service that includes not only building and running environment but also a toolset for continuous deployment. It assists a developer in creating or updating a microservice instance in a flash way and allows the implementation team to test or deliver what they build at will. The COTS employed in the deployment pipeline are shown in Table 6.1. This setting allows exe- cutable code to be automatically deployed into a sandbox instance running upon containers. Additionally, it satisfies the basic demands of version control, build integration and load balancing. The infrastructure is deployed on Amazon EC2, providing a set of deployment APIs and a web interface for service management purposes. As a supplement, Figure. 6.7 shows the deployment workflow. 69 Table 6.1: COTS Enables Continuous Deployment Usage Tool Code Repository Gogs - https://gogs.io/ Code Integration Jenkins - https://jenkins.io/ Instance Container Docker - https://www.docker.com/ Service Discovery Consul - https://www.consul.io/ Infrastructure Host AWS EC2 - https://aws.amazon.com/ec2/ 6.3 CaseStudies Three master-level student projects and one industrial research project have been carried out as the case study instances with a focus on the following questions: • How does the proposed tool fit into domain modeling activity in microservices development? • To what degree does it contribute to risk mitigation? • How does it perform regarding effectiveness and efficiency in continuous development? • Are there any constraints hindering it in the real development environment? A tradeoff between the level of control and degree of realism is thoughtfully made. The student projects are less complex but more deterministic. Hence it can simulate an environment with a controllable level of uncertainty to evaluate the efficiency of the proposed approach. Project BDR is used to explain how the proposed tool is applied in domain modeling activity, wherein the rate of personnel turnover reaches 86% on average during its 6 iterations. Similarly, project TIKI is the latest project instance with more compli- cated domain concepts. In project PicShare, a control experiment is conducted and analyzed from collected issue tickets and effort from two independent implementation teams. Effectiveness was investigated when the project size was increased in an industrial environment in project MGD, wherein 160+ entities were presented in its initial domain models. Some points of improvement and valuable feedback were collected while studying on these projects. 70 6.3.1 RequiredInfrastructures,SkillsandProcess In student projects, implementation teams were formed based on the proficiency of software development skills. Specifically, a development position was assigned to a developer with relevant experience. All selected teams had a short training session on the process model before launching project instances. All developers were supposed to have basic knowledge about UML so that they were able to accomplish the domain modeling activity. In order to track the effort, developers were asked to create Jira tickets. Effort was recorded in the corresponding work log noted with person-hours. Tickets creation or update followed a well-defined workflow and labelling rules, in which Dev.(development), Doc.(documentation) andOthers(team activity) are the first-level categories, each with its sub-categories defined as well. Weekly/bi-weekly surveys were enforced to guarantee everyone contributed reasonably in each development iteration. A parallel agile process [96] was applied in all case studies. This process is built upon the combination of use-case-driven and agile approaches. In a development iteration, each implementation group works on an independent use case; namely, each group works on a sub-domain model and updates relevant entities with partial information specified. Hence within a development iteration, several sub-models are developed concurrently. The current framework was deployed on Amazon Web Service Cloud (AWS), providing a simple inter- face for developers to upload domain models and manage microservices. 6.3.2 TheFirstCaseStudy-ProjectBDR The first case study, project BDR, was to develop a DUI (Driving Under the Influence) reporting system, which uses crowdsourcing to gather and report drunk driving videos to law enforcement and insurance companies. It was launched in an academic environment in 2016 and then turned into production in 2018. This case study aims to observe how the proposed approach and tool perform in domain modeling 71 Figure 6.8: Domain Model Evolves from Uncertainty to Certainty in Project BDR 72 activity. The author was a part of the implementation team in its first two iterations, providing training and technical support. For the rest of iterations, the author only gave a one-hour training session during the launching meeting. In each iteration, the project owner collected sub-domain changes from developers and incrementally updated the overall design architecture. Activity and effort data were collected bi-weekly. At the first meeting, initial user stories were identified after interacting with the project owner: • As a Mobile App user, I can upload Video and its Metadata to Report Poster’s Account. • AsaReportPoster,IcanaddReportDescriptionandposttheUnreviewedBadDriverReporttoaReview Queue. • As a Report Reviewer, I can review Unreviewed Bad Driver Reports from the Review Queue. • As a Report Reviewer, I can verify a Reviewed Bad Driver Report in the Bad Driver Database. • As a Report Consumer, I can query by license plate number and can see Reviewed Bad Driver Reports. Figure. 6.8(a) shows a conceptual model of this domain. It presents a high-level description of the use case for managing drunk driver reports. This model only contains entities and actors identified from the given user stories. When this model was initialized, its accuracy could not be confirmed unless being verified by product owners. However, the proposed approach allowed a back-end service to be immediately set up at this point. This happened before any attributes had been discovered, and even before the use cases had been modeled. In order to eliminate the uncertainty, the use cases were further decomposed, and each developer con- ducted iterative prototyping on the sandbox instances in parallel. The second model (Figure. 6.8(b)) shows a state of transition afterward. Some attributes were added to the entities, with associations redefined. This model had been iterated several times, with more domain information being discussed and incrementally identified by different developers. Then a new service baseline was generated in the second week after the project started. 73 The service then was further prototyped with entities reorganized (e.g. merge, decompose, etc.). New attributes and relationships were identified to satisfy implementation needs and client expectations. The last model (Figure. 6.8(c)) shows its revision in the subsequent iteration. Attributes inside entities were fur- ther identified, with associations among entities updated. This domain model was modified by a different implementation team with 89% personnel turnover from the previous iteration. The proposed approach made it easier for new developers to understand and modify the old system. The personnel turnover rate varied from 60% to 133% and reached 86% on average during its 6 iterations, with a cumulative amount of 69 developers involved in this project. The proposed approach enabled the microservices to be generated immediately once a domain model or a sub-model was updated. It took 4 person-hours on average for a new developer to understand the problem domain and the infrastructure before s/he could start working on the source code with a live sandbox instance. The service generation tool has been applied along with domain modeling activity in each iteration when old sub-domains were updated or new use cases were identified. The result so far shows a 30% savings of total effort compared with the estimated effort. 6.3.3 TheSecondCaseStudy-ProjectTIKI The second case study, project TIKI, observes the microservices development of a mobile game. This project contains many domain-specific terminologies, specifically for its location-based features and inventory system. The author held a one-hour training session at the beginning of each iteration, and observed the team activities by accessing their development artifacts and conducting bi-weekly surveys over a period of 4 iterations. In the first two iterations, the implementation team conducted rapid prototyping against its operational workflow. In front-end development, most effort was put into constructing 3D graphics. The proposed approach was provided to build the back-end so that front-end developers were able to prototype the 74 game with live services. The combination of NoSQL database (flexible and evolvable schema) and RESTful APIs (accessible from the mobile app) allowed multiple developers to verify and cooperate with their idea in "sandbox" mode. It took 50 person-hours for 2 front-end developers to release the first version of the mobile app, which connects to compatible back-end services. The project owner decided to accelerate the development of use cases by adding more developers. This situation happened regularly at the middle stage of this project. Complex use cases were split into simple ones and allocated to development staffs. Some use cases were finished early, while the others ran late. Then additional developers were added to those running late. Speeding the progress by conducting a parallel development on a per-use-case-per-developer basis is not always easy. One important precondition is that use cases have to be decomposed smartly by someone with rich architectural insight. It raises the discussion in Section 6.4. Another precondition is that the additional developers assigned to help the late developers must have relevant development experience and quickly acquire the relevant domain knowledge. The proposed framework alleviated this pressure by providing consistent user stories and domain models to these additional developers and enabling them to start working based on the generated code after a short time. The observation denotes that developers who switched to a new sub-domain had an average of 2.5 person-hours before manipulating their sandbox services. It includes the time spent on domain modeling. Additional staffs were able to work on existing services after a 0.5-hour discussion and re-baselining. 6.3.4 TheThirdCaseStudy-ProjectPicShare The third case study was the development of a content sharing project. It aims to evaluate the efficiency and effectiveness of the proposed approach and tool. A control experiment was conducted where a six- developer team (PSRA) was taught to start prototyping with the proposed framework at the early stages. A comparison team (PSAA) followed a traditional agile method. Both teams were given the same functional 75 Figure 6.9: Effort Distribution in Project PicShare Figure 6.10: Ticket Distribution in Project PicShare requirements and worked with the same client, and each team was required to develop a fully functional system. The author monitored these two teams by accessing their development artifacts and participating in the monthly artifacts review boards with the same client in a period of two iterations, each with three months. In order to observe and compare their activity and issues, both teams were asked to apply the same ticket posting rules. In the first iteration, the experimental group (PSRA) posted more tickets (Figure. 6.10) and spent more effort (Figure. 6.9). As the project moved to its second iteration, the experimental group (PSRA) had fewer issues and less time spent on rebaselining and knowledge synchronizing among team members. In contrast, the control group (PSAA) generates more tickets and effort in the same period. 76 Table 6.2: Effort and Percentage of Issues on domain modeling in Project PicShare Project Early Effort Total Effort Dev. Doc. Others (man hours) (man hours) (tickets%) (tickets%) (tickets%) PSAA 148 1120 76.43 3.18 20.38 PSRA 83.25 772 36.11 36.11 27.78 Team PSRA did domain modeling and iterative prototyping as early as possible so that issues could be solved in a relatively early stage. In the later stage, issues were relatively fewer because domain knowledge had been well constructed and formatted, and service generation accelerated the development process. The total number of tickets and effort are also shown in Table 6.2. Compared with PSAA, PSRA saved 44% effort at the early stages and saved 30% total effort in these two development iterations. 6.3.5 TheFourthCaseStudy-ProjectMGD In this case study, project MGD, the client was from an industrial internet service provider (ISP). They wanted to upgrade their infrastructure from the legacy technical stack. The existing problem domain contained 160+ entities which could be categorized into 5+ sub-domains. The initial goal was to verify its feasibility for refactoring. The author devoted 20 person-hours per week over a period of three months to this project, working with the client’s team and collecting feedback. In this case study, domain models were represented by a specific XML structure (Figure. 6.2) generated by the client’s legacy system. Hence instead of using a standard UML parser, a specific mapping was created for this task. The main effort was spent on mapping from the input model to the intermediate structure and the reconstruction of the type system. It took 80 person-hours to extract the information from the given models and generate microservices for new infrastructure. The service generation tool helped solve entity duplication between sub-domains during this project lifecycle. When shared entities appear in two sub-domains, the proposed activity triggered a discussion 77 by prompting an issue to responsible teams. For example, when a duplicated entity was caused by se- mantically overload, further discussion was made to address the naming issue, leading to an update of sub-domain models and regeneration of services. When it turned out that both sub-domains shared an entity, further discussion was made to address its ownership. In this case, the generated API reference ex- posed the service to whom lost ownership by providing the access point and corresponding configuration file. 6.4 Conclusion The proposed framework leverages domain modeling activity during a project life cycle. It suggests an intermediate structure to be established before targeting to any executable code. This step maps different representations of domain models to a persisted view so that a consistent structure can always be utilized in domain modeling activity. First of all, the existing workflow of domain modeling activity can be decomposed into three phases (Figure. 4.3 ) - Inception, Foundations (Re-baseline) and Development - which makes it adaptive to the iterative process model. In the Inception Phase, the main activities are initial scoping and concept defini- tion. Meetings are held to collect user stories and domain glossary, along with the initial domain model identified. In Foundations Phase, a "sandbox" environment is generated for each model so that prototyping can be conducted in the Development Phase. After that, while incremental Development and Rebaselining Phase intertwine as the project process moves forward, domain models are iteratively refined by different teams and integrated at the end of an iteration. Executable code is regenerated as long as domain models get updated. The above workflow suggests presenting a function of KOTPV [37], as a central role between cus- tomers and implementation teams, providing the richest insight into domain analysis and smoothing out the difficulty of domain integration. This function is very important, especially for the development of 78 large systems. Hence a concern regarding the absence of KOTPV is raised. For instance, it may take an experienced engineer two hours to integrate sub-models, while it takes a whole day for a novice unfamiliar with the problem domain to produce the same artifacts. Manual checking or resolving domain integra- tion issues adds additional effort and waiting time on cross-team cooperation and is often error-prone. So computer-aided features are under exploration to better address these concerns. 79 Chapter7 ConclusionsandFutureWork 7.1 GeneralConclusions Microservices development usually follows a feedback-driven manner. The management uncertainty leads to extra effort spent on resolving the domain uncertainty at the early stage of each development iteration. This research tries to balance the feedback-driven and plan-driven methodologies for microservices development. It highlights the necessity of domain models and domain modeling activity within the devel- opment iteration and over the SDLC. The tool support for Executable Domain Models comprises Domain Identification and Microservices Generation (Figure 7.1). These tools allow a live service instance to be generated once the initial requirements are collected. In this framework, the domain concepts are incre- mentally developed along with rapid prototyping on live service instances. Whenever new requirements are collected, an updated domain model can be obtained for knowledge sharing and regeneration of live service instances. The intermediate artifacts help the implementation team to better reconcile require- ments. Overall, domain models are used for knowledge synchronization within a team and across teams, and early prototyping is enabled to achieve feedback-driven. It is used as a requirement discovery mech- anism for better planning. The case studies and evaluation confirm that this framework helps eliminate domain uncertainty and mitigate risks caused by management uncertainty at the early stages. 80 Figure 7.1: Summary of Executable Domain Model Framework 7.2 SummaryofContributions In summary, this dissertation centers around EDM. It discusses the reason for the need, describes the ac- tivities involved, and proposes the tools to be applied. The evaluations and case studies show its feasibility, benefits and potential concerns. The concept of EDM is further committed as a part of an agile process called Parallel Agile (PA) [94, 95, 96]. The automated EDM generation approach [116, 117] is further im- plemented as PA’s Codebot. 7.3 FutureWork Two suggested future directions can be considered according to the concerns found in case studies and evaluations. 81 7.3.1 ImprovementoftheCurrentWork • Completenessandredundancyimprovement. Although the domain identification tool produces more correct results, room for improvement in its completeness and redundancy can be further addressed by exploring more expressive sentence structures and transformation rules. • Experimentinmoreindustrialenvironments. More experiments should be repeated in indus- trial environments to better understand the benefits and limitations. • Process guideline. A guideline of the overall process, including roles, activities, tasks and tool usage, could be better established. 7.3.2 PotentialCapabilities • Domainmodelintegration. Sub-models could be automatically or semi-automatically integrated. This action happens at the end of each development iteration. The output is an updated domain model, as the input for the next iteration. Researches on this topic integrate the domain models from instance level and structure level [101, 9]. The structure-level integration views the domain models as graphs and merges them based on their structural information. The instance-level integration focuses on the data set resulting from prototyping and testing, during which data types, values and formats stored in the database are used as the matching criteria. Based on these two approaches, some hybrid or composite approaches are derived for better accuracy. • Datamigration. The data from previous prototyping and testing results could be automatically mi- grated from an old service instance to a new instance. This step could happen after a data migration script is synthesized based on the original domain models and the updated ones. 82 Some existing research on software synthesis can be applied to this capability. For example, in [122], given the initial model and updated model, both are viewed as a tree structure, a synthesizer can furcate the tree and infers the path transformation. With a predefined code template, path transformation can be mapped to the corresponding data transformation script. Building an integration or migration system for microservices is not easy work because services vary by infrastructure. Different syntax of domain model or database may lead to the extra effort spent on the matching script. In this study, an intermediate structure is introduced to resolve the difference between representa- tions, especially targeting to the structure-level difference. Moreover, once an integration among multiple intermediate structures is done, code can be immediately regenerated from it. 83 References [1] H. T. P. (HTTP/1.1). Semantics and Content". IETF. Retrieved 16, 2017. [2] Acceleo, an open template-based source code generation technology developed inside of the eclipse foundation.url: https://www.eclipse.org/acceleo/. Accessed: 2019, Mar. [3] Actifsource, build your domain specific development tool that turns your software specification into running code.url: http://www.actifsource.com/. Accessed: 2019, Mar. [4] A. Adamko. Modeling data-oriented web applications using uml. In EUROCON 2005-The International Conference on "Computer as a Tool", volume 1, pages 752–755. IEEE, 2005. [5] M. A. Akbar, J. Sang, A. A. Khan, F.-E. Amin, S. Hussain, M. K. Sohail, H. Xiang, B. Cai, et al. Statistical analysis of the effects of heavyweight and lightweight methodologies on the six-pointed star model. IEEE Access, 6:8066–8079, 2018. [6] M. M. Albakri and M. R. J. Qureshi. Empirical estimation of cocomo i and cocomo ii using a case study. In Proceedings of the International Conference on Software Engineering Research and Practice (SERP), page 1. The Steering Committee of The World Congress in Computer Science, Computer ..., 2012. [7] A. J. Albrecht and J. E. Gaffney. Software function, source lines of code, and development effort prediction: a software science validation. IEEE transactions on software engineering, (6):639–648, 1983. [8] A. Alshamrani and A. Bahattab. A comparison between three sdlc models waterfall model, spiral model, and incremental/iterative model. International Journal of Computer Science Issues (IJCSI), 12(1):106, 2015. [9] A. A. Alwan, A. Nordin, M. Alzeber, and A. Z. Abualkishik. A survey of schema matching research using database schemas and instances. International Journal Of Advanced Computer Science And Applications, 8(10), 2017. [10] Apache isis, domain driven applications, quickly.url: https://isis.apache.org/. Accessed: Mar. 2019. 84 [11] Apimatic, an api documentation tool that provides a complete set of dx components.url: https://apimatic.io/. Accessed: 2019, Mar. [12] C. Arora, M. Sabetzadeh, S. Nejati, and L. Briand. An active learning approach for improving the accuracy of automated domain model extraction. ACM Transactions on Software Engineering and Methodology (TOSEM), 28(1):1–34, 2019. [13] M. Awad. A comparison between agile and traditional software development methodologies. University of Western Australia, 30, 2005. [14] B. michael. these 3 industries have the highest talent turnover rates.url: https://www.linkedin.com/business/talent/blog/talent-strategy/industries-with-the-highest- turnover-rates. Accessed: Jan. 2021. [15] R. D. Banker, R. J. Kauffman, and R. Kumar. An empirical test of object-based output measurement metrics in a computer aided software engineering (case) environment. Journal of Management Information Systems, 8(3):127–150, 1991. [16] B. Barry. Software engineering economics. Prentice Hall, 1981. [17] P. Bocciarelli and A. D’Ambrogio. A model-driven method for describing and predicting the reliability of composite services. Software & Systems Modeling, 10(2):265–280, 2011. [18] B. Boehm and R. Turner. Balancing Agility and Discipline: A Guide for the Perplexed. Addison-Wesley Professional, 2003. [19] B. Boehm and R. Turner. Management challenges to implementing agile processes in traditional development organizations. IEEE software, 22(5):30–39, 2005. [20] B. Boehm and R. Turner. Using risk to balance agile and plan-driven methods. Computer, 36(6):57–66, 2003. [21] B. Boehm. Get ready for agile methods, with care. Computer, 35(1):64–69, 2002. [22] B. Boehm. Requirements that handle ikiwisi, cots, and rapid change.Computer, 33(7):99–102, 2000. [23] B. Boehm, B. Clark, E. Horowitz, C. Westland, R. Madachy, and R. Selby. Cost models for future software life cycle processes: cocomo 2.0. Annals of software engineering, 1(1):57–94, 1995. [24] B. W. Boehm, J. A. Lane, S. Koolmanojwong, and R. Turner. The incremental commitment spiral model: principles and practices for successful systems and software. Addison-Wesley, 2014. [25] J. Bogner, S. Wagner, and A. Zimmermann. Automatically measuring the maintainability of service-and microservice-based systems: a literature review. In Proceedings of the 27th International Workshop on Software Measurement and 12th International Conference on Software Process and Product Measurement, pages 107–115, 2017. [26] G. Booch. The unified modeling language user guide . Pearson Education India, 2005. 85 [27] M. Borg, P. Chatzipetrou, K. Wnuk, E. Alégroth, T. Gorschek, E. Papatheocharous, S. M. A. Shah, and J. Axelsson. Selecting component sourcing options: a survey of software engineering’s broader make-or-buy decisions. Information and Software Technology, 112:18–34, 2019. [28] J. Bosch. Continuous software engineering: An introduction. Springer, 2014, pages 3–13. [29] L. Buglione and A. Abran. Improving the user story agile technique using the invest criteria. In 2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement, pages 49–53. IEEE, 2013. [30] J. Cardoso, A. Barros, N. May, and U. Kylau. Towards a unified service description language for the internet of services: requirements and first developments. In 2010 IEEE International Conference on Services Computing, pages 602–609. IEEE, 2010. [31] L. Chen. Microservices: architecting for continuous delivery and devops. In 2018 IEEE International conference on software architecture (ICSA), pages 39–397. IEEE, 2018. [32] J. Clark. Xsl transformations (xslt). World Wide Web Consortium (W3C). URL http://www. w3. org/TR/xslt:103, 1999. [33] P. Clarke and R. V. O’Connor. The situational factors that affect the software development process: towards a comprehensive reference framework. Information and software technology, 54(5):433–447, 2012. [34] P. M. Clarke, P. Elger, and R. V. O’Connor. Technology enabled continuous software development. In Proceedings of the International Workshop on Continuous Software Evolution and Delivery, pages 48–48, 2016. [35] A. Cockburn and J. Highsmith. Agile software development, the people factor. Computer, 34(11):131–133, 2001. [36] E. F. Codd. Further normalization of the data base relational model. Data base systems, 6:33–64, 1972. [37] B. Curtis, H. Krasner, and N. Iscoe. A field study of the software design process for large systems. In Communications of the ACM, pages 1268–1287, 1988. 31(11). [38] D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun. A practical part-of-speech tagger. In Third conference on applied natural language processing, pages 133–140, 1992. [39] K. Czarnecki and S. Helsen. Feature-based survey of model transformation approaches. IBM Systems Journal, 45(3):621–645, 2006.issn: 0018-8670. [40] K. Czarnecki and S. Helsen. Feature-based survey of model transformation approaches. IBM systems journal, 45(3):621–645, 2006. [41] A. Dallas. RESTful Web Services with Dropwizard. Packt Publishing Ltd, 2014. 86 [42] M.-C. De Marneffe and C. D. Manning. The stanford typed dependencies representation. In Coling 2008: proceedings of the workshop on cross-framework and cross-domain parser evaluation, pages 1–8, 2008. [43] R. Dijkman, M. Dumas, B. Van Dongen, R. Käärik, and J. Mendling. Similarity of business process models: metrics and evaluation. Information Systems, 36(2):498–516, 2011. [44] Docker, empowering app development for developers.url: https://www.docker.com/. Accessed: Jan. 2021. [45] E. Domı, B. Pérez, and Á. L. Rubio. A systematic review of code generation proposals from state machine specifications. Information and Software Technology, 54(10):1045–1066, 2012. [46] N. Dragoni, S. Giallorenzo, A. L. Lafuente, M. Mazzara, F. Montesi, R. Mustafin, and L. Safina. Microservices: yesterday, today, and tomorrow. Present and ulterior software engineering:195–216, 2017. [47] N. Dragoni, I. Lanese, S. T. Larsen, M. Mazzara, R. Mustafin, and L. Safina. Microservices: how to make your application scale. In International Andrei Ershov Memorial Conference on Perspectives of System Informatics, pages 95–104. Springer, 2017. [48] E. Evans and E. J. Evans. Domain-driven design: tackling complexity in the heart of software. Addison-Wesley Professional, 2004. [49] Express.js, a minimal and flexible node.js web application framework that provides a robust set of features for web and mobile applications.url: https://expressjs.com/. Accessed: 2019, Mar. [50] A. Finkelstein, J. Kramer, B. Nuseibeh, L. Finkelstein, and M. Goedicke. Viewpoints: a framework for integrating multiple perspectives in system development. International Journal of Software Engineering and Knowledge Engineering, 2(01):31–57, 1992. [51] M. Fowler, J. Highsmith, et al. The agile manifesto. Software development, 9(8):28–35, 2001. [52] M. Fowler and J. Lewis. Microservices: a definition of this new architectural term. url: https://martinfowler.com/articles/microservices.html. Accessed: Mar. 2019. [53] R. T. Futrell, D. F. Shafer, and L. Shafer. Quality software project management, volume 1. Prentice Hall Professional, 2002. [54] Gherkin uses a set of special keywords to give structure and meaning to executable specifications. url: https://cucumber.io/docs/gherkin/reference/. Accessed: Jan. 2021. [55] Git, a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. url: https://git-scm.com/. Accessed: Jan. 2021. [56] Gradle is an open-source build automation tool focused on flexibility and performance. url: https://docs.gradle.org/. Accessed: Jan. 2021. 87 [57] S. Gulwani, O. Polozov, R. Singh, et al. Program synthesis. Foundations and Trends® in Programming Languages, 4(1-2):1–119, 2017. [58] D. Gurunule and M. Nashipudimath. A review: analysis of aspect orientation and model driven engineering for code generation. Procedia Computer Science, 45:852–861, 2015. [59] P. Hanks. Lexical patterns: from hornby to hunston and beyond. In Proceedings of the XIII EURALEX International Congress, volume 1 of number 1, pages 89–129, 2008. [60] H. Harmain and R. Gaizauskas. Cm-builder: a natural language-based case tool for object-oriented analysis. Automated Software Engineering, 10(2):157–181, 2003. [61] J. A. Highsmith and J. Highsmith. Agile software development ecosystems. Addison-Wesley Professional, 2002. [62] J. Highsmith and A. Cockburn. Agile software development: the business of innovation. Computer, 34(9):120–127, 2001. [63] R. Hoehndorf, A.-C. N. Ngomo, and H. Herre. Developing consistent and modular software models with ontologies. In SoMeT, pages 399–412, 2009. [64] J. Holck and N. Jørgensen. Continuous integration and quality assurance: a case study of two open source projects. Australasian Journal of Information Systems, 11(1), 2003. [65] M. Ilieva and O. Ormandjieva. Models derived from automatically analyzed textual user requirements. In Fourth International Conference on Software Engineering Research, Management and Applications (SERA’06), pages 13–21. IEEE, 2006. [66] M. Imaz and D. Benyon. How stories capture interactions. In INTERACT, volume 99, pages 321–328, 1999. [67] T. Jarratt, C. M. Eckert, N. H. Caldwell, and P. J. Clarkson. Engineering change: an overview and perspective on the literature. Research in engineering design, 22(2):103–124, 2011. [68] Jenkins, an open source automation server which enables developers around the world to reliably build, test, and deploy their software.url: https://www.jenkins.io/. Accessed: Jan. 2021. [69] Jet, a model-to-text project.url: https://www.eclipse.org/modeling/m2t/?project=jet. Accessed: 2019, Mar. [70] M. Kassab. The changing landscape of requirements engineering practices over the past decade. In 2015 IEEE Fifth International Workshop on Empirical Requirements Engineering (EmpiRE), pages 1–8. IEEE, 2015. [71] S. Kehrer and W. Blochinger. Autogenic: automated generation of self-configuring microservices. In CLOSER, pages 35–46, 2018. [72] S. Khan 1 , A. B. Dulloo, and M. Verma. Systematic review of requirement elicitation techniques. India, 2014. 88 [73] B. Klatt. Xpand: a closer look at the model2text transformation language. Language, 10(16):2008, 2007. [74] C. Lee, J. Gottschlich, and D. Roth. Toward code generation: a survey and lessons from semantic parsing. arXiv preprint arXiv:2105.03317, 2021. [75] L. Leite, C. Rocha, F. Kon, D. Milojicic, and P. Meirelles. A survey of devops concepts and challenges. ACM Computing Surveys (CSUR), 52(6):1–35, 2019. [76] D. Liu, K. Subramaniam, A. Eberlein, and B. H. Far. Natural language requirements analysis and class model generation using ucda. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pages 295–304. Springer, 2004. [77] G. Lucassen, M. Robeer, F. Dalpiaz, J. M. E. Van Der Werf, and S. Brinkkemper. Extracting conceptual models from user stories with visual narrator. Requirements Engineering, 22(3):339–358, 2017. [78] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55–60, 2014. [79] D. Martin, M. Paolucci, S. McIlraith, M. Burstein, D. McDermott, D. McGuinness, B. Parsia, T. Payne, M. Sabou, M. Solanki, et al. Bringing semantics to web services: the owl-s approach. In International Workshop on Semantic Web Services and Web Process Composition, pages 26–42. Springer, 2004. [80] G. S. Matharu, A. Mishra, H. Singh, and P. Upadhyay. Empirical study of agile software development methodologies: a comparative analysis. ACM SIGSOFT Software Engineering Notes, 40(1):1–6, 2015. [81] D. McKinney. Impact of commercial off-the-shelf (cots) software and technology on systems engineering. Presentation to INCOSE Chapters:1–19, 2001. [82] M. Morisio, C. B. Seaman, V. R. Basili, A. T. Parra, S. E. Kraft, and S. E. Condon. Cots-based software development: processes and open issues. Journal of Systems and Software, 61(3):189–199, 2002. [83] M. Morisio, C. B. Seaman, A. T. Parra, V. R. Basili, S. E. Kraft, and S. E. Condon. Investigating and improving a cots-based software development. In Proceedings of the 22nd international conference on Software engineering, pages 32–41, 2000. [84] N. M. A. Munassar and A. Govardhan. A comparison between five models of software engineering. International Journal of Computer Science Issues (IJCSI), 7(5):94, 2010. [85] J. Munch and K. Schmid. Domain Modeling and Domain Engineering: Key Tasks in Requirements Engineering.PerspectivesontheFutureofSoftwareEngineering:EssaysinHonorofDieterRombach, 9783642373, 2013. 89 [86] R. V. O’Connor, P. Elger, and P. M. Clarke. Continuous software engineering—a microservices architecture perspective. Journal of Software: Evolution and Process, 29(11):e1866, 2017. [87] Openapi specification. 2019. url: https://www.openapis.org/. Accessed: Jun, 2019. [88] Openxava, ajax java framework for rapid application development.url: http://www.openxava.org/. Accessed: Mar. 2019. [89] S. Overmyer, L. Benoit, and R. Owen. Conceptual modeling through linguistic analysis using LIDA. Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001:401–410, 2001.issn: 0270-5257. [90] C. Pacheco, I. Garcıa, and M. Reyes. Requirements elicitation techniques: a systematic literature review based on the maturity of the techniques. IET Software, 12(4):365–378, 2018. [91] R. Pawson. Naked Objects. PhD thesis, Trinity College, Dublin, 2004. [92] Roma framework, the new way to conceive web applications.url: http://www.romaframework.org/. Accessed: Mar. 2019. [93] D. Rosenberg and M. Stephens. Use Case Driven Object Modeling with UML: Theory and Practice. ITPro collection. Apress, 2007. [94] D. Rosenberg, B. Boehm, M. Stephens, C. Suscheck, S. R. Dhalipathi, and B. Wang. Parallel Agile–faster delivery, fewer defects, lower cost. Springer, 2020. [95] D. Rosenberg, B. Boehm, B. Wang, and K. Qi. Rapid, evolutionary, reliable, scalable system and software development: the resilient agile process. In Proceedings of the 2017 International Conference on Software and System Process, pages 60–69, 2017. [96] D. Rosenberg, B. W. Boehm, B. Wang, and K. Qi. The parallel agile process: applying parallel processing techniques to software engineering. Journal of Software: Evolution and Process, 31(6):e2144, 2019. [97] A. Safwat and M. Senousy. Addressing challenges of ultra large scale system on requirements engineering. Procedia Computer Science, 65:442–449, 2015. [98] N. Samarasinghe and S. S. Somé. Generating a domain model from a use case model. IASSE, 278, 2005. [99] G. Sebastián, J. A. Gallud, and R. Tesoriero. Code generation using model driven architecture: a systematic mapping study. Journal of Computer Languages, 56:100935, 2020. [100] Sensu, the observability pipeline that delivers monitoring as code on any cloud.url: https://sensu.io/. Accessed: Jan. 2021. [101] P. Shvaiko and J. Euzenat. A survey of schema-based matching approaches:146–171, 2005. 90 [102] A. Singleton. Unblock! A Guide to the New Continuous Agile. Assembla, Inc., 2014.url: http://www.continuousagile.com/unblock/. Accessed: Mar. 2019. [103] G. Sousa, W. Rudametkin, and L. Duchien. Automated setup of multi-cloud environments for microservices applications. In 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), pages 327–334. IEEE, 2016. [104] Spring boot, makes it easy to create stand-alone, production-grade spring based applications.url: http://spring.io/projects/spring-boot. Accessed: 2019, Mar. [105] S. Staab, T. Walter, G. Gröner, and F. S. Parreiras. Model driven engineering with ontology technologies. In Reasoning Web International Summer School, pages 62–98. Springer, 2010. [106] R. J. Sternberg, E. L. Grigorenko, and L.-f. Zhang. Styles of learning and thinking matter in instruction and assessment. Perspectives on psychological science, 3(6):486–506, 2008. [107] V. Stray, N. B. Moe, and G. R. Bergersen. Are daily stand-up meetings valuable? a survey of developers in software teams. In International Conference on Agile Software Development, pages 274–281. Springer, 2017. [108] Swagger, simplify api development for users, teams, and enterprises with our open source and professional toolset.url: https://swagger.io/. Accessed: 2019, Mar. [109] E. Syriani, L. Luhunu, and H. Sahraoui. Systematic mapping study of template-based code generation. Computer Languages, Systems & Structures, 52:43–62, 2018. [110] A. Y. Teka, N. Condori-Fernandez, and B. Sapkota. A systematic literature review on service description methods. In International Working Conference on Requirements Engineering: Foundation for Software Quality, pages 239–255. Springer, 2012. [111] J. S. Thakur and A. Gupta. Automatic generation of analysis class diagrams from use case specifications. arXiv preprint arXiv:1708.01796, 2017. [112] J. S. Thakur and A. Gupta. Identifying domain elements from textual specifications. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE, pages 566–577, Singapore, Singapore, 2016.isbn: 978-1-4503-3845-5. [113] J. Thönes. Microservices. IEEE software, 32(1):116–116, 2015. [114] M. Torchiano and M. Morisio. Overlooked aspects of cots-based development. IEEE software, 21(2):88–93, 2004. [115] Trails, a modern web application framework for node.js.url: https://trailsjs.io/. Accessed: Mar. 2019. [116] B. Wang and B. W. Boehm. Process implications of executable domain models for microservices development. In Proceedings of the International Conference on Software and System Processes, pages 41–50, 2020. 91 [117] B. Wang, D. Rosenberg, and B. W. Boehm. Rapid realization of executable domain models via automatic code generation. In 2017 IEEE 28th Annual Software Technology Conference (STC), pages 1–6. IEEE, 2017. [118] Y. Wautelet, S. Heng, M. Kolp, and I. Mirbel. Unifying and extending user story models. In International conference on advanced information systems engineering, pages 211–225. Springer, 2014. [119] S. A. White. Introduction to bpmn. IBM Cooperation, 2, 2004. [120] K. Wnuk. Involving relevant stakeholders into the decision process about software components. In 2017 IEEE International Conference on Software Architecture Workshops (ICSAW), pages 129–132. IEEE, 2017. [121] L. R. Wong, D. S. Mauricio, G. D. Rodriguez, et al. A systematic literature review about software requirements elicitation. Journal of Engineering Science and Technology, 12(2):296–317, 2017. [122] N. Yaghmazadeh, C. Klinger, I. Dillig, and S. Chaudhuri. Synthesizing transformations on hierarchically structured data. In ACM SIGPLAN Notices, volume 51 of number 6, pages 508–521. ACM, 2016. [123] C. Yang, Y. Liu, and C. Yin. Recent advances in intelligent source code generation: a survey on natural language based studies. Entropy, 23(9):1174, 2021. [124] T. Yue, L. C. Briand, and Y. Labiche. Automatically Deriving a UML Analysis Model from a Use Case Model. Technical Report 2010-15 (Version 2), Simula Research Laboratory, (October), 2010. 92 AppendixA SentenceStructuresforDomainIdentification Table A.1: Sentence Structures # Structure Note TDs POS-tags Indep. Sentence Example 1 SVObj Subject- Verb- Object nsubj*(A,B), obj(A,C) B==NN*, C==NN*, A==VB* 4, 6, 22, 23, 25 As a poster, I can create posts. 2 SVOblOn Subject- Verb- OblOn nsubj(A,B), obl:on(A,C) B==NN*, C==NN*, A==VB* As a student, I work on a project. 3 SVOblIn Subject- Verb- OblIn nsubj(A,B), obl:in(A,C) B==NN*, C==NN*, A==VB* As a student, I work in an university. 4 SVCcO Subject- Verb- Clausal Compl- Object nsubj(A,B), xcomp(A,C), obj(C,D), ~obj(A,*) B==NN*, D==NN*, C==VB* As a poster, I can keep updating the blog. 5 SVCc Subject- Verb- Clausal Compl nsubj(A,B), xcomp(A,C), ~obj(A,*) B==NN*, C==VB* 4 As a gas keeper, I stop filling when the tank’s status is full. 6 SVpO Subject- Verb Phrase- Object nsubj(A,B), com- pound:prt(A,C), obj(A,D) B==NN*, C==RP, D==NN* As a robot, I can turn off the fans. 7 SVp Subject- Verb Phrase nsubj(A,B), com- pound:prt(A,C) B==NN*, C==RP 6 As a robot, I can wake up. 93 8 SV Subject- Verb nsubj*(A,B) A==VB*, B==NN* 1, 2, 3, 4, 5, 6, 7, 10, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 As a student, I study online when the course is over. 9 Poss Possessive nmod:poss(A,B) A==NN*, B==NN* As a gas keeper, I stop filling when the tank’s status is full. 10 SPass Part-V- By-S Subject- Be- Passive Verb-By- Subject nsubj:pass(A,B), obl:agent(A,C) A==VBN, B==NN*, C==NN* As a customer, I can be charged by the seller. 11 VOToInf Verb- Object- To- Infinitive obj(A,B), ad- vcl:to(A,C) B==NN*, C==VB* As a user, I can com- mand the motor to start. 12 SVThat Clause Subject- Verb- ThatClause nsubj(A,B), ccomp(A,C), mark(C,D), nsubj*(C,E) D==IN, A==VB*, B==NN*, E==NN* As a seller, I can vali- date that the coupon is expired. 13 SVObl With Subject- Verb- OblWith nsubj(A,B), obl:with(A,C) B==NN*, C==NN*, A==VB* As a customer, I can purchase goods with a card. 14 SVByVO Subject- Verb- By-Verb- Object nsubj(A,B), advcl:by(A,C), obj(C,D) B==NN*, C==VBG, D==NN* As a customer, I can purchase goods by using a coupon. 15 SVByV Subject- Verb-By- Verb nsubj(A,B), ad- vcl:by(A,C) B==NN*, C==VBG 14 As a customer, I can get goods by paying. 16 AO Noun- Clausal Modifier acl(A,B), obj(B,C) A==NN*, B==VBG, C==NN* As a GM, I can make decision affecting results. 17 SVCVO Subject- Verb- CC-Verb- Object nsubj(A,B), conj*(A,C), obj(A,D), ~obj(C,*) B==NN*, C==VB*, D==NN* As a seller, I can re- place, buy and sell phones. 18 SVVO Multi Comma Subject- Verb- Verb- Object nsubj(A,B), conj*(A,C), conj*(A,D), obj(D,E), ~obj(A,*), ~obj(C,*) B==NN*, A==VB*, C==VB*, D==VB*, E==NN* As a seller, I can replace, buy, sell phones. 94 19 SVVO 1Comma Subject- Verb- Verb- Object (one comma) nsubj(A,B), dep(A,C), obj(C,D), ~obj(A,*) B==NN*, A==VB*, C==VB*, D==NN* As a seller, I can re- place, sell phones. 20 SVOCO Subject- Verb- Object- CC- Object nsubj(A,B), obj(A,C), conj*(C,D) B==NN*, A==VB*, C==NN*, D==NN* As a customer, I can buy fruits, meats and vegetables. 21 SVOO Comma Subject- Verb- Object- Object nsubj(A,B), obj(A,C), ap- pos(C,D) B==NN*, A==VB*, D==NN* As a customer, I can buy fruits, meats, vegetables. 22 SVIOO Subject- Verb- Indirect Object- Object nsubj(A,B), iobj(A,C), obj(A,D) A==VB*, B==NN*, C==NN*, D==NN* As a admin, I can send users an email. 23 SVOTo Subject- Verb- Object- OblTo nsubj(A,B), obj(A,C), obl:to(A,D) A==VB*, B==NN*, C==NN*, D==NN* As a host, I can send a gift to customers. 24 OPassPart VByS Object- Passive Verb-By- Subject nsubj(A,B), obl:by(A,C) A==VBN, B==NN*, C==NN* As a admin, I can validate the record submitted by the customer. 25 SVOPass Part Subject- Verb- Object- Passive Verb nsubj(A,B), dep(A,C), nsubj*(C,D) A==VB*, B==NN*, C==VBN, D==NN* As a admin, I can validate the profile submitted by the customer. 26 SPred- icative V Subject- Be-Noun nsubj(A,B), cop(A,C) A==NN*, B==NN* As a user, I can be an admin. Rather than matching with the exact TDs and POS-tags names, two wildcard characters are introduced in the structure pattern: • "*" character. Matches any item starts with its preceding names. For example, given "A == NN*", A could be a singular or mass noun (NN) or a Proper noun (NNP). Similarly, "nsubj*(A,B)" denotes A could be a passive nominal subject ("nsubj:pass") of B. 95 • "~" character. Negates matching if its following item appears. For example, "~obj(A,B)" denotes B should not be the object ofA in the matching. Similarly, "~obj(A,*)" denotesA should have no object identified in the matching. 96 AppendixB TransformationRulesforDomainIdentification Table B.1: Transformation Rules # Structure Action Rule 1 SVObj add_behavior actor=B, target=C, action=A 2 SVOblOn add_behavior actor=B, target=C, action=A 3 SVOblIn add_behavior actor=B, target=C, action=A 4 SVCcO add_behavior actor=B, target=D, action=C 5 SVCc add_behavior actor=B, action=C 6 SVpO add_behavior actor=B, action=AC, target=D 7 SVp add_behavior actor=B, action=AC 8 SV add_behavior actor=B, action=A 9 Poss add_entity_attribute entity_name=B, attr_name=A 10 SPassPartVByS add_behavior actor=C, target=B, action=A 11 VOToInf add_behavior actor=B, action=C 12 SVThatClause add_behavior actor=B, target=E, action=A 13 SVOblWith add_relation source=B, dest=C, msg="supported_by", ass_type="association" 14 SVByVO add_behavior actor=B, target=D, action=C 15 SVByV add_behavior actor=B, action=C 16 AO add_behavior actor=A, target=C, action=B 17 SVCVO add_behavior actor=B, target=D, action=C 18 SVVOMultiComma add_behavior actor=B, target=E, action=A 19 SVVOMultiComma add_behavior actor=B, target=E, action=C 20 SVVOMultiComma add_behavior actor=B, target=E, action=D 21 SVVO1Comma add_behavior actor=B, target=D, action=C 22 SVVO1Comma add_behavior actor=B, target=D, action=A 23 SVOCO add_behavior actor=B, target=C, action=A 24 SVOCO add_behavior actor=B, target=D, action=A 25 SVOOComma add_behavior actor=B, target=D, action=A 26 SVIOO add_behavior actor=B, target=D, action=A 27 SVIOO add_relation source=D, dest=C, msg="to", 97 ass_type="association" 28 SVOTo add_behavior actor=B, target=C, action=A 29 SVOTo add_relation source=C, dest=D, msg="to", ass_type="association" 30 OPassPartVByS add_behavior actor=C, target=B, action=A 31 SVOPassPart add_behavior actor=B, target=D, action=A 32 SPredicativeV add_relation source=A, dest=B, ass_type="generalization" 98
Abstract (if available)
Abstract
Microservice has been recognized as an important enabler for the continuous development of many cloud-based systems. A wide range of development methodologies and tools are available for delivering microservices at a fast pace. Microservices development has proven to be agile. However, common agile practices are not always the best solution to microservices development. This situation is subject to the nature of microservices and several technical and social factors during the project development life cycles. To apply the agile methodology in microservices development without reasoning may result in undesired overhead.
Derived from the adoption of agile methodology in microservices development, the uncertainty existing at the early stages of each development iteration is a major concern. This research identifies the way of improvement and develops an approach to eliminate the delays brought by the uncertainty. The proposed approach, Executable Domain Models, mainly consists of a process elaboration and a toolkit that assists implementation teams to identify application domain models and generate microservices at the early stages. The approach aligns this toolkit with the development process and coordinates domain modeling activity over project life cycles.
Empirical studies have been conducted to assess the effectiveness and efficiency of Executable Domain Models. Experiments and several minimum viable products have been accomplished during the past years. The collected project data shows the domain identification tool generates more correct domain models for knowledge sharing purposes. The service generation tool, bound with domain modeling activity, results in a 10% saving of effort and fewer issues at later stages. Effort saving increases to 30% under an extreme condition with high-rate personnel turnover.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Automated synthesis of domain-specific model interpreters
PDF
Domain-based effort distribution model for software cost estimation
PDF
A model for estimating cross-project multitasking overhead in software development projects
PDF
The effects of required security on software development effort
PDF
A model for estimating schedule acceleration in agile software development projects
PDF
Software architecture recovery using text classification -- recover and RELAX
PDF
Incremental development productivity decline
PDF
Semantic structure in understanding and generation of the 3D world
PDF
Optimizing execution of in situ workflows
PDF
Development of electronic design automation tools for large-scale single flux quantum circuits
PDF
Physics-based data-driven inference
PDF
Toward better understanding and improving user-developer communications on mobile app stores
PDF
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
PDF
Electronic design automation algorithms for physical design and optimization of single flux quantum logic circuits
PDF
Verification and testing of rapid single-flux-quantum (RSFQ) circuit for certifying logical correctness and performance
PDF
Development and applications of a body-force propulsor model for high-fidelity CFD
PDF
Kernel methods for unsupervised domain adaptation
PDF
Empirical study of informational regularizations in learning useful and interpretable representations
PDF
Unsupervised domain adaptation with private data
PDF
A framework for runtime energy efficient mobile execution
Asset Metadata
Creator
Wang, Bo
(author)
Core Title
Process implications of executable domain models for microservices development
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2022-08
Publication Date
07/21/2022
Defense Date
05/13/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
agile,code generation,continuous development,domain identification,domain modeling,microservices,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Boehm, Barry (
committee chair
), Gupta, Sandeep (
committee member
), Nakano, Aiichiro (
committee member
)
Creator Email
wang736@usc.edu,xixixhalu@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111373670
Unique identifier
UC111373670
Legacy Identifier
etd-WangBo-10886
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Wang, Bo
Type
texts
Source
20220721-usctheses-batch-958
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
agile
code generation
continuous development
domain identification
domain modeling
microservices