Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Domain specific software architecture for large-scale scientific software
(USC Thesis Other)
Domain specific software architecture for large-scale scientific software
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DOMAIN SPECIFIC SOFTWARE ARCHITECTURE FOR LARGE-SCALE SCIENTIFIC SOFTWARE by David Woollard A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2011 Copyright 2011 David Woollard Dedication nulla tenaci invia est via To my family, by whom I've been shown that no road is impassable. ii Acknowledgments No work of this magnitude in undertaken without the support from friends, family and colleagues. You know who you are. iii Table of Contents Dedication ii Acknowledgments iii List of Tables vii List of Figures viii Abstract x Chapter 1. Introduction 1 1.1 Historical Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Historical Trends in Hardware . . . . . . . . . . . . . . . . . 2 1.1.2 Historical Trends in Software Infrastructure . . . . . . . . . 4 1.2 The in silico Process . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Current Software Development Methods . . . . . . . . . . . . . . . 9 1.3.1 Mythos of High Performance . . . . . . . . . . . . . . . . . . 9 1.3.2 State of Development Methodology . . . . . . . . . . . . . . 10 1.3.3 Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Research Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.1 Research Statement and Hypotheses . . . . . . . . . . . . . 14 1.4.2 Assumptions and Threats to Validity . . . . . . . . . . . . . 17 1.4.3 Contributions to the State of the Art . . . . . . . . . . . . . 18 1.5 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . 19 Chapter 2. Background and Related Work 20 2.1 Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.1 Architecture Description Languages . . . . . . . . . . . . . . 21 2.1.2 Domain-Specic Software Architecture . . . . . . . . . . . . 22 2.1.3 Architectural Recovery . . . . . . . . . . . . . . . . . . . . . 24 2.1.4 Software Connectors . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Support for In Silico Experimentation . . . . . . . . . . . . . . . . 26 2.2.1 Work ow Systems . . . . . . . . . . . . . . . . . . . . . . . 26 iv 2.2.2 Work ow Services . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3 Component-Based Scientic Software . . . . . . . . . . . . . . . . . 36 Chapter 3. Approach 37 3.1 Characterization of Work ow Services . . . . . . . . . . . . . . . . . 37 3.1.1 Example Services . . . . . . . . . . . . . . . . . . . . . . . . 38 3.1.2 Interaction Mechanisms . . . . . . . . . . . . . . . . . . . . 40 3.1.3 Work ow Service Granularity . . . . . . . . . . . . . . . . . 41 3.1.4 The Work ow Services Challenge . . . . . . . . . . . . . . . 41 3.2 Technical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.1 Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2.2 Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2.3 Re-architecting Work ow Components . . . . . . . . . . . . 44 3.2.4 Deployment onto the Grid . . . . . . . . . . . . . . . . . . . 45 Chapter 4. KADRE: Decomposing Existing Software Systems 46 4.1 Approach to Decomposition . . . . . . . . . . . . . . . . . . . . . . 47 4.1.1 Clustering Process . . . . . . . . . . . . . . . . . . . . . . . 48 4.1.2 Formal Denition of Similarity . . . . . . . . . . . . . . . . . 49 4.1.3 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1.4 Parameterization . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Parameter Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Chapter 5. SWSA: A Domain Specic Software Architecture 62 5.1 SWSA - A Domain-Specic Software Architecture . . . . . . . . . . 62 5.1.1 Service Encapsulation . . . . . . . . . . . . . . . . . . . . . 64 5.1.2 The Role of Connectors . . . . . . . . . . . . . . . . . . . . 64 5.2 Middleware Support . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.3 Support for Architectural Styles . . . . . . . . . . . . . . . . . . . . 68 5.3.1 Dimensions of Style Variance . . . . . . . . . . . . . . . . . 68 5.4 SWSA's Implementation . . . . . . . . . . . . . . . . . . . . . . . . 71 5.4.1 SWSA Components . . . . . . . . . . . . . . . . . . . . . . . 72 5.4.2 SWSA Connectors . . . . . . . . . . . . . . . . . . . . . . . 73 5.4.3 SWSA Topology . . . . . . . . . . . . . . . . . . . . . . . . 73 5.4.4 Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Chapter 6. Evaluation 75 6.1 Research Thesis and Hypotheses . . . . . . . . . . . . . . . . . . . . 75 6.2 Validation Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.2.1 Evaluation Suite . . . . . . . . . . . . . . . . . . . . . . . . 77 6.2.2 Measuring Cluster Accuracy . . . . . . . . . . . . . . . . . . 79 6.3 Evaluation of KADRE's Accuracy . . . . . . . . . . . . . . . . . . . 82 v 6.3.1 Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . 83 6.3.2 Comparison to Existing Techniques . . . . . . . . . . . . . . 85 6.3.3 Comparison to Manual Decomposition . . . . . . . . . . . . 87 6.4 Evaluation of KADRE's Run Time . . . . . . . . . . . . . . . . . . 88 6.4.1 Algorithmic Analysis . . . . . . . . . . . . . . . . . . . . . . 88 6.4.2 Empirical Performance Analysis . . . . . . . . . . . . . . . . 89 6.5 Evaluation of SWSA's Performance . . . . . . . . . . . . . . . . . . 90 Chapter 7. Conclusion and Future Work 93 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Bibliography 98 Appendix. Evaluation Suite Source Code 109 A.1 LUD Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 A.2 Crypt Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 A.3 Sparse Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 A.4 FFT Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.5 Euler Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 A.6 MD Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 A.7 Search Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 vi List of Tables 3.1 Characterization of Work ow Services. . . . . . . . . . . . . . . . . 39 4.1 Weights associated with LUDecomp resources. . . . . . . . . . . . . . 57 6.1 Suite of training programs. . . . . . . . . . . . . . . . . . . . . . . . 78 6.2 Error counts by category for both KADRE and Bauhaus. . . . . . . 87 vii List of Figures 1.1 Transistor count in commercial CPUs over time [Cor05, Lud08, Cor03]. 2 1.2 Capacity of spinning disk storage over time in Mb [Kom09]. . . . . 3 1.3 Cost of 1 Gb of spinning disk storage over time [Kom09]. . . . . . . 4 1.4 Percentage of Top500 list with cluster architecture [Top10]. . . . . . 5 1.5 Phases of the \in silico" process. . . . . . . . . . . . . . . . . . . . . 8 2.1 Generalized grid work ow management system [YB05]. . . . . . . . 27 2.2 Taxonomy of work ow management system from Yu and Buyya [YB05]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3 Typical work ow-level provenance cataloging system architecture. . 32 2.4 Typical task-level provenance cataloging system architecture. . . . . 33 2.5 Exception-handling framework proposed by Hwang and Kesselman. 35 4.1 A general program element model and hierarchical tree view. . . . . 50 4.2 Elided source code for LUD. . . . . . . . . . . . . . . . . . . . . . . . 56 4.3 LUD's undirected call tree. . . . . . . . . . . . . . . . . . . . . . . . 57 4.4 Anity matrix for LUD assuming even weighting. . . . . . . . . . . . 58 4.5 Reduced anity matrix for LUD after two clusterings. . . . . . . . . 59 4.6 Initial clustering produced by Bauhaus. . . . . . . . . . . . . . . . . 60 4.7 Three dierent clusterings of LUD. . . . . . . . . . . . . . . . . . . . 61 5.1 A DSSA for scientic work ow stages including connectors to work- ow services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 viii 5.2 Componentization of work ow services allows both asynchronous and synchronous communications. . . . . . . . . . . . . . . . . . . . 64 5.3 UML class diagram view of Prism-MW. Middleware core classes are highlighted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.4 Prism-MW's support for architectural styles. . . . . . . . . . . . . . 70 6.1 Three dierent clusterings of LUD. . . . . . . . . . . . . . . . . . . . 80 6.2 MoJo's 3 level containment tree model. . . . . . . . . . . . . . . . . 81 6.3 Clusterings X and Y of program P labeled. . . . . . . . . . . . . . . 81 6.4 Bar graph showing KADRE's accuracy during leave-one-out cross- validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.5 Euler's call tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.6 Bar graph showing KADRE's accuracy vs Bauhaus, a state of the art general purpose software clustering utility. . . . . . . . . . . . . 86 6.7 Runtime performance of KADRE. . . . . . . . . . . . . . . . . . . . 89 6.8 Performance impacts of SWSA of scientic kernels. . . . . . . . . . 91 ix Abstract Scientists today increasingly rely on computers to perform simulations of the phys- ical world that are impractical (in the case of drug discovery), impossible (nano- scale fracture dynamics) or unsafe (nuclear physics). This type of in silico exper- imentation has taken its place with in vitro and in vivo scientic experiments in helping scientists to understand the world around us. Scientic software systems, the code that has replaced the test-bench in the scientist's lab, is complex and quite costly to produce. As such, it represents a signicant investment on the part of computational scientists. In order to improve upon the state of the art in scientic software development methodologies, we must move from large, monolithic software systems to more modularized systems capable of being evolved and redeployed rapidly at a pace dictated by the evolution of the experiment rather than the rate of code development. The state of the art methodology today for developing scientic software sys- tems is to produce monolithic systems that, while performant, are dicult to understand, modify, and recongure, all necessary activities as the scientist evolves the scientic experiment that the code implements. This dissertation asks and answers the question of how software engineering can be leveraged to provide improved development support to scientists conducting in silico experimentation. x A modular software system, decomposed into software components with explicit communications through software connectors, is the heart of the software archi- tectural approach, and the rst, necessary step toward an architected scientic software system that supports evolution, replication, scaling and third-party vali- dation. This dissertation presents two thrusts of research developed to support the sci- entist performing in silico research. The rst thrust is to analyze existing scientic code in order to identify snippets that can be encapsulated and modularized. A domain-specic software architectural recovery technique, trained specically for scientic software, called KADRE, implements this approach. The second thrust of research presented in this dissertation is the implementa- tion of a domain-specic software architecture for scientic software that orches- trates these identied modules into a software system with explicit software archi- tecture that aids scientists to manipulate the software system at the level of the scientic experiment being conducted while at the same time providing support to the software engineer making decisions to improve the performance of the soft- ware system in term of its components and connectors. This architecture is called SWSA, or Scientic Work ow Software Architecture. xi Chapter 1 Introduction Software plays a vital role today in conducting science, allowing researchers to run powerful climate models [CBJ + 00], fold proteins for drug discovery [Dre00] and model the explosion of nuclear weapons without doing irreversible harm to the environment [Val98], to name a few achievements. While much focus, including conferences [Sup10], corporate involvement [Mic09, CGR05], and government funding [Hig10], has been on the high perfor- mance aspects of scientic software, recent studies [Wil09a, Seg09] suggest that there is a growing number of computational scientists working in all sectors on scientic code either individually or in small groups. This burgeoning middle class of computational scientists is made possible by the technological advances of the past three decades. Computational scientists today working in a university or government labora- tory have access to computational resources the likes of which technologists could not have dreamed about only thirty years ago. Not only are these scientists able to purchase small (and in many cases substantial) private computational clusters for themselves or their research group, but they also undoubtedly have access to larger departmental and university or institutional hardware. Recently, even elastic cloud platforms such as Amazon's EC2 environment [ama10] have become viable substrates on which to conduct scientic research. 1 1.1 Historical Trends This section explores the historical trends and contributions that have made this abundance of computational resources possible, including increases in speed of processors and capacity of spinning disk, decreases in cost for commodity hardware, the introduction of beowulf technology that transformed this commodity hardware into a computational cluster, and the advent of grid technologies that have allow for the growth of virtual organizations. 1.1.1 Historical Trends in Hardware Moore's Law [Moo65] states that the speed of computer processors will roughly double every 2 years. At rst driven by transistor counts (as illustrated in the exponential growth in Figure 1.1), the underlying technology behind this \law" is transitioning to multi-core processor technology. Figure 1.1: Transistor count in commercial CPUs over time [Cor05, Lud08, Cor03]. 2 Not only do consumers have access to much faster processors, spinning disk hard drives have seen a similar exponential growth in capacity. Figure 1.2 illustrates Kryder's Law [Wal05]. Figure 1.2: Capacity of spinning disk storage over time in Mb [Kom09]. Though the price-point of the fastest consumer processor has remained stable at approximately $1000US for the past decade, the cost of spinning disk has expe- rienced an exponential decline. In Figure 1.3, we can see that, while a gigabyte of hard disk would have cost the consumer hundreds of thousands of dollars in the early 1980's, this same storage costs pennies today. Not only have hardware trends allowed experimentalists in many scientic elds access to faster and cheaper hardware and storage, but a number of advancements in software infrastructure, including the emergence of commodity hardware clusters and grid computing platforms, have paved the way for in silico experimentation as well. 3 Figure 1.3: Cost of 1 Gb of spinning disk storage over time [Kom09]. 1.1.2 Historical Trends in Software Infrastructure Clusters, or multiple computers linked together via networking infrastructure that can be commanded, to an extent, as a single entity, have their origins in the depths of computing history. Indeed, in his book, In Search of Clusters, Greg Pster wrote, \Virtually every press release from DEC mentioning clusters say 'DEC, who invented clusters...'. IBM did not invent them either. Customers invented clusters, as soon as they could not t all their work on one computer, or needed a backup. The date of the rst is unknown, but it would be surprising if it was not in the 1960's, or event the late 1950's [P95]." The innovation that brought the cluster out of the realm of the computer science lab and into the physicist's, materials scientist's, and even biologist's, was the commodity cluster. In 1994, as CPUs were growing faster and hard disk was getting cheaper, Thomas Sterling and Don Becker of the Center for Excellence 4 in Space Data and Information Sciences (CESDIS), part of the NASA Earth and Space Science (ESS) project, built a cluster of 16 DX4 processors which they called Beowulf [Mer10]. Figure 1.4: Percentage of Top500 list with cluster architecture [Top10]. Ubiquitous at this point, cluster architectures have even gained a remarkable amount of the top-end computing market, comprising over 80% of the top 500 fastest supercomputers in the world (see Figure 1.4). In fact, the same underlying cluster technology drives the server farms at computing powerhouses like Facebook [Ham09] and Google [BDH03]. Originally, scientists interested in developing scientic software for a cluster had limited software support. Most software running on clusters in the later 1990's was either independently executing programs managed by a combination of scripts and batch processing software such as PBS [pbs10], or was developed as a parallel program with language-level support through language extensions such as MPI [mpi10] and PVM [pvm10]. 5 Perhaps the most important, and most controversial, evolution in software infrastructure support for computational sciences in the last decade has been Grid technology. Though the term rst appeared in the mid 1990s [FK99], Grid had certainly emerged as a new paradigm by the time Ian Foster, Carl Kesselman and Steven Tuecke of the Globus Alliance wrote their now seminal paper, The Anatomy of the Grid [FKT01], in 2001. In [Fos02], Foster brie y outlined the evolution of the denition of Grid, starting with his denition in [FK99]: "A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities." Over time, Foster, Kesselman, and Tuecke revised and rened their denition in [FKT01]:\The sharing that we are concerned with is not primarily le exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem-solving and resource-brokering strate- gies emerging in industry, science, and engineering. This sharing is, necessarily, highly controlled, with resource providers and consumers dening clearly and care- fully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions dened by such sharing rules form what we call a virtual organization." For the purposes of this dissertation, Foster, Kesselman, and Tuecke's denition of Grid from [FKT01] will be used. From this denition, one can see an important evolution in the software infrastructure supporting computational scientists. With the advent of Grid, scientists were no longer limited to the computational resources that their individual lab or university could provide; Grid provides a software vehicle for building large virtual clusters out of smaller physical clusters. 6 In the next section of this chapter, I will discuss the process of in silico exper- imentation, including its role in inductive reasoning about physical phenomena, how scientists conduct experiments in silico, and why reproducibility and scalabil- ity are key to successful experimentation. 1.2 The in silico Process In order to understand the role of the in silico experiment, it is rst important to establish the scientic process in which a computational scientist works. This eort includes identication of distinct phases of software development in in silico experimentation. Over the last century, scientists and students of the philosophy of science have argued that there is a role for probabilities and statistics in both deductive and inductive reasoning of scientic hypotheses [Gow97]. That is, scientists today tend toward a process of building evidence in order to reach a particular conclusion rather than designing a single, perfect experiment or arriving at a conclusion purely through logic. When deductive reasoning is paired with inductive reasoning, or experimentalism, one can then reach the strongest validation of a hypothesis. One sees this pairing of inductive and deductive reasoning used to build vali- dation of hypotheses today through in silico experimentation. Theory is codied as scientic software (i.e., simulation of physical phenomena under investigation, visualization tools, etc.), and experimental results are compared against observa- tion in an iterative eort to validate the theory. Like all scientic endeavors, in silico science has distinctive phases in which this iterative process is achieved. Shown in Figure 1.5, there are three phases of in silico science that mostly mirror the processes of in vivo and in vitro science. 7 These phases, which I rst introduced in[WMGM08], are discovery, production and distribution. Figure 1.5: Phases of the \in silico" process. Discovery in this context describes the phase of development in which algo- rithms and techniques are tested { the scientic \solution-space" is explored { and scientists arrive at a process which yields the desired result. This is what Kepner has described as the lone researcher process [Kep03]. Production is the engineering and scientic eort to reproduce the process established in the discovery phase on a large scale. Finally, distribution is a phase in which results of individual processes are shared, validated, and new research goals are formulated. Scientists conducting in silico experiments face a number of technical chal- lenges, including language and platform selection, algorithm development, data management, and the mechanics of the experiment (i.e., process orchestration, data provenance, etc). In the next section, as I explore the limitations of current methodologies for performing in silico experiments, including three specic activities that the in silico experimentalist will need to perform in the discovery and distribution phases of the in silico process: experiment repetition, validation and scaling. I will do so through the lens of the aforemetioned technical challenges. 8 1.3 Current Software Development Methods Computational scientists, working on their Grids and clusters, are attempting, through in silico experiments, to validate theories of physical phenomena that would be too costly or dicult to do in the physical world. In this section, I will discuss the current state of the art in software development strategies employed by computational scientists, specically focusing on the disconnects between the goals of the scientist with regard to their experiments and the existing techniques they use to develop their software. 1.3.1 Mythos of High Performance With all of the clusters, grid, and clouds at their disposal, and with much of the press regarding scientic breakthroughs focused on large teams of scientists sequencing genomes, simulating nuclear weapons, and exploring the nature of the universe, it is hard to separate this mythos of high performance from the realities of the computational scientist. Two recent studies, a large survey of computational scientists produced by Greg Wilson [Wil09a], and a number of case studies in a variety of scientic domains con- ducted by Judith Segal [Seg09], have suggested that these visions of computational science are not applicable to the average computational scientist. 9 Large Teams Despite the mythos of large virtual co-labratories of computational scientists (of which there certainly are a number of well documented examples such as [Pro08]), the majority of scientists report that they develop code of their own or in teams of two to three, and rarely do other scientists use the code they build [Wil09a, Seg09]. Bleeding Edge Hardware In his survey of nearly 2,000 scientists, Wilson asked the survey participants to list all hardware that they utilized in their day-to-day activities. Rather than list the latest exotic supercomputers or cloud computing environments, approximately 80% primarily use their own desktop machine for building and running scientic code, while only 20% utilize larger clusters and supercomputers [Wil09a]. Large Codebases Though many proponents of high performance scientic computing have popu- larized the notion of scientic programs as millions of lines of custom Fortran code (and again, there are a number of notable examples such as the Department of Energy ASCI codes [PK04]), over 80% of the scientists interviewed and surveyed by Wilson and Segal worked with much smaller codebases with no more than 5,000 lines of code [Wil09a, Seg09]. 1.3.2 State of Development Methodology The current architecture of most scientic code is most aptly described by Ronald Boisvert and Ping Tak Peter Tang in their preface to The Architecture of Scientic Software. \[T]he software architecture of a scientic computing application has typically been that the computation is eected not by just a single program but by the operation of a suite of related programs acting on a common database. With such a suite, the individual programs are structured from subprograms. Some 10 of these subprograms are obtained from libraries provided by various commercial suppliers, and some from public domain sources. A few subprograms will be unique to this suite of programs, representing the particular modeled science in the suite and the desired sequences of operations to be performed [IFI01]." Put another way, the state of the art in software development for scientic systems is much like the state of the art in general software development forty years ago. Scientists, working by themselves or in small groups, produce software to conduct experiments in an ad hoc manner. Orchestration of dierent programs and subroutines is implemented as part of a monolith system or set of software scripts maintained by the scientist. Data management and experiment tracking are seldom emphasized [Bra04a]. If a scientist produces a useful bit of new software, she gives it directly to her immediate community or creates a software library that packages the code for general use. As an observation, even the software language of choice, Fortran, is more than forty years old. There are a number of reasons for the lack of good software engineering in the current development practices of computational scientists. Firstly, and perhaps most importantly, scientists are not software engineers, but domain experts. The nature of the software they are developing is such that it requires deep domain knowledge [Seg09]. Owing not only to the complexity of the software being devel- oped, but also the lack of funding for developing good software (scientic research, even that based on in silico experimentation, is ndings-driven), the nature of scientic competition, and the race to publish, scientists are often unmotivated to support other users beyond themselves. Additionally, scientic software enjoys a longevity not often experienced in other domains. Unlike so-called business logic, where the nature of the activity reied in software evolves at a pace in-line with or ahead of software engineering, 11 scientic principles change on a much longer timeline. This does not negate cutting edge research, though it does encourage scientists to reuse long-lived libraries that implement basic scientic and mathematical concepts on which their new research is based. 1.3.3 Shortcomings The contrast between the reality of developing scientic software today and the goals of the computational scientist conducting in silico experimentation is stark. In order to validate experiments and appropriately apply the scientic method, there are a number of areas of in silico experimentation, especially in the afore- mentioned Discovery and Distribution phases, in which we must improve upon the state of practice. This dissertation will focus on software support for three such activities: scaling the experiment, repeating the experiment, and third-party validation. Unlike other potential improvements to the state of the art in in sil- ico experimentation, these three activities are of primary interest because they represent signicant engineering eorts and so involve both scientist and engineer stakeholders. Scaling the Experiment Because in silico experiments attempt to validate researchers' theories of the physical world, conducting many runs of the experiment with dierent parameters (in the process, gaining understanding the complex interactions of multiple phe- nomena) and scaling the experiment to simulate a larger portion of the physical world (with the goal of producing results veriable through traditional means of observation) are important aspects of in silico experimentation. For example, computational scientists working to model complex chemical interactions in the upper atmosphere as part of their research in climate science 12 produce data for much ner physical grids than the current state of the art high res- olution spaced-based observational platforms. A challenge for computational scien- tists in this domain is to be able to simulate regional and global trends observable from space rather than the micro-simulations they are current capable of running 1 . Repeatability A major component of all experiment-based scientic methodology, in silico or otherwise, is repeatability. A recent paper on the digital library techniques being employed by the German Research Foundation (DFG) and the German National Library of Science and Technology (TIB), even cites discussion of possible falsication of scientic results as a major reason for supporting repeatability of computational experiments [Bra04a]. There are two thrusts of repeatability with which this dissertation is concerned in this dissertation: (1) support for access to raw data and, (2) capturing the conguration (or orchestration) of the experiment at a level of granularity and in a form that allows scientists to essentially \replay" the experiment. Third-Party Validation Though closely related to repeatability, third-party validation of the experiment introduces a number of new challenges for the computational scientist, including capturing orchestration in such a way as to allow other scientists to not only rationalize about it, but to actually conduct the experiment themselves. This was recently listed as a grand challenge for the scientic work ow community [GDE + 07]. Another challenge introduced by the third-party requirement is the portability of code utilized in the in silico experiment to other computational platforms, be 1 This challenge was rst discussed by the author with scientists working at NASA's Jet Propulsion Laboratory working to bridge satellite observations for the next generation of Inter- governmental Panel on Climate Change (IPCC) reports. 13 they clusters built with dierent hardware, grids, or even cloud environments. Partitioners of the current scientic software methodology described above are especially challenged by this requirement when working with scientic software libraries which have been compiled for a specic target environment or set of environments and do not support others 2 . Portability also emerged as a concern of the scientic software teams studies in [CKSP07]. 1.4 Research Agenda In this section, I will discuss the research agenda for this dissertation, including a succinct statement of research and related hypotheses that will be evaluated, the assumptions and attacks to validity of this research, and the original contributions to the the state of art made by this dissertation. 1.4.1 Research Statement and Hypotheses The state-of-the-art in software development methodology for scientic computing lacks support for many activities that are part of the production and distribu- tion phases of in silico experimentation, including the scaling of experiments, the repeatability of experiments, and support for third-party validation. Furthermore, good software engineering practices are not a motivation of computational scien- tists. Scientists are focused on, and funded by, producing good scientic ndings and not on producing good software. 2 A notable example of this problem was discussed by the author during a recent collaboration between a current NASA sponsored mission, UAVSAR (Uninhabited Aerial Vehicle Synthetic Aperture Radar) and a project at the Jet Propulsion Laboratory called the Airborne Cloud Computing Environment. Porting UAVSAR code from SGI Altix machines to Amazon EC2 instances is currently estimated to take over 2 person-years of software re-development. 14 Additionally, despite a wealth of computational resources are their disposal, most computational scientists conducting in silico experiments do not use the grids, clusters, and clouds on which much of the latest research in software engineering is focused. Though scientists have been slow to adopt grid technology, there is much in grid to support the activities of production and distribution, including data and metadata management systems which form distributed data repositories, work ow systems that create models of the orchestration of an in silico experiment, allowing users to both rationalize about the experiment and also to rerun work ows as part of validation activities, and virtualization of computational resources, allowing for the smooth platform migration required for third-party validation. Finally, a successful collaboration between scientists with deep domain knowl- edge and software engineers with appropriate technical expertise is key to suc- cessfully transitioning scientic software systems from the discovery phase to the production and distribution phases. In [CKSP07], a computational scientist sur- veyed made this same observation: \In these types of high performance, scalable computing [applications], in addition to the physics and mathematics, computer science plays a very major role. Especially when looking at optimization, memory management and making [the code] perform better You need an equal mixture of subject theory, the actual physics, and technology expertise." These observations have led to the following research question: How can soft- ware engineering, and specically software architecture [TMD09, PW92], be lever- aged in order to provide improved development support to scientists conducting production and distribution activities as part of in silico experimentation? My thesis is that a domain-specic software architecture, or DSSA [Tra95, HR + 95], for grid-based scientic software could provide a separation of concerns 15 between the computational scientist, who has deep domain knowledge and must validate the science being conducted, and the software engineer who can be con- cerned with the engineering associated with scaling experiments, management data products, and orchestrating scientic code through work ows. My hypotheses, which I aim to validate during the discourse of this dissertation, are: Hypothesis 1: Software modules can be accurately identied algorithmically in existing scientic software that not only encapsulate code in a functional way, but also reify a single step in the scientic process being employed at the conceptual level. Accuracy in this context will be dened as 80% correct identication when compared to expertly-identied software modules as measured by a quality metric develop by Tzerpos and Holt [TH99]. Hypothesis 2: An ecient (in run-time) algorithm exists to automatically identify the software modules as specied in Hypothesis 1. Ecient run-time will be dened as polynomial complexity. Hypothesis 3: A scientic software system that is composed of these identied modules, properly componentized with explicit interfaces, performs correctly and within acceptable bounds in both computation time and memory footprint when compared to its equivalent monolithic software system. Acceptable in this context will be within 10% of the original computation time and within 50% of the original memory footprint. To this end, I have developed two major research platforms described in detail and evaluated in Chapters 4 and 5. The rst platform is called KADRE { Ker- nel Analysis, Decomposition, and Re-Engineering { a domain-specic architec- ture recovery approach and toolset to aid automatic and accurate identication of potential SWSA components in existing scientic software. The second platform 16 is the Scientic Work ow Software Architecture, or SWSA, which is comprised of a domain-specic software architecture for scientic software and a reference implementation in Java. 1.4.2 Assumptions and Threats to Validity Throughout this research, I have made a number of assumptions. Specically, I assume language agnosticism with regard to scientic code. All tools presented in this dissertation work with Java programs and, as such, the evaluation suite is comprised of scientic code written in Java. Though other languages, such as Fortran and C, are more prevalent in the scientic domain, this approach is, by and large, language independent. Where language agnosticism has been violated, it is both documented and parameterized appropriately (as is the case with the feature extraction used by KADRE to perform static analysis). Additionally, I assume that collaboration between scientists, as domain experts, and software engineers is key to successfully developing scientic applications and as such, these parties are not excluded from any tools or techniques developed in this dissertation. The ramication of this decision is that the decomposition of existing scientic software is with a human-in-the-loop rather than using a fully-automated technique (a domain expert makes all nal decisions regarding decomposition). Outside of the threats to validity introduced by the scope of this project, the dominant source of threats to validity with this approach is the training set utilized in our supervised learning algorithm. Like all such techniques, a threat to the validity of this approach is the quality of representation by the training set of real-world scientic algorithms. 17 Also, there is a distinct lack of existing \gold standards" for proper decompo- sitions of monolithic source code into component-based structures for the domain of scientic software. In Chapter 6, I will address my approach to overcoming this challenge, which involves creating a custom reference suite. 1.4.3 Contributions to the State of the Art This dissertation contributes to the state of art in developing scientic software in a number of crucial ways. Primarily, this dissertation is targeted at impacting the development of scientic software by aiding scientists in developing modularized software in which modules represent distinct steps in the scientic process they are employing. This type of development approach yields a number of advantages over the current state of practice, including support for module-based reuse, targeted evo- lution of the scientic process, code porting and deployment reconguration that is targeted to specic modules rather than monolithic codebases, and a separation of concerns between scientists and software engineers. By providing a domain-specic software architecture that aids this separation of concerns between scientists and software engineers, the tools developed in this dis- sertation increase scientists' ability to scale experiments, repeat experiments, and utilize third-party validation as a means of improving transparency and increasing condence in the resulting work. Targeted porting allows scientists conducting in silico research to utilize the latest computational grid platforms in their research. Finally, module-based software reuse is more ecient that the current state of prac- tice (low-level library-based reuse), improving scientists' productivity in developing and evolving in silico experiments. 18 1.5 Organization of the Dissertation The rest of this dissertation is organized as follows: In Chapter 2, I will discuss related work, including research into software architectures, computational plat- forms, and previous support for in silico experimentation. Chapter 3 delves into the motivations behind my domain specic software architecture and outlines my technical approach. In Chapters 4 and 5, I discuss my two research platforms, SWSA and KADRE. Chapter 6 will show my evaluation of each of the hypotheses listed in this chapter. Finally, I will conclude with a chapter on contributions, scientic conclusions, and future work. 19 Chapter 2 Background and Related Work In this chapter, I will discuss work related to this dissertation in two major areas of study: (1) Software design, specically focusing on software architectures, and (2) current eorts of computational scientists to support scientic software devel- opment. Finally, I will discuss a number of examples of component-based software engineering applied to scientic software, including existing software systems that support in silico experimentation and their limitations. 2.1 Software Architecture Software architecture captures the principal design decisions behind a software system and is often represented in terms of components (units of computation), connectors (interactions amongst components) and congurations (arrangements of components and connectors and the rules that guide their composition) [TMD09, PW92]. There are a number of key features of software architecture that make it a pow- erful tool to address the challenges of scientic software development. Specically, software architects believe that all systems have an architect and all system have an architecture. By making the architecture of scientic software systems explicit, we can aid both scientists and software engineers developing scientic software to share a common domain vocabulary, as well as a shared understanding of the design of the system. 20 Additionally, a central tenant of software architecture is the principle of faithful implementation. This principle suggests that a software system should be built to an architectural specication that is well documented and that both the software and the architecture (including its documentation) evolve over time to re ect the state of the software system. This principle aids the scientist in evolving their software over time as well as conveying their software design (and potentially their experiment design), to other people. Software architecture exploits a number of existing design principles that are also relevant to scientists and software engineers developing scientic software. These design principles include separation of concerns, isolation of likely change, and explicit identication of control and data ow in software. There have been two largely disjoint paths of research in the eld of software architectures: one path has focused on the design issues, formal foundations, and analysis of architectures, while the other has resulted in technologies for implement- ing software architectures. The rst approach has focused on architectural styles, formal architecture description languages (ADLs) and their supporting tools. 2.1.1 Architecture Description Languages To date, most architectural tools have focused on the simulation and analysis of architectural models to exploit the semantic power of ADLs. At the same time, insucient progress has been made on supporting implementation of applications based on styles and ADL models. The second approach to software architectures has focused on providing software frameworks, often through object-oriented (OO) reuse techniques such as design patterns and object hierarchies. However, software implementations resulting from such use of frameworks often dier widely from 21 conceptual models and lack adequate semantic underpinnings for analytic pur- poses. Three architectural description languages that have focused on linking con- ceptual architectural models to software implementation are ArchJava, Aura, and Aesop, each described below. ArchJava [ACN02] is an extension to Java that unies software architecture with implementation, ensuring that the implementa- tion conforms to architectural constraints. Due to ArchJava's lack of support for enforcing topological constraints, it is not possible to capture a software system's architectural style in ArchJava. Aura [SG02] is an architectural style and supporting middleware for ubiquitous computing applications with a special focus on user mobility, context awareness, and context switching. Aura has explicit, rst-class connectors. Aura also provides a set of components that perform management of tasks, environment monitoring, context observing, and service supplying. However, Aura does not provide support for specifying a system's architectural style, nor does it support creation of new architectural styles supported by the middleware. Aesop [GAO94] was an early toolkit for constructing open, architectural design environments that support architectural styles. Aesop makes it easy to dene new styles and then use those styles to create architectural designs. Aesop's goal is to provide the architects with a visual environment that guides them in creating the system's architecture, but stopped short of providing implementation-level support. 2.1.2 Domain-Specic Software Architecture A domain-specic software architecture is a codication of domain knowledge in a software architecture that is applicable to software systems developed for a given 22 application domain [Tra95]. Hayes-Roth et. al., extended this denition to include not only a reference architecture, but other reusable artifacts including reference components capturing domain knowledge and conguration methods for selecting these components and conguring the reference architecture [HR + 95]. Like patterns and frameworks, domain-specic software architectures codify design knowledge. When working in a specic, well-understood domain like scien- tic software simulations, domain-specic software architectures allow the devel- oper to leverage more existing design knowledge than with other approaches. It is because of this increased level of support that domain-specic software architecture is applicable to the problem of supporting scientic software development. There are a number of elements of domain-specic architectures that have been developed, including domain modeling, domain requirements, and domain reference architectures [TMD09]. With domain modeling, one codies the specic language of the domain, including its constituent elements. This is particularly applicable to the domain of scientic software as the language of the domain is the experimental process rather than a software development-centric vocabulary. Con- rming the meaning of elements of the architecture is important in order to allow all parties to the development of scientic software to understand one-another. The second aspect of domain-specic software architecture are domain require- ments. Capturing the engineering concerns regarding the performance of production-style in silico experiments as a set of domain-specic software require- ments also aids in the communication of goals to all stakeholders in scientic software development. Finally, reference architectures, including implementation of reference compo- nents, allows developers in the domain to convey appropriate design elements to others. In the process, components that can fulll canonical needs in the domain 23 can be developed a priori of the development of a specic system and leveraged in order to reduce the time and complexity of the creation of in silico experiments. 2.1.3 Architectural Recovery Most scientic software is long-lived and has a lengthy development timeline, as studied in [CKSP07]. As a result, scientic software is seen as a great commodity and, ultimately, as a capital investment. Any technique proposed to develop better scientic software should, therefore, allow for the reengineering of existing software. To that end, software architectural recovery is the process of elucidating an existing software system's architecture from its source code, available documentation and other artifacts (e.g., runtime behavior) [DP09, MJ06]. There are several dierent automatic approaches to general architectural recov- ery, most of which result in partial architectural descriptions [DP09]. These partial descriptions are developed utilizing feature extraction from both static analysis and runtime analysis, and agglomeration techniques such as clustering [MB07]. The fundamental notion in software clustering is discerning a similarity between elements of the input source code. Similarity-based software clustering techniques provide a means of grouping related program entities (procedures/methods, source code, variables, and other programming language level constructs) based on their relationship to one another. Approaches to automatic clustering have leveraged genetic algorithms [DMM99], latent semantic analysis of source code [MV99], and software evolution and change [BN05], while the semi-automatic techniques have focused on metrics [GKS97] and development of a means of assessing a cluster's quality [KE00]. Maqbool and Babri recently reviewed hierarchical clustering techniques for soft- ware architecture recovery and provided an analysis of the behavior of various 24 similarity and distance measures [MB07]. As part of their conclusion, the authors emphasize that the quality of clustering depends on the characteristics of the ana- lyzed software systems. Domain-specic recovery methodologies [HH00, M + 09b] that utilize human interpretation to cluster source code into more highly-rened architectural models [MJ06, MM01] have emerged in recent years. These methodologies recover domain- specic architectures utilizing domain knowledge such as source code, domain spe- cic terms, system documentation, and experience of architects who have built systems within the particular domain. 2.1.4 Software Connectors An area of software architecture research that touches on reconstruction of soft- ware systems following software architectural tenants is the utilization of explicit software connectors, as described and taxonomized by Mehta, et. al. [MMP00], to modularize existing software (via wrappers) and to make communications explicit in software. Mattmann, et.al., [Mat06] studied the composition of connectors as an approach to imbue connectors with complex functionality. Spitznegal and Garlan studied wrappers, a connector capable of bridging inter- faces between existing software and specied innterfaces, in [SG03], but formalized the approach using six dimensions of functionality largely orthogonal to scientic software concerns (such as security). Of particular interest in the software connector literature is the exogenous con- nector model proposed by Lau, et al., in [LOW06a]. In the exogenous connector model, computational units (components) are invoked via exogenous invoking con- nectors which not only communicate data but also control ow to the module. No modules are capable of calling other modules directly, but rather, control ow is 25 isolated in a hierarchically composed set of pipe and branch connectors. These connectors can mimic any control ow found in structured programs and so pro- vide they type of communications patterns required of the modules specied in the rst hypothesis of this dissertation (see 1.4.1). 2.2 Support for In Silico Experimentation In addition to general research in the area of software architecture, it is also impor- tant to discuss relevant literature in the area of scientic software. I will focus this literature review on software-based support of the in silico process, including areas of work ow systems, provenance support, and fault handling, each of which are necessary capabilities to support scaling and repeating experiments, as well as facilitate third-party validation (three areas of concern from Chapter 1). 2.2.1 Work ow Systems A useful construct that has recently been applied to the scientic computing domain to help orchestrate in silico processes is the work ow model. Work ows are a construct originating in the business processing community. The basic ele- ments of work ows are actors, tasks, data and rules [YB05]. A work ow species which actors initiate which tasks to do processing of particular data once rules have been satised. Actors can be other tasks as well, so work ows are very use- ful when specifying a high-level process with many steps, alternate paths, data dependencies, etc. In business processing, there are many instances of human initiated tasks, such as interviews conducted by loan ocers or workers retrieving records from a paper archive. In the scientic community, however, work ow developers have tended to 26 remove actors from the work ow model, further simplifying it to tasks, rules to apply to the task, and dependencies between tasks in the form of data. Throughout this dissertation, when I use the term work ow system, I will be referring to systems providing an explicit, language independent processing model tting the work ow paradigm of tasks, data, and rules. The generalized work ow management system shown in Figure 2.1 shows that in this denition of work ow, a model is developed by the user and interpreted by a work ow engine which executes applications using a grid infrastructure. Work ows models are specied in some form a graph (Directed Acyclic Graphs and Petri Nets are both used extensively, as well as UML sequence diagrams to a lesser extent) in which tasks form nodes and data dependencies are edges between tasks. This type of static representation is often augmented with a dynamic state machine view, capturing tasks as states and rules as conditions for transition. Grid Users Grid Workflow Application Modelling & Definition Tools Grid Workflow Specification Workflow Design & Definition BUILD TIME RUN TIME Grid Workflow Enactment Service Workflow Scheduling Data Movement Fault Management workflow change Grid Middleware Grid Resources Workflow Execution & Control Interaction with Grid Resources Resource Info Service Application Info Service Grid Information Services Interaction with Information Services Figure 2.1: Generalized grid work ow management system [YB05]. 27 Work ows have been recently applied at NASA's Jet Propulsion Laboratory in the development of ground data systems [M + 09a]. In addition to the rigid processing pipelines in which data is transformed using a number of processing components sequentially, work ows have allowed JPL scientists to execute alter- native paths, deploy to multiple dierent underlying computing environments, and integrate more dynamic execution. Work ow systems have been studies extensively in literature. Yu and Buyya presented the most comprehensive review of existing work ow managements sys- tems to date in [YB05]. This taxonomy is recreated in Figure 2.2. In Yu and Buyya's work ow taxonomy, work ow systems are dierentiated by work ow repre- sentation (XML, petri nets, etc.), scheduling algorithm, and fault-tolerance strate- gies, to name a few dimensions of comparison. Work ow System Example: Wings An example of a work ow system that has gained traction in the scientic software development community is Wings [K + 07, G + 07]. Wings represents work ows using semantic metadata properties of both processing tasks (which it calls components) and datasets, represented in Web Ontology Language (OWL). Wings uses (1) work ow representations that are expressed in a manner that is independent of the execution environment and (2) the Pegasus mapping and execution system that submits and monitors work ow executions [K + 07]. In addition, Wings has constructs that express in a compact manner the par- allel execution of components to process concurrently subsets of a given dataset [G + 07]. Codes are encapsulated so that any execution requirements (such as tar- get architectures or software library dependencies) as well as input/output dataset requirements and properties are explicitly stated. Code parameters that can be 28 Grid Workflow Management System Workflow Design Information Retrieval Workflow Scheduling Fault Tolerance Data Movement DAG Non-DAG Sequence Parallelism Choice Sequence Parallelism Choice Iteration Workflow Structure Abstract Concrete Workflow Model/Spec. Workflow Composition System User-directed Markup Others Petri Net UML User-defined component Automatic Language-based Graph-based Workflow QoS Constraints Time Cost Fidelity Reliability Security Infrastructure-related Config.-related QoS-related Access-Related User-related Resource-related Execution-related Market-related Static Information Historical Infomation Dynamic Information Centralized Hierarchical Decentralized Architecture Local Global Descision Making User-directed Simulation-based Prediction-based Just-in-time Static Dynamic Planning Scheme Performance-driven Market-driven Trust-driven Scheduling Performance Simulation Analytical Modeling Historical Data On-line Learning Hybrid Retry Alternate Resource Checkpoint/Restart Replication Alternate Task Redundancy User-def. Handling Rescue Workflow Task-level Workflow-level Centralized Mediated Peer-to-peer Automatic User-defined Figure 2.2: Taxonomy of work ow management system from Yu and Buyya [YB05]. used to congure the codes (for scientists, these may correspond to dierent models in the experiment) are also represented explicitly and recorded within the work ow representation. 29 A model of each code is created to express all such requirements, so they can be used exibly by the work ow system as work ow components. Component classes are created to capture common properties of alternative codes and their congurations. The prescribed methodology for designing work ows, encapsulat- ing components, and formalizing metadata properties is outlined in [GGCD07]. Using these high-level and semantically rich representations, Wings can reason about component and dataset properties to assist the user in composing and val- idating work ows. Wings also uses these representations to generate the details that Pegasus needs to map the work ows to the execution environment, and to gen- erate metadata descriptions of new datasets that result from work ow execution and that help track provenance of new data products [K + 07]. Work ow System Example: OODT Work ow Manager Object Oriented Data Technology (OODT) is a data and computational grid frame- work developed at NASA's Jet Propulsion Laboratory and has been used in a num- ber of science data systems as documented in [M + 09a]. OODT consists of three major component: File Manager (responsible for cataloging and archiving data products ingested into and created by the system), Work ow Manager (a compo- nent that manages the dierent data generation and manipulation tasks initiated by data ingests and/or science users), and Resource Manager (the component that executes work ow tasks on heterogeneous computational hardware). OODT's Work ow Manager models work ows as a set of consecutively run work ow tasks, each of which the Work ow Manager translates into a job that the Resource Manager then tracks throughout its execution. Dependencies between jobs and also of datasets are modeled with task preconditions. 30 2.2.2 Work ow Services Like work ow systems, work ow services have developed over time to address the needs of computational scientists. Work ow services provide non-scientic func- tionality to work ow practitioners. For example, the scientist can use a provenance tracking service to record the sequence of work ow stages and input/data param- eters used at each stage to repeat the experiment at a later time. In production environments, these services provide additional engineering functionality such as a fault detection service that can monitor abnormal work ow stage exit codes and annotate associated datasets appropriately. I will formally dene a work ow service as follows: Work ow Service: A cross-cutting aspect of a scientic work ow not primarily scientic but rather supportive of the experiment that is encapsulated as a service callable by a work ow engine and/or work ow stages. In the rest of this section, I will explore a number of representative examples of provenance tracking and fault handling work ow services in order to illustrate existing support for in silico experimentation. Grid Services Example: Provenance Bose and Frew [BF05] present a survey of provenance systems in which they make a dierentiation between systems in which the provenance mechanism is provided in the work ow engine and provenance services which operate independently of the work ow manager and provide querying capabilities to users. The two basic architectures surveyed by Bose and Frew are shown in Figures 2.3 and 2.4. At the work ow level, services record information provided directly by the work ow engine rather than interface to the individual tasks in the production 31 system. Eder, et. al. have suggested that the work ow processing logs themselves can be queried for the information required of many users interested in provenance information [EOG02]. Szomszor and Moreau have proposed a hybrid architecture in which provenance information is provided to the provenance recording service using work ow logs for provenance and can be queried by work ow tasks via a web service interface [SM03]. Workflow Engine Provenance Service Provenance Store Tasks Figure 2.3: Typical work ow-level provenance cataloging system architecture. Work ow-level provenance services require traceability to individual data prod- ucts generated by the production system. Digital library techniques such as DOI have been proposed as a method to track data sets as a facilitator for provenance studies [Bra04b]. Figure 2.3 shows a typical work ow-level provenance mechanism architecture. Semantics are a common mechanism for resolving the relationship between these identiers and work ow conguration. The Stanford Knowledge Prove- nance Infrastructure [dSMM03] is a notable semantics-based provenance query 32 system from the Semantic Web domain. Semantic Grid [KAB + 05, SGPHGCGP06] has expanded the notion of a semantics-based provenance system. Taverna is a provenance service developed for the myGrid project that provides semantic-based provenance querying [ZGS06, ZWG + 04]. With task-level provenance, task programmers are often responsible for inter- facing to the provenance recording service directly. Notably, Groth, et. al. have proposed using services to record provenance using a shared protocol and securely validate provenance [GLM04, GMF + 05, MCS + 04]. In their architecture, the task programmer is responsible for providing provenance information in the required format. Another interesting task-level provenance recording system is Chimera and its later incarnation as the Virtual Data System (VDS) [FVWZ02]. VDS provides not only an interface to perform provenance queries, but also catalogs data. Workflow Engine Provenance Service Provenance Store Tasks Figure 2.4: Typical task-level provenance cataloging system architecture. 33 Figure 2.4 shows a typical task-level provenance mechanism. In this architec- ture, tasks are often wrapped in order to provide data to the provenance service without modifying the original science code. Wrappers are utilized by a number of task-granule provenance services to record appropriate provenance-related information. The Earth System Science Server (ES3) [Fre04, FB01], a predecessor of a number of the mission science systems at JPL, uses metadata and a Metadata Management component to record data prove- nance. The ES3 uses wrappers around science \scripts" to record provenance meta- data. The Karma provenance framework [SPG06] is another provenance recording service that monitors an event bus. Metadata about which process created a data product is published to the event bus by wrappers around Fortran executables. Grid Services Example: Fault Handling Gartner in [Gar99] oers a discourse on the study of fault tolerance in which he oers safety and liveness as the critical dimensions of fault-tolerance and that there are four dierent types of fault-handling systems: masking (safe and live), non-masking (live, not safe), fail-safe (safe, not live), and none (neither safe nor live). Fault tolerance at the work ow level has been developed in a number of existing work ow systems. It is a very common design feature of most work ow systems that each work ow task return and exit code to the manager [Luo00]. In practice this exit code is limited to a binary determination of success or failure of the task rather than any sort of user dened exit status, making these services non-masking. Abawajy suggested that fault-tolerance be part of the scheduling policy for grid resources and that multiple replicated tasks be deployed in an attempt to mask 34 ( N o t i fi c a t i o n l i s t e n e r ) G r i d c l i e n t A P I N o t i fi c a t i o n l i s t e n i n g p o r t G r i d g e n e r i c s e r v e r L o c a l r e s o u r c e m a n a g e m e n t s y s t e m t a s k H e a r t b e a t m o n i t o r p o l l ( u n ) r e g i s t e r Figure 2.5: Exception-handling framework proposed by Hwang and Kesselman. faults [Aba04]. Another masking work ow-level service is provided in the Exot- ica business work ow system at IBM [AAA + 95, AKA + 94] supports component replication and persistent messaging. At the task level, a number of services rely on wrappers and local job managers to provide information about the execution of the task to fault-handling services listening for explicit errors and implicit faults via heartbeat monitors. For example, Condor-G [FTL + 02] uses a mobile sandboxing environment to trap system calls and communicate via heartbeat monitors. A system that has advocated intra-task error handling in conjunction with a grid work ow management system is Gridwork ow [HK03a, HK03b] seen in Figure 2.5. The authors of this system have advocated for user-specic fault handling semantics. They provide an event notication API that users can access in order 35 to notify the grid work ow system of specic failures. From the real-time grid domain, Jin et. al. have provided task programmers with fault-handling libraries [JCC + 04, JZC + 03]. In each of these systems, the task author is responsible for intra-task fault handling. 2.3 Component-Based Scientic Software In addition to work ow systems and associated services, there are a number of examples of software systems that combine the notion of software components and software architecture with support for scientic applications. Component-based techniques have focused on language interoperability [EKK01], object-orientation in support of mathematical constructs [TAL + 01], and improved approaches of library integration [GL01]. One initiative of note is the Common Component Architecture (CCA) [A + 06]. Developed by a consortium of universities and national laboratories, CCA seeks to dene a common component interface and framework that will allows scientic developers to reuse code that has been writ- ten to the standard. While component-based software development has been helpful for system developers, it suers from certain pitfalls including architectural mismatch [GAO95]. Additionally, each of these component-based techniques have focused specically on high-performance scientic computing, a sub-domain in which prac- titioners favor eciency and performance over general usability of code. Because of this choice of audience, each of these techniques has not found widespread adop- tion. 36 Chapter 3 Approach As described in Chapter 2, work ow systems have great potential for scientic computing, though current work ow services that might support scientists' engi- neering concerns utilize a number of integration patterns and mechanisms. In order to eectively utilize work ow systems and services to compose scientic software, not only must scientists be able to access work ow services via a standardized mechanism, but also integrate their code at a level of granularity appropriate for the work ow (i.e., task level, work ow-level, etc.). In this chapter, I will rst characterize in greater detail existing work ow inte- gration mechanisms and also oer the reader greater understanding of work ow granularity. From there, I will describe an approach to support scientists in inte- grating their scientic software into existing work ow systems and how they can utilizing existing work ow services to address engineering concerns in the software while keeping these concerns separate from the science being conducted. 3.1 Characterization of Work ow Services Though a comprehensive listing of all work ow services available today is beyond the scope of this dissertation, in this section, I present a set of representative services, describing both access mechanisms (the method with which the developer calls the provided functionality) and expected interaction. Interaction, in this case, 37 is the granularity at which the service operates and expects data (for example, at work ow level or the work ow stage level). Table 3.1 summarizes this information. 3.1.1 Example Services Chimera: Chimera is a work ow system that provides provenance recording at the work ow specication level [FVWZ02]. Both the work ow model and provenance information are captured using the Virtual Data Language (VDL). Provenance is implicit in the execution of work ows, and so no provenance information is captured at the work ow stage level. Chimera has recently been rolled into the Swift project, though provence support in Swift is presently in development. Grid-WFS: Grid-WFS is an event-based generic fault detection service devel- oped by Hwang and Kesselman and is built on top of the Globus toolkit [HK03a, HK03b]. Grid-WFS separates failure handling policy from application logic and provides a mechanism for users to specify custom error handling poli- cies. Because Grid-WFS can handle both work ow stage-level crashes as well as user-dened failures, it operates at both the work ow and work ow stage level. Karma: The Karma Provenance Framework is a provenance service that sub- scribes to an event bus onto which provenance events at the work ow and work ow stage level are published [SPG06]. Work ow stage events are the responsibility of the application developer to produce. The framework was developed for the Linked Environments for Atmospheric Discovery (LEAD) project. Lab Notebook: The Lab Notebook is part of the Earth System Science Server, or ES3 [Fre04]. The Lab Notebook records provenance information at the work ow stage level and presents an API that is used by script-based invoking wrappers around scientic code to generate XML-based metadata events. These events are in turn recorded by the Lab Notebook database. 38 Table 3.1: Characterization of Work ow Services. Service Description Workflow System Access Mechanism Interaction Chimera Chimera is a work ow engine that uses VDL to model both work ows and work ow-level provenance information. Chimera Implicit Recording Work ow Level Grid-WFS Grid-WFS is a work ow service that records faults and can execute custom fault handling based on user specica- tions. Globus Toolkit Web Service Work ow & Task Level Karma The Karma Framework is an event- based provenance recorder that listens on an event bus for provenance record- ing events from both a work ow engine and work ow stages. Work ow System Independent Event-based Work ow & Task Level Lab Notebook A component of the Earth System Sci- ence Server, the Laboratory Notebook records provenance as metadata at the work ow level. Earth System Science Server (ES3) API Task Level OODT Prole Server The OODT Prole Server is a data grid service that provides transparent access to disparate data repositories. Work ow System Independent Web Service Work ow & Task Level PReServ PReServ is a web service based prove- nance recorder that uses the PReP pro- tocol to record provenance information independent of the work ow system. Work ow System Independent Web Service + Custom Protocol Task Level VisTrails VisTrails is a work ow specication environment and execution engine that records work ow level provenance to track work ow evolution over time. VisTrails API Work ow Level 39 OODT Prole Server: The OODT Prole Server is a data-grid resource dis- covery service that allows transparent access to disparate data repositories [M + 06]. It is web-service enabled (the server provides an XML-RPC interface) and can operate at both the work ow level and the work ow stage level. PReServ: PReServ is a web services-based provenance recording service that implements a custom provenance notation in a protocol called PReP [GLM04, GMF + 05]. Recording of provenance is entirely separate from the execution of the work ow, so PReServ can be used with multiple work ow engines as a standard for establishing provenance, though it is the responsibility of the work ow stage developer to manage all communications with the service. Provenance in VisTrails: VisTrails is a work ow specication and execution environment specically developed for scientic visualization work ows [CFS + 06]. VisTrails provides a provenance service that automatically captures provenance and versioning information about the work ow instance, tracking the evolution of the work ow over time. This provenance service operates at the work ow level and is provided with information via the VisTrials work ow engine, so only a query API is exposed to the user. 3.1.2 Interaction Mechanisms As shown in Table 3.1, the several work ow services described in this section inter- act with scientic software via a number of dierent mechanisms. In order to utilize Chimera's provenance mechanism, the scientist must explicitly record information in the work ow itself utilizing a custom work ow specication. Similarly, the sci- entist interested in using PReServ to track provenance must write events in the PReP protocol. Other mechanisms are web-services and traditional APIs. 40 A signicant challenge to the scientist interested in utilizing these services to to understand each of the dierent interaction mechanisms for each service as well as understand the compatibility between these services and the work ow system they have chosen to use. 3.1.3 Work ow Service Granularity In addition to understanding the mechanisms of interactions required to utilize the service, scientists must also understand how the granularity of work ow tasks in a scientic work ow eects the utilizing of these services. Some work ow services, such as VisTrails, record scientic provenance by analyzing the work ow specica- tion and do not require the user to make any eort beyond work ow specication itself. Unfortunately, with such work ow services, the information available to the service is limited (such a provenance-recording mechanism will be able to record task inputs and outputs, but will be unable to record individual task states). Likewise, a similar granularity challenge in utilizing work ow services is that the selection of work ow tasks themselves alter the state of information exposed to the work ow service. This is especially challenging for work ow check-pointing and fault-tolerance services as they they are highly dependent on correct and full reporting of state. 3.1.4 The Work ow Services Challenge The challenge for the scientist in utilizing work ow services is the fragmentation of existing work ow services (especially regarding mechanism of interaction) and the selection of appropriate work ow granularity. 41 As an example, a scientist utilizing the Globus Toolkit to provide a work ow system will need to understand how to utilize the Grid-WFS web services front-end in order to record work ow faults and restart portions of a scientic work ow as part of fault-handling. In order to properly use the checkpointing features of Grid- WFS, the scientist will need to decompose their scientic software into a number of tasks at a granularity that takes check-pointing and other Grid-WFS services into consideration, potentially forcing a rewrite of the code in addition to the challenge of understanding the web services provided by Grid-WFS. To record work ow provenance, the same scientist would also need to utilize a provenance-recording work ow service like PReServ as it is work ow-system independent. This code will need to be implemented in the individual tasks rather than handled at the level of the work ow itself (causing the scientist more eort to utilize a second set of web services and another potential rewrite of the scientic code). Finally, if the scientist was unable to orchestrate all data locally, then she might need to utilize a data discovery service such as the OODT Prole Server, causing her to have to understand a third set of web services. 3.2 Technical Approach In order to address these challenges and allow scientists to develop scientic work- ows more easily, scientists must be supported in composing (and decomposing) scientic software into work ow tasks and integrating these tasking into a work- ow system. Additionally, the scientist should be able to interact with new and existing work ow services utilizing a standard mechanism. 42 3.2.1 Insights A number of insights that suggest that software architecture, and more pre- cisely, domain-specic software architecture can aid the scientist in developing work ow-based scientic software. Because work ow systems compose software tasks through explicit control and data ow, there is a direct parallel between work ow compositions and data ow architectures [TMD09]. Work ow tasks can be characterized as components in a data ow architecture. Like work ow tasks, work ow services can form additional components that codify domain knowledge such as fault handling, data provenance, data discovery, etc. Communication with these services can then be isolated to connectors that transfer control/data ow (state in data ow architectures). The specication of a particular work ow is then a composition of these com- ponents into a software architecture that adheres to the rules of the domain and orchestrates the scientic experiment. My approach for composing these archi- tectures consists of three major steps: 1) identication of work ow tasks via the decomposition of an existing scientic application, 2) the composition of code snip- pets into components in a domain-specic software architecture composing these components via data ow connectors, and 3) the deployment of these systems into an existing grid infrastructure to be executed in a production environment. The following subsections give more detail for each step in the approach. 3.2.2 Decomposition Decomposition of existing scientic software is important not only because it aids the scientist in utilizing software that is dicult to produce, representing a sig- nicant investment by the scientic community, but also because it can aid the scientist in re-aggregating work ow tasks as required by work ow services. 43 Decomposition of scientic software is the identication of scientic software kernels in source code. These kernels are software modules that are both functional in nature (free of side-eects), and reify a single step in the scientic process being conducted. In order to identify kernels in scientic software, I have developed a domain- specic approach to software architectural recovery, a process that identies com- ponents in monolithic software systems and is tuned specically to the domain of scientic software. This approach to support scientists in identifying scientic ker- nels, as well as its implementation as a tool called KADRE, are discussed Chapter 4. 3.2.3 Re-architecting Work ow Components Once kernels have been identied in existing scientic software, they must be made to conform to a domain-specic software architecture through the use of componentizing wrappers. The resulting components are then integrated into an architecture that reies the intended work ow via orchestrating connectors. Finally, work ow services, also wrapped as components, can be integrated into the architecture to address engineering-related concerns via a special form of con- nector called an invoking connector that temporarily transfers control and data ow to work ow services prior to returning it to the componentized work ow tasks. These wrappers, connectors and composition rules are implemented as a domain-specic software architecture in an architecturally-aware middleware called Prism-MW. The resulting DSSA, called the Scientic Work ow Software Archi- tecture, or SWSA, is detailed in Chapter 5. 44 3.2.4 Deployment onto the Grid The nal step in my approach is to deploy this architecture onto an existing grid system in order to create a production system. Because this step in the approach is largely an implementation issue, it is out of the scope of this dissertation, though it is included in the approach for sake of completeness. 45 Chapter 4 KADRE: Decomposing Existing Software Systems As discussed in the preceding chapters, software architecture captures the principal design decisions behind a software system [TMD09]. A domain-specic software architecture not only encompasses the traditional architectural elements, ratio- nale, and design, tailored to a specic domain, but also codication of methods and domain knowledge [HR + 95]. Architectural recovery, often thought of as the process of elucidating a software system's architecture from its source code, avail- able documentation and other artifacts (e.g., runtime behavior) [DP09, MJ06], such a domain-specic methodology. Thus, I dene domain-specic architectural recovery to be an architectural recovery process that utilizes domain knowledge to extract system architectures compliant to a domain-specic reference architecture. KADRE (Kernel Analy- sis, Decomposition, and Re-Engineering), a domain-specic architectural recovery technique, identies a specic type of component: a work ow component which is functional in nature (free of side-eects) and represents a single step in the scientic process (a single domain concept), known as a kernel. 46 4.1 Approach to Decomposition Based on experiences refactoring scientic software used at JPL, recovering a data ow architecture from an existing monolithic scientic software system is a task in which software components are identied and used as stages in a work ow process; in turn, the work ow process implements the actual scientic experiment being attempted by the software's authors. In this dissertation, these work ow components are called scientic kernels. A scientic kernel is a snippet of source code that implements a single step in the scientic process being reied in software. A kernel is similar to a lter in data ow architectures [TMD09]: it is stateless and exchanges all necessary data as part of the transfer of control ow. Existing scientic software systems are written in a variety of languages (For- tran, C/C++, and increasingly Java) and also contain varying degrees of encapsu- lation (subsystems, modules, classes, objects, functions, etc.) that form elements of the software system. During past manual decompositions of dierent scien- tic software systems into kernels, a number of sources of information about these elements are utilized, in addition to a conceptual understanding of the scientic process being implemented. These sources of information include: Proximity between elements in source code (i.e., Are two functions in the same class or same subsystem?). Call distance between elements (i.e., Are two functions called by the same parent function? Are they executed far from one-another temporally?). Data dependencies between elements (i.e., Do two functions share a large amount of data? Do they manipulate completely separate resources?) 47 These experiences and observations have led to the thesis that underlies KADRE: Automatically agglomerating low-level software elements, namely func- tions, into clusters that are meaningful scientic kernels is possible if (1) a cluster- ing process is used that incorporates these forms of information about the elements, and (2) the clustering process is further tailored to the domain of scientic software by use of an appropriately representative training set of sample decompositions. 4.1.1 Clustering Process This clustering process is codied in a tool called KADRE. As with organizational terminology, in which a cadre is a group of key personnel or entities in an organiza- tion that form its core, KADRE is a tool that aids the scientist in the decomposi- tion of monolithic code into a work ow-based system by automatically identifying scientic kernels. In order to identify these kernels, KADRE uses an anity clus- tering algorithm that implements a clustering technique, originally inspired by the manual approach to decompose scientic software systems described above. Ele- ments of the program, in this case functions, are iteratively combined until the resulting clusters exhibit maximum internal cohesion and are minimally similar to one-another. Algorithm 1: Iterative Element Clustering Algorithm. input : program P output: set C of element clusters set C all elements2P whilejCj> 1 and9(a;b)2CC :Sim(a;b) do nd max 8a;b2C Sim(a;b) remove a and b from C addha;bi to C 48 As with other clustering techniques, KADRE employs a similarity metric in order to measure the \distance" between code elements. The similarity of clusters is measured using full linkage chaining { the similarity of cluster a and cluster b is taken as the minimum pairwise similarity of all elements in a and all elements in b. The clustering process terminates when the similarity between all clusters is below a threshold value that is a tunable parameter. As with the manual process, similarity in my clustering algorithm is aected by three sources of information: (1) proximity, (2) call distance, and (3) data dependency, each of which is explored in greater detail in this chapter. 4.1.2 Formal Denition of Similarity During the anity clustering process given in Subsection 4.1.1, the key metric employed to judge \distance" (and clustering potential) is the similarity between any two elements in the program. As noted in this chapter, during the manual clustering process for discovering scientic kernels, a number of sources of informa- tion are utilized in order to determine if two program elements should be clustered together. This information is formalized into a similarity metric (seen in Algorithm 2), Sim(a;b), given as Equation 4.1. As with the manual process, similarity in this clustering algorithm is aected by three sources of information: (1) the proximity measure between a and b, given as prox(a;b), (2) the scaled distance between the entities in the program's call tree, given ascall(a;b), and (3) the data dependency measure between a and b, given as data(a;b): Sim(a;b) = prox(a;b) +call(a;b) + data(a;b) + + (4.1) 49 In Equation 4.1,,, and are weighting parameters that adjust the in uence of these measures on the overall similarity of two software features. In the rest of this subsection, I will describe each of the measures in greater detail. In Section 4.2, I will show how weighting parameters are trained to be tuned specically for the domain of scientic software. Proximity The code-level proximity between any two software elements is best understood by visualizing the hierarchical composition of the elements in the program as a tree. For an object-oriented language like Java, the base node in the tree represents the program, its children the modules (in Java, these are packages), the modules' children the classes, the classes' children the functions (or methods, in the case of Java), and the functions' children the lines of code. This entity relationship is shown in Figure 4.1. Figure 4.1: A general program element model and hierarchical tree view. If the two elements contained in program P have the same parent, they are highly proximate to one another. On the other hand, if they are not contained in the same 2-level sub-tree of the containment tree (i.e., their parents' parents are not identical), they are considered unrelated. For example, if two functions do not exist in the same module, they are considered proximally unrelated. 50 If each of the elements' parents are contained by the same parent container, then their proximity is language dependent because various languages support dierent degrees of encapsulation. For languages that do not support objects, the element hierarchy tree is more shallow that the general model presented in Figure 4.1. In Fortran77, for example, programs contain modules, modules contain functions/subroutines, and functions and subroutines contain lines of code. In order to accommodate dierent source languages, the proximity of elements with this relationship is assigned the value , a parameter with a value such that 1 > > 0. is an adjustable parameter used to adapt the proximity metric a priori to dierent element hierarchies. For languages with shallower entity trees, entities are more likely to share a common parent regardless of wether they should be clustered, so the value of is lowered to compensate. The equation for prox(a;b) is given in Equation 4.2. prox(A;B) = 8 > > > < > > > : 1 if par(A) =par(B) if par(par(A)) =par(par(B)) 0 if par(par(A))6=par(par(B)) where 1>> 0 and par() = the parent entity of (4.2) Call Relationship The second measure aecting similarity between two program elements to be considered is the call relationship, or call(a;b). Notionally, the call relationship between two program elements is the elements' relationship in the dynamic execu- tion of the program. If two functions in a program are executed in sequence, then they are good candidates for clustering, whereas, for example, a function that is 51 called only at the beginning of execution is not a good candidate to cluster with a function that is called only at the program's end. The measure call(a;b) is the normalized distance between nodes representing the program elements in the undirected version of the program's call graph. Specif- ically, an element a is judged similar to an element b in terms of call relationships by calculating the minimal path in the undirected call graph, call it P 0 , between the nodes a and b. This measure is normalized by dividing by the longest path in P 0 . Since this is a distance measure, the complement is taken to get a similarity metric. The equation for calculating call(a;b) is given as Equation 4.3. call(a;b) = 1 minimal path from a to b in graph P 0 longest path in P 0 where P 0 is the undirected call graph of program P (4.3) Data Dependency The nal measure that aects the similarity metric of two elements, a and b, is data(a;b), given in Equation 4.4. If two program elements share a large amount of data, then they are good candidates for clustering. Revisiting the denition of scientic kernels given in Section 4.1, it is important to minimize the data dependencies between the resultant clusters in order to allow the kernels to be stateless and exchange all data dependencies with control ow. The measure data(a;b) is similar to Girard, et. al.'s indirect relationship met- ric in [GKS97], although this relationship is analyzed using a resource tree of data dependencies rather than a ow graph. Put another way, transitive data depen- dencies are not captured. This distinction is important so as to not indirectly in uence the data(a;b) feature with call structure (since call structure is already accounted for in Sim(a;b) via the parameter call(a;b)). 52 data(a;b) = common(a;b) common(a;b) +distinct(a;b) (4.4) As shown in Equations 4.5 and 4.6, common(a;b) is the set of data dependen- cies, or resources, common to both element a and element b and distinct(a;b) is the set of resources which are used by eithera orb but not both. As with [GKS97], the operator is the symmetric dierence of the sets. common(a;b) =W (resources(a)\resources(b)) (4.5) distinct(a;b) =W (resources(a)resources(b)) (4.6) Not all shared resources should be weighted equally, however. Two functions that both manipulate a large array of variables should be judged more similar for clustering purposes than two functions that exchange a single variable, for example. Likewise, if a resource is shared by all program elements, then it should not be used as rationale for clustering any two elements over all possible clusterings. In Equations 4.5 and 4.6, W is a weighting parameter given by multiplying the size (memory footprint) of the shared resource normalized to overall program footprint by its Shannon information content [Sha48] { a measure of the resource's frequency of use. Equation 4.7 shows how the weights of the parameters determined to be both common and distinct are applied. W (X) = P x2X w(x) P e2E w(e) where w(x) = log(prob(x))size(x) (4.7) 53 In Equation 4.7,prob(x) is the probability that, given an element in program P, it will utilizex as a data resource. Additionally, the setE is the set of all elements in program P. From these equations, one can see that if a particular resource is shared by all elements of a program (itsprob(x) = 1), then thelog term in Equation 4.7 zeroes out: it provides no information, so its Shannon information content is zero. Additionally the size of x is a multiplier to give more weight to variables with larger memory footprints. 4.1.3 An Example In order to illustrate the clustering process, and specically, to show how similar- ity between program elements is utilized, I will use an example common to the scientic domain: a Lower-Upper Matrix Decomposition package called LUD that is used to solve systems of linear equations. Conceptual Overview The LUD software solves a system of equations, a _ x =b for x, given in the form: a = 2 6 6 6 6 6 6 6 4 a 11 a 12 a 1n a 21 a 22 a 2n . . . . . . . . . . . . a m1 a m2 a mn 3 7 7 7 7 7 7 7 5 x = 2 6 6 6 6 6 6 6 4 x 1 x 2 . . . x n 3 7 7 7 7 7 7 7 5 b = 2 6 6 6 6 6 6 6 4 b 1 b 2 . . . b n 3 7 7 7 7 7 7 7 5 by performing four conceptual steps: 1. Populate matrices a[][] and b[] 2. Perform a partial pivot operation on a[][] 3. Solve the system a[][] x[] = b[] 54 4. Compute b[] by multiplying out a[][] with the result to step 3 and com- puting the error Because each kernel represents a conceptual step in the scientic process, the clustering algorithm should produce four clusters, each corresponding to one of the conceptual steps outlined above. In order to properly illustrate the clustering process, elided source code for LUD is shown in Figure 4.2; specically, method signatures, methods calls, and variables shared by the methods are shown, although the actual method logic is omitted for clarity. Clustering Process In Figure 4.2 there are seven methods, or program features, being clustered, namely main, init, run, validate, dmxpy, dgefa, and dgesl. In order to determine if any of the elements are to be clustered together, the pairwise similarity of each element is calculated, as is each of its constituent measures: proximity, call relationship, and data dependency. I will illustrate the calculation of KADRE's similarity metric on two methods, run and dmxpy; the reader should be able to verify the full anity matrix for LUD given below. In order to calculate Sim(run;dmxpy), we must calculate the three individual features: prox(run, dmxpy), call(run, dmxpy), and data(run, dmxpy). In order to calculate proximity for Java programs, a value of = 0:4 was determined after some experimentation. Because LUD is a small example com- prising a single class, all methods are highly proximate to one another, and so prox(run;dmxpy) = 1. 55 1 public class LUDf 2 double a[][], b[], x[]; 3 int ipvt[], n; 5 public void init(int size)f ... /*populate a[n][n], b[n]*/ 21 g 23 public void run()f 24 dgefa(a,ipvt,n); //perform partial pivot 25 dgesl(a,ipvt,b,n) //solve a[][]*x[]=b[] 26 g 28 public void validate()f ... /*estimate roundoff error*/ 50 dmxpy(a,b,x,n); //calculate a[][]*b[] ... /*compare residual*/ 57 g 59 private void dmxpy(double a[][], double b[], double x[], int n)f ... /*calculate a[][]*x[] = b[]*/ 63 g 65 private void dgefa(double a[][], int ipvt[], int n)f ... /*perform partial pivot on a[][]*/ 109 g 111 private void dgesl(double a[][], int ipvt[], double b[], int n)f ... /*solve for b[]*/ 135 g 137 public static void main(String[] args)f 138 LUD lud = new LUD(); 139 lud.init(500); 140 lud.run(); 141 lud.validate(); 142 g 143g Figure 4.2: Elided source code for LUD. 56 Figure 4.3: LUD's undirected call tree. In order to calculate the call relationship between each of the methods in LUD using Equation 4.3, we must rst recover the call graph of LUD. The undirected version of this call graph is shown in Figure 4.3. Shown in the undirected graph in Figure 4.3, the longest path in LUD's undi- rected call tree is 4 (dgefa()! run()! main()! validate()! dmxpy()). Using this information and Figure 4.3, we can calculate call(run, dmxpy) = 0:25. To calculate the data dependency relationship between run and dmxpy, the weighting for each of the data resources in LUD must be determined. From Figure 4.2, there are ve shared resources, namely a[][], x[], b[], ipvt[], and n. Table 4.1 shows calculations for prob(x), size(x), and w(x) for each resource in LUD. Table 4.1: Weights associated with LUDecomp resources. x prob(x) size(x) W (x) a[][] 1.0 0.5 0.0 x[] 0.5 0.2 0.4 b[] 0.83 0.2 0.4 ipvt[] 0.67 0.1 0.2 n 1.0 0.0 0.0 We can calculate data(run, dmxpy) in the equation as follows: data(run;dmxpy) = W (fa[][];b[];ng) W (fa[][];b[];ng) +W (fx[];ipvt[]g) 57 Solving this equation, data(run;dmxpy) = 0:4. Combining these features pro- duces the following calculation for Sim(run;dmxpy): Sim(run;dmxpy) = (1) +(0:25) + (0:4) + + In the next subsection, I will show how the tuning of these weighting param- eters, as well as the threshold parameter , creates dierent clusterings, some of which more accurately identify scientic kernels. 4.1.4 Parameterization In order to train KADRE to produce clusters that are meaningful scientic kernels, weights for each of the features as well as a threshold parameter must be deter- mined. Revisiting the previous example, assuming an even distribution of weights to each of the relationships above (e.g., = 1, = 1, and = 1), the anity matrix shown as Figure 4.4 would be calculated for the LUD example. init run validate dmxpy dgefa dgesl init 0.7 0.767 0.683 0.48 0.617 run 0.633 0.55 0.69 0.917 validate 0.917 0.42 0.55 dmxpy 0.33 0.467 dgefa 0.667 dgesl Figure 4.4: Anity matrix for LUD assuming even weighting. From Figure 4.4, the pair of entities to be clustered rst is either dmxpy and validate, or run and dgesl. Assuming the former pair is clustered rst and the latter pair second, a reduced anity matrix in which the next pair of entities to agglomerate is init and the cluster formed by validate and dmxpy is produced. 58 We can see from Figure 4.5, however, that the similarity of this pair has fallen much lower than the similarities of the perviously combined program elements. init valid/dmxpy dgefa run/dgesl init 0.683 0.4833 0.617 valid/dmxpy 0.333 0.467 dgefa 0.667 run/dgesl Figure 4.5: Reduced anity matrix for LUD after two clusterings. Recalling Algorithm 2, program elements are only combined if their similarity is greater than the threshold parameter, . Like , , and , is a trainable parameter. In this example, a greater than the distance between init and the cluster consisting of valid and dmxpy (see Figure 4.5), stops KADRE's clustering process at a point in which the clusters are meaningful scientic kernels (described in Subsection 4.1.3). is not the only tuning parameter that plays a role in KADRE's clustering process. Had the weightings , , and not been set equal, but instead favored call relationships more heavily ( = 0:2, = 0:5, and = 0:3, for example), then the rst two clusterings would once again take place, but it would be dgefa and the cluster formed of run and dgesl that would be the third potential clustering. If these two methods are clustered, then conceptual steps 2 and 3 of LUD (recalling Section 4.1.3) would be combined, violating the denition of scientic kernels. In fact, an anity-based clustering tool called Bauhaus, which implements a widely-cited component recovery technique [GK97, Kos00], gave just this clustering (see Figure 4.6). 59 Figure 4.6: Initial clustering produced by Bauhaus. In the next section, I will show how KADRE leverages domain specicity to cre- ate a training set of sample scientic software systems and also how it utilizes this training set to incrementally improve the clusters produced by KADRE without the manual intervention of the user. 4.2 Parameter Training As illustrated by the LUDecomp example in the last section, dierent values of , , , and eect a weighting mechanism that produces dierent clusterings of program elements. Figure 4.7 shows three dierent clusterings of LUD, each pro- duced with dierent weights associated with the features calculated by Sim(a;b) and also dierent thresholds. Clustering A matches the manually decomposed scientic kernels for LUD. Clus- tering B is a clustering that would be produced if all features (prox,call, anddata) were equally weighted, and 0:34< < 0:683. This example was discussed above. As another example, Clustering C is produced if = 0:2, = 0:5, = 0:3, and 0:565<< 0:6. 60 Figure 4.7: Three dierent clusterings of LUD. In order to rene the weighting of individual measures in theSim(a;b) metric, clusterings produced by KADRE can be compared to expertly decomposed scien- tic kernels, adjusting , , , and in order to more closely align the clusters with scientic kernels. Adjusting these values via a supervised learning process will improve KADRE's accuracy at identifying true scientic kernels. In Chapter 6, I will describe this evaluation suite as well as the results of the supervised learning used to train KADRE's parameters, evaluating KADRE's accuracy in producing scientic kernels. 61 Chapter 5 SWSA: A Domain Specic Software Architecture In the second step of my approach, scientic kernels are integrated into a domain- specic software architecture that supports the scientist in orchestrating work ow tasks and integrating work ow services. 5.1 SWSA - A Domain-Specic Software Archi- tecture Scientic Work ow Software Architecture (SWSA), is a domain-specic software architecture for scientic work ow systems that promotes encapsulation of work- ow services separate from scientic algorithms, accessible via rst-class connec- tors. This separation of concerns allows both scientists and software engineers to converse about work ow stage design without requiring that each domain practi- tioner become expert in both scientic and engineering elds [WMGM08]. In order to expose internal breakpoints during which to communicate with work ow services (as required of work ow stage level communications), SWSA uses a decomposition of an existing work ow stage into scientic kernels as provided by KADRE (See Chapter 4). In addition to the scientic kernels, control/data ow of the original source code in the form of a graph of calls made to execute each 62 Figure 5.1: A DSSA for scientic work ow stages including connectors to work ow services. kernel (its control), and the data dependancies between each kernel (its data- ow) are utilized to build orchestrations. SWSA uses a componentizing wrapper to provide these kernels with a compo- nent interface, creating Modules as seen in Figure 5.1. A hierarchically-composed exogenous connector [LOW06b] is used to mimic the original control and data ow of the program. An event capturing both the data and control ow of the program at the start of the kernel execution is sent from the exogenous connector to the module via an invoking connector. The crux of this architecture is the use of an invoking connector to communicate with work ow services. Each invoking connector has a custom event handler that can be written to access work ow services either synchronously or asynchronously (see the pseudo-code in Figure 5.1). Because both control and data ow is captured in the exogenous connector, the state of the work ow stage is captured in the control event, so the handler is able to communicate necessary data to each service. 63 5.1.1 Service Encapsulation Much like the modules that componentize each kernel in the work ow stage, SWSA uses wrappers to transform existing work ow services, both API-based and web or grid service enabled, into components with event-based interfaces (see Figure 5.2). Existing work ow services can be wrapped a priori and reused in subsequent scientic work ows, or new work ow services can be built to directly leverage event-based communication from the invoking connector. Figure 5.2: Componentization of work ow services allows both asynchronous and synchronous communications. In order to communicate with a synchronous work ow service, the invoking connector passes an event to an event processor. The event processor can either instantiate the work ow service directly in the case of API-based work ow services (including event-bus based services), or utilize a web-services interface in the case of grid or web-service enabled work ow services. Since an invoking connector can also communicate asynchronously with a work ow service, the DSSA allows for a thread pool and event queue, though these elements are not necessary in order to wrap a purely synchronous work ow service. 5.1.2 The Role of Connectors Chapter 3 explored a number of representative work ow services, each with dif- ferent methods of communication and dierent requirements in terms of level of 64 information required from a work ow stage. Existing communications methods do not provide the work ow stage developer, often a domain scientist and not a software engineer, with necessary guidance on how to interface to the work ow service. The domain scientist, then, must understand the API and link libraries at the code level, understand message formats and event buses, or identify and access web services in order to satisfy the engineering aspects of their scientic work ows. A domain-specic software architecture removes the impetus of understanding each of these communication methods from the domain scientist. The combina- tion of an exogenous connector and invoking connectors provides the necessary communication between kernels and work ow services without requiring a deep understanding of the scientic process. This separation of concerns between the scientic algorithm and engineering aspects of the scientic work ow allows us to clearly delineate the role of the scientist and the role of the software engineer in the development of production scientic work ows. 5.2 Middleware Support Prism-MW supports architectural abstractions by providing classes for represent- ing each architectural element, with methods for creating, manipulating, and destroying the element. These abstractions enable direct mapping between an architecture and its implementation. The below description of Prism-MW's design is adapted from [MMRM05]. Figure 5.3 shows the class design view of Prism-MW. The shaded classes con- stitute the middleware core, which represents a minimal subset of Prism-MW that enables implementation and execution of architectures in a single address space. 65 Only the dark gray classes of Prism-MW's core are directly relevant to the appli- cation developer. Figure 5.3: UML class diagram view of Prism-MW. Middleware core classes are highlighted. 66 Brick is an abstract class that represents an architectural building block. It encapsulates common features of its subclasses (Architecture, Component, Con- nector, and Port). Architecture records the conguration of its constituent com- ponents, connectors, and ports, and provides facilities for their addition, removal, and reconnection, possibly at system runtime. A distributed application is imple- mented as a set of interacting Architecture objects. Events are used to capture communication in an architecture. An event consists of a name and payload. An event's payload includes a set of typed parameters for carrying data and meta-level information (e.g., sender, type, and so on). An event type is either a request for a recipient component to perform an operation or a reply that a sender component has performed an operation. Ports are the loci of interaction in an architecture. A link between two ports is made by welding them together. A port can be welded to at most one other port. Each port has a type, which is either request or reply, so that request events are forwarded from request ports to reply ports, while reply events are forwarded in the opposite direction. Components perform computations in an architecture and may maintain their own internal state. A component is dynamically associated with its application- specic functionality via a reference to the AbstractImplementation class. This allows the developer to perform dynamic changes to a component's application- specic behavior without having to replace the entire component. Each component can have an arbitrary number of attached ports. When a component generates an event, it places copies of that event on each of its ports whose type corresponds to the generated event type. Components may interact either directly (through ports) or via connectors. 67 Connectors are used to control the routing of events among the attached compo- nents. Like components, each connector can have an arbitrary number of attached ports. Components attach to connectors by creating a link between a component port and a single connector port. Connectors may support arbitrary event delivery semantics (e.g., unicast, multicast, broadcast). Finally, Prism-MW provides support for event dispatching, event queuing, architectural monitoring, and re ection facilities that the developer can associate with the system's architecture [MMRM05]. 5.3 Support for Architectural Styles Prism-MW's design provides a separation of concerns via its explicit architectural constructs and its use of abstract classes and interfaces. This has resulted in an extensible middleware and has enabled Prism-MW to directly support multiple architectural styles, even within a single application. In the following sections, I will discuss the dimensions of architectural variance which Prism-MW supports through the use of these abstractions, as well as provide example architectural styles to illustrate the construction of style variants through implementations of extensible Prism-MW features. 5.3.1 Dimensions of Style Variance The work of [MRMM02] indicates that architectural styles vary along ve dimen- sions: structure, topology, behavior, interaction, and data. I will explore each of these dimensions in turn and illustrate Prism-MW's ability to support these variants. 68 1. Structure. The structure of an architecture describes the \physical" prop- erties of the components and connectors. An example of structure is C2 style's requirement that all components contain exactly one port which sends requests and receives notications on its top and one port that receives requests and sends notications on its bottom [T + 96]. 2. Topology. The topology of an architecture represents the organization of components and connectors into a system. An example is pipe-and-lter's requirement that a pipe connect the source of one lter to the sink of another lter [Fie00]. 3. Behavior. Every component and connector in an architectural style has an associated internal behavior. In this context, the behavior is specic to the architecture and independent of the functionality of the application-specic code. An example is that in the client-server architecture, a client sends a request to a server and then blocks until it receives a matching reply. 4. Interaction. Architectural styles are often characterized primarily by the guidelines they provide for the interaction between architectural elements. An example of interaction in the C2 style is that requests ow upward through the vertical component topology while asynchronous notications ow downward. 5. Data. Data is the form of the information exchanged by architectural ele- ments in a particular style. For example, in many event-based architectural styles, data takes the form of request and reply events. 69 Extensibility in Prism-MW is built around an unchanging software core. To that end, the core constructs (Component, Connector, Port, Event, and Archi- tecture) are subclassed via specialized classes (ExtensibleComponent, Extensible- Connector, ExtensiblePort, ExtensibleEvent, and ExtensibleArchitecture), each of which has a reference to a number of abstract classes. Seen in Figure 5.4, each of the abstractions of a class can have multiple implementations, thus enabling selec- tion of the desired functionality inside each instance of a given Extensible class. If a reference to an abstract class is instantiated in a given Extensible class instance, that instance will exhibit the behavior realized inside the implementation of that abstract class. Figure 5.4: Prism-MW's support for architectural styles. All software artifacts developed in Prism-MW are inherited from the Brick class. Specication of a particular style element such as client, server, pipe, or lter is accomplished using an attribute that identies its style-specic type. Each of these components and connectors communicates in Prism-MW using Events. The two native event types in Prism-MW are Request and Reply, though new event types can be specied via the ExtensibleEvent class. This allows for the introduction of new event types when required by the style and in this way, Prism- MW supports Data variants. 70 Topological and Structural constraints for each of these style-specic dimensions are represented in Prism-MW via the implementation of AbstractTopology that is installed on the ExtensibleArchitecture object. Each implementation of Abstract- Topology provides a style-specic implementation of the weld method, structuring the components and connecting them according to the rules of the particular style. For example, a client server-specic implementation of AbstractTopology provides a weld which adds request and reply ports to both the Server and Client components and species that the Client's request port is matched to the Server's reply port and vice-versa. Additionally, Prism-MW provides services for distributed architec- tural styles through ExtensiblePort. ExtensiblePort has an associated implementa- tion of the AbstractDistribution class to support inter-process communication (see Figure 5.4.c). In order to specify Behavior of an architectural style's elements, Prism-MW provides the abstract class ExtensibleComponent. ExtensibleComponent can be implemented with component-specic behavior, such as component synchronism. In order to specify Interaction aspects of a style, ExtensibleConnector has an asso- ciated implementation of the AbstractHandler class to support style-specic event routing policies (see Figure 5.4.a). For example, uni-directional data forward- ing and bidirectional event broadcast can be developed as implementations of the AbstractHandler. 5.4 SWSA's Implementation In this section, I will discuss the implementation of SWSA in Prism-MW, including components, connectors, and topologies specic to the architecture as well as the 71 overhead induced by re-architecting a monolithic software system as an event-based system. 5.4.1 SWSA Components There are two types of components in SWSA: work ow task components and work ow services components. Each are implemented as a type of ExtensibleCom- ponent in Prism-MW. Because each type of component is inherited from the the core class Component, it has an event queue built in. Additionally, at the level of the component event handler, each work ow task is capable of receiving both synchronous and asynchronous typed events. The kernel code that has been developed to implement a work ow task is installed in a specic work ow task component at runtime via the software factory pattern. Utilizing this mechanism, SWSA separates scientic code from the frame- work as well as any implementation of SWSA artifacts in Prism-MW, honoring the separation of concerns between the scientist and the engineer. In addition to integrating the scientic code at runtime, the nal task of the engineer in integrating scientic code into a work ow task is to write the com- ponent's event handler. It is with this method that the engineer species wither the component will handle synchronous or asynchronous events (or both) and also logic to translate events into program state for the kernel code. Like kernels, work ow services are also integrated into Prism-MW as Com- ponents. These components can be built a priori by the software engineer who understands the service API (or other interaction mechanism). In this case, the implementation of the component's handler code is slightly dierent in that the translation to implement is from event to API or service call. 72 5.4.2 SWSA Connectors Like work ow service components, SWSA connectors can be developed a priori by the software engineer. These connectors implement specic control and data ow (as specied by the work ow) and are exogenous connectors [LOW06a]. Each connector transfers events that carry program state as payload. When events are transferred to a component's handler, control and data ow are transferred to the particular work ow task. The simplest form of SWSA connector is a pipe. In addition, SWSA connec- tors that support branching (and conditional routing based on events types) and bounding have been created. These connectors suce to implement control and data ow in most existing work ow systems [YB05], though more complex control and data ow (such as Multi-Choice [RtHvdAM06]) can be implemented by com- posing these SWSA connectors or by creating further custom connectors using the Prism-MW framework. In addition to these connectors, SWSA's invoking connector is implemented with a custom handler that replicates events and passes them to all connected work ow service components. 5.4.3 SWSA Topology SWSA constrains components from transferring data or control directly, so welds between components are disallowed. Additionally, all components must be con- nected to an invoking connector, though any number of other connector types can be used to transfer events between invoking connectors. These topology rules are implemented as a SWSA Topology that inherits from the AbstractTopology Interface and is installed in the Topology of the work ow instance. 73 5.4.4 Overhead A key concern when reimplementing scientic software with architectural con- structs is the introduction of performance overhead in both computational time and memory footprint. Indeed, Chapter 1 put forth a hypothesis that properly architected work ow system would perform correctly and within certain perfor- mance bounds. In the next chapter, I will discuss my evaluation of the overhead induced by Prism-MW wrappers and should that SWSA performs with-in the specied bounds. 74 Chapter 6 Evaluation In this chapter, I recapitulate my research thesis and hypotheses as well as present an evaluation of each research platforms discussed in great detail in Chapters 4 and 5. Additionally, I will discuss the experiment design, challenges, results, and conclusions drawn specically with reference to the aforementioned hypotheses. 6.1 Research Thesis and Hypotheses Recalling from Chapter 1, this dissertation postulates that a domain-specic soft- ware architecture, or DSSA, for grid-based scientic software could provide a sep- aration of concerns between the computational scientist, who has deep domain knowledge and must validate the science being conducted, and the software engi- neer who is concerned with the engineering associated with scaling experiments, management data products, and orchestrating scientic code through work ows. The following hypotheses support this thesis: Hypothesis 1: Software modules can be accurately identied algorithmically in existing scientic software that not only encapsulate code in a functional way, but also reify a single step in the scientic process being employed at the conceptual level. Accuracy in this context is 80% correct identication when compared to expertly-identied software modules. 75 Hypothesis 2: An ecient (in run-time) algorithm exists to automatically identify the software modules as specied in Hypothesis 1. An algorithm will be considered ecient if it exhibits polynomial time complexity. Hypothesis 3: A scientic software system that is composed of these identied modules, properly componentized with explicit interfaces, performs correctly and within acceptable bounds in both computation time and memory footprint when compared to its equivalent monolithic software system. Acceptable is dened as within 10% of the original computation time and within 50% of the original memory footprint. 6.2 Validation Approach KADRE, or Kernel Analysis, Decomposition, and Re-Engineering, is a domain- specic architecture recovery approach and toolset to aid automatic and accurate identication of potential scientic work ow components in existing scientic soft- ware. In order to validate Hypothesis 1, I will measure the accuracy of KADRE in identifying scientic kernels. In order to evaluate Hypothesis 2, I will analyze KADRE's runtime performance. To test Hypothesis 3, I will measure the performance impacts of SWSA, the Scientic Work ow Software Architecture, on the performance, both in memory and runtime, of existing scientic software. Each of these evaluations is described in detail in the rest of this section. 76 6.2.1 Evaluation Suite In order to evaluate KADRE's ability to accurately cluster program elements into scientic kernels, it is necessary to develop a suite of representative scientic soft- ware systems. Seven systems from the Java Grande Benchmarking Suite [B + 00] were utilized to build the suite: LUD { a lower-upper decomposition program, Crypt { a cryptography application, Sparse { a sparse matrix multiplication program, FFT { a fast fourier transform program, Euler { a program that sim- ulates uid dynamics in a channel, MD { a molecular dynamics simulation, and Search { an A.I. application that uses alpha-beta pruning to explore a complex space. The Java Grande Benchmark is a set of scientic and engineering applications originally produced to test the performance of the Java programming language for use in the computational science domain. Each program was selected as represen- tative of the domain by computational science experts from academia, industry and government [B + 00]. They are also representative of a variety of coding practices and code structures (object-orientation, hierarchical composition, and class encap- sulation are used to varying degrees) as they were initially written by dierent developers. Based on a recent survey, these programs share a number of characteristics with the average scientic software system, including size; Wilson found that more than two-thirds of the computational scientists he surveyed worked with scientic software with less than 5 KSLOC [Wil09b]. Despite the small size of these programs as measured in SLOC, they are are still very complex (200 lines of byte-level data manipulations in the case of crypt, for example). For each of the seven software programs, a group of scientic computing experts to cluster the system manually, identifying scientic kernels. During manual 77 Table 6.1: Suite of training programs. Name Description SLOC Classes Functions Resources Kernels LUD A program that uses lower-upper decomposition to solve systems of linear equations (this was the exam- ple program used in Section 4.1.3). 143 1 7 5 4 Crypt A program that applies IDEA (International Data Encryption Algorithm) block cypher to encrypt and decrypt an array of bytes. 226 1 8 2 3 Sparse A program that multiplies matri- ces in Yale storage scheme, or compressed-row structure. 83 1 5 6 3 FFT A Fast Fourier Transform solver. 173 1 10 1 4 Euler Time-dependent Euler equation solver simulating ow in a channel with a \bump" on one of the walls using a fourth order Runge-Kutta method with local timestepping. 832 4 24 33 5 MD An N-body molecular dynamics code modeling particles interacting under a Lennard-Jones potential in a cubic spatial volume with periodic boundary conditions. 399 4 18 59 3 Search Search solves a game of connect-4 on a 6 x 7 board using an alpha-beta pruning technique. The search uses a transposition table with the twobig replacement strategy. 410 4 27 36 3 78 decomposition, the static and dynamic properties of the software systems were leveraged as well as knowledge of the domain and specically understanding of the scientic and mathematical processes implemented by each of the systems to arrive at a decomposition representative of the high-level steps in the systems' respective scientic processes. In addition to these decompositions, a study was conducted in which twenty- three teams of computer science graduate students (eighty-two students in all), where instructed on how to decompose scientic applications. The students were enrolled in an advanced graduate-level software architecture course and were given extensive guidance in order to apply the manual decomposition strategy used by the domain experts. Three to four teams decomposed each system in the evaluation suite; these decompositions were combined create a \best of breed" decomposition { a decomposition in which the software clusters are true representations of the scientic steps being conducted. As further validation for the selection of decompositions, each program was redeveloped as a work ow-based orchestration utilizing the selected kernels and the equivalence of the scientic results produced by the work ow and the original, monolithic system was validated. A summary of the evaluation suite is presented in Table 6.1 and full source code for each of the programs in the suite is available in Appendix A. 6.2.2 Measuring Cluster Accuracy Accuracy, or quality, of a clustering produced by KADRE is dened as the inverse distance between the clustering and an expert decomposition of the software sys- tem into scientic kernels. Notionally, one can measure the distance between two clusterings as the eort required to transform one clustering to another. 79 To illustrate this transformation process, Figure 4.7 from Chapter 4 is repro- duced as Figure 6.1 below. Figure 6.1: Three dierent clusterings of LUD. Using Clusterings A and C from Figure 6.1, one can see that if dgefa was moved to the cluster containing run and dgesl, Clustering A would transform to Clustering C. Similarly, if one where to move init to the cluster containing valid and dmxpy, Clustering C would be transformed to Clustering B. In [TH99], Tzerpos and Holt dened the MoJo metric as the number of Move and Join operations required to convert one set of clusters to another set. MoJo [TH99], treats each clustering of program P as a three-level containment tree in which all clusters belong to the same parent and all clustered entities are leaf nodes at the third level. This tree is illustrated in Figure 6.2. In Figure 6.2, a clustering X contains two clusters, X 1 and X 2 . Cluster X 1 consists of two elements: B and E. Cluster X 2 consists of three elements: A, C and D. In order to judge the distance between this clustering and an alternative clustering of the same program, MoJo calculates the minimum number of Move and Join operations needed to transform one clustering to the other. 80 Figure 6.2: MoJo's 3 level containment tree model. The rst step in determining this distance is to label each of the elements of one clustering with the cluster in the second clustering in which the element is contained. In Figure 6.3, the elements in Clustering X (illustrated as circles), are labeled with the corresponding cluster names from Clustering Y (the rectangular labels). Once this labeling is done, a center cluster in clustering X is detyermined for each cluster Y 1n , where a center cluster is the cluster in Y with the most labels in a particular cluster in X. For example, the center cluster of X 2 is 3 as the greatest number of elements are tagged 3. Figure 6.3: Clusterings X and Y of program P labeled. Once center clusters are determined, elements in a cluster contained in Cluster- ing X that are labeled with a Y cluster dierent from the center cluster are moved to the appropriate center cluster, though any protable joins are performed prior to these moves. A protable join is the combination of two clusters such that the the join results in fewer moves than if the two clusters remained distinct. In Figure 6.3, X 2 would be split [TH99] by moving C to its own cluster (aected by 1 move operation) and E would be moved from X 1 to X 2 , making MoJo(X;Y ) = 2. 81 This metric will be used to judge the distance between KADRE's clusters and expert decompositions of the systems detailed above, and therefore the accuracy of the clustering produced by KADRE. Cluster accuracy, or A(a;b), is dened as: A(a;b) = (1 MoJo(a;b) n ) 100% (6.1) Where n is the number of elements clustered [TH99]. In the above example from Figure 6.3, if one considers Clustering X to be correct clustering of elements, then Clustering Y is 60% accurate (1 2 5 100% = 60% since there are ve clusters, A-E, in each clustering and MoJo(X;Y ) = 2). 6.3 Evaluation of KADRE's Accuracy I have made three measures to judge KADRE's accuracy in clustering scientic applications. The rst measure of accuracy is a leave-one-out cross-validation to measure KADRE's ability to cluster previously untrained scientic applications (a proxy for KADRE's ability to identify new scientic applications not in the evaluation set). The second measure of accuracy is a comparison to state of the art general purpose software clustering tools that are not targeted specically to scientic software systems. Bauhaus, a commercially-available clustering tool, was used to measure against KADRE. The nal measure of accuracy is to measure KADRE's accuracy to manual decomposition by software engineers who are not scientic software experts but are trained to decompose scientic software. Specically, the variance of dierent groups of engineers producing clusters is used to characterize the error-prone nature of manual decomposition. 82 6.3.1 Cross Validation In order to evaluate the accuracy of the clusterings produced by KADRE, a leave- one-out cross validation technique common to the evaluation of training algorithms was used. KADRE's parameters (as described in Chapter 4) were trained for each program in the evaluation suite in order to optimize the clustering results for all other programs in the suite (the training set). The goal of this optimization was to maximize the joint accuracy of the KADRE across all training programs, as measured by MoJo. 0 % 20 % 40 % 60 % 80 % 100 % LUD Crypt Sparse FFT Euler MD Search KADRE’s Accuracy Program Left Out Leave-One-Out Cross Validation Analysis Figure 6.4: Bar graph showing KADRE's accuracy during leave-one-out cross- validation. For each of the programs that was left out, the accuracy of KADRE in decom- posing that system was measured using the parameters from the training process. The results of this analysis are shown in Figure 6.4. Overall, KADRE performed well in leave-one-out analysis, performing at greater than 80% accuracy on average. 83 Figure 6.5: Euler's call tree. This analysis does highlight one shortcoming of KADRE. Specically, KADRE assumes that each function in a scientic program should appear in only one kernel. Recalling from Chapter 4, KADRE analyzes call trees such as the call tree for the Euler program in the evaluation suite (seen in Figure 6.5). Unlike the other programs is the evaluation suite, Euler utilizes a small number of functions that would, in other languages, be considered macros or \library" functions. These functions are specically highlighted in the box labeled Replicated Functions and are called multiple times by the Function doInteration. As such, when KADRE clusters Euler's functions, it must select a single kernel in which to place these functions. The absence of these functions from the other kernels results in the detection of inaccuracies by MoJo analysis, and a lower accuracy than other scientic software programs in the evaluation suite. Because KADRE is an analysis that aids the scientist in decomposing scientic systems rather than a fully automatic approach, KADRE's utility to the scientist in these cases could be improved by agging potential \library" functions based on Shannon information content (as measured during the calculation ofdata(a;b), 84 described in Chapter 4). In other words, KADRE can ag for the scientist functions that are utilized by the majority of other functions in the code. Removing these functions from the MoJo calculation of KADRE's accuracy, KADRE would be 75% accurate in its kernel identication within Euler. I will discuss this proposed improvement further in the Future Work section of Chapter 7. 6.3.2 Comparison to Existing Techniques A course-grained exhaustive search of all possible parameter values ranging from 0.1 to 1.0, inclusive, for,, , and optimizing the joint accuracy of all programs in the evaluation suite is a second measure of accuracy. The distance between KADRE's clustering of each of the evaluation suite programs and the \best of breed" scientic kernels was evaluated using the most successful weighting gener- ated during this exhaustive search. = 0:1, = 0:3, = 0:6, and = 0:4 were evaluated. Further, we compared these clusters to the clusters produced by an untuned version of Bauhaus as described in Koschke's work [Kos00]. Results of this comparison using the accuracy metric{described above{are shown in Figure 6.6. Looking more closely at the types of errors produced by both KADRE and Bauhaus, these errors can be classied into one of three types of error: a join error (an error in which the clustering tool suggests that two clusters should be joined when they are separate clusters in the expert decomposition), a move error that in which KADRE or Bauhus performs moves thinking that a cluster is over- agreggated (the opposite of a join error), and a move error in which the clustering tool interprets a boundary between clusters incorrectly and recommends moves incorrectly. Error counts by type are reported in Table 6.2. 85 0 % 20 % 40 % 60 % 80 % 100 % LUD Crypt Sparse FFT Euler MD Search Accuracy Program Comparison to State of the Art KADRE Bauhaus Figure 6.6: Bar graph showing KADRE's accuracy vs Bauhaus, a state of the art general purpose software clustering utility. Interpreting the errors in Table 6.2, Bauahus produced signicant over- aggregation or signicant under-aggregation (e.g., it either clustered most functions together or decided not to cluster any elements of the code). KADRE produced many fewer Join errors, not suering from the same under-aggregation. Overall, KADRE was highly successful at automatically creating clusters that match expertly-created scientic kernels. It is important to note that Bauhaus could perform better had it's anity clustering algorithm been experimentally tuned. Nonetheless, the comparison presented in Figure 6.6 is fair considering that any tuning of Bauhaus would be left to the scientist [Kos00], and may be arbitrary without signicant knowledge of the clustering algorithm. 86 Table 6.2: Error counts by category for both KADRE and Bauhaus. Program KADRE Bauhaus Total Errors Moves Joins Total Errors Moves Joins Boundary OverAgg. Boundary OverAgg. LUD 1 0 0 1 1 0 1 0 Crypt 1 1 0 0 2 0 0 2 Sparse 0 0 0 0 2 0 0 2 FFT 0 0 0 0 3 0 3 0 Euler 4 1 1* 1 4 0 0 4 MD 4 3 1 0 5 2 0 3 Search 4 0 2* 0 6 1 1 4 * Indicates that this error was recorded as multiple move errors in when measuring clustering accuracy with MoJo. 6.3.3 Comparison to Manual Decomposition As part of an advanced graduate-level course in software architecture, twenty-three teams of computer science students { both full time students and working engineers { were each asked to decompose one of the programs in the evaluation suite and orchestrate the program via work ow. The manual decomposition took an average of nine developer weeks to accomplish based on self-reporting in required status reports whereas, the orchestration took an average time of one developer week. While most teams generally agreed with one another on the decompositions as well as with the expert decompositions, teams could potentially fall into two traps during manual decomposition: under-decomposition as a result of the lack of understanding of the overall data ow in the program, and over-decomposition that correlated with lack of understanding of the scientic concepts being reied (these are the same types of errors re ected in clustering via both KADRE and Bauhaus as shown in Table 6.2). In comparing the various manual decompositions with one another, there was an overall variance of 22% in A(a,b), suggesting that manual decomposition is more prone to error than KADRE-based decomposition. 87 6.4 Evaluation of KADRE's Run Time In Chapter 1, Hypothesis 2 is parameterized to dene KADRE as a performant algorithm (i.e., ecient in run-time) if it exhibits polynomial runtime complex- ity. In this section, I will not only analyze KADRE's complexity, but, because even polynomial-time algorithms can be impractical to utilize owing to the run- time eects of constant multipliers, I have also charted KADRE's performance in analyzing the evaluation suite discussed in Section 6.2.1. 6.4.1 Algorithmic Analysis KADRE rst uses a one-pass parser to build a map of clusterable elements and their features and then executes the algorithm given in Chapter 4. For the sake of clarity, KADRE's clustering algorithm is reproduced below. Algorithm 2: Iterative Element Clustering Algorithm. input : program P output: set C of element clusters set C all elements2P whilejCj> 1 and9(a;b)2CC :Sim(a;b) do nd max 8a;b2C Sim(a;b) remove a and b from C addha;bi to C In the above algorithm, KADRE's implementation of clustering runs in O(n 3 ) where n is the number of clusterable elements, or, more specically, the num- ber of function points. In practice, however, the algorithm performs signicantly sub O(n 3 ) as the number of clusterable elements is reduced at every iteration. Nonetheless, KADRE's clustering algorithm satises Hypothesis 2 in that it runs in polynomial time. 88 6.4.2 Empirical Performance Analysis Beyond algorithmic analysis, it is important to empirically test KADRE's perfor- mance in order to understand the impact of constant multipliers on runtime. Using the evaluation suite of scientic programs, I tested KADRE's performance on a 3.06Ghz dual core PC with 4 Gb RAM. 0 1 2 3 4 5 6 0 5 10 15 20 25 30 Time (s) Function Points KADRE’s Runtime Performance Figure 6.7: Runtime performance of KADRE. KADRE not only performs a polynomial time analysis, but is also a practical analysis tool for scientic software developers as it can work very quickly to analyze and cluster function points. From Figure 6.7, KADRE does perform in polynomial time. Fitting a third-order polynomial to the data, KADRE's runtime performance follows the following equation: 0:0002x 3 0:0004x 2 + 0:0581x + 0:1334 89 From both an empirical performance analysis and an algorithmic runtime anal- ysis of KADRE's clustering algorithm, KADRE runs in polynomial time, satisfying the performance metric specied in Hypothesis 2, validating the hypothesis. 6.5 Evaluation of SWSA's Performance This dissertation's nal hypothesis, that a scientic software system composed of these identied modules, properly componentized with explicit interfaces, performs correctly and within acceptable bounds in both computation time and memory footprint when compared to its equivalent monolithic software system, requires evaluation of the runtime impacts of SWSA on scientic code. The target metric for \performant" in this hypothesis is performance within 10% of the original computation time and within 50% of the original memory footprint. In order to test this hypothesis, I implemented a scientic kernel that multiplies an identity matrix of Doubles by itself. Matrix manipulations of this sort are common to a number of scientic simulation codes and is a good test of the impacts of SWSA's componentizing wrapper as a signicant amount of data is passed into and out of the kernel. In addition to Java, this kernel was developed in Fortran. As I will discuss further in Chapter 7, many legacy scientic applications are developed in Fortran, and, therefore, I consider it a language to target in future work. Comparing the performance of the original code to the SWSA version of the kernel in both computational time and memory footprint, SWSA is performant as specied in Hypothesis 3. Specically, the impact of computation time made by SWSA wrappers is due almost entirely to the overhead of launching the Java Virtual Machine and so is negligible for larger matrices. For the smallest matrix 90 size tested (200 x 200 Doubles), the computational impact (in wall clock time of execution) was 20% greater for the Fortran implementation of the kernel and 2% greater for the Java version (See Figure 6.8 for a plot of the performance impacts to the Fortran kernel), though this impact, because JVM instantiation is a xed cost, was quickly marginalized with increases to problem size. 0.1 1 10 100 1000 0 200 400 600 800 1000 Computation Time (s) Matrix Size [x,x] Computation Time in Fortran Fortran 77 Thin Wrapper Full Wrapper 0.01 0.1 1 10 0 200 400 600 800 1000 Spacetime (Mbyte.s) Matrix Size [x,x] Spacetime in Fortran Fortran 77 Thin Wrapper Full Wrapper Figure 6.8: Performance impacts of SWSA of scientic kernels. The impact to the memory footprint of scientic kernels produced by SWSA is more signicant for Fortran code, but was negligible for Java code. Shown if Figure 6.8, the impact of SWSA on the memory footprint of a Fortran matrix multiplication kernel is 50% (i.e., the SWSA componentized version of the Fortran code ran in a memory footprint 150% of the size of the original source code). This impact is due to the size of data being passed be SWSA into the kernel at the component interface. In order to explore this impact further, a version of the code was developed (shown as the \Full Wrapper" in Figure 6.8) that passed two full matrices rather than a single identity matrix. The dierence in memory impact between the \Thin Wrapper" version and \Full Wrapper" version show that the impact is indeed caused by the amount of data passed through SWSA's interface (and through the native C bindings used by SWSA to execute Fortran code). 91 Overall, SWSA, while not meeting the performance metric specied for com- putational time for very small computations when using Fortran scientic code, should still be considered performant, as the impact to computation time is neg- ligible with larger problem sizes which are, in turn, more realistic to the scientic domain. In my future work, I will explore engineering optimizations that can reduce the impact of SWSA's wrappers on Fortran code, including trades that can be made in buering data in the C bindings that link SWSA's Java wrappers with Fortran kernels. 92 Chapter 7 Conclusion and Future Work In this chapter, I will present my conclusions, the contributions of the work I have presented in this dissertation to the state of the art in the domain of scientic computing, and I will discuss proposed, future work to extend these ndings. 7.1 Conclusions Chapter 1 asked the question, \How can software engineering, and specically software architecture [TMD09, PW92], be leveraged in order to provide improved development support to scientists conducting production and distribution activities as part of in silico experimentation?" A modular software system, decomposed into software components with explicit communications through software connectors, is the heart of the software archi- tectural approach, and the rst, necessary step toward an architected software system. To that end, this dissertation dened a specic type of software component for scientic software systems that is called a scientic kernel. These kernels form work ow stages in a work ow-based orchestration of a scientic experiment. KADRE, a clustering algorithm and tool for scientists to use to analyze existing scientic code, which represents a signicant investment in both time and money, can identify scientic kernels in monolithic scientic software. 93 KADRE accomplishes this by looking at data dependencies, call dependencies, and artifact-level proximity. KADRE is a domain-specic software architecture recovery tool because, unlike general purpose tools such as Bauhaus [Kos00], the relative weighing of each of these measures is trained specically for scientic soft- ware. Indeed, as shown in Chapter 6, KADRE outperforms Bauhaus signicantly when analyzing scientic software systems. In analyzing how tuning KADRE specically for scientic software systems eects parameter weighting, data dependencies and call relationships are more sig- nicant than proximity in determining a clustering of two elements. Additionally, data dependencies factor more heavily than call distance, re ecting the goal of minimizing data transfer in data ow connectors (elements of the software system that exchange more data will more likely be clustered together than elements of code that exchange little or no data). Once scientists writing scientic software systems have identied scientic ker- nels, orchestrating them into scientic work ow applications is the next step in creating more exible scientic experiments in silico. Work ow-based orchestra- tions allow scientists to: Scale their experiments, conducting many runs of the experiment with dier- ent parameters (in the process, gaining understanding the complex interac- tions of multiple phenomena) and simulating a larger portion of the physical world (with the goal of producing results veriable through traditional means of observation). Repeat their experiments, leveraging work ow conguration to capture the orchestration and parameters of the experiment at a level of granularity and in a form that allows scientists to essentially \replay" the experiment. 94 Validate experiments of other researchers, leveraging the work ow congu- ration to not only rationalize about the experiment at a high level of under- standing but also to validate the experiment by conducting it themselves. In order to support this type of orchestration while also allowing software engi- neers to manipulate the code in terms of its explicit software architecture, I have developed SWSA, the Scientic Work ow Software Architecture. Scientic ker- nels are developed as explicit components in the resulting software system and orchestration in handled by data ow connectors. A major concern of scientic software developers is that their scientic simula- tions be performant, so Chapter 6 validated the hypothesis that these kernels can be orchestrated via and \architected" software system in a performant manner. Orchestration of these kernels does not too greatly impact the resulting perfor- mance of the resulting work ow. 7.2 Contributions Firstly, the notion of a scientic kernel, or snippet of source code that implements a particular step in a scientic process, is side-eect free, and can act as a work- ow staging in a work ow-based orchestration of a scientic experiment has been developed in this dissertation. Secondly, the identication of kernels, a necessary step in creating a work ow- based orchestration of a scientic experiment reied in a software system, is not only possible via program analysis, but that software clustering can yield accurate scientic kernels eciently when the clustering process takes specic aspects of scientic software into account. 95 KADRE, a program analysis tool that clusters software elements in order to identify kernels in scientic software systems, is both accurate (greater than 80%) and ecient (polynomial runtime), is more accurate in kernel identication that Bauhaus, an often-cited clustering software based on Koschke's work [Kos00]. Kernel based scientic software development allows scientists to compose exist- ing experiments in a more explicit manor, reifying individual steps of the exper- iment as a work ow that can be replayed, either by the original experimenter or others looking to validate the work. Additionally, explicit steps in an experimental process can be rened over time more easily by the experimenter, both because work ow-based orchestration allows scientists to work at a level of abstraction with which they are more familiar, but also because it allows for more targeted changes (for performance optimization, parallelization, or evolution of code). To support this type of orchestration while at the same time supporting both the role of the scientists and the role of the software engineer in creating large- scale scientic simulations, I have developed SWSA, a software architecture that not only formalizes the notions of work ow systems, but also implements domain- specic functionality such as work ow services, as components that can be utilized by developers of SWSA-based systems in the future. 7.3 Future Work Though it was not part of this dissertation because it is a matter primarily of engineering eort rather than scientic research, automatic deployment of rearchi- tected scientic software, implemented with SWSA, on Grid systems is necessary to support the complete lifecycle of in silico experimentation. 96 In the future, I plan to improve KADRE's identication of scientic kernels in two ways. Firstly, KADRE will be improved by expanding the scientic software suite developed as part of the dissertation (elaborated in Chapter 6). An expanded evaluation suite would allow for further studies regarding the training of KADRE, including analyzing the impact of increasing training set sizes on KADRE's accu- racy. Secondly, as discussed in Chapter 6, KADRE's analysis will be modied to identify functions that are called by the majority of other functions (such as simple math routines like vector multiplication ,etc.) and, as such, should be replicated as aspects, macros, or library routines rather than assigned to a particular kernel. Identifying these functions to the scientist and removing them from the clustering process should aid in the accuracy of identied kernels. Additionally, as other programming languages, such as C and Fortran, are more prevalent that Java for scientic software development, extending KADRE's ability to parse these languages to its common language model is a future eort. Finally, I plan to incorporate SWSA more specically in the design process for scientic software systems rather than focus primarily on the recovery aspects of existing scientic software systems. At institutions such as NASA's Jet Propulsion Laboratory, scientic software that analyzes data from orbiting spacecraft, inter- planetary rovers (such as the Mars Rovers), and airborne instruments is developed every day and can utilize SWSA to orchestrate more exible processing pipelines. Such as eort is underway at JPL in the form of an experimental test-bed of the algorithm developers of the Soil Moisture Active/Passive (SMAP) Mission [M + 09a]. 97 Bibliography [A + 06] Benjamin A. Allen et al. A component architecture for high- performance scientic computing. The International Journal of High Performance Computing Applications, 20(2):163{202, 2006. [AAA + 95] Gustavo Alonso, Divyakant Agrawal, Amr El Abbadi, C. Mohan, Roger Gunthor, and Mohan Kamath. Exotica/fmqm: A persis- tent message-based architecture for distributed work ow man- agement. In IFIP WG8.1 Working Conference on Information Systems Development for Decentralized Organizations, Trond- heim, Norway, 1995. [Aba04] Jemal H. Abawajy. Fault-tolerant scheduling policy for grid com- puting systems. In 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), Santa Fe, New Mexico, USA, 2004. [ACN02] J. Aldrich, C. Chambers, and D. Notkin. Archjava: Connect- ing software architecture to implementation. In International Conference on Software Engineering, 2002. [AKA + 94] Gustavo Alonso, Mohan Kamath, Divyakant Agrawal, Amr El Abbadi, Roger Gunthor, and C. Mohan. Failure handling in large scale work ow management systems. Technical Report Research Report RJ9913, IBM Almaden Research Center, 1994. [ama10] Amazon elastic compute cloud (amazon ec2). http://aws.amazon.com/ec2/, 2010. [B + 00] J. M. Bull et al. A benchmark suite for high performance java. Concurrency: Practice and Experience, pages 375{388, 2000. [BDH03] Luiz Andr Barroso, Jerey Dean, and Urs Hlzle. Web search for a planet: The google cluster architecture. IEEE Micro, 23(2):22{ 28, 2003. 98 [BF05] Rajendra Bose and James Frew. Lineage retrieval for scientic data processing: a survey. ACM Comput. Surv., 37(1):1{28, 2005. [BN05] D. Beyer and A. Noack. Clustering software artifacts based on frequent common changes. In Int. Work. on Program Compre- hension, pages 259{268, May 2005. [Bra04a] Jan Brase. Using digital library techniques - registration of scientic primary data. Lecture Notes in Computer Science, 3232/2004:488{494, 2004. [Bra04b] Jan Brase. Using digital library techniques - registration of scientic primary data. Lecture Notes in Computer Science, 3232/2004:488{494, 2004. [CBJ + 00] Peter M. Cox, Richard A. Betts, Chris D. Jones, Steven A. Spall, and Ian J. Totterdell. Acceleration of global warming due to carbon-cycle feedbacks in a coupled climate model. Nature, 408:184{187, 2000. [CFS + 06] Steven P. Callahan, Juliana Freire, Emanuele Santos, Carlos E. Scheidegger, Claudio T. Silva, and Huy T. Vo. Using provenance to streamline data exploration through visualization. Technical Report UUSCI-2006-016, SCI Institute Technical Report, Uni- versity of Utah, 2006. [CGR05] G.L.-T Chiu, M. Gupta, and A.K. Royyuru. Blue gene. IBM Journal of Research and Development, 49, March/May 2005. [CKSP07] Jeery Carver, Richard P. Kendall, Susan E. Squires, and Dou- glass E. Post. Software development environments for scientic and engineering software: A series of case studies. In Proceedings of the International Conference on Software Engineering, 2007. [Cor03] Xilinx Corporation. Revolutionary architecture for the next gen- eration platform fpgas, 2003. [Cor05] Intel Corporation. Moore's law: Raising the bar, 2005. [DMM99] D. Doval, S. Mancoridis, and B. S. Mitchell. Automatic cluster- ing of software systems using a genetic algorithm. In Proc. of the Software Technology and Engineering Practice, page 73, 1999. 99 [DP09] Stephane Ducasse and Damien Pollet. Software architecture reconstruction: A process-oriented taxonomy. IEEE Trans. Softw. Eng., 35(4):573{591, 2009. [Dre00] Jurgen Drews. Drug discovery: a historical perspective. Science, 287(5460):1960{1964, 2000. [dSMM03] Paulo Pinheiro da Silva, Deborah L. McGuinness, , and Rob McCool. Knowledge provenance infrastructure. Data Engineer- ing Bulletin, 26(4):26{32, 2003. [EKK01] Tom Epperly, Scott R. Kohn, and Gary Kumfert. Component technology for high-performance scientic simulation software. In Proceedings of the IFIP TC2/WG2.5 Working Conference on the Architecture of Scientic Software, pages 69{86, Deventer, The Netherlands, The Netherlands, 2001. Kluwer, B.V. [EOG02] Johann Eder, Georg E. Olivotto, and Wolfgang Gruber. A data warehouse for work ow logs. In EDCIS '02: Proceedings of the First International Conference on Engineering and Deployment of Cooperative Information Systems, pages 1{15, London, UK, 2002. Springer-Verlag. [FB01] James Frew and Rajendra Bose. Earth system science work- bench: A data management infrastructure for earth science prod- ucts. In SSDBM '01: Proceedings of the Thirteenth International Conference on Scientic and Statistical Database Management, page 180, Washington, DC, USA, 2001. IEEE Computer Society. [Fie00] R. Fielding. Architectural Styles and the Design of Network- Based Software Architectures. PhD dissertation, University of California at Irvine, 2000. [FK99] Ian Foster and Carl Kesselman. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufaman, 1999. [FKT01] I. Foster, C. Kesselman, and S. Tuecke. The anatomy of the grid: Enabling scalable virtual organizations. International Journal of Supercomputer Applications, 15, 2001. [Fos02] I. Foster. What is the grid? a three point checklist. GRIDToday, July 2002. [Fre04] James Frew. Earth system science server (es3): Local infras- tructure for earth science product management. In Proceedings of the Fourth Earth Science Technology Conference, 2004. 100 [FTL + 02] James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, and Steven Tuecke. Condor-g: A computation management agent for multi-institutional grids. Cluster Computing, 5(3):237{246, 2002. [FVWZ02] Ian T. Foster, Jens-S. Vockler, Michael Wilde, and Yong Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In SSDBM '02: Proceedings of the 14th International Conference on Scientic and Statistical Database Management, pages 37{46, Washington, DC, USA, 2002. IEEE Computer Society. [G + 07] Y. Gil et al. Wings for pegasus: Creating large-scale scien- tic applications using semantic representations of computa- tional work ows. In Proceedings of the 19th Annual Conference on Innovative Applications of Articial Intelligence (IAAI), July 2007. [GAO94] D. Garlan, R. Allen, and J. Ockerbloom. Exploiting style in architectural design environments. In Proc. of SIGSOFT '94 Symposium on the Foundations of Software Engineering, 1994. [GAO95] David Garlan, Robert Allen, and John Ockerbloom. Architec- tural mismatch: Why reuse is so hard. IEEE Software, 12(6):17{ 26, 1995. [Gar99] Felix C. Gartner. Fundamentals of fault-tolerant distributed computing in asynchronous environments. ACM Comput. Surv., 31(1):1{26, 1999. [GDE + 07] Yolanda Gil, Ewa Deelman, Mark Ellisman, Thomas Fahringer, Georey Fox, Dennis Gannon, Carole Goble, Miron Livny, Luc Moreau, and Jim Myers. Examining the challenges of scientic work ows. Computer, 40(12):24{32, 2007. [GGCD07] Y. Gil, P. A. Gonzalez-Calero, and E. Deelman. On the black art of designing computational work ows. In Proceedings of the Second Workshop on Work ows in Support of Large-Scale Sci- ence (WORKS'07), in conjunction with the IEEE International Symposium on High Performance Distributed Computing, June 2007. [GK97] J. Girard and R. Koschke. Finding components in a hierarchy of modules: a step towards architectural understanding. In Proc. of ICSM, 1997. 101 [GKS97] J.-F. Girard, R. Koschke, and G. Schied. A metric-based approach to detect abstract data types and state encapsulations. Autom. Softw. Eng., 1997. [GL01] Samuel Z. Guyer and Calvin Lin. Broadway: A software archi- tecture for scientic computing. In Proceedings of the IFIP TC2/WG2.5 Working Conference on the Architecture of Scien- tic Software, pages 175{192, Deventer, The Netherlands, The Netherlands, 2001. Kluwer, B.V. [GLM04] Paul Groth, Michael Luck, and Luc Moreau. Formalising a pro- tocol for recording provenance in grids. In Proceedings of the UK OST e-Science second All Hands Meeting 2004 (AHM'04), 2004. [GMF + 05] Paul Groth, Simon Miles, Weijian Fang, Syvia C. Wong, Klaus- Peter Zauner, and Luc Moreau. Recording and using provenance in a protein compressibility experiment. In Proceedings of The 14th IEEE International Symposium on High Performance Dis- tributed Computing (HPDC-14), Research Triangle Park, North Carolina, 2005. [Gow97] Barry Gower. Scientic Method: An historical and Philosophical Introduction. Routledge, 1997. [Ham09] Je Hammerbacher. Analyzing petabytes of data with hadoop. Talk at Jet Propulsion Laboratory, Pasadena, CA. October 30, 2009. [HH00] Ahmed E. Hassan and Richard C. Holt. A reference architecture for web servers. In Proc. of WCRE, 2000. [Hig10] High productivity computer systems website. http://www.highproductivity.org/, 2010. [HK03a] Soonwook Hwang and Carl Kesselman. A generic failure detec- tion service for the grid. Technical Report ISI-TR-568, Informa- tion Sciences Institute, University of Southern California, 2003. [HK03b] Soonwook Hwang and Carl Kesselman. Gridwork ow: A exible failure handling framework for the grid. In HPDC '03: Proceed- ings of the 12th IEEE International Symposium on High Perfor- mance Distributed Computing (HPDC'03), page 126, Washing- ton, DC, USA, 2003. IEEE Computer Society. 102 [HR + 95] B. Hayes-Roth et al. A domain-specic software architecture for adaptive intelligent systems. IEEE Transactions on S.E., 21:288{301, 1995. [IFI01] IFIP TC/2WG2.5 Working Conference on the Architecture of Scientic Software. The Architecture of Scientic Software. Kluwer Academic Publishers, 2001. [JCC + 04] Hai Jin, Hanhua Chen, Jian Chen, Ping Kuang, Li Qi, and Deqing Zou. Real-time strategy and practice in service grid. In COMPSAC '04: Proceedings of the 28th Annual Interna- tional Computer Software and Applications Conference (COMP- SAC'04), pages 161{166, Washington, DC, USA, 2004. IEEE Computer Society. [JZC + 03] Hai Jin, DeQing Zou, HanHua Chen, JianHua Sun, and Song Wu. Fault-tolerant grid architecture and practice. J. Comput. Sci. Technol., 18(4):423{433, 2003. [K + 07] J. Kim et al. Provenance trails in the wings/pegasus work ow system. To appear in Concurrency and Computation: Practice and Experience, Special Issue on the First Provenance Challenge, 2007. [KAB + 05] Ioannis Kotsiopoulos, Pinar Alper, Sean Bechhofer, Oscar Cor- cho, Carole Goble, Dean Kuo, Paolo Missier, and Maria de los Santos Perez-Hernandez. Towards a semantic grid architecture. In First CoreGrid workshop on Knowledge and Data Manage- ment in Grids, 2005. [KE00] Rainer Koschke and Thomas Eisenbarth. A framework for exper- imental evaluation of clustering techniques. In Proc. of IWPC, pages 201{210, 2000. [Kep03] J. Kepner. Hpc productivity: An overarching view. In J. Kepner, editor, IJHPCA Special Issue on HPC Productivity, volume 18, 2003. [Kom09] Matt Komorowski. A history of storage cost. http://www.mkomo.com/cost-per-gigabyte, 2009. [Kos00] Rainer Koschke. Atomic architectural component recovery for program understanding and evolution. Dissertation, U. of Stuttgart, 2000. 103 [LOW06a] K.-K. Lau, M. Ornaghi, and Z. Wang. A software component model and its preliminary formalisation. In F.S. de Boer et al., editor, Proc. 4th International Symposium on Formal Methods for Components and Objects, LNCS 4111, pages 1{21. Springer- Verlag, 2006. [LOW06b] K.-K. Lau, M. Ornaghi, and Z. Wang. A software component model and its preliminary formalisation. In Proceedings of the 4th International Symposium on Formal Methods for Components and Objects, pages 1{21. Springer-Verlag, 2006. [Lud08] Christian Ludlo. Sandpile.org: The world's lead- ing source for pure technical x86 processor information. http://www.sandpile.org/, 2008. [Luo00] ZongWei Luo. Checkpointing for work ow recovery. In ACM- SE 38: Proceedings of the 38th annual on Southeast regional conference, pages 79{80, New York, NY, USA, 2000. ACM Press. [M + 06] Chris A. Mattmann et al. A software architecture-based frame- work for highly distributed and data intensive scientic applica- tions. In Int. Conf. on S.E., pages 721{730, 2006. [M + 09a] C. Mattmann et al. A reusable process control system framework for the orbiting carbon observatory and npp sounder peate mis- sions. In IEEE Intl Conference on Space Mission Challenges for Information Technology (SMC-IT 2009), pages 165{172, 2009. [M + 09b] Chris A. Mattmann et al. The anatomy and physiology of the grid revisited. In WICSA/ECSA, 2009. [Mat06] Chris Mattmann. Software connectors for highly distributed and voluminous data intensive systems. In ASE, pages 331{334, 2006. [MB07] Onaiza Maqbool and Haroon Babri. Hierarchical clustering for software architecture recovery. IEEE Trans. Softw. Eng., 33(11):759{780, 2007. [MCS + 04] Luc Moreau, Syd Chapman, Andreas Schreiber, Rolf Hempel, Omer Rana, Laszlo Varga, Ulises Cortes, and Steven Willmott. Provenance-based trust for grid computing - position paper, 2004. [Mer10] Phil Merkey. Beowulf project overview. http://www.beowulf.org/overview/history.html, 2010. 104 [Mic09] Sun Microsysems. Pathways to open petascale computing: The sun constellation system { designed for performance, 2009. [MJ06] Nenad Medvidovic and Vladimir Jakobac. Using software evo- lution to focus architectural recovery. Autom. Softw. Eng., 13(2):225{256, 2006. [MM01] Brian S. Mitchell and Spiros Mancoridis. Comparing the decom- positions produced by software clustering algorithms using sim- ilarity measurements. In Proc. of ICSM, page 744, 2001. [MMP00] Nikunj R. Mehta, Nenad Medvidovic, and Sandeep Phadke. Towards a taxonomy of software connectors. In ICSE, pages 178{187, 2000. [MMRM05] S. Malek, M. Mikic-Rakic, and N. Medvidovic. A style-aware architectural middleware for resource-constrained, distributed systems. IEEE Trans. on Software Engineering, 31, 2005. [Moo65] Gordon Moore. Cramming more components onto integrated circuits. Electronics Magazine, page 4, 1965. [mpi10] The message passing interface (mpi) standard. http://www.mcs.anl.gov/research/projects/mpi/, 2010. [MRMM02] Marija Mikic-Rakic, Nikunj R. Mehta, and Nenad Medvidovic. Architectural style requirements for self-healing systems. In First Workshop on Self-Healing Systems, november 2002. [MV99] Jonathan I. Maletic and Naveen Valluri. Automatic software clustering via latent semantic analysis. In Autom. Softw. Eng., page 251, 1999. [pbs10] Pbsworks. http://www.pbsgridworks.com/Product.aspx?id=11, 2010. [P95] G. F. Pster. In Search of Clusters: the Coming Battle in Lowly Parallel Computing. Prentice-Hall, Inc., 1995. [PK04] D. E. Post and R. P. Kendall. Lessons learned from asci. The International Journal of High Performance Computing Applica- tions, 18:399{416, 2004. [Pro08] Human Genome Project. Genomics and its impact on science and society: A 2008 primer, 2008. 105 [pvm10] Pvm: Parallel virtual machine. http://www.csm.ornl.gov/pvm/, 2010. [PW92] D. E. Perry and A. L. Wolf. Foundations for the study of soft- ware architectures. ACM SIGSOFT Software Engineering Notes, October, 1992. [RtHvdAM06] N. Russell, A.H.M. ter Hofstede, W.M.P. van der Aalst, and N. Mulyar. Work ow control- ow patterns: A revised view. Technical Report BPM-06-22, BPM Center, 2006. [Seg09] Judith Segal. Some challenges facing software engineers develop- ing software for scientists. In SECSE '09: Proceedings of the 2009 ICSE Workshop on Software Engineering for Computational Sci- ence and Engineering, pages 9{14, 2009. [SG02] J. P. Sousa and D. Garlan. Aura: an architectural framework for user mobility in ubiquitous computing environments. In The Working IEEE/IFIP Conference on Software Architecture, 2002. [SG03] Bridget Spitznagel and David Garlan. A compositional formal- ization of connector wrappers. In ICSE, pages 374{384, 2003. [SGPHGCGP06] Manuel Sanchez-Gestido, Maria S. Perez-Hernandez, Rafael Gonzalez-Cabero, and Asuncion Gomez-Perez. Improving a satellite mission system by means if a semantic grid architec- ture. In GGF16 Semantic Grid Workshop, 2006. [Sha48] Claude Elwood Shannon. A mathematical theory of communi- cation. Bell System Tech. Journal, 27, 1948. [SM03] Martin Szomszor and Luc Moreau. Recording and reasoning over data provenance in web and grid services. In International Con- ference on Ontologies, Databases and Applications of SEmantics (ODBASE'03), 2003. [SPG06] Yogesh L. Simmhan, Beth Plale, and Dennis Gannon. Perfor- mance evaluation of the karma provenance framework for sci- entic work ows. In International Provenance and Annotation Workshop (IPAW) & LNCS, volume 4145, 2006. [Sup10] The international conference for high performance computing networking, storage, and analysis website. http://www.sc- conference.org/, 2010. 106 [T + 96] R.N. Taylor et al. A component- and message-based architectural style for gui software. IEEE Trans. on Software Engineering, June 1996. [TAL + 01] Michael Thune, Krister Ahlander, Malin Ljungberg, Markus Norden, Kurt Otto, and Jarmo Rantakokko. Object-oriented modeling of parallel pde solvers. In Proceedings of the IFIP TC2/WG2.5 Working Conference on the Architecture of Scien- tic Software, pages 159{174, Deventer, The Netherlands, The Netherlands, 2001. Kluwer, B.V. [TH99] Vassilios Tzerpos and R.C. Holt. Mojo: A distance metric for software clusterings. Working Conf. on Rev. Eng., page 187, 1999. [TMD09] R.N. Taylor, N. Medvidovic, and E.M. Dashofy. Software Archi- tecture: Foundations, Theory, and Practice. John Wiley & Sons, 2009. [Top10] Top500 supercomputing sites. http://www.top500.org, 2010. [Tra95] Will Tracz. Dssa (domain-specic software architecture): Peda- gogical example. ACM SIGSOFT Software Engineering Notes, 20, 1995. [Val98] Greg A. Valentine. Damage to structures by pyroclastic ows and surges, inferred from nuclear weapons eects. Journal of Volcanology and Geothermal Research, 87(1-4):117{140, 1998. [Wal05] Chip Walter. Kryder's law. Scientic American, August 2005. [Wil09a] Greg Wilson. How do scientists really use computers? American Scientist, 97(5), September/October 2009. [Wil09b] Greg Wilson. How do scientists really use computers? American Scientist, 97(5), September/October 2009. [WMGM08] David Woollard, Nenad Medvidovic, Yolanda Gil, and Chris Mattmann. Scientic software as work ows: From discovery to distribution. IEEE Software, 25(4):37{43, 2008. [YB05] J. Yu and R. Buyya. A taxonomy of scientic work ow systems for grid computing. Special Issue on Scientic Work ows, ACM SIGMOD Record, 34(3), 2005. 107 [ZGS06] Jun Zhao, Carole Goble, and Robert Stevens. An identity crisis in the life sciences. In 1st International Provenance and Anno- tation Workshop (IPAW'06), 2006. [ZWG + 04] Jun Zhao, Chris Wroe, Carole Goble, Robert Stevens, Dennis Quan, and Mark Greenwood. Using semantic web technolo- gies for representing e-science provenance. In 3rd International Semantic Web Conference (ISWC2004), 2004. 108 46 eps = Math.abs(gamma 1.0); g for(int i=0;i<n; i++) x[ i ] = b[ i ]; double norma = 0.0; int init = 1325; 51 for(int i=0; i<n; i++) for(int j=0; j<n; j++)f init = 3125 init % 65536; a[ j ][ i ] = ( init 32768.0) / 16384.0; norma = (a[ j ][ i ] > norma) ? a[ j ][ i ] : norma; 56 g for(int i=0 ;i<n; i++) b[ i ] = 0.0; for(int j=0 ;j<n; j++) for(int i=0;i<n; i++) b[ i ] += a[ j ][ i ]; 61 for(int i=0;i<n; i++) b[ i ] =b[ i ]; dmxpy(a,b,x,n); for(int i=0;i<n; i++)f resid = (resid > Math.abs(b[ i ])) ? resid : Math.abs(b[ i ]); normx = (normx > Math.abs(x[ i ])) ? normx : Math.abs(x[ i ]); 66 g resid = resid/(nnormanormxeps ); System.out. println(resid ); if (resid > 6.0) System. err . println("Validation failed"); g 71 private void dgefa(double a [][] , int ipvt [] , int n)f double[] col k , col j ; double t , dmax, dtemp;; 76 if (n>0)f for(int k=0; k< n1; k++)f col k = a[k]; int l = 0; if ((nk) >= 1)f 81 dmax = Math.abs(col k [k]); for(int i=1;i<nk; i++)f dtemp = Math.abs(col k [ i+k]); if (dtemp> dmax)f l = i ; 86 dmax = dtemp; g g l++; g 91 ipvt [k] = l ; if (col k [ l ] != 0)f if (l!=k)f t = col k [ l ]; col k [ l ] = col k [k]; 96 col k [k] = t; g t = 1.0 / col k [k]; if ((n(k+1))>0) for(int i=0;i<n(k+1);i++) 101 col k [ i+k+1] = t; for(int j=k+1;j<n; j++)f col j = a[ j ]; t = col j [ l ]; if (l!=k)f 106 col j [ l ] = col j [k]; col j [k] = t; g 110 if ((n(k+1))>0 && t!=0) 111 for(int i=0;i<(n(k+1));i++) col j [ i+k+1] = tcol k [ i+k+1]; g g g 116 g ipvt [n1] = n1; g private void dgesl(double a [][] , int ipvt [] , double b[] , int n)f 121 double t; if (n > 1)f for(int k=0;k<n1;k++)f int l = ipvt [k]; t = b[ l ]; 126 if (l!=k)f b[ l ] = b[k]; b[k] = t; g if ((n(k+1))>0 && t!=0) 131 for(int i=0;i<n(k+1);i++) b[ i+k+1]= ta[k][ i+k+1]; g for(int kb = 0; kb< n; kb++)f int k = n(kb+1); 136 b[k] /= a[k][k]; t =b[k]; if (k>0 && t!=0) for(int i=0;i<k; i++) b[ i ] = ta[k][ i ]; 141 g g g private void dmxpy(double a [][] , double b[] , double x[] , int n)f 146 for(int i=0;i<n; i++) for(int j=0;j<n; j++) b[ i ] += x[ i ] a[ j ][ i ]; g 151 public static void main(String [] args)f LUD lud = new LUD(); lud. init (500); lud.run(); lud. validate (); 156 g g 111 A.2 Crypt Source Code / KADRE Benchmark Project 3 Crypt / package edu.usc. softarch .kadre.bench. crypt ; import java. util .Random; 8 public class Crypt f private int array rows ; 13 private byte[] plain1 ; private byte[] crypt1; private byte[] plain2 ; private short [] userkey; 18 private int [] Z; private int [] DK; public void initialize (int size) f array rows = size ; 23 Random rndnum = new Random(136506717L); plain1 = new byte[ array rows ]; 28 crypt1 = new byte[ array rows ]; plain2 = new byte[ array rows ]; userkey = new short [8]; Z = new int [52]; 33 DK = new int [52]; for (int i = 0; i < 8; i++) f userkey [ i ] = (short) rndnum. nextInt (); g 38 calcEncryptKey (); calcDecryptKey (); for (int i = 0; i < array rows ; i++) f 43 plain1 [ i ] = (byte) i ; g g public void run() f 48 cipher idea(plain1 , crypt1 , Z); cipher idea(crypt1 , plain2 , DK); g public void validate () f 53 boolean error ; error = false ; for (int i = 0; i < array rows ; i++)f error = (plain1 [ i ] != plain2 [ i ]); if (error)f 58 System.out. println("Validation failed"); System.out. println("Original Byte " + i + " = " + plain1 [ i ]); 112 System.out. println("Encrypted Byte " + i + " = " + crypt1[ i ]); System.out. println("Decrypted Byte " + i + " = " + plain2 [ i ]); g 63 g g private void cipher idea(byte[] text1 , byte[] text2 , int [] key) f 68 int i1 = 0; int i2 = 0; int ik ; int x1, x2, x3, x4, t1 , t2; int r; 73 for (int i = 0; i < text1 . length ; i += 8) f ik = 0; r = 8; 78 x1 = text1 [ i1++] & 0xff ; x1 j= (text1 [ i1++] & 0xff) << 8; x2 = text1 [ i1++] & 0xff ; x2 j= (text1 [ i1++] & 0xff) << 8; x3 = text1 [ i1++] & 0xff ; 83 x3 j= (text1 [ i1++] & 0xff) << 8; x4 = text1 [ i1++] & 0xff ; x4 j= (text1 [ i1++] & 0xff) << 8; do f 88 x1 = (int) ((long) x1 key[ ik++] % 0x10001L & 0xffff ); x2 = x2 + key[ ik++] & 0xffff ; x3 = x3 + key[ ik++] & 0xffff ; x4 = (int) ((long) x4 key[ ik++] % 0x10001L & 0xffff ); t2 = x1 ^ x3; 93 t2 = (int) ((long) t2 key[ ik++] % 0x10001L & 0xffff ); t1 = t2 + (x2 ^ x4) & 0xffff ; t1 = (int) ((long) t1 key[ ik++] % 0x10001L & 0xffff ); t2 = t1 + t2 & 0xffff ; x1 ^= t1; 98 x4 ^= t2; t2 ^= x2; x2 = x3 ^ t1; x3 = t2; 103 g while (r != 0); x1 = (int) ((long) x1 key[ ik++] % 0x10001L & 0xffff ); x3 = x3 + key[ ik++] & 0xffff ; x2 = x2 + key[ ik++] & 0xffff ; 108 x4 = (int) ((long) x4 key[ ik++] % 0x10001L & 0xffff ); text2 [ i2++] = (byte) x1; text2 [ i2++] = (byte) (x1 >>> 8); text2 [ i2++] = (byte) x3; 113 text2 [ i2++] = (byte) (x3 >>> 8); text2 [ i2++] = (byte) x2; text2 [ i2++] = (byte) (x2 >>> 8); text2 [ i2++] = (byte) x4; text2 [ i2++] = (byte) (x4 >>> 8); 118 g g 113 123 private void calcEncryptKey() f int j ; for (int i = 0; i < 52; i++) Z[ i ] = 0; 128 for (int i = 0; i < 8; i++) f Z[ i ] = userkey [ i ] & 0xffff ; g 133 for (int i = 8; i < 52; i++) f j = i % 8; if (j < 6) f Z[ i ] = ((Z[ i 7] >>> 9) j (Z[ i 6] << 7)) & 0xFFFF; 138 continue; g if (j == 6) f Z[ i ] = ((Z[ i 7] >>> 9) j (Z[ i 14] << 7)) & 0xFFFF; 143 continue; g Z[ i ] = ((Z[ i 15] >>> 9) j (Z[ i 14] << 7)) & 0xFFFF; g 148 g private void calcDecryptKey() f int j , k; int t1 , t2 , t3; 153 t1 = inv(Z[0]); t2 =Z[1] & 0xffff ; t3 =Z[2] & 0xffff ; 158 DK[51] = inv(Z[3]); DK[50] = t3; DK[49] = t2; DK[48] = t1; 163 j = 47; k = 4; for (int i = 0; i < 7; i++) f t1 = Z[k++]; DK[j] = Z[k++]; 168 DK[j] = t1; t1 = inv(Z[k++]); t2 =Z[k++] & 0xffff ; t3 =Z[k++] & 0xffff ; DK[j] = inv(Z[k++]); 173 DK[j] = t2; DK[j] = t3; DK[j] = t1; g 178 t1 = Z[k++]; DK[j] = Z[k++]; DK[j] = t1; t1 = inv(Z[k++]); t2 =Z[k++] & 0xffff ; 183 t3 =Z[k++] & 0xffff ; DK[j] = inv(Z[k++]); DK[j] = t3; 114 DK[j] = t2; DK[j] = t1; 188 g private int inv(int x) f int t0 , t1; int q, y; 193 if (x <= 1) return (x); t1 = 0x10001 / x; 198 y = 0x10001 % x; if (y == 1) return ((1 t1) & 0xFFFF); t0 = 1; 203 do f q = x / y; x = x % y; t0 += q t1; if (x == 1) 208 return (t0 ); q = y / x; y = y % x; t1 += q t0; g while (y != 1); 213 return ((1 t1) & 0xFFFF); g public static void main(String [] args) f 218 Crypt c = new Crypt(); c. initialize (500); c.run(); c. validate (); 223 g g 115 A.3 Sparse Source Code / KADRE Benchmark Project Sparse / 5 package edu.usc. softarch .kadre.bench. sparse ; import java. util .Random; public class Sparse f 10 private static final long RANDOMSEED = 10101010; private Random R = new Random(RANDOMSEED); private double[] x; 15 private double[] y; private double[] val ; private int [] col ; private int [] row; private double ytotal ; 20 public void initialize (int size m , int size n , int size nz) f x = RandomVector(size n , R); 25 y = new double[ size m ]; val = new double[ size nz ]; col = new int[ size nz ]; row = new int[ size nz ]; 30 ytotal = 0.0; for (int i=0; i<size nz ; i++) f row[ i ] = Math.abs(R. nextInt ()) % size m ; 35 col [ i ] = Math.abs(R. nextInt ()) % size n ; val [ i ] = R.nextDouble (); g g 40 public void run(int iter ) f int nz = val . length ; for (int reps=0; reps<iter ; reps++) f for (int i=0; i<nz; i++) f 45 y[ row[ i ] ] += x[ col [ i ] ] val [ i ]; g g for (int i=0; i<nz; i++) f 50 ytotal += y[ row[ i ] ]; g g 55 public void validate () f double dev = Math.abs(ytotal 75.02484945753453); if (dev > 1.0e12 )f System.out. println("Validation failed"); System.out. println("ytotal = " + ytotal + " " + dev); 116 60 g g private static double[] RandomVector(int N, java. util .Random R)f double A[] = new double[N]; 65 for (int i=0; i<N; i++) A[ i ] = R.nextDouble() 1e6; return A; 70 g public static void main(String [] args) f Sparse s = new Sparse (); 75 s. initialize (50000,50000,250000); s.run(200); s. validate (); g 80 g 117 A.4 FFT Source Code / KADRE Benchmark Project FFT / 5 package edu.usc. softarch .kadre.bench. fft ; import java. util .Random; public class Fft f 10 private double[] x; private static final long RANDOMSEED = 10101010; private Random R; private double total = 0.0; 15 private double totali = 0.0; public void initialize (int size)f R = new Random(RANDOMSEED); x = RandomVector(2( size ),R); 20 g public void run()f transform(x); inverse(x); 25 g public void validate ()f double refval = 1.726962988395339; double refvali = 2.0974756152524314; 30 double dev = Math.abs(total refval ); double devi = Math.abs( totali refvali ); if (dev > 1.0e12 )f System.out. println("Validation failed"); 35 System.out. println("total = " + total + " " + dev); g if (devi > 1.0e12 )f System.out. println("Validation failed"); System.out. println("totalinverse = " + totali + " " + dev); 40 g g private static double[] RandomVector(int N, java. util .Random R) f double A[] = new double[N]; 45 for (int i=0; i<N; i++) A[ i ] = R.nextDouble() 1e6; return A; 50 g private void transform(double data [])f transform internal(data,1); 55 for(int i=0;i<data. length ; i++) total += data[ i ]; g private void transform internal(double data [] , int direction) f 118 60 int n = data. length/2; if (n == 1) return; int logn = log2(n); bitreverse(data); 65 for (int bit = 0, dual = 1; bit < logn; bit++, dual = 2) f double w real = 1.0; double w imag = 0.0; 70 double theta = 2.0 direction Math.PI / (2.0 (double) dual ); double s = Math. sin(theta ); double t = Math. sin(theta / 2.0); double s2 = 2.0 t t; 75 / a = 0 / for (int b = 0; b < n; b += 2 dual) f int i = 2b ; int j = 2(b + dual ); 80 double wd real = data[ j ] ; double wd imag = data[ j+1] ; data[ j ] = data[ i ] wd real ; data[ j+1] = data[ i+1] wd imag; 85 data[ i ] += wd real ; data[ i+1]+= wd imag; g for (int a = 1; a < dual; a++) f 90 f double tmp real = w real s w imag s2 w real ; double tmp imag = w imag + s w real s2 w imag; w real = tmp real ; w imag = tmp imag; 95 g for (int b = 0; b < n; b += 2 dual) f int i = 2(b + a); int j = 2(b + a + dual ); 100 double z1 real = data[ j ]; double z1 imag = data[ j+1]; double wd real = w real z1 real w imag z1 imag; double wd imag = w real z1 imag + w imag z1 real ; 105 data[ j ] = data[ i ] wd real ; data[ j+1] = data[ i+1] wd imag; data[ i ] += wd real ; data[ i+1]+= wd imag; 110 g g g g 115 public static void inverse (double data []) f transform internal(data , +1); int nd=data. length ; int n =nd/2; 120 double norm=1/((double) n); for(int i=0; i<nd; i++) data[ i ] = norm; 119 for(int i=0;i<data. length ; i++) 125 totali += data[ i ]; g private int log2 (int n)f int log = 0; 130 for(int k=1; k < n; k = 2, log++); if (n != (1 << log))f System. err . println("FFT: Data length is not a power of 2!: "+n); System. exit (1); g 135 return log ; g private void bitreverse(double data []) f int n=data. length/2; 140 for (int i = 0, j=0; i < n 1; i++) f int ii = 2i ; int jj = 2j ; int k = n / 2 ; if (i < j) f 145 double tmp real = data[ ii ]; double tmp imag = data[ ii +1]; data[ ii ] = data[ jj ]; data[ ii +1] = data[ jj +1]; data[ jj ] = tmp real ; 150 data[ jj+1] = tmp imag; g while (k <= j) f j = j k ; k = k / 2 ; g 155 j += k ; g g g 160 public static void main(String [] args) f Fft fft = new Fft (); 165 fft . initialize (2097152); fft .run(); fft . validate (); g g 120 A.5 Euler Source Code 1 / KADRE Benchmark Project Euler / package edu.usc. softarch .kadre.bench. euler ; 6 public class Euler extends Tunnel f public void setsize (int size) f this. size = size ; 11 g public void init () f try f 16 initialise (); g catch (FileNotFoundException e) f System. err . println("Could not find file tunnel .dat"); System. exit (0); g catch (IOException e) f 21 System. err . println("IOException in initialisation"); System. exit (0); g g 26 public void application () f runiters (); 31 g public void validate () f double refval [] = f 0.0033831416599344965, 0.006812543658280322 g; double dev = Math.abs(error refval [ size ]); 36 if (dev > 1.0e12) f System.out. println("Validation failed"); System.out. println("Computed RMS pressure error = " + error ); System.out. println("Reference value = " + refval [ size ]); g 41 g public void tidyup() f a = null; deltat = null; 46 opg = null; pg = null; pg1 = null; sxi = null; seta = null; 51 tg = null; tg1 = null; xnode = null; ynode = null; d = null; 56 f = null; g = null; r = null; ug1 = null; 121 ug = null; 61 System.gc(); g public void run(int size) f 66 setsize (size ); init (); application (); validate (); 71 tidyup (); g public static void main(String [] argv) f Euler e = new Euler (); 76 e.run(0); g g 122 1 / KADRE Benchmark Project Euler / package edu.usc. softarch .kadre.bench. euler ; 6 import java. io .; public class Tunnelf 11 private int size ; private int [] datasizes ; private double machff; public double secondOrderDamping; 16 public double fourthOrderDamping; public int ntime; private int scale ; private double error ; 21 private double [][] a; private double [][] deltat ; private double [][] opg; private double [][] pg; private double [][] pg1; 26 private double [][] sxi ; private double [][] seta ; private double [][] tg; private double [][] tg1; private double [][] xnode; 31 private double [][] ynode; private double [ ] [ ] [ ] oldval ; private double [ ] [ ] [ ] newval; 36 private double cff ; private double uff ; private double vff ; private double pff ; private double rhoff ; 41 private double tff ; private double jplusff ; private double jminusff ; private double datamax; 46 private double datamin; private int iter ; private int imax; private int jmax; private int imaxin 51 private int jmaxin; private int nf = 6; private Statevector [][] d; private Statevector [][] f ; private Statevector [][] g; 56 private Statevector [][] r; private Statevector [][] ug1; private Statevector [][] ug; private double Cp; 61 private double Cv; private double gamma; private double rgas ; 123 private double fourthOrderNormalizer ; private double secondOrderNormalizer; 66 public void initialise () throws IOException , FileNotFoundExceptionf 71 datasizes = f8,12g; machff = 0.7; secondOrderDamping = 1.0; fourthOrderDamping = 1.0; 76 ntime = 1; iter = 100; nf = 6; Cp = 1004.5; Cv = 717.5; 81 gamma = 1.4; rgas = 287.0; fourthOrderNormalizer = 0.02; secondOrderNormalizer = 0.02; 86 int i ; int j ; int k; int n; double scrap ; 91 double scrap2 ; scale = datasizes [ size ]; FileReader instream = new FileReader("tunnel .dat"); 96 StreamTokenizer intokens = new StreamTokenizer(instream ); if (intokens .nextToken() == StreamTokenizer.TTNUMBER) imaxin = (int) intokens .nval; 101 else throw new IOException (); if (intokens .nextToken() == StreamTokenizer.TTNUMBER) jmaxin = (int) intokens .nval; else 106 throw new IOException (); oldval = new double[nf ][ imaxin+1][jmaxin+1]; for (i=0;i<imaxin; i++)f 111 for (j=0;j<jmaxin; j++)f for (k=0; k<nf; k++)f if (intokens .nextToken() == StreamTokenizer.TTNUMBER)f oldval [k][ i ][ j]= (double) intokens .nval; g 116 elsef throw new IOException (); g g g 121 g imax = (imaxin 1) scale + 1; jmax = (jmaxin 1) scale + 1; 126 newval = new double[nf ][imax][jmax]; 124 for (k=0; k<nf; k++)f for (i=0;i<imax; i++)f for (j=0;j<jmax; j++)f 131 int iold = i/scale ; int jold = j/scale ; double xf = ( (double) i%scale) /( (double) scale ); double yf = ( (double) j%scale) /( (double) scale ); newval[k][ i ][ j ] = (1.0 xf)(1.0 yf) oldval [k][ iold ][ jold ] 136 +(1.0 xf) yf oldval [k][ iold ][ jold+1] + xf (1.0 yf) oldval [k][ iold+1][jold ] + xf yf oldval [k][ iold+1][jold+1]; g g 141 g deltat = new double[imax+1][jmax+2]; opg = new double[imax+2][jmax+2]; pg = new double[imax+2][jmax+2]; 146 pg1 = new double[imax+2][jmax+2]; sxi =new double[imax+2][jmax+2];; seta = new double[imax+2][jmax+2];; tg = new double[imax+2][jmax+2]; tg1 = new double[imax+2][jmax+2]; 151 ug = new Statevector [imax+2][jmax+2]; a = new double[imax][jmax]; d = new Statevector [imax+2][jmax+2]; f = new Statevector [imax+2][jmax+2]; g = new Statevector [imax+2][jmax+2]; 156 r = new Statevector [imax+2][jmax+2]; ug1 = new Statevector [imax+2][jmax+2]; xnode = new double[imax][jmax]; ynode = new double[imax][jmax]; 161 for (i = 0; i < imax+2; ++i) for (j = 0; j < jmax+2; ++j) f d[ i ][ j ] = new Statevector (); f [ i ][ j ] = new Statevector (); g[ i ][ j ] = new Statevector (); 166 r[ i ][ j ] = new Statevector (); ug[ i ][ j ] = new Statevector (); ug1[ i ][ j ] = new Statevector (); g 171 cff = 1.0; vff = 0.0; pff = 1.0 / gamma; rhoff = 1.0; tff = pff / (rhoff rgas ); 176 for (i=0; i<imax; i++)f for (j=0; j<jmax; j++)f xnode[ i ][ j ] = newval [0][ i ][ j ]; 181 ynode[ i ][ j ] = newval [1][ i ][ j ]; ug[ i+1][j+1].a = newval [2][ i ][ j ]; ug[ i+1][j+1].b = newval [3][ i ][ j ]; ug[ i+1][j+1].c = newval [4][ i ][ j ]; ug[ i+1][j+1].d = newval [5][ i ][ j ]; 186 scrap = ug[ i+1][j+1].c/ug[ i+1][j+1].a; scrap2 = ug[ i+1][j+1].b/ug[ i+1][j+1].a; tg[ i+1][j+1] = ug[ i+1][j+1].d/ug[ i+1][j+1].a 125 (0.5 (scrapscrap + scrap2scrap2 )); 191 tg[ i+1][j+1] = tg[ i+1][j+1] / Cv; pg[ i+1][j+1] = rgas ug[ i+1][j+1].a tg[ i+1][j+1]; g g 196 for (i = 1; i < imax; ++i) for (j = 1; j < jmax; ++j) a[ i ][ j ] = 0.5 ((xnode[ i ][ j ] xnode[i1][j1]) (ynode[i1][j ] ynode[ i ][ j1]) 201 (ynode[ i ][ j ] ynode[i1][j1]) (xnode[i1][j ] xnode[ i ][ j1])); oldval = newval = null; g 206 public void doIteration () f double scrap ; int i , j ; 211 for (i = 1; i < imax; ++i) for (j = 1; j < jmax; ++j) f opg[ i ][ j ] = pg[ i ][ j ]; g 216 calculateDummyCells(pg, tg , ug); calculateDeltaT (); calculateDamping(pg, ug); 221 calculateF(pg, tg , ug); calculateG(pg, tg , ug); calculateR (); 226 for (i = 1; i < imax; ++i) for (j = 1; j < jmax; ++j) f ug1[ i ][ j ].a=ug[ i ][ j ].a0.25deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].ad[ i ][ j ].a); ug1[ i ][ j ].b=ug[ i ][ j ].b0.25deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].bd[ i ][ j ].b); ug1[ i ][ j ]. c=ug[ i ][ j ].c0.25deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].cd[ i ][ j ]. c); 231 ug1[ i ][ j ].d=ug[ i ][ j ].d0.25deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].dd[ i ][ j ].d); g calculateStateVar(pg1, tg1 , ug1); calculateDummyCells(pg1, tg1 , ug1); 236 calculateF(pg1, tg1 , ug1); calculateG(pg1, tg1 , ug1); calculateR (); for (i = 1; i < imax; ++i) for (j = 1; j < jmax; ++j) f 241 ug1[ i ][ j ].a= ug[ i ][ j ].a0.33333deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].ad[ i ][ j ].a); ug1[ i ][ j ].b= ug[ i ][ j ].b0.33333deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].bd[ i ][ j ].b); ug1[ i ][ j ]. c= 246 ug[ i ][ j ].c0.33333deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].cd[ i ][ j ]. c); ug1[ i ][ j ].d= ug[ i ][ j ].d0.33333deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].dd[ i ][ j ].d); g calculateStateVar(pg1, tg1 , ug1); 251 calculateDummyCells(pg1, tg1 , ug1); 126 calculateF(pg1, tg1 , ug1); calculateG(pg1, tg1 , ug1); calculateR (); 256 for (i = 1; i < imax; ++i) for (j = 1; j < jmax; ++j) f ug1[ i ][ j ].a= ug[ i ][ j ].a0.5deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].ad[ i ][ j ].a); ug1[ i ][ j ].b= 261 ug[ i ][ j ].b0.5deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].bd[ i ][ j ].b); ug1[ i ][ j ]. c= ug[ i ][ j ].c0.5deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].cd[ i ][ j ]. c); ug1[ i ][ j ].d= ug[ i ][ j ].d0.5deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].dd[ i ][ j ].d); 266 g calculateStateVar(pg1, tg1 , ug1); calculateDummyCells(pg1, tg1 , ug1); 271 calculateF(pg1, tg1 , ug1); calculateG(pg1, tg1 , ug1); calculateR (); for (i = 1; i < imax; ++i) for (j = 1; j < jmax; ++j) f 276 ug[ i ][ j ].a= deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].ad[ i ][ j ].a); ug[ i ][ j ].b= deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].bd[ i ][ j ].b); ug[ i ][ j ]. c = deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].cd[ i ][ j ]. c); ug[ i ][ j ].d= deltat [ i ][ j ]/a[ i ][ j ](r[ i ][ j ].dd[ i ][ j ].d); g 281 calculateStateVar(pg, tg , ug); error = 0.0; for (i = 1; i < imax; ++i) for (j = 1; j < jmax; ++j) f 286 scrap = pg[ i ][ j]opg[ i ][ j ]; error += scrapscrap ; g error = Math. sqrt(error / (double)((imax1) (jmax1)) ); g 291 private void calculateStateVar(double [][] localpg , double [][] localtg , Statevector [][] localug)f 296 double temp, temp2; int i , j ; for (i = 1; i < imax; ++i) f for (j = 1; j < jmax; ++j) f 301 temp = localug [ i ][ j ].b; temp2 = localug [ i ][ j ]. c; localtg [ i ][ j ] = localug [ i ][ j ].d/localug [ i ][ j ].a 0.5 (temptemp + temp2temp2)/(localug [ i ][ j ].alocalug [ i ][ j ].a); 306 localtg [ i ][ j ] = localtg [ i ][ j ] / Cv; localpg [ i ][ j ] = localug [ i ][ j ].a rgas localtg [ i ][ j ]; g g g 311 private void calculateR() f double deltax , deltay ; 127 316 double temp; int i , j ; Statevector scrap ; for (i = 1; i < imax; ++i) f 321 for (j = 1; j < jmax; ++j) f r[ i ][ j ].a = 0.0; r[ i ][ j ].b = 0.0; r[ i ][ j ]. c = 0.0; 326 r[ i ][ j ].d = 0.0; deltay = (ynode[ i ][ j ] ynode[ i ][ j1]); deltax = (xnode[ i ][ j ] xnode[ i ][ j1]); temp = 0.5 deltay ; 331 r[ i ][ j ].a += temp(f [ i ][ j ].a + f [ i+1][j ].a); r[ i ][ j ].b += temp(f [ i ][ j ].b + f [ i+1][j ].b); r[ i ][ j ]. c += temp(f [ i ][ j ]. c + f [ i+1][j ]. c); r[ i ][ j ].d += temp(f [ i ][ j ].d + f [ i+1][j ].d); 336 temp = 0.5deltax ; r[ i ][ j ].a += temp (g[ i ][ j ].a+g[ i+1][j ].a); r[ i ][ j ].b += temp (g[ i ][ j ].b+g[ i+1][j ].b); r[ i ][ j ]. c += temp (g[ i ][ j ]. c+g[ i+1][j ]. c); r[ i ][ j ].d += temp (g[ i ][ j ].d+g[ i+1][j ].d); 341 deltay = (ynode[ i ][ j1] ynode[i1][j1]); deltax = (xnode[ i ][ j1] xnode[i1][j1]); temp = 0.5 deltay ; 346 r[ i ][ j ].a += temp(f [ i ][ j ].a+f [ i ][ j1].a); r[ i ][ j ].b += temp(f [ i ][ j ].b+f [ i ][ j1].b); r[ i ][ j ]. c += temp(f [ i ][ j ]. c+f [ i ][ j1].c); r[ i ][ j ].d += temp(f [ i ][ j ].d+f [ i ][ j1].d); 351 temp = 0.5deltax ; r[ i ][ j ].a += temp (g[ i ][ j ].a+g[ i ][ j1].a); r[ i ][ j ].b += temp (g[ i ][ j ].b+g[ i ][ j1].b); r[ i ][ j ]. c += temp (g[ i ][ j ]. c+g[ i ][ j1].c); r[ i ][ j ].d += temp (g[ i ][ j ].d+g[ i ][ j1].d); 356 deltay = (ynode[i1][j1] ynode[i1][j ]); deltax = (xnode[i1][j1] xnode[i1][j ]); temp = 0.5 deltay ; 361 r[ i ][ j ].a += temp(f [ i ][ j ].a+f [i1][j ].a); r[ i ][ j ].b += temp(f [ i ][ j ].b+f [i1][j ].b); r[ i ][ j ]. c += temp(f [ i ][ j ]. c+f [i1][j ]. c); r[ i ][ j ].d += temp(f [ i ][ j ].d+f [i1][j ].d); 366 temp = 0.5deltax ; r[ i ][ j ].a += temp (g[ i ][ j ].a+g[i1][j ].a); r[ i ][ j ].b += temp (g[ i ][ j ].b+g[i1][j ].b); r[ i ][ j ]. c += temp (g[ i ][ j ]. c+g[i1][j ]. c); r[ i ][ j ].d += temp (g[ i ][ j ].d+g[i1][j ].d); 371 deltay = (ynode[i1][j ] ynode[ i ][ j ]); deltax = (xnode[i1][j ] xnode[ i ][ j ]); temp = 0.5 deltay ; 376 r[ i ][ j ].a += temp(f [ i ][ j ].a+f [ i+1][j ].a); r[ i ][ j ].b += temp(f [ i ][ j ].b+f [ i+1][j ].b); r[ i ][ j ]. c += temp(f [ i ][ j ]. c+f [ i+1][j ]. c); 128 r[ i ][ j ].d += temp(f [ i ][ j ].d+f [ i+1][j ].d); 381 temp = 0.5deltax ; r[ i ][ j ].a += temp (g[ i ][ j ].a+g[ i ][ j+1].a); r[ i ][ j ].b += temp (g[ i ][ j ].b+g[ i ][ j+1].b); r[ i ][ j ]. c += temp (g[ i ][ j ]. c+g[ i ][ j+1].c); r[ i ][ j ].d += temp (g[ i ][ j ].d+g[ i ][ j+1].d); 386 g g g 391 private void calculateG(double [][] localpg , double [][] localtg , Statevector [][] localug) f double temp, temp2, temp3; 396 double v; int i , j ; for (i = 0; i < imax + 1; ++i) f for (j = 0; j < jmax + 1; ++j) f 401 v = localug [ i ][ j ]. c / localug [ i ][ j ].a; g[ i ][ j ].a = localug [ i ][ j ]. c; g[ i ][ j ].b = localug [ i ][ j ].b v; g[ i ][ j ]. c = localug [ i ][ j ]. cv + localpg [ i ][ j ]; temp = localug [ i ][ j ].b localug [ i ][ j ].b; 406 temp2 = localug [ i ][ j ]. c localug [ i ][ j ]. c; temp3 = localug [ i ][ j ].a localug [ i ][ j ].a; g[ i ][ j ].d = localug [ i ][ j ]. c (Cp localtg [ i ][ j]+ (0.5 (temp + temp2)/(temp3))); g 411 g g private void calculateF(double [][] localpg , double [][] localtg , 416 Statevector [][] localug) f f double u; double temp1, temp2, temp3; int i , j ; 421 for (i = 0; i < imax + 1; ++i) f for (j = 0; j < jmax + 1; ++j) f u = localug [ i ][ j ].b/ localug [ i ][ j ].a; f [ i ][ j ].a = localug [ i ][ j ].b; 426 f [ i ][ j ].b = localug [ i ][ j ].b u + localpg [ i ][ j ]; f [ i ][ j ]. c = localug [ i ][ j ]. c u; temp1 = localug [ i ][ j ].b localug [ i ][ j ].b; temp2 = localug [ i ][ j ]. c localug [ i ][ j ]. c; temp3 = localug [ i ][ j ].a localug [ i ][ j ].a; 431 f [ i ][ j ].d = localug [ i ][ j ].b (Cp localtg [ i ][ j ] + (0.5 (temp1 + temp2)/(temp3))); g g g 436 g private void calculateDamping(double [][] localpg , Statevector [][] localug) f double adt , sbar; double nu2; 441 double nu4; 129 double tempdouble; int ascrap , i , j ; Statevector temp = new Statevector (); Statevector temp2 = new Statevector (); 446 Statevector scrap2 = new Statevector () , scrap4 = new Statevector (); nu2 = secondOrderDamping secondOrderNormalizer; nu4 = fourthOrderDamping fourthOrderNormalizer ; 451 for (i = 1; i < imax; ++i) for (j = 1; j < jmax; ++j) f sxi [ i ][ j ] = Math.abs(localpg [ i+1][j ] 2.0 localpg [ i ][ j ] + localpg [i1][j ])/ localpg [ i ][ j ]; seta [ i ][ j ] = Math.abs(localpg [ i ][ j+1] 456 2.0 localpg [ i ][ j ] + localpg [ i ][ j1]) / localpg [ i ][ j ]; g for (i = 1; i < imax; ++i) f for (j = 1; j < jmax; ++j) f 461 if (i > 1 && i < imax1) f adt = (a[ i ][ j ] + a[ i+1][j ]) / (deltat [ i ][ j ] + deltat [ i+1][j ]); sbar = (sxi [ i+1][j ] + sxi [ i ][ j ]) 0.5; g 466 else f adt = a[ i ][ j ]/ deltat [ i ][ j ]; sbar = sxi [ i ][ j ]; g tempdouble = nu2sbaradt; 471 scrap2 .a = tempdouble (localug [ i+1][j ].alocalug [ i ][ j ].a); scrap2 .b = tempdouble (localug [ i+1][j ].blocalug [ i ][ j ].b); scrap2 .c = tempdouble (localug [ i+1][j ].clocalug [ i ][ j ]. c); scrap2 .d = tempdouble (localug [ i+1][j ].dlocalug [ i ][ j ].d); 476 if (i > 1 && i < imax1) f temp = localug [ i+2][j ]. svect(localug [i1][j ]); temp2.a = 3.0( localug [ i ][ j ].alocalug [ i+1][j ].a); temp2.b = 3.0( localug [ i ][ j ].blocalug [ i+1][j ].b); 481 temp2.c = 3.0( localug [ i ][ j ].clocalug [ i+1][j ]. c); temp2.d = 3.0( localug [ i ][ j ].dlocalug [ i+1][j ].d); tempdouble =nu4adt; scrap4 .a = tempdouble(temp.a+temp2.a); 486 scrap4 .b = tempdouble(temp.a+temp2.b); scrap4 .c = tempdouble(temp.a+temp2.c); scrap4 .d = tempdouble(temp.a+temp2.d); g else f 491 scrap4 .a = 0.0; scrap4 .b = 0.0; scrap4 .c = 0.0; scrap4 .d = 0.0; g 496 temp.a = scrap2 .a + scrap4 .a; temp.b = scrap2 .b + scrap4 .b; temp.c = scrap2 .c + scrap4 .c; temp.d = scrap2 .d + scrap4 .d; 501 d[ i ][ j ] = temp; if (i > 1 && i < imax1) f adt = (a[ i ][ j ] + a[i1][j ]) / (deltat [ i ][ j ] + deltat [i1][j ]); 130 sbar = (sxi [ i ][ j ] + sxi [i1][j ]) 0.5; 506 g else f adt = a[ i ][ j ]/ deltat [ i ][ j ]; sbar = sxi [ i ][ j ]; g 511 tempdouble =nu2sbaradt; scrap2 .a = tempdouble (localug [ i ][ j ].alocalug [i1][j ].a); scrap2 .b = tempdouble (localug [ i ][ j ].blocalug [i1][j ].b); scrap2 .c = tempdouble (localug [ i ][ j ].clocalug [i1][j ]. c); 516 scrap2 .d = tempdouble (localug [ i ][ j ].dlocalug [i1][j ].d); if (i > 1 && i < imax1) f temp = localug [ i+1][j ]. svect(localug [i2][j ]); 521 temp2.a = 3.0( localug [i1][j ].alocalug [ i ][ j ].a); temp2.b = 3.0( localug [i1][j ].blocalug [ i ][ j ].b); temp2.c = 3.0( localug [i1][j ].clocalug [ i ][ j ]. c); temp2.d = 3.0( localug [i1][j ].dlocalug [ i ][ j ].d); 526 tempdouble = nu4adt; scrap4 .a = tempdouble(temp.a+temp2.a); scrap4 .b = tempdouble(temp.a+temp2.b); scrap4 .c = tempdouble(temp.a+temp2.c); scrap4 .d = tempdouble(temp.a+temp2.d); 531 g else f scrap4 .a = 0.0; scrap4 .b = 0.0; scrap4 .c = 0.0; 536 scrap4 .d = 0.0; g d[ i ][ j ].a += scrap2 .a + scrap4 .a; d[ i ][ j ].b += scrap2 .b + scrap4 .b; 541 d[ i ][ j ]. c += scrap2 .c + scrap4 .c; d[ i ][ j ].d += scrap2 .d + scrap4 .d; if (j > 1 && j < jmax1) f adt = (a[ i ][ j ] + a[ i ][ j+1]) / (deltat [ i ][ j ] + deltat [ i ][ j +1]); 546 sbar = (seta [ i ][ j ] + seta [ i ][ j+1]) 0.5; g else f adt = a[ i ][ j ]/ deltat [ i ][ j ]; sbar = seta [ i ][ j ]; 551 g tempdouble = nu2sbaradt; scrap2 .a = tempdouble (localug [ i ][ j+1].alocalug [ i ][ j ].a); scrap2 .b = tempdouble (localug [ i ][ j+1].blocalug [ i ][ j ].b); scrap2 .c = tempdouble (localug [ i ][ j+1].clocalug [ i ][ j ]. c); 556 scrap2 .d = tempdouble (localug [ i ][ j+1].dlocalug [ i ][ j ].d); if (j > 1 && j < jmax1) f temp = localug [ i ][ j+2].svect(localug [ i ][ j1]); temp2.a = 3.0( localug [ i ][ j ].alocalug [ i ][ j+1].a); 561 temp2.b = 3.0( localug [ i ][ j ].blocalug [ i ][ j+1].b); temp2.c = 3.0( localug [ i ][ j ].clocalug [ i ][ j+1].c); temp2.d = 3.0( localug [ i ][ j ].dlocalug [ i ][ j+1].d); tempdouble =nu4adt; 566 scrap4 .a = tempdouble(temp.a+temp2.a); scrap4 .b = tempdouble(temp.a+temp2.b); 131 scrap4 .c = tempdouble(temp.a+temp2.c); scrap4 .d = tempdouble(temp.a+temp2.d); g 571 else f scrap4 .a = 0.0; scrap4 .b = 0.0; scrap4 .c = 0.0; scrap4 .d = 0.0; 576 g d[ i ][ j ].a += scrap2 .a + scrap4 .a; d[ i ][ j ].b += scrap2 .b + scrap4 .b; d[ i ][ j ]. c += scrap2 .c + scrap4 .c; d[ i ][ j ].d += scrap2 .d + scrap4 .d; 581 if (j > 1 && j < jmax1) f adt = (a[ i ][ j ] + a[ i ][ j1]) / (deltat [ i ][ j ] + deltat [ i ][ j1]); sbar = (seta [ i ][ j ] + seta [ i ][ j1]) 0.5; g 586 else f adt = a[ i ][ j ]/ deltat [ i ][ j ]; sbar = seta [ i ][ j ]; g tempdouble =nu2sbaradt; 591 scrap2 .a = tempdouble (localug [ i ][ j ].alocalug [ i ][ j1].a); scrap2 .b = tempdouble (localug [ i ][ j ].blocalug [ i ][ j1].b); scrap2 .c = tempdouble (localug [ i ][ j ].clocalug [ i ][ j1].c); scrap2 .d = tempdouble (localug [ i ][ j ].dlocalug [ i ][ j1].d); 596 if (j > 1 && j < jmax1) f temp = localug [ i ][ j+1].svect(localug [ i ][ j2]); temp2.a = 3.0( localug [ i ][ j1].alocalug [ i ][ j ].a); temp2.b = 3.0( localug [ i ][ j1].blocalug [ i ][ j ].b); temp2.c = 3.0( localug [ i ][ j1].clocalug [ i ][ j ]. c); 601 temp2.d = 3.0( localug [ i ][ j1].dlocalug [ i ][ j ].d); tempdouble = nu4adt; scrap4 .a = tempdouble(temp.a+temp2.a); scrap4 .b = tempdouble(temp.a+temp2.b); 606 scrap4 .c = tempdouble(temp.a+temp2.c); scrap4 .d = tempdouble(temp.a+temp2.d); g else f scrap4 .a = 0.0; 611 scrap4 .b = 0.0; scrap4 .c = 0.0; scrap4 .d = 0.0; g d[ i ][ j ].a += scrap2 .a + scrap4 .a; 616 d[ i ][ j ].b += scrap2 .b + scrap4 .b; d[ i ][ j ]. c += scrap2 .c + scrap4 .c; d[ i ][ j ].d += scrap2 .d + scrap4 .d; g g 621 g private void calculateDeltaT() f double xeta , yeta , xxi , yxi ; int i , j ; 626 double mint; double c, q, r; double safety factor = 0.7; for (i = 1; i < imax; ++i) 132 631 for (j = 1; j < jmax; ++j) f xxi = (xnode[ i ][ j ] xnode[i1][j ] + xnode[ i ][ j1] xnode[i1][j1]) 0.5; yxi = (ynode[ i ][ j ] ynode[i1][j ] + ynode[ i ][ j1] ynode[i1][j1]) 0.5; 636 xeta = (xnode[ i ][ j ] xnode[ i ][ j1] + xnode[i1][j ] xnode[i1][j1]) 0.5; yeta = (ynode[ i ][ j ] ynode[ i ][ j1] + ynode[i1][j ] ynode[i1][j1]) 0.5; 641 q = (yeta ug[ i ][ j ].b xeta ug[ i ][ j ]. c); r = (yxi ug[ i ][ j ].b + xxi ug[ i ][ j ]. c); c = Math. sqrt (gamma rgas tg[ i ][ j ]); deltat [ i ][ j ] = safety factor 2.8284 a[ i ][ j ] / 646 ( (Math.abs(q) + Math.abs(r))/ug[ i ][ j ].a + c Math. sqrt(xxixxi + yxiyxi + xetaxeta + yetayeta + 2.0 Math.abs(xetaxxi + yetayxi ))); g 651 if (ntime == 1) f mint = 100000.0; for (i = 1; i < imax; ++i) for (j = 1; j < jmax; ++j) 656 if (deltat [ i ][ j ] < mint) mint = deltat [ i ][ j ]; for (i = 1; i < imax; ++i) for (j = 1; j < jmax; ++j) 661 deltat [ i ][ j ] = mint; g g private void calculateDummyCells(double [][] localpg , double [][] localtg , 666 Statevector [][] localug) f double c; double jminus; double jplus ; double s; 671 double rho , temp, u, v; double scrap , scrap2 ; double theta ; double uprime; int i , j ; 676 Vector2 norm = new Vector2 (); Vector2 tan = new Vector2 (); Vector2 u1 = new Vector2 (); uff = machff; 681 jplusff = uff + 2.0 / (gamma 1.0) cff ; jminusff = uff 2.0 / (gamma 1.0) cff ; for (i = 1; i < imax; ++i) f tan. ihat = xnode[ i ][0] xnode[i1][0]; 686 tan. jhat = ynode[ i ][0] ynode[i1][0]; norm. ihat = (ynode[ i ][0] ynode[i1][0]); norm. jhat = xnode[ i ][0] xnode[i1][0]; scrap = tan.magnitude(); 691 tan. ihat = tan. ihat / scrap ; tan. jhat = tan. jhat / scrap ; scrap = norm.magnitude(); 133 norm. ihat = norm. ihat / scrap ; norm. jhat = norm. jhat / scrap ; 696 rho = localug [ i ][1]. a; localtg [ i ][0] = localtg [ i ][1]; u1. ihat = localug [ i ][1].b / rho; u1. jhat = localug [ i ][1]. c / rho; 701 u = u1.dot(tan) + u1.dot(norm) tan. jhat /norm. jhat ; u = u / (tan. ihat (norm. ihat tan. jhat / norm. jhat )); v = (u1.dot(norm) + u norm. ihat) / norm. jhat ; 706 localug [ i ][0]. a = localug [ i ][1]. a; localug [ i ][0].b = rho u; localug [ i ][0]. c = rho v; localug [ i ][0].d = rho (Cv localtg [ i ][0] + 0.5 (uu + vv)); 711 localpg [ i ][0] = localpg [ i ][1]; tan. ihat = xnode[ i ][jmax1] xnode[i1][jmax1]; tan. jhat = ynode[ i ][jmax1] ynode[i1][jmax1]; norm. ihat = ynode[ i ][jmax1] ynode[i1][jmax1]; 716 norm. jhat = (xnode[ i ][jmax1] xnode[i1][jmax1]); scrap = tan.magnitude(); tan. ihat = tan. ihat / scrap ; tan. jhat = tan. jhat / scrap ; 721 scrap = norm.magnitude(); norm. ihat = norm. ihat / scrap ; norm. jhat = norm. jhat / scrap ; rho = localug [ i ][jmax1].a; 726 temp = localtg [ i ][jmax1]; u1. ihat = localug [ i ][jmax1].b / rho; u1. jhat = localug [ i ][jmax1].c / rho; u = u1.dot(tan) + u1.dot(norm) tan. jhat /norm. jhat ; 731 u = u / (tan. ihat (norm. ihat tan. jhat / norm. jhat )); v = (u1.dot(norm) + u norm. ihat) / norm. jhat ; localug [ i ][jmax].a = localug [ i ][jmax1].a; 736 localug [ i ][jmax].b = rho u; localug [ i ][jmax]. c = rho v; localug [ i ][jmax].d = rho (Cv temp + 0.5 (uu + vv)); localtg [ i ][jmax] = temp; localpg [ i ][jmax] = localpg [ i ][jmax1]; 741 g for (j = 1; j < jmax; ++j) f norm. ihat = ynode [0][ j1] ynode [0][ j ]; norm. jhat = xnode [0][ j ] xnode [0][ j1]; 746 scrap = norm.magnitude(); norm. ihat = norm. ihat / scrap ; norm. jhat = norm. jhat / scrap ; theta = Math.acos((ynode [0][ j1] ynode [0][ j ]) / Math. sqrt ((xnode [0][ j ] xnode [0][ j1])(xnode [0][ j ] xnode [0][ j1]) 751 + (ynode [0][ j1] ynode [0][ j ]) (ynode [0][ j1] ynode [0][ j ]))); u1. ihat = localug [1][ j ].b / localug [1][ j ].a; u1. jhat = localug [1][ j ]. c / localug [1][ j ].a; uprime = u1. ihat Math. cos(theta ); 756 c = Math. sqrt(gamma rgas localtg [1][ j ]); 134 if (uprime <c) f localug [0][ j ].a = rhoff ; localug [0][ j ].b = rhoff uff ; 761 localug [0][ j ]. c = rhoff vff ; localug [0][ j ].d = rhoff (Cv tff + 0.5 (uffuff + vffvff )); localtg [0][ j ] = tff ; localpg [0][ j ] = pff ; g 766 else if (uprime < 0.0) f jminus = u1. ihat 2.0/(gamma1.0) c; s = Math. log(pff) gamma Math. log(rhoff ); v = vff ; 771 u = ( jplusff + jminus) / 2.0; scrap = ( jplusff u) (gamma1.0) 0.5; localtg [0][ j ] = (1.0 / (gamma rgas)) scrap scrap ; localpg [0][ j ] = Math.exp(s) / Math.pow((rgas localtg [0][ j ]) , gamma); 776 localpg [0][ j ] = Math.pow(localpg [0][ j ] , 1.0 / (1.0 gamma)); localug [0][ j ].a = localpg [0][ j ] / (rgas localtg [0][ j ]); localug [0][ j ].b = localug [0][ j ].a u; localug [0][ j ]. c = localug [0][ j ].a v; 781 localug [0][ j ].d = localug [0][ j ].a (Cv tff + 0.5 (uu + vv)); g else f System. err . println("You have outflow at the inlet , which is not allowed ."); g 786 norm. ihat = ynode [0][ j ] ynode [0][ j1]; norm. jhat = xnode [0][ j1] xnode [0][ j ]; scrap = norm.magnitude(); norm. ihat = norm. ihat / scrap ; 791 norm. jhat = norm. jhat / scrap ; scrap = xnode [0][ j1] xnode [0][ j ]; scrap2 = ynode [0][ j ] ynode [0][ j1]; theta = Math.acos((ynode [0][ j ] ynode [0][ j1]) / Math. sqrt(scrapscrap + scrap2scrap2 )); 796 u1. ihat = localug [imax1][j ].b / localug [imax1][j ].a; u1. jhat = localug [imax1][j ]. c / localug [imax1][j ].a; uprime = u1. ihat Math. cos(theta ); c = Math. sqrt(gamma rgas localtg [imax1][j ]); 801 if (uprime > c)f localug [imax][ j ].a = 2.0 localug [imax1][j ].a localug [imax2][j ].a; localug [imax][ j ].b = 2.0 localug [imax1][j ].b localug [imax2][j ].b; localug [imax][ j ]. c = 2.0 localug [imax1][j ]. c localug [imax2][j ]. c; localug [imax][ j ].d = 2.0 localug [imax1][j ].d localug [imax2][j ].d; 806 localpg [imax][ j ] = 2.0 localpg [imax1][j ] localpg [imax2][j ]; localtg [imax][ j ] = 2.0 localtg [imax1][j ] localtg [imax2][j ]; g else if (uprime < c && uprime > 0) f jplus = u1. ihat + 2.0/(gamma 1) c; 811 v = localug [imax1][j ]. c / localug [imax1][j ].a; s = Math. log(localpg [imax1][j ]) gamma Math. log(localug [imax1][j ].a); u = (jplus + jminusff) / 2.0; 816 scrap =(jplus u) (gamma1.0) 0.5; localtg [imax][ j ] = (1.0 / (gamma rgas)) scrap scrap ; localpg [imax][ j ] = Math.exp(s) / Math.pow((rgas localtg [imax][ j ]) , gamma); 135 localpg [imax][ j ] = Math.pow(localpg [imax][ j ] , 1.0 / (1.0gamma)); 821 rho = localpg [imax][ j ]/ (rgas localtg [imax][ j ]); localug [imax][ j ].a = rho; localug [imax][ j ].b = rho u; localug [imax][ j ]. c = rho v; 826 localug [imax][ j ].d = rho (Cv localtg [imax][ j ] + 0.5 (uu + vv)); g else if (uprime <c) f localug [0][ j ].a = rhoff ; 831 localug [0][ j ].b = rhoff uff ; localug [0][ j ]. c = rhoff vff ; localug [0][ j ].d = rhoff (Cv tff + 0.5 (uffuff + vffvff )); localtg [0][ j ] = tff ; localpg [0][ j ] = pff ; 836 g else if (uprime < 0.0) f jminus = u1. ihat 2.0/(gamma1.0) c; s = Math. log(pff) gamma Math. log(rhoff ); v = vff ; 841 u = ( jplusff + jminus) / 2.0; scrap = ( jplusff u) (gamma1.0) 0.5; localtg [0][ j ] = (1.0 / (gamma rgas)) scrap scrap ; localpg [0][ j ] = Math.exp(s) / Math.pow((rgas localtg [0][ j ]) , gamma); 846 localpg [0][ j ] = Math.pow(localpg [0][ j ] , 1.0 / (1.0 gamma)); localug [0][ j ].a = localpg [0][ j ] / (rgas localtg [0][ j ]); localug [0][ j ].b = localug [0][ j ].a u; localug [0][ j ]. c = localug [0][ j ].a v; 851 localug [0][ j ].d = localug [0][ j ].a (Cv tff + 0.5 (uu + vv)); g else f System. err . println("You have inflow at the outlet , which is not allowed ."); g 856 g localug [0][0] = localug [1][0]; localug [imax][0] = localug [imax ][1]; localug [0][jmax] = localug [1][jmax]; 861 localug [imax][jmax] = localug [imax][jmax1]; g public void runiters ()f 866 for (int i = 0; i<iter ; i++)f doIteration (); g g 871 g class Statevector f double a; double b; 876 double c; double d; Statevector () f a = 0.0; 881 b = 0.0; c = 0.0; 136 d = 0.0; g 886 public Statevector amvect(double m, Statevector that) f Statevector answer = new Statevector (); answer.a = m (this.a + that.a); answer.b = m (this.b + that.b); 891 answer.c = m (this.c + that.c); answer.d = m (this.d + that.d); return answer; g 896 public Statevector avect(Statevector that) f Statevector answer = new Statevector (); answer.a = this.a + that.a; answer.b = this.b + that.b; 901 answer.c = this.c + that.c; answer.d = this.d + that.d; return answer; g 906 public Statevector mvect(double m) f Statevector answer = new Statevector (); answer.a = m this.a; answer.b = m this.b; 911 answer.c = m this.c; answer.d = m this.d; return answer; g 916 public Statevector svect(Statevector that) f Statevector answer = new Statevector (); answer.a = this.a that.a; answer.b = this.b that.b; 921 answer.c = this.c that.c; answer.d = this.d that.d; return answer; g 926 public Statevector smvect(double m, Statevector that) f Statevector answer = new Statevector (); answer.a = m (this.a that.a); 931 answer.b = m (this.b that.b); answer.c = m (this.c that.c); answer.d = m (this.d that.d); return answer; 936 g g class Vector2 f double ihat ; 941 double jhat ; Vector2() f ihat = 0.0; jhat = 0.0; 137 946 g public double magnitude() f double mag; 951 mag = Math. sqrt(this. ihatthis. ihat + this. jhat this. jhat ); return mag; g public double dot(Vector2 that) f 956 double answer; answer = this. ihat that. ihat + this. jhat that. jhat ; return answer; 961 g g 138 A.6 MD Source Code / KADRE Benchmark Project MD / 5 package edu.usc. softarch .kadre.bench.md; public class MD extends Moldyn f public void setsize (int size) f 10 this. size = size ; g public void init () f 15 initialise (); g public void application () f 20 runiters (); g 25 public void validate () f double refval [] = f 1731.4306625334357, 7397.392307839352 g; double dev = Math.abs(ek refval [ size ]); if (dev > 1.0e12) f System.out. println("Validation failed"); 30 System.out. println("Kinetic Energy = " + ek + " " + dev + " " + size ); g g 35 public void tidyup() f one = null; System.gc(); g 40 public void run(int size) f setsize (size ); 45 init (); application (); validate (); tidyup (); g 50 public static void main(String argv []) f MD md = new MD(); MD.run(0); 55 g g 139 / KADRE Benchmark Project 3 MD / package edu.usc. softarch .kadre.bench.md; import java. util .; 8 import java. text .NumberFormat; public class Moldyn f public int ITERS; 13 public double LENGTH; public double m; public double mu; public double kb; public double TSIM; 18 public double deltat ; public particle [] one; public double epot; public double vir ; public double count; 23 public int size ; public int [] datasizes ; public int interactions ; 28 public int i ; public int j ; public int k; public int lg ; public int mdsize; 33 public int move; public int mm; public double l ; public double rcoff ; 38 public double rcoffs ; public double side ; public double sideh ; public double hsq; public double hsq2; 43 public double vel ; public double a; public double r; public double sum; public double tscale ; 48 public double sc ; public double ekin ; public double ek; public double ts ; public double sp; 53 public double den; public double tref ; public double h; public double vaver; public double vaverh; 58 public double rand; public double etot ; public double temp; public double pres ; public double rp; 63 public double u1; 140 public double u2; public double v1; public double v2; public double s; 68 public int ijk ; public int npartm; public int PARTSIZE; public int iseed ; 73 public int tint ; public int irep ; public int istop ; public int iprint ; public int movemx; 78 public random randnum; public NumberFormat nbf; public NumberFormat nbf2; 83 public NumberFormat nbf3; public void initialise () f ITERS = 100; 88 LENGTH = 50e10; m = 4.0026; mu = 1.66056e27; kb = 1.38066e23; TSIM = 50; 93 deltat = 5e16; one = null; epot = 0.0; vir = 0.0; count = 0.0; 98 datasizes = f 8, 13 g; interactions = 0; den = 0.83134; tref = 0.722; 103 h = 0.064; irep = 10; istop = 19; iprint = 10; 108 movemx = 50; nbf = NumberFormat. getInstance (); nbf.setMaximumFractionDigits(4); nbf.setMinimumFractionDigits(4); 113 nbf.setGroupingUsed(false ); nbf2 = NumberFormat. getInstance (); nbf2.setMaximumFractionDigits(1); nbf2.setMinimumFractionDigits(1); 118 nbf3 = NumberFormat. getInstance (); nbf3.setMaximumFractionDigits(6); nbf3.setMinimumFractionDigits(6); 123 mm = datasizes [ size ]; PARTSIZE = mm mm mm 4; mdsize = PARTSIZE; one = new particle [mdsize ]; 141 l = LENGTH; 128 side = Math.pow((mdsize / den), 0.3333333); rcoff = mm / 4.0; a = side / mm; 133 sideh = side 0.5; hsq = h h; hsq2 = hsq 0.5; npartm = mdsize 1; rcoffs = rcoff rcoff ; 138 tscale = 16.0 / (1.0 mdsize 1.0); vaver = 1.13 Math. sqrt( tref / 24.0); vaverh = vaver h; ijk = 0; 143 for (lg = 0; lg <= 1; lg++) f for (i = 0; i < mm; i++) f for (j = 0; j < mm; j++) f for (k = 0; k < mm; k++) f double 148 one[ ijk ] = new particle (( i a + lg a 0.5) , (j a + lg a 0.5) , (k a), 0.0 , 0.0 , 0.0 ,0.0 , 0.0 , 0.0); ijk = ijk + 1; g 153 g g g for (lg = 1; lg <= 2; lg++) f for (i = 0; i < mm; i++) f 158 for (j = 0; j < mm; j++) f for (k = 0; k < mm; k++) f one[ ijk ] = new particle (( i a + (2 lg) a 0.5) , (j a + (lg 1) a 0.5) , (k a + a 0.5) , 0.0 , 0.0 , 0.0 , 0.0 , 0.0 , 0.0); 163 ijk = ijk + 1; g g g g 168 iseed = 0; v1 = 0.0; v2 = 0.0; 173 randnum = new random(iseed , v1, v2); for (i = 0; i < mdsize; i += 2) f r = randnum.seed (); one[ i ]. xvelocity = r randnum.v1; 178 one[ i + 1]. xvelocity = r randnum.v2; g for (i = 0; i < mdsize; i += 2) f r = randnum.seed (); 183 one[ i ]. yvelocity = r randnum.v1; one[ i + 1]. yvelocity = r randnum.v2; g for (i = 0; i < mdsize; i += 2) f 188 r = randnum.seed (); one[ i ]. zvelocity = r randnum.v1; 142 one[ i + 1]. zvelocity = r randnum.v2; g 193 ekin = 0.0; sp = 0.0; for (i = 0; i < mdsize; i++) f sp = sp + one[ i ]. xvelocity ; 198 g sp = sp / mdsize; for (i = 0; i < mdsize; i++) f one[ i ]. xvelocity = one[ i ]. xvelocity sp; 203 ekin = ekin + one[ i ]. xvelocity one[ i ]. xvelocity ; g sp = 0.0; for (i = 0; i < mdsize; i++) f 208 sp = sp + one[ i ]. yvelocity ; g sp = sp / mdsize; for (i = 0; i < mdsize; i++) f 213 one[ i ]. yvelocity = one[ i ]. yvelocity sp; ekin = ekin + one[ i ]. yvelocity one[ i ]. yvelocity ; g sp = 0.0; 218 for (i = 0; i < mdsize; i++) f sp = sp + one[ i ]. zvelocity ; g sp = sp / mdsize; 223 for (i = 0; i < mdsize; i++) f one[ i ]. zvelocity = one[ i ]. zvelocity sp; ekin = ekin + one[ i ]. zvelocity one[ i ]. zvelocity ; g 228 ts = tscale ekin ; sc = h Math. sqrt( tref / ts ); for (i = 0; i < mdsize; i++) f 233 one[ i ]. xvelocity = one[ i ]. xvelocity sc ; one[ i ]. yvelocity = one[ i ]. yvelocity sc ; one[ i ]. zvelocity = one[ i ]. zvelocity sc ; g 238 g public void runiters () f move = 0; 243 for (move = 0; move < movemx; move++) f for (i = 0; i < mdsize; i++) f one[ i ].domove(side ); g 248 epot = 0.0; vir = 0.0; for (i = 0; i < mdsize; i++) f 143 253 one[ i ]. force(side , rcoff , mdsize , i ); g sum = 0.0; 258 for (i = 0; i < mdsize; i++) f sum = sum + one[ i ]. mkekin(hsq2); g ekin = sum / hsq; 263 vel = 0.0; count = 0.0; for (i = 0; i < mdsize; i++) f 268 vel = vel + one[ i ]. velavg(vaverh , h); g vel = vel / h; 273 if ((move < istop) && (((move + 1) % irep) == 0)) f sc = Math. sqrt( tref / (tscale ekin )); for (i = 0; i < mdsize; i++) f one[ i ]. dscal(sc , 1); g 278 ekin = tref / tscale ; g if (((move + 1) % iprint) == 0) f ek = 24.0 ekin ; 283 epot = 4.0 epot; etot = ek + epot; temp = tscale ekin ; pres = den 16.0 (ekin vir) / mdsize; vel = vel / mdsize; 288 rp = (count / mdsize) 100.0; g g 293 g g class particle f 298 public double xcoord , ycoord , zcoord; public double xvelocity , yvelocity , zvelocity ; public double xforce , yforce , zforce ; 303 public particle(double xcoord , double ycoord , double zcoord , double xvelocity , double yvelocity , double zvelocity , double xforce , double yforce , double zforce) f this.xcoord = xcoord; 308 this.ycoord = ycoord; this.zcoord = zcoord; this. xvelocity = xvelocity ; this. yvelocity = yvelocity ; this. zvelocity = zvelocity ; 313 this. xforce = xforce ; this. yforce = yforce ; this. zforce = zforce ; 144 g 318 public void domove(double side) f xcoord = xcoord + xvelocity + xforce ; ycoord = ycoord + yvelocity + yforce ; 323 zcoord = zcoord + zvelocity + zforce ; if (xcoord < 0) f xcoord = xcoord + side ; g 328 if (xcoord > side) f xcoord = xcoord side ; g if (ycoord < 0) f ycoord = ycoord + side ; 333 g if (ycoord > side) f ycoord = ycoord side ; g if (zcoord < 0) f 338 zcoord = zcoord + side ; g if (zcoord > side) f zcoord = zcoord side ; g 343 xvelocity = xvelocity + xforce ; yvelocity = yvelocity + yforce ; zvelocity = zvelocity + zforce ; 348 xforce = 0.0; yforce = 0.0; zforce = 0.0; g 353 public void force(double side , double rcoff , int mdsize , int x) f double sideh ; double rcoffs ; 358 double xx, yy, zz , xi , yi , zi , fxi , fyi , fzi ; double rd , rrd , rrd2 , rrd3 , rrd4 , rrd6 , rrd7 , r148; double forcex , forcey , forcez ; 363 int i ; sideh = 0.5 side ; rcoffs = rcoff rcoff ; 368 xi = xcoord; yi = ycoord; zi = zcoord; fxi = 0.0; fyi = 0.0; 373 fzi = 0.0; for (i = x + 1; i < mdsize; i++) f xx = xi md.one[ i ]. xcoord; yy = yi md.one[ i ]. ycoord; 378 zz = zi md.one[ i ]. zcoord; 145 if (xx < (sideh)) f xx = xx + side ; g 383 if (xx > (sideh)) f xx = xx side ; g if (yy < (sideh)) f yy = yy + side ; 388 g if (yy > (sideh)) f yy = yy side ; g if (zz < (sideh)) f 393 zz = zz + side ; g if (zz > (sideh)) f zz = zz side ; g 398 rd = xx xx + yy yy + zz zz; if (rd <= rcoffs ) f rrd = 1.0 / rd; 403 rrd2 = rrd rrd ; rrd3 = rrd2 rrd ; rrd4 = rrd2 rrd2; rrd6 = rrd2 rrd4; rrd7 = rrd6 rrd ; 408 md.epot = md.epot + (rrd6 rrd3 ); r148 = rrd7 0.5 rrd4; md. vir = md. vir rd r148; forcex = xx r148; fxi = fxi + forcex ; 413 md.one[ i ]. xforce = md.one[ i ]. xforce forcex ; forcey = yy r148; fyi = fyi + forcey ; md.one[ i ]. yforce = md.one[ i ]. yforce forcey ; forcez = zz r148; 418 fzi = fzi + forcez ; md.one[ i ]. zforce = md.one[ i ]. zforce forcez ; md. interactions++; g 423 g xforce = xforce + fxi ; yforce = yforce + fyi ; zforce = zforce + fzi ; 428 g public double mkekin(double hsq2) f 433 double sumt = 0.0; xforce = xforce hsq2; yforce = yforce hsq2; zforce = zforce hsq2; 438 xvelocity = xvelocity + xforce ; yvelocity = yvelocity + yforce ; zvelocity = zvelocity + zforce ; 146 443 sumt = (xvelocity xvelocity) + (yvelocity yvelocity) + (zvelocity zvelocity ); return sumt; g 448 public double velavg(double vaverh , double h) f double velt ; double sq; 453 sq = Math. sqrt(xvelocity xvelocity + yvelocity yvelocity + zvelocity zvelocity ); if (sq > vaverh) f md.count = md.count + 1.0; 458 g velt = sq; return velt ; g 463 public void dscal(double sc , int incx) f xvelocity = xvelocity sc ; yvelocity = yvelocity sc ; 468 zvelocity = zvelocity sc ; g g 473 class random f public int iseed ; public double v1, v2; 478 public random(int iseed , double v1, double v2) f this. iseed = iseed ; this.v1 = v1; this.v2 = v2; 483 g public double update() f double rand; 488 double scale = 4.656612875e10; int is1 , is2 , iss2 ; int imult = 16807; int imod = 2147483647; 493 if (iseed <= 0) f iseed = 1; g 498 is2 = iseed % 32768; is1 = (iseed is2) / 32768; iss2 = is2 imult; is2 = iss2 % 32768; is1 = (is1 imult + (iss2 is2) / 32768) % (65536); 503 iseed = (is1 32768 + is2) % imod; 147 rand = scale iseed ; 508 return rand; g public double seed() f 513 double s , u1, u2, r; s = 1.0; do f u1 = update(); 518 u2 = update(); v1 = 2.0 u1 1.0; v2 = 2.0 u2 1.0; s = v1 v1 + v2 v2; 523 g while (s >= 1.0); r = Math. sqrt(2.0 Math. log(s) / s ); 528 return r; g g 148 A.7 Search Source Code / KADRE Benchmark Project Search 4 / package edu.usc. softarch .kadre.bench. search ; public class Search extends SearchGame f 9 private int size ; public void setsize (int size) f this. size = size ; g 14 public void initialise () f reset (); for (int i = 0; i < startingMoves [ size ]. length (); i++) makemove(startingMoves [ size ]. charAt(i) '0' ); 19 emptyTT(); g public void application () f int result = solve (); 24 g public void validate () f int i , works []; int ref [][] = f 29 f 422, 97347, 184228, 270877, 218810, 132097, 72059, 37601, 18645, 9200, 4460, 2230, 1034, 502, 271, 121, 55, 28, 11, 6, 4, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 g, f 0, 1, 9, 2885, 105101, 339874, 282934, 156627, 81700, 40940, 20244, 10278, 4797, 2424, 1159, 535, 246, 139, 62, 28, 34 11, 11, 3, 0, 3, 0, 0, 0, 0, 0, 0, 0 g g; works = new int [32]; for (i = 0; i < 32; i++) works[ i ] = 0; 39 for (i = 0; i < TRANSIZE; i++) works[he[ i ] & 31]++; for (i = 0; i < 32; i++) f int error = works[ i ] ref [ size ][ i ]; 44 if (error != 0) f System.out. print("Validation failed for work count " + i ); System.out. print("Computed value = " + works[ i ]); System.out. print("Reference value = " + ref [ size ][ i ]); g 49 g g public void tidyup() f 54 ht = null; he = null; System.gc(); g 59 public void run(int size) f 149 setsize (size ); initialise (); application (); 64 validate (); tidyup (); g 69 public static void main(String argv []) f Search sb = new Search (); sb.run(0); g 74 g 150 1 / KADRE Benchmark Project Search / package edu.usc. softarch .kadre.bench. search ; 6 public class SearchGame extends TransGame f public int [][] history ; 11 public long nodes; public long msecs; public SearchGame() f 16 history = f f 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 2, 4, 2, 1, 0, 1, 1, 3, 5, 7, 5, 3, 1, 1, 2, 5, 8, 10, 8, 5, 2, 1, 2, 5, 8, 10, 8, 5, 2, 1, 1, 3, 5, 7, 5, 3, 1, 1, 0, 1, 2, 4, 2, 1, 0 g, 21 f 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 2, 4, 2, 1, 0, 1, 1, 3, 5, 7, 5, 3, 1, 1, 2, 5, 8, 10, 8, 5, 2, 1, 2, 5, 8, 10, 8, 5, 2, 1, 1, 3, 5, 7, 5, 3, 1, 1, 0, 1, 2, 4, 2, 1, 0 g g; 26 super(); g public int solve () f int i , side ; 31 int x, work, score ; long poscnt; nodes = 0L; msecs = 1L; 36 side = (plycnt + 1) & 1; for (i = 0; ++i <= 7;) if (height [ i ] <= 6) f if (wins(i , height [ i ] , 1 << side) jj colthr [columns[ i ]] == (1 << side )) 41 return (side != 0 ? WIN : LOSE) << 5; g if ((x = transpose ()) != ABSENT) f if ((x & 32) == 0) return x; 46 g score = ab(LOSE, WIN); poscnt = posed; for (work = 1; (poscnt >>= 1) != 0; work++) 51 ; return score << 5 j work; g 56 public int ab(int alpha , int beta) f int besti , i , j , h, k, l , val , score ; int x, v, work; int nav, av[] = new int [8]; long poscnt; 61 int side , otherside ; nodes++; 151 if (plycnt == 41) return DRAW; 66 side = (otherside = plycnt & 1) ^ 1; for (i = nav = 0; ++i <= 7;) f if ((h = height [ i ]) <= 6) f if (wins(i , h, 3) jj colthr [columns[ i ]] != 0) f if (h + 1 <= 6 && wins(i , h + 1, 1 << otherside )) 71 return LOSE; av[0] = i ; while (++i <= 7) if ((h = height [ i ]) <= 6 && (wins(i , h, 3) jj colthr [columns[ i ]] != 0)) 76 return LOSE; nav = 1; break; g if (!(h + 1 <= 6 && wins(i , h + 1, 1 << otherside ))) 81 av[nav++] = i ; g g if (nav == 0) return LOSE; 86 if (nav == 1) f makemove(av[0]); score =ab(beta , alpha ); backmove(); return score ; 91 g if ((x = transpose ()) != ABSENT) f score = x >> 5; if (score == DRAWLOSE) f if ((beta = DRAW) <= alpha) 96 return score ; g else if (score == DRAWWIN) f if ((alpha = DRAW) >= beta) return score ; g else 101 return score ; g poscnt = posed; l = besti = 0; score = Integer .MINVALUE; 106 for (i = 0; i < nav; i++) f for (j = i , val = Integer .MINVALUE; j < nav; j++) f k = av[ j ]; v = history [ side ][ height [k] << 3 j k]; if (v > val) f 111 val = v; l = j ; g g j = av[ l ]; 116 if (i != l) f av[ l ] = av[ i ]; av[ i ] = j ; g makemove(j ); 121 val =ab(beta , alpha ); backmove(); if (val > score) f besti = i ; if (( score = val) > alpha && (alpha = val) >= beta) f 126 if (score == DRAW && i < nav 1) 152 score = DRAWWIN; break; g g 131 g if (besti > 0) f for (i = 0; i < besti ; i++) history [ side ][ height [av[ i ]] << 3 j av[ i]]; history [ side ][ height [av[ besti ]] << 3 j av[ besti ]] += besti ; 136 g poscnt = posed poscnt; for (work = 1; (poscnt >>= 1) != 0; work++) ; if (x != ABSENT) f 141 if (score ==(x >> 5)) score = DRAW; transrestore(score , work); g else transtore(score , work); 146 return score ; g g 153 1 / KADRE Benchmark Project Search / package edu.usc. softarch .kadre.bench. search ; 6 public class TransGame extends Game f public int NSAMELOCK; public int STRIDERANGE; public int INTMODSTRIDERANGE;; 11 public int ABSENT;; public int [] ht; public byte[] he; private int stride ; 16 private int htindex , lock ; protected long posed; protected long hits ; 21 public TransGame() f NSAMELOCK = 0x20000; STRIDERANGE = (TRANSIZE / PROBES NSAMELOCK); INTMODSTRIDERANGE = (int) ((1L << 32) % STRIDERANGE); 26 ABSENT = 128; super(); ht = new int[TRANSIZE]; he = new byte[TRANSIZE]; 31 g public void emptyTT() f int i , h, work; 36 for (i = 0; i < TRANSIZE; i++) if ((work = (h = he[ i ]) & 31) < 31) he[ i ] = (byte) (h (work < 16 ? work : 4)); posed = hits = 0; g 41 public double hitRate() f return posed != 0 ? (double) hits / (double) posed : 0.0; g 46 public void hash() f int t1 , t2; long htemp; 51 t1 = (columns[1] << 7 j columns[2]) << 7 j columns [3]; t2 = (columns[7] << 7 j columns[6]) << 7 j columns [5]; htemp = t1 > t2 ? (long) (t1 << 7 j columns[4]) << 21 j t2 : (long) (t2 << 7 j columns[4]) << 21 j t1; 56 lock = (int) (htemp >> 17); htindex = (int) (htemp % TRANSIZE); stride = NSAMELOCK + lock % STRIDERANGE; if (lock < 0) f if (( stride += INTMODSTRIDERANGE) < NSAMELOCK) 61 stride += STRIDERANGE; g g 154 public int transpose() f 66 hash(); for (int x = htindex , i = 0; i < PROBES; i++) f if (ht[x] == lock) return he[x]; if ((x += stride) >= TRANSIZE) 71 x= TRANSIZE; g return ABSENT; g 76 public String result () f int x; return (x = transpose ()) == ABSENT ? "n/a" : result(x); g 81 public String result(int x) f return "" + "##<=>+#".charAt(4 + (x >> 5)) + "(" + (x & 31) + ")"; g 86 public void transrestore(int score , int work) f int i , x; if (work > 31) work = 31; 91 posed++; hash(); for (x = htindex , i = 0; i < PROBES; i++) f if (ht[x] == lock) f hits++; 96 he[x] = (byte) (score << 5 j work); return; g if ((x += stride) >= TRANSIZE) x= TRANSIZE; 101 g transput(score , work); g public void transtore(int score , int work) f 106 if (work > 31) work = 31; posed++; hash(); transput(score , work); 111 g public void transput(int score , int work) f for (int x = htindex , i = 0; i < PROBES; i++) f if (work > (he[x] & 31)) f 116 hits++; ht[x] = lock ; he[x] = (byte) (score << 5 j work); return; g 121 if ((x += stride) >= TRANSIZE) x= TRANSIZE; g g 126 public String htstat () f 155 int total , i ; StringBuffer buf = new StringBuffer (); int works []; int typecnt []; 131 works = new int [32]; typecnt = new int [8]; for (i = 0; i < 32; i++) works[ i ] = 0; 136 for (i = 0; i < 8; i++) typecnt[ i ] = 0; for (i = 0; i < TRANSIZE; i++) f works[he[ i ] & 31]++; if ((he[ i ] & 31) != 0) 141 typecnt[4 + (he[ i ] >> 5)]++; g for (total = i = 0; i < 8; i++) total += typecnt[ i ]; if (total > 0) 146 buf.append("store rate = " + hitRate() + "nn " + typecnt[4 + LOSE] / (double) total + " < " + typecnt[4 + DRAWLOSE] / (double) total + " = " + typecnt[4 + DRAW] / (double) total + " > " + typecnt[4 + DRAWWIN] / (double) total + " + " + typecnt[4 + WIN] 151 / (double) total + "nn"); for (i = 0; i < 32; i++) f buf.append(works[ i ]); buf.append(( i & 7) == 7 ? 'nn' : 'nt ' ); g 156 return buf. toString (); g g 156 / 2 KADRE Benchmark Project Search / package edu.usc. softarch .kadre.bench. search ; 7 public class Game f public int TRANSIZE; public int PROBES; public int REPORTPLY; 12 public int UNK; public int LOSE; public int DRAWLOSE; public int DRAW; public int DRAWWIN; 17 public int WIN; public int EMPTY; public int BLACK; public int WHITE; public int EDGE; 22 public String [] startingMoves ; public int [] colthr ; public boolean[] colwon; 27 public int [] moves; public int plycnt ; public int [] rows; public int [] dias ; 32 public int [] columns; public int [] height ; public Game() f 37 colthr = new int [128]; for (int i = 8; i < 128; i += 8) f colthr [ i ] = 1; colthr [ i + 7] = 2; g 42 colwon = new boolean[128]; for (int i = 16; i < 128; i += 16) colwon[ i ] = colwon[ i + 15] = true; 47 TRANSIZE = 1050011; PROBES = 8; REPORTPLY = 8; UNK = 4; 52 LOSE = 2; DRAWLOSE = 1; DRAW = 0; DRAWWIN = 1; WIN = 2; 57 EMPTY = 0; BLACK = 1; WHITE = 2; EDGE = 3; startingMoves = f "444333377", "44433337" g; 62 moves = new int [44]; 157 rows = new int [8]; dias = new int [19]; columns = new int [8]; 67 height = new int [8]; reset (); g public void reset () f 72 plycnt = 0; for (int i = 0; i < 19; i++) dias [ i ] = 0; for (int i = 0; i < 8; i++) f columns[ i ] = 1; 77 height [ i ] = 1; rows[ i ] = 0; g g 82 public String toString () f StringBuffer buf = new StringBuffer (); for (int i = 1; i <= plycnt ; i++) buf.append(moves[ i ]); 87 return buf. toString (); g public boolean wins(int n, int h, int sidemask) f int x, y; 92 sidemask <<= (2 n); x = rows[h] j sidemask; y = x & (x << 2); if ((y & (y << 4)) != 0) 97 return true; x = dias[5 + n + h] j sidemask; y = x & (x << 2); if ((y & (y << 4)) != 0) return true; 102 x = dias[5 + n h] j sidemask; y = x & (x << 2); return (y & (y << 4)) != 0; g 107 public void backmove() f int mask, d, h, n, side ; side = plycnt & 1; n = moves[plycnt]; 112 h =height [n]; columns[n] >>= 1; mask = ~(1 << (2 n + side )); rows[h] &= mask; dias[5 + n + h] &= mask; 117 dias[5 + n h] &= mask; g public void makemove(int n) f int mask, d, h, side ; 122 moves[++plycnt ] = n; side = plycnt & 1; h = height [n]++; columns[n] = (columns[n] << 1) + side ; 158 127 mask = 1 << (2 n + side ); rows[h] j= mask; dias[5 + n + h] j= mask; dias[5 + n h] j= mask; g 132 g 159
Abstract (if available)
Abstract
Scientists today increasingly rely on computers to perform simulations of the physical world that are impractical (in the case of drug discovery), impossible (nano-scale fracture dynamics) or unsafe (nuclear physics). This type of in silico experimentation has taken its place with in vitro and in vivo scientific experiments in helping scientists to understand the world around us.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Automated synthesis of domain-specific model interpreters
PDF
An automated testing system for scientific workflows
PDF
A user-centric approach for improving a distributed software system's deployment architecture
PDF
Software connectors for highly distributed and voluminous data-intensive systems
PDF
Domain-based effort distribution model for software cost estimation
PDF
Architecture and application of an autonomous robotic software engineering technology testbed (SETT)
PDF
Accelerating scientific computing applications with reconfigurable hardware
PDF
Improved size and effort estimation models for software maintenance
PDF
Design-time software quality modeling and analysis of distributed software-intensive systems
PDF
Architectural evolution and decay in software systems
PDF
Policy based data placement in distributed systems
PDF
Cyberinfrastructure management for dynamic data driven applications
PDF
Calculating architectural reliability via modeling and analysis
PDF
Self-assembly for discreet, fault-tolerant, and scalable computation on internet-sized distributed networks
PDF
Using metrics of scattering to assess software quality
PDF
Security functional requirements analysis for developing secure software
PDF
Efficient data and information delivery for workflow execution in grids
PDF
Schema evolution for scientific asset management
PDF
Rapid creation of photorealistic large-scale urban city models
PDF
Mapping sparse matrix scientific applications onto FPGA-augmented reconfigurable supercomputers
Asset Metadata
Creator
Woollard, David
(author)
Core Title
Domain specific software architecture for large-scale scientific software
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
02/03/2011
Defense Date
02/03/2011
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
OAI-PMH Harvest,scientific computing,software,workflow
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Medvidovic, Nenad (
committee chair
), Haas, Stephan (
committee member
), Nakano, Aiichiro (
committee member
)
Creator Email
woollard@gmail.com,woollard@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3640
Unique identifier
UC11178725
Identifier
etd-Woollard-4322 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-431327 (legacy record id),usctheses-m3640 (legacy record id)
Legacy Identifier
etd-Woollard-4322.pdf
Dmrecord
431327
Document Type
Dissertation
Rights
Woollard, David
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
scientific computing
workflow