Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
TRANSPARENT OBJECT SHARING IN FEDERATED DATABASE SYSTEMS by Douglas Fang A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL I UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY i (Computer Science) 1 August 1992 Copyright 1992 Douglas Fang UMI Number: D P 22846 All rights reserved INFORMATION TO ALL U SE R S The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely ev en t that the author did not sen d a com plete manuscript and there are m issing p a g es, th e se will be noted. Also, if material had to be rem oved, a note will indicate the deletion. UMI D P 22846 Published by ProQ uest LLC (2014). Copyright in the Dissertation held by the Author. Dissertation Publishing Microform Edition © ProQ uest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United S ta tes C ode ProQ uest LLC. 7 8 9 E ast Eisenhow er Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 - 1346 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, w ritten by Douglas Fang under the direction of h . Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillm ent of re quirem ents for the degree of fVP, Cp 5 xCl l F 2 i i 3 7 ^ D O C TO R OF PH ILOSOPH Y Dean of Graduate Studies D a te M Y . . ! * . . . 1 . ? . ? . ? . DISSERTATION COMMITTEE Dedication This dissertation is dedicated to the memory of my beloved father, John Ta-Chuan Fang and to my loving family whose support gave me the strength to succeed. Florence Fang, my mother, James Fang, my oldest brother, and Ted Fang, my older brother. Acknowledgment s I would like to express by deepest gratitude to my advisor, Prof. Dennis McLeod, for his invaluable guidance and friendship. I would like to sincerely thank Prof. Shahram Ghandeharizadeh for all his en thusiasm and stimulating discussions. I would also like to thank my other committee members Profs. Victor Li, Deb orah Estrin, and Peter Danzig. For providing such a wonderful forum for discussing my ideas, I am truly in debted to my dear friends in the Remote Exchange research group: Joachim Ham- jmer, Antonio Si, and K.J. Byeon. I To all of the senior networking lab folk past and present, I would like to express l my thanks for your friendship and comraderie: Abhijit Khale, Lee Breslau, Louie Ramos, Steve Hotz, Katia Obraczka, John Noll, Danny Mitzel, Ron Cocchi, and Sugih Jamin. A special thanks goes to all my office mates and friends from the department who put up with me for all these years: Pei-Wei Mi, Dave Wilhite, Lorna Zorman, Gary Frenkel, and Mark Goldstein. 11 1 Contents D edication i ii A cknow ledgm ents iii List O f Tables vi List O f Figures vii A bstract viii jl Introduction 1 1.1 Motivation and S o lu tio n ............................................................... . . . . 2 1.2 Running E x am p le............................................................................ . . . . 4 1 1.3 Guide to Rest of Thesis ............................................................... . . . . 5 2 R elated Work 7 3 O bject D atabase C ontext and Sharing P atterns 12 3.1 Generic Functional Object Data M o d e l.................................... . . . . 13 3.2 Sharing Patterns ............................................................................ . . . . 15 4 A U nified Framework Based on Function Sharing 18 4.1 Taxonomy of Function Sharing..................................................... . . . . 19 4.1.1 Stored F u n c tio n s .............................................................. . . . . 19 4.1.2 Computed F u n c tio n s........................................................ . . . . 22 4.2 Relating Taxonomy to Sharing Patterns ................................. . . . . 23 4.3 D iscussion......................................................................................... . . . . 24 5 M echanism for Seam less Interconnection 25 5.1 Providing Database T ransparency.............................................. . . .. 26 5.1.1 Instance Sharing .............................................................. . . .. 27 5.1.2 Type S h a rin g ..................................................................... . . . . 32 5.1.3 Behavior S h a rin g .............................................................. . . . . 34 iv 5.2 Summary of Object Constructs 5.3 D iscussion................................. 37 39 6 P rototyp e Im plem entation 40 I 6.1 Iris P ro to ty p e ......................................................................................... 41 J 6.2 Omega P ro to ty p e .......................................................................................... 42 6.3 D iscussion....................................................................................................... 43 7 Perform ance Evaluation 45 7.1 Design of the E x p erim en ts......................................................................... 46 7.2 The Benchmark D a ta b a s e ......................................................................... 46 7.3 The Q u e rie s................................................................................................... 47 7.4 Organization of E xperim ents................................. 48 7.5 Heterogeneous Configuration ......................................................... 49 7.6 Homogeneous C onfiguration.................................................. 50 7.7 D iscussion........................................................ 56 i |8 Conclusion 58 8.1 Summary of Results ................................................................................... 59 8.2 Research C ontributions................................................................................ 59 8.3 Directions for Future R esearch................................................................... 60 8.3.1 Performance E n h an c em en ts........................................................ 60 8.3.2 Remote Exchange Research P ro je c t........................................... 61 A p pend ix A Measurements for 1,000 Object E xperim ents.................................................. 63 A p pend ix B Measurements for 10,000 Object E x p e rim e n ts.............................................. 67 I (A ppendix C ; Measurements for 100,000 Object E xperim ents.............................................. 71 v List Of Tables i 7.1 Functions Defined on the Root of the USC Benchmark Type Lattice 46 7.2 Response Times for Query 1................................................................. 50 7.3 Response Times for Query 2................................................................. 52 7.4 Response Times for Query 3................................................................. 54 I List Of Figures 1.1 Collaborating Researcher Example ......................................................... 4 2.1 Heterogeneous Database A rchitecture................................ 8 2.2 Federated and M ultidatabase A rc h ite c tu re ........................................... 10 3.1 FOOM Database E x a m p le ........................................................................ 14 3.2 Sharing Pattern E xam ple............................................................................ 16 4.1 Function Sharing T ax o n o m y ...................................................................... 20 4.2 Two Example Component Database Schemas ..................................... 21 5.1 Sharing Instance O b je c ts............................................................................ 27 5.2 Pseudo-Code for Im portJnstance P ro c e d u re ....................................... 28 5.3 Sharing Type Objects ............................................................................... 31 5.4 Pseudo-Code for Import_Meta Procedure ..................................... 32 5.5 Pseudo-Code for Import-Type Procedure ..................................... 33 5.6 Sharing Function Objects ......................................................................... 35 7.1 Hit Ratios for Query 1 .............................................................................. 51 7.2 Hit Ratios for Query 2 ............................................................................... 53 7.3 Exporter’s Hit Ratio for Query 3 ........................................................... 55 7.4 Time Spent in Omega_eval() as Function of Database S i z e .............. 56 vii Abstract An approach and mechanism to support the seamless interconnection of federated database systems is described. In the context of autonomous but interconnected database systems, it is necessary to accommodate the efficient sharing and trans mission of information across database system boundaries. Towards achieving the Igoal of seamless interconnection, it is desirable to provide database transparency by insulating a given database system from unnecessary detail of remote database systems, allowing users to utilize local tools for database management. Based on | an object-oriented approach, we present a mechanism that supports the sharing of information units (objects) at various levels of abstraction as well as the sharing of behavior. An experimental prototype system has been constructed using the Iris and Omega database management systems in order to demonstrate, refine, and 'evaluate the sharing mechanism. Chapter 1 (Introduction t Sharing in a federation of database components requires a sharing mechanism that jrespects the autonomy of the individual components. We first describe the main jconstraints imposed by this environment, followed by a flexible and efficient sharing mechanism that satisfies these requirements. In order to illustrate the approach, we i ■ introduce the collaborating researcher example which is used throughout the rest of [this thesis. We finish the chapter with an overview of the rest of this thesis. 1.1 Motivation and Solution Traditional distributed and heterogeneous database systems have concentrated on| j making decentralized systems appear logically centralized. T hat is, the individual1 j component database systems are combined into a unified (virtual) database. This I is at some cost to the autonomy of the components. Typically, strict functional requirements are placed upon components, with regards to the format of stored, | data, contents and format of the data dictionary and the data manipulation and^ . i j definition languages. Recently though, the need for individual components to retain | j their own autonomy has been recognized [24, 28, 31]. j ! . . . i i The general problem of sharing in this federated [14] environment, while respect-! Tng the autonomy of the individual components, can be broken down into three | areas: 1. the discovery and identification of relevant information in the federation. 2. the resolution of the similarities and differences (“semantic heterogeneity” [23]) ■ in modeling of information. | 3. the efficient realization and implementation of a mechanism for the sharingt I and transmission of information between components. ! In this thesis, we specifically address the third area of sharing in federated i database systems.1 One of the novel aspects of our approach is to import remotej , objects directly into existing databases. To our knowledge, no other work in the lit-1 j erature has taken this approach. In previous approaches, a new level of abstraction1 is imposed upon the user (either an entirely new system, or new layer is built on top of an existing one). This approach falls short in human terms because it precludes! use of the existing database tools and application programs that the user normally, uses for viewing and manipulating data. 1 One of the main difficulties in supporting a sharing mechanism for a feder ated system of autonomous components is that the database software on each com- ■ ponent is developed independently. In particular, supporting existing components \ 1Our work is part of the larger ongoing Remote-Exchange project here at USC, which was ! created to address all three of these issues[8]. | whose database management system (DBMS) is not designed to operate in this dis-j I tributed environment remains to be solved. Any solution involving the modification^ j or rewriting of any of a component’s DBMS software (e.g., the query processor) is. !largely unacceptable [14, 20]. In order to truly support the autonomy of component databases and their users, sharing should be both location and database transparent. Location transparency relieves the user of the responsibility for keeping track of the exact location (e.g.,| machine) of remote data. Database transparency can be viewed as an extension! of location transparency where remote data is logically incorporated directly into the local database (i.e., users view both local and remote data as local data). In j addition to allowing users to manipulate remote data using local DBMS tools and’ application programs, database transparency provides users with the flexibility to > 1 I ! structure and integrate remote data autonomously and independently instead of I imposing a single, global structure that all components must adhere to for viewing' remote data. In order to maintain this transparency, efficient access to remote data! is critical. Otherwise, remote data will not appear transparent to users since access to remote data will be much slower than access to local data. In summary, a mechanism designed to support sharing in federated database! f systems should: 1) respect the autonomy of pre-existing components (e.g., by not: imposing any modifications to a component’s existing software), 2) support database I I transparency, and 3) minimize overhead of remote access in order to be practically! ! feasible. i | The mechanism we present in this thesis is based on an object-oriented approach: j to sharing. While the focus of this thesis applies the approach towards a federa-j tion of object-oriented components, this approach can also serve as the basis for a common model/language in a federation of components with heterogeneous data models.2 In addition to supporting the transparent sharing of traditional kinds of data, our approach supports the sharing of behavioral objects. Since sharing occurs: at the object level, the participants only share what is needed. As we will show in: I j ‘ | 2In this case, some base-level common inter-component communication forum is required for1 1 any sharing to occur; the core object database model used here is a good candidate for such ai | forum. This topic is further discussed in Chapter 8. Sharing Example Local DB Queries Local DB Queries Researcher-A R esearcher-B I i DB2 DB1 Figure 1.1: Collaborating Researcher Example .this thesis, the sharing patterns resulting from this approach are both flexible and| i I efficient. i : i 1.2 Running Example Consider the following scenario of two collaborating researchers. Two researchers, Researcher-A and Researcher-B, maintain separate databases of journal and con-j ference publications. Each researcher uses a different underlying schema to model, this information. Difficulties arises when Researcher-B wants to include a particular publication of Researcher-A’s into his/her own database for viewing purposes (seel | Figure 1.1). In traditional approaches, Researcher-B must first structurally inte-! ! I j grate [3, 19, 29] the two researchers’ representations of publications (i.e., conceptual^ j schemas). Researcher-B would then have to import all of the publications that! Researcher-A exports rather than just the single publication in which s/he is inter-! ested. Not only is this level of granularity course, it is potentially inefficient since l extraneous data may also be retrieved. Further, Researcher-A must typically keep! multiple form atted versions of the same publication such as simple text, postscript I I i I i 1 4: and dvi formats so that Researcher-B can view it depending on whatever format his/her local system supports. In contrast, our approach supports sharing at both course and fine levels of granularity. In our approach, for example, Researcher-B can selectively import the exact publication s/he is interested in. Further, since j Reseacher-A encapsulates publication objects by a set of methods, Researcher-B j need not be aware of its internal representation. In particular, Reseacher-A might, supply the method for displaying his/her own publication objects. This is advanta geous for both Reseacher-A and Researcher-B. Researcher-A only needs to manage one format of the publication (e.g., dvi). Researcher-B in turn, uses the method; (provided by Reseacher-A (e.g., xdvi) for displaying the publication. More impor-; ! tantly, by using our approach Reseacher-A’s publication appear completely locall ! to Reseacher-B. Researcher-A can then manipulate this publication using the saraej local DBMS tools and application programs that s/he has invested much time and, i I \ effort in learning and developing. j 1.3 Guide to Rest of Thesis i The remainder of this thesis is organized as follows. In Chapter 2, we compare andj contrast previous work in heterogeneous database, federated database and multi-; database systems with respect to our approach. I In Chapter 3, we introduce a generic functional object-oriented data modelj (FOOM) that we use in this thesis for representing conceptual database models1 I _ l and to illustrate our techniques for seamless interconnection. We also describe the; sharing patterns that result from our object-oriented approach. Sharing occurs at| i the object level; since three distinct concepts are all modeled as objects (instance,1 I type, and behavior), three different sharing patterns naturally arise. ^ In Chapter 4, we take a different perspective from the object-level sharing ap- j proach that produced the sharing patterns described in Chapter 3. In this chapter, ! we present the generalized concept of function sharing which we use as a framework to unify the three object level sharing patterns. In Chapter 5, we describe the object-oriented constructs which in conjunction with the function sharing approach described in Chapter 4, form the basis for sup porting database transparency. It is important to note that these constructs, are. i ! not implementation specific as they are based on commonly accepted object-oriented! principles[2]. Hence, any component supporting the object-oriented paradigm willj support these constructs. Finally, we present our mechanism which uses these con- I structs to provide database transparency. j j In Chapter 6, we describe the experimental testbed of interconnected autonomousl j components that we have implemented for demonstrating and refining the sharing1 I . . . ^ mechanism described in Chapter 5. In particular, we describe the implementation! of our mechanism in the Iris[10] and 0mega[12] database systems. j f In Chapter 7, we describe the results of the experiments we performed in or-j ! der to evaluate the sharing mechanism implemented in Chapter 6 using the USC; benchm ark[ll]. There are two major results from this evaluation: 1) we demon- \ strate the feasibility of our mechanism using existing object-oriented technology | and 2) we analyze the performance overhead of our mechanism through experimen- I , , ! tal measurements. : Finally, in Chapter 8, we summarize the results and research contributions of this ! work. We also use this chapter to discuss ongoing and future research opportunities: that can be built based on the work presented in this thesis. ■ i i i ■ i i i I i i i i 3The appendix contains a more comprehensive listing of our experimental measurements. t 6 . ! Chapter 2 i I Related Work ' Previous work in the area of distributed and heterogeneous database systems has ! proposed a variety of architectures for sharing in a distributed environment. In I 1 ! this chapter, we discuss these approaches with respect to the issues of pre-existing. components, and transparent access of remote data by the users of these systems. V iew 1 V iew 2 SQ L F ro n t-E n d IMS F ro n t-E n d M eta-D atabase Figure 2 .la Figure 2.1: Heterogeneous Database Architecture Research in the area of heterogeneous database systems began over a decade> | ago[9, 29]. The term heterogeneous database was originally used to distinguish^ work which included database model and conceptual schema heterogeneity from1 t work in distributed databases1 which addressed issues solely related to distribution; j [6]. Recently, there has been a resurgence in the area of heterogeneous database systems and the field has evolved to include pre-existing database components. It isj ■ with respect to pre-existing components that we consider heterogeneous databasesj ■ systems here; heterogeneity issues are not the focus of this work. Systems using| : this architecture are characterized by the (possibly partial) integration of severalj , component database schema into one centralized global schema [25, 29, 30, 33] (seei ' Figure 2.1a). In this system, different user views can be defined. However, they must| ! all be derived from the same global schema. Usually there is some sort of database! | (labeled Meta-Database in Figure 2.1a) associated with the central node to store! J the global schema itself and auxiliary information about the various components.; i ! 1The term distributed database is used here as it has been mainly used in the literature,' | denoting a relatively tightly coupled, homogeneous system of logically centralized, but physically j distributed component databases. | i While this approach supports pre-existing component databases, it falls short in terms of flexible sharing patterns; the integration process is very expensive and tends to be difficult to change. As with centralized approaches, the central node! i tends to be the bottleneck of the system and the system as a whole is unresilientj during failures of this central node. Furthermore, in human terms there is a new| [ layer of abstraction that the user must learn in order to pose queries on the global schema. To help alleviate this transition, various front ends can be placed on top of the global schema to emulate the user’s local environment (see Figure 2.1b). Thus, it is possible to allow users to pose queries in terms of a data model (e.g., Relational ) similar to his/her own through these front-ends. Nevertheless, the user must alternate between his/her own local database and this new “virtual” database (distinct from his/her own local database) whenever s/he wants to consider any remote sharing. Orion-2 [16] for instance, uses the notions of a private database andi a global shared database. The users then explicitly pose their queries against the private database or the shared database or the combination of private and shared database. It is possible to observe some similarities between the heterogeneous database approach and the work presented in this thesis. In some sense, the central node with the global schema can be considered to be a local database system that is in corporating remote data from external systems. However, there are three important differences. First, the global schema holds no local data. Rather, it is the (possibly partial) integration of all the remote schemas. There are no instantiations of local or remote data other than m eta-data (i.e., schema and component information). Sec-‘ ond, the database system used for constructing and maintaining the global schema is a tailored system designed specifically for use in a distributed environment. From this perspective, the heterogeneous database approach does not address the issue of! i existing database systems. Third, sharing patterns are not flexible. Importing indi vidual instances or behavioral objects is not possible. Sharing patterns are usually! restricted to high level, set oriented queries that are are passed over the network (e.g., expressed in SQL). The federated architecture proposed by Heimbigner and McLeod [14] is a loosely coupled collection of database systems stressing autonomy and flexible sharing pat terns through inter-component negotiation (see Figure 2.2a). Rather than a static Federal D ictionary Figure 2.2a: Federated Architecture M u ltid atah ase Language Interface DB3 DB1 DB2 Figure 2.2b: Multidatabase Architecture Figure 2.2: Federated and Multidatabase Architecture i I | single global schema, the federated architecture allows multiple import schemas2 j 1 which are stored in the federal dictionary. The rationale here is that the resolu tion of conflicts (e.g., structural, naming, and scale) during the schema integration process is a subjective, application dependent process. Accommodating multiple in- ! tegrations (i.e., import schemas) allows flexibility and autonomy. Furthermore, oncei ;a particular import schema is chosen, data is retrieved directly from the exporter j 1 . 1 and not indirectly through some central node as in the heterogeneous database! architecture. ' I i ; The multidatabase architecture proposed by Litwin and Abdellatif [20] is in aj 'sense similar to the federated architecture (see figure 2.2b). The observation here' j is that the integration process is too complicated and ultimately infeasible. Ini this architecture there is no global schema; the integration process is simply not supported. Rather, emphasis is placed on the interoperability among component. | databases based on a flexible, common multidatabase language. In this approach, i ‘ ■ the user is responsible for keeping track of the various databases and their schemas' in order to navigate and manipulate data. In addition, the user utilizes a new] 2These are sometimes referred to as federated schemas. 'm ultidatabase language. For example, in MRSDM [20] the m ultidatabase language (provided, MRDS (based on the tuple calculus), introduces the notions of multiple ^ queries, multiple identifiers, and semantic variables. These additional constructs are iintended as tools for helping users to manipulate relations with multiple semantic interpretations. The result, however, is that the user operates at a new level of abstraction and must learn how to use these new tools. Our work can be viewed as an extension of the federated architecture that sup ports a higher degree of transparency, additional sharing capability, and efficient implementation. As in the federated approach, we allow components to directly establish their own sharing patterns with other components without the use of a 'centralized global schema. In addition, objects at different levels of abstraction can |be selectively imported into a local component. The resulting database transparency preserves the functionality of exisiting database tools and applications. I Chapter 3 Object Database Context and Sharing Patterns In this chapter, we present a generic functional object data model (FOOM) which supports the usual object-oriented constructs. We use this model to represent the conceptual schemas used in this thesis. In addition, FOOM serves as a reference; point from which we can describe our techniques for transparent sharing. We also! describe three sharing patterns (instance, type, and behavior) that naturally arise' in this object database context and illustrate them using FOOM. 3.1 Generic Functional Object Data Model The conceptual database model employed in this work is based on the functional i database models proposed in Daplex [27], Iris [10], and Omega [12]. This function- [ ally object-oriented model (FOOM) contains features common to most semantic, ! [1, 13, 15] and object-oriented database models [2] such as GemStone [22], Orion! ! [17], and 02 [18]. In particular, FOOM supports complex objects (aggregation), j type membership (classification), subtype to supertype relationships (generaliza-1 i tion), inheritance of functions (attributes) from supertype to subtypes, run-time < binding of functions (method override), and user-definable functions (methods). ! Everything in FOOM, including m eta-data, is modeled as objects. There are! : three primary kinds of objects: instances, types, and functions. Functions operate j I on instances, types and possibly on other functions. Types constrain the objects on! which functions can operate. Types also serve to encapsulate objects by defining the set of functions which operate on members of the type (as in abstract data types). Instances are classified into types and represent real world entities or values. Alii objects have a unique system generated object identifier (OID). j Functions are used to model attributes (or more generally inter-object rela-j tionships). These functions can be broken down into two categories: stored and computed. Stored functions loosely correspond to the traditional notion of stored attributes. Computed functions can be used to calculate values instead of simply retrieving stored values from the database. ' < Figure 3.1a represents the conceptual schema of Reseacher-A’s database whichl I was originally introduced in the collaborating researchers example. The figure serves | to illustrate the simple diagrammatic notation we will use for depicting conceptual j schemas in FOOM. In our diagrammatic notation, instance and type objects are I ! depicted as bubbles. For type objects, the name of the type is placed within the' bubble. There are four types depicted in Figure 3.1a: P ub lication s, Conferences,! Journals, and T utorials.1 Each type has a set of functions defined on it. The signature of each of these functions is placed immediately above the type. The input arguments of each of these functions, are instance objects of the type upon which 1 C o n fer en ces refers to conference papers, J o u rn a ls denotes journal publications, and T u to ria ls refers to journal papers that are tutorial. I Title(): String View() Text^BodyO^String Issue#(): Integer Publi cations LocattonO :Strmg Tutorials (Researcher-A’s Conceptual Schema) Figure 3.1a:FOOM Example Legend S u p e rty p e Membership Entity User- Types In_Arg(): O D D Out_Arg(): ODD S_or_CO- Bool Name() : String Functions Types Figure 3.1b: Meta-Data in FOOM Figure 3.1: FOOM Database Example the function is defined. Hence, the input argument type of functions is not shown in their signatures; it is assumed to be of the type on which the function is defined.2 Hence, in Figure 3.1a, the functions defined on the type P u b lic a tio n s are: TitleQ,< ViewQ, and TexVBodyQ. TitleQ is applied to an instance of type P u b lic a tio n and' returns the ASCII string corresponding to the title of the publication. Similarly,( Text_BodyQ returns a string representing the actual body of text. ViewQ is aj computed function used to display instances of P u b lic a tio n and does not return a; value in this example. | Two kinds of inter-object relationships are explicitly modeled in FOOM and have| corresponding diagrammatic notations: the supertype to subtype (with inheritance) j interclass relationship and the type membership relationship. All other inter-object relationships are modeled through functions. The supertype to subtype relation-j ship is depicted with thick dark lines from supertype object to subtype object. Inj Figure 3.1a, C onferences and Jo u rn a ls are subtypes of the type P u b licatio n s.; i Further, through inheritance, not only do instances of C onferences have the Loca- tionQ function defined on them, but they inherit the functions of their supertype J 2The function is also defined be on its subtypes through inheritance. 14' P ublications. Hence the functions TitleQ, ViewQ, and Text-BodyQ are also de-! fined on instances of Conferences. Type membership is depicted with thin dotted j lines from type objects to its members (i.e., instances). In Figure 3.1a, the Jour nals type has four instances: three directly created as Journals and one from the, t T utorials subtype. j J I j M eta-data information is also modeled in FOOM (see Figure 3.1b). E n tity is the | ‘ root of the type hierarchy and has as its subtypes System -T yp es and U ser-T ypes., | S ystem -T yp es in turn, defines two subtypes, T ypes and Functions. Note that' j all types are instances of the type T ypes as depicted by the thin dotted lines from (T ypes to all other type objects. In fact, all of the types in Figure 3.1a are also instances of the type T ypes. The root of the hierarchy starting at P ublications is also shown in Figure 3.1b, where the remainder of it is as shown in Figure 3.1a. I T yp es has the three functions, Show-InstQ ShowSubQ, and NameQ defined on it. Show-InstQ and ShowSubQ return the OID’s of the instances and subtypes, of a type respectively. NameQ simply returns the ASCII string of a type. All) functions are instances of the type Functions and have a corresponding set of J functions defined on them. In.ArgQ and Out-ArgQ return the OID of the typesj J of the input and output arguments of the function. S-or-CQ returns true or false' j depending on whether the functions is stored or computed. NameQ simply returns i , , the name of the function. These “meta-functions” may vary depending on the actual' implementation but serve to illustrate the kinds of functions that might be defined on the m eta-data (i.e., Functions and T ypes). Notice that the function NameQ is overloaded; both the type T ypes and the type Functions have a function called NameQ defined on them. This function overloading is not ambiguous since each; function is distinguished by its input argument type. More complex overloading: I ambiguities can be resolved using dynamic binding semantics (see Chapter 5). Also,: to make conceptual schemas more legible, we do not always show all the functions j , defined on types nor do we draw all the thin dotted lines denoting type membership.1 i s j I I I . ! 1 3.2 Sharing Patterns ! i I Since every object is treated uniformally in FOOM, it is natural to investigate the sharing of individual objects (instances), m eta-data objects (e.g., types), and| I I _______________________________________________________________________ 15] L egend Supertype M embership ----- Rem ote t//} I TitleQ: String ViewO I Text_Body()-String I Pub_Date(>: String Research-Papers SIG() :String IEEE-papers ACM*papers (Researcher-B*s Conceptual Schema) Figure 3.2a: Sharing Instance Objects Research-Papers s i o o : String I E E E - p a p e r s ACM-papers o o < ss> 0 Figure 3.2b: Sharing Tvr»e Objects Figure 3.2: Sharing Pattern Example j behavioral objects (functions). Returning to the previous example of collaborating1 researchers, recall that Researcher-B would like to import some specific publications from Researcher-A. This situation corresponds to instance sharing. In order to illus- [ trate this more clearly, Figure 3.2a depicts the conceptual schema of Researcher-B. Notice that Reseacher-B models publications differently than Reseacher-A (Fig ure 3.1a). Imported instance objects are denoted by the hashed bubbles in Fig ure 3.2a. Hence, Researcher-B now sees four instances of A C M -p ap e rs where originally there were only two and three instances of IE E E -p a p e rs where origi nally there were only two. Even though Figure 3.2a depicts the remote instances differently than local ones, they all appear local (i.e., transparent) to Researcher-B in our approach. ; Existing approaches have concentrated on what corresponds closely to type shar ing [30, 34]. These systems focus on the sharing of entire collections (e.g., relations), rather than members of the collection (e.g., tuples). By contrast, importing a remote type in FOOM naturally leads to importing the subtree of the remote type hierar chy rooted at that type. Again, using the collaborating researchers example, assume Researcher-B would also like to have access to all of Researcher-A’s journal publica-, tions. Figure 3.2b illustrates this situation. In this case, Researcher-B also imports! i the T u to rials type by virtue of the supertype-subtype relationship. The principal! i < advantage of this kind of sharing is that it allows Researcher-B to use Reseacher-A’sj database without integrating Researcher-A’s conceptual schema with his/her own conceptual schema; Reseacher-B simply uses the part of Researcher-A’s original I schema rooted at Jo u rn a ls. Another way of looking at this is that Researcher B 1 ; can now move from his/her own context to Researcher A’s context. Researcher b | I ! I can even add or delete instances from his/her own context of Researcher A (note1 { that one of the instances of Jo u rn a ls in Figure 3.2b is local) without really adding! | or deleting it in Researcher A’s database. This is a useful feature when the auton-i omy constraints of Researcher A prohibit any addition or deletion of its instances. ! To some degree, instance sharing can be viewed as the complementary situation; I instead of integrating local instances into a remote component context, the goal is; ] I to integrate remote instances into the local component context. | In addition to simply sharing objects representing information units, it is also1 I possible to share behavioral objects. The importer may then access services not; provided by his/her local system. For example, in the collaborating researchersj | scenario, assume Researcher-B has a dvi previewer method but no latex compiler, j on his/her local system. Further, Researcher-A has latex available by virtue of a > function, latexQ which takes ASCII text with latex commands as input, and outputs! dvi format. In this case, our approach allows Researcher-B to share the function I I latex(). Sharing functions in this way is very natural in FOOM because functions, are represented as objects. J Chapter 4 f i i i 1 A Unified Framework Based on Function Sharingi i j Abstractly, an object consists of one common entity (identified by its unique OID) j that serves to unite the values of its component properties. Functions are applied, | to objects in order to access values associated with an object (i.e., its state). Using] ! this perspective of the relationship between objects and functions, we consider the| j effects of distribution on both objects and the functions that operate on them in-j J dependently. We present a taxonomy of the various cases and relate this taxonomy! to our original goal of object level sharing. | 1 4.1 Taxonomy of Function Sharing i i J Let us assume the existence of a function F , which can be shared among components ^ ! of a federation; without loss of generality, assume T takes as input the argum ent, a} The argument type can be a literal (i.e., Integer, String, ...) or a user-defined type such as R esearch-Papers, for example. Sharing takes place on a component-; pairwise basis, meaning that T is exported by a component C 1 and imported by1 a component C 2. The importing component is called the local database, whilej the exporting component is called the remote database. There are several ways' j components C l and (72 can share the service provided by IF, depending upon the! location where T executes and upon where its input argument a resides (i.e., there' are two degrees of freedom). At this level of abstraction there are four distinct function-argument combinations: • local function - local argument | • local function - remote argument j ' j • remote function - local argument I I , I I • remote function - remote argument In Chapter 3, we noted that functions could be further differentiated as either stored or computed functions. At this (finer) level2 of granularity, we can now distinguish between a total of eight different sharing scenarios. In Figure 4.1, Local refers to the domain of the local database while Remote refers to the domain of a remote database. Local objects are those that belong to the local database, while remote objects belong to a remote database. Below we consider each of the sharing situations shown in Figure 4.1. We first focus on stored I functions, and then turn our attention to computed functions. 1 j ! I I j 4.1.1 S tored F u n ction s I ! As a framework for analysis, recall the collaborating researchers scenario where! both Researcher-A and Researcher-B maintain separate databases of journal and; 1 Since the argument can be a complex unit of information, this is not a lim itation; m ultiple arguments can be handled by an obvious extension of our approach. 2We now have three degrees of freedom. } 1 j conference publications. Figure 4.2 shows both schemas together. In this scenario,; Researcher-B is the local database and Researcher-A is the remote database. ; i The four situations for the sharing of stored functions among components can’ be broken down as: ; I 1. Local function - Local object This is what we term the base case. Both objects, the stored function T and its argument a, reside in the local component and can be executed as usual;, all processing of T is done locally. | 2. Local function - Remote object In this case, the local stored function T is applied to argument a which resides- remotely. This situation has the effect of giving local state to a remote object.! For example, Researcher-B can create a value for the Pub-Date() function onj the highlited remote object, o, in Figure 4.2. The value of Pub^Datef) is stored! t locally in Researcher-B’s database while Researcher-A remains unaware of thej existence of Pub-Date(). This feature is very useful for allowing Researcher-Aj to “adjust” remote instances and customize them to the local environment! while respecting the autonomy of the Researcher-B’s database. ; Stored Functions Computed Functions ^ a r g u m e n t f u n c tio n 's . L O C f t l Local Remote Remote b a s e c a s e MS* t M » a t W l .tu t , <+•*«<. Bvr n - m t f u u t t p i 1 u n d d m e d b a v i N r o t m s U n t c &<vkw n m n tv tv a r g u m e n t function's. Local Local Remote Remote h a s t * i » i s t * I I S * l o l l « l i I H l» > t »U% l u L . l l ) > , ) ) • > ' l . u W 1 i l l j u t l u l . j . H v hasis lor hcljayjor shai ms* basis lor a is tan* o 1 m l sharing U l l i o l c l * s |l . l V I « > l O i l J l X . i l ! p l V K e s M lk L * ( k x u u i u o l i l y Figure 4.1: Function Sharing Taxonomy Issue#() : Integer o o 0 o o o Tutorials o Researcher-A’s Conceptual Schema Lesend Supertype M embership a * Remote < Z 2 > Title(): String View() Text_B odyO: String Pub_Date(): String R e s e a r c h - P a p e r s IEEE-papers A C M - p a p e r s o CD CD ° O Researcher-B* s Conceptual Schema Figure 4.2: Two Example Component Database Schemas | i i 3. Remote function - Local object \ This situation is somewhat meaningless, since stored functions only have a. meaning in the local context of the component in which they were initially1 created. For example, if Researcher-B, who does not have the IssuejfcQ func-. tion defined in his/her schema, wants to see the issue number of all his/her' R esearch-Papers, Researcher-B would first need to create the new func tion, Is$ue#(), and then populate it with the corresponding values rather! than being able to use Researcher-B’s Issue# function. However, when look-: ing at this situation from Researcher-A’s point of view, one can argue that this situation is merely the mirror image of the second case (Local function Remote object). Rather than executing Researcher-A’s function remotely in: Researcher-B’s database, one can integrate Researcher-B’s R esearch-Papersj into Researcher-A’s schema (e.g., create a new subtype, Research-Papers,j of Journal-Papers and populate it with Researcher-B’s R esearch-Papers): and execute the IssuejfcQ function locally. Still, the a value for Issu e # Q , would have to be created in Researcher-A’s database for each instance ofi R esearch-Papers. ; I i I I 21; 4. Remote function - Remote object This case, like the first case, is a base case; both the state of the object and the execution of the function are in the same component (e.g., Researcher-j A). The difficulty here lies in providing database transparency. The nextj chapter addresses this issue. We point out here that this situation forms th e 1 j basis of the instance sharing pattern described in Chapter 3. For example,! : the remote instance, o, would appear local and have its original values if oV 1 T itle () and Pub-DateQ functions were the original remote functions defined; I . ' ! in Researcher-A’s database. I I ! i ! | 4.1.2 C om p u ted Functions j | The four situations for the sharing of computed functions is can be broken down as: I I i j 1. Local function - Local object As in the case of stored functions, this is the base case. Computed function J- as well as its argument (a) reside in the local component and the execution is local. | 2. Local function - Remote object This situation can be reduced to the base case described in case 1. For ex-: ample, Researcher-B can use his/her own locally defined ViewQ computed function to view the the remote instance o. I i 1 I 3. Remote function - Local object ■ This is the reverse of the previous case: the function executes remotely and1 j the input argument is supplied from the local database. In effect, the remotei database is providing a non-local “service.” Intuitively, this is the most usefulj scenario from Researcher-B’s perspective and forms the basis for behavior| ' . . 1 | sharing. For example, if Researcher-B did not have a ViewQ function, then; s/he could use the ViewQ defined in Researcher-A’s database. ; I 4. Remote function - Remote object I This situation is similar to the first case (Local function - Local object) ini that both the state of the object and execution of the function are in thej I same component. For example, Researcher-B views one of Researcher-A’s C onference-Papers using Researcher-A’s original ViewQ function. 1 In the examples above, the functions have returned a literal type (e.g., the TitleQ ' function returns a String). However, functions with signatures involving abstract] (user-defined) types can also be shared. In this case, both the input and output argument types must be defined locally; if they are not, their m eta-data must bei I ^ I imported beforehand. The location of the result argument is determined by the] I location where the function executes. I i 14.2 Relating Taxonomy to Sharing Patterns 'in this section, we show how the taxonomy just presented can be used to unify the ■original notions of instance, type, and behavior sharing introduced in Chapter 3., :We proceed by explaining how each of these sharing patterns can be implemented; using a case presented in the taxonomy. 1 1 Conceptually, in instance sharing, a remote instance object is imported directly j ; into a local type. This remote instance behaves in the same manner as a local I instance object from the user’s perspective. However, the actual state of the remote! : instance exists in the remote component database; retrieval of any state of th e : j remote object is done by accessing the remote database transparently. Hence, access* jto remote instance objects corresponds to the Remote function - Remote object j [situations described in Figure 4.1.3 I ! Sharing behavioral objects corresponds to sharing a computed function that! (exists on a remote component. Intuitively, when an instance object is imported, only J idata is being shared. On the other hand, importing a behavioral object gives thej importer access to services not provided by his/her local system. This corresponds] the Remote function - Local object situation in the taxonomy for computed functions, j i Type sharing consists of two aspects: 1) sharing the m eta-data 2) sharing all the] (instances of a type. The first aspect does not relate to our taxonomy. The second| ; aspect is handled by instance level sharing. j ! _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ i 1 3Note that this applies to both stored or computed functions. ; 23 One useful sharing pattern predicted by our taxonomy that is not directly cov- i iered by instance or behavior sharing, is the im portant case of Local Function -j i ' . . . . ' | Remote Object. Among other things this situation allows users to add additional; 1 state to remote objects without modification of the exporting database, thereby pre-| serving the autonomy of the exporter. This ability to create local state for remote! I E j objects is achieved automatically from the way we implement instance level sharing I ; and corresponds to simple local database access. It also allows a remote object to [ j _ I | be customized to the environment of the local component database. ! i 4.3 D iscussion In more practical terms, our taxonomy of eight different sharing patterns can be’ reduced to two “most interesting” cases: (1) executing an imported function on a local argument; this corresponds the Remote function - Local object situation; and! (2) executing a local function on an imported (shared) argument; this corresponds to> the Local function - Remote object situation. The first case only applies to computed! functions and can be described as reusing a previously defined function from another! component in the federation. This is a principal reason why components would j want to share behavior: code reuse. The second case applies to both computed andj stored functions and can be described as extending the “characteristics” of a remote: object with added functionality while at the same time respecting the autonomy of j the originating site. In this case, the structure of an object (type) is shared by otherj components which will not be able to see or modify the original object. I In our discussion of function sharing, we have stressed separation of the location! where the function executes from the location where the data resides. However,! dn order to achieve database transparency, this separation of function execution) and argument location should be completely transparent. In the next chapter wej present our mechanism for achieving database transparency for the instance, type,: and behavior sharing patterns. j Chapter 5 Mechanism for Seamless Interconnection I I i i I In this chapter, we describe the details of our mechanism for providing database! I transparency. Our approach is based on the function sharing taxonomy presented in. the previous chapter to implement the instance, type, and behavior sharing patterns/ In order to provide transparency, we make use of commonly accepted object-oriented! constructs. | i l ____________________________________________________________________________ , I j I I I 25 [5.1 Providing Database Transparency j | The federated environment we are considering places two significant restrictionsj ion our mechanism: no global schema and no global OID space. One of the main[ ' drawbacks of a global schema is that it only provides for a single view of shared | data. Our approach maximizes flexibility by allowing remote objects to be selec tiv ely imported directly into a component database. Lack of a global OID spacej ; leads to obstacles to providing the seamless operation of local database tools and. ! applications. The purpose of OIDs are to give each object a (system-wide) logically i | unique identifier. Since each component is autonomous, there is no way to guaranteej ' that OIDs generated independently on separate components will be globally unique. | Previous approaches simply construct unique global OIDs by concatenating a local! OID and the database identifier uniquely identifying a site [16, 21]. While this ap-| proach guarantees globally unique OIDs, it is not a tolerable solution since existing! I component’s database software would need to be modified in order to interpret the; global OIDs. Our approach is to create local surrogates for remote objects. By using surrogates, differences between remote representations of objects (e.g., object' identifiers) can be masked out (made transparent) so that the local DBMS is able to interpret and manipulate remote objects as usual through the surrogates. When it' is necessary to retrieve the actual state of a remote object, using surrogates alone is' 1 . . . 1 not sufficient. Our mechanism exploits the extensible nature of the object-oriented; paradigm (e.g., FOOM) by rewriting the functions encapsulating surrogate objects] as computed functions which then retrieve the results of applying the function on1 I . . 1 1 the remote component where the object is actually stored. This mechanism does not: 1 j . require modification to existing DBMS software. We simply provide computed func-: j tions which override the stored functions at run-time. In order for the overloading ■ to perform correctly, function naming, binding, and placement in the type hierarchy are critical. Below, we discuss in more detail our technique for placing objects and1 overloading functions so as to preserve the seamless operation of a component. i E ntity Tit4«(); 3 » trto & \ V ie w O - \ Ik ! - & • t « lv < k S i Datvt), S (j n i> ' i . t v «. IK K I^IU iik i h [ A’ C j i v , L e g en d Supertype M embership o U ser’s View Remote * Figure 5.1: Sharing Instance Objects 5.1.1 Instance Sharing < i The key to incorporating remote instance objects seamlessly into a local component lies in determining which local component functions the remote instance partici-' pates, and then insuring that these functions operate transparently when surrogate; 1 objects are involved. Below we first consider the two categories of functions in j which instance objects participate and present our mechanism for preserving trans parency for each category. Next, we discuss how we handle remote instances that have additional or missing functions defined on them by the remote component. ’ Functions in which instance objects participate fall into two categories: functions, that operate on the actual type object of which the instance is a member (e.g.,| i Show-Inst()), and functions that are defined directly by the types of which the' ! instance is a member (e.g, Title (J). Hence, we must make sure that both of these! ! kinds of functions operate correctly on surrogate instance objects. , ! Type level functions (e.g., Show.Inst()) execute properly since surrogates are' created as members of that very type. For example, in Figure 5.1 if the type level' function Show-InstQ was invoked on IE E E -p a p e rs, the function would return two’ procedure Import_Instance(Rem-OID, Rem-DB, Rem-Host, Loc-Type-Name, n_funcs, Local-f-names, Remote-f-names, F-signatures) begin var i: integer; if (“R ’*-Loc~type-name does NOT exist in local Database) { '“ R ” ~L.oc-type-na.tne refers to the surrogate type w hose nam e is Loc-type-nam e preceded w ith an “ R "} Create type “R”-Loc-type-name as subtype of Surrog&te-Types for i a s 1 to n_fun.cs Create a computed function. Loc-f-names[i] with signature F-signaturesfi] which will make RPC call to Remote-f-names[i] end; {for} end; { if} Create new instance object, inst, of type Loc-Type-Name Add inst to the type “R”-Loc-Type-Name {inst is now a m em ber o f both types} R_OID(inst) = Rem-OID; DB_.Name(inst) = Rem-DB; Host_Name(inst) = Rem-Host; end; {Import-Instance} Figure 5.2: Pseudo-Code for Import_Instance Procedure j i ! OIDs: one local instance and one surrogate. In general, all type level functions wilh have their transparency preserved by using surrogates. j For instance level functions, however, we need to introduce a surrogate type ini order to preserve transparency. The purpose of this new surrogate type is to overridej the local functions which surrogates inherit from the local type of which they are a! member. By additionally creating the surrogate as a member of this surrogate type,| I the functions which the surrogate instance originally inherited are overridden. This; approach is illustrated in Figure 5.1. The large gray bubble in the diagram, serves toj indicate Researcher-B’s perspective of the conceptual schema; s/he is unaware of thej other types in the diagram. There are two thin dotted arrows from R -IE E E -p ap erS | (“R” for remote) and IE E E -p a p e rs to the surrogate instance. This indicates that' surrogate instances (i.e., remote instances) of IE E E -p a p e rs are created as members! i ___ ___ ] of both IE E E -p a p e rs and R -IE E E -p a p e rs. Thus these (remote) instances inherit; j functions multiply from both R -IE E E -p a p e rs and IE E E -p ap e rs; any duplicatelyj 1 named function from the two types is overridden by the function defined by R - ; IE E E -p a p e rs. Surrogates are created as members of both types simultaneously : because we need a surrogate to sometimes act as a member of IE E E -p a p e rs (for 28 type level functions such as Show-Inst(J) and sometimes as a member of R -IE E E - p a p e rs (for instance level functions such as Title (J). \ Figure 5.2 shows the pseudo code for ImportJnstanceQ which is the procedure for creating surrogates for remote instances. Here the naming convention used for variables in ImporVlnstanceQ is to precede variables pertaining to remote informa-, tion with “Rem” and variables pertaining to local information with “Loc.” The first! three input parameters to Import.InstanceQ supply information about the remote! OID, database and host respectively of the instance being imported. The next; param eter, Loc-Type-Name, is the name of the local type (e.g., IE E E -p ap ers)i in which to make the surrogate a member. The n_funcs param eter represents th e’ number of functions that will be created for the surrogate. In the example shown in Figure 5.1, there are three functions: TitleQ, ViewQ, and Pub-Date() so n_funcs in this case is 3. The Local-f-names, Remote-f-names,’ and F-signatures parameters are arrays of length n_funcs which correspond to th e1 I local name, remote name and signature of the functions that will be created. In thej example, Local-f-names, Remote-f-names are the same (i.e., an array with elementsj ! TitleQ, ViewQ and Pub-DateQ). ImportdnstanceQ first checks to see if a surro-j j gate type has already been created for Loc-Type-Name (e.g., R -IE E E -p a p e rs). Ifj it does not already exist then the surrogate type must be created along with the; computed functions which are used to override the local stored functions. Then; the object is created as a member of Loc-Type-Name and the surrogate type (de-> noted R-Loc-Type-Name in Irnport_InstanceQ). Finally, the R-OIDQ, DB-NameQ and HosLNameQ functions defined on R-Loc-Type-Name (e.g., R-IEEE-papers) are supplied. In Import-InstanceQ, the param eter n Junes represents the number of remote, functions defined on the remote object. In general, this number will not be the same as the actual number of local functions defined on members of the type Loc- Type-Name. There are several different ways of handling this situation; we supply ! mechanisms to support these alternatives. In the situation where Loc-Type-Name i j has additional functions not defined on the remote instance, local queries applied I j to remote instances involving these additional functions will simply return null val- t j ues which are discarded in the result of the query. For example, Pub-DateQ is not defined in the surrogate type, R -IE E E -p a p e rs in Figure 5.1. Local queries. involving Pub-Date() simply return null values for the local surrogate’s PubJjateQ j value. When the results of queries are passed back to the user, these null values j are discarded. However, it is entirely possible to assign values to local functions! i which are not defined remotely.1 This is ideal for situations where the autonomy ofj a remote component restricts the creation/modification of functions on instances it i is exporting. The result is that an imported remote instance can have additional state not originally or currently found on the component exporting it. This can be extremely useful in allowing a local component to integrate remote objects flexi bly and independently into its local databases. Hence, in the example, it is quite permissible to set the value of Pub~Date() locally to be “1/1/92.” The reverse situation where Loc-Type-Name does not have some of the functionsj defined remotely on Rem-Type-Name is slightly more complicated, and is depen-' ! dent upon the signatures of the functions on the remote component. Most of the' j functions presented in the examples return literal values such as strings or integers.1 * Such literal types do not have OIDs associated with them. They form the atom ic1 * types upon which other user-defined abstract types can be built. In addition to! * 1 1 functions returning literal types, there are some functions which return abstract ob-| | jects, as represented by OIDs. An example of a function which returns an abstract > ; object is AK^/ior//:Publications->Persons. Author() takes in an instance of type^ ; P ub lication s and returns an instance of type Persons. When signatures of addi-’ i tional functions defined by a remote component on an instance return simple atomic' I * types, queries against the local instances return null values (analogous to previous: 1 situation of missing functions), which are then discarded in the results. However/ I . . . 1 I some problems occur when the result type of the function is non-atomic types and| j is not defined locally. For example, when trying to import the function Author ()^ j two problems arise. First, the type Persons is not defined locally. Second, the OID; j returned by the overridden function will return a remote OID that the local system! J cannot interpret. The solution to the first problem is to create a local P erson type; by importing the m eta-data from the remote component. For the second problem,, j a local surrogate must be created for the person who is the author of the instancei publication being imported. i____________________________________ 1This corresponds to the case of Local function - Remote object in the taxonomy presented in Chapter 4. Jtsaaafl Supertype M embership Rem ote Title(): String View() Text_Body{):String Pub_Date(): String Research>Papers Research-Papers SIG<> :String IEEE-papers ACM-papers IEEE-papers} ACM-papers 'ss ue#(): Integer o o o o C D O < Z 2 > < S > Figure 5.3a: Import_Meta (Journal) Figure 5.3b: Import_Type (Journal) Figure 5.3: Sharing Type Objects I The approach we have just described in Import-InstanceQ exploits the feature | in FOOM that allows objects to be members of more than one type.2 We briefly ; describe here an alternative approach which we find inferior due to transparency con- i | siderations. The key to providing transparency involves the overriding of functions.. | In Figure 5.1, we accomplished this by creating the type R -IE E E -p a p e rs and creat- ■ ing surrogates as members of both IE E E -p a p e rs and R -IE E E -p a p e rs. An alter- j native approach might be to create R -IE E E -p a p e rs as a subtype of IE E E -p a p e rs I and create the local surrogates as instances of R -IE E E -p a p e rs. One major draw-. J back of this approach is that remote instances are no longer transparent members' ' of IE E E -p a p e rs. The user is now made aware of the R -IE E E -p a p e rs subtype. A: j second drawback is that in this approach surrogate types appear scattered through-: I out the user’s meta-data. In the approach we support, all remote surrogate types' i i are contained to a single branch of the type hierarchy under S u rro g a te -T y p e s. 1 i , i i I I i i t I j 2This is a feature of the m ulti-type membership construct that is described in Section 5.2. j 31. i * * Figure 5.4: Pseudo-Code for Import JVIeta Procedure (5.1.2 T y p e Sharing j ; j There are two aspects to sharing type objects (see Figure 5.3). The first is to share 'only the m eta-data associated with the type being imported. The second aspect is to also include all the members and subtypes of the type being imported. The m eta-data associated with a type consists of the signatures of the functions; which serve to encapsulate the members of that type. Hence, all that is needed! to import the m eta-data of a remote type is create a local surrogate type with th e1 same functions. The only real complication occurs when one of the functions hasj i a signature whose result type is not known locally. Conceptually it is not difficult! I to apply this simple rule recursively to create a surrogate type for the closure ofj this type. However, the entire closure may not necessarily be required. Figure 5.4| i presents the pseudo code for the procedure ImporLMeta() which specifies the meta-| data importation process for type objects. 1 In addition to the host and database parameters, Import-MetaQ takes as input1 the name of the remote type and the local super type for determining where to| place the new type in the local hierarchy. The name of the type created in the local1 j database is the same as the remote type. Import-MetaQ creates only a single type procedure Import_M eta(Rem-Type-Name, Loc-Super-Name, Rem-DB, Rem-Host) begin v a r Rem-F-Names: array of Func_Names Rem.-P-Signatu.res: array of Signatures;, i, n.fiincs: integer; n_fun.cs = Get_Rem_Fmics(Rem-Host, Rem-DB, Rem-Type-Name, n_funcs, Rem-F-Names); {G et _R em _F uncs() returns the num ber o f fu n c tio n s defined on a type a n d p la c es tixe nam es o f the fu n c tio n s in Retn-F -N am es) for i s 1 to n_fu.ncs Rem-F-signatures[i] = Get_Signature(Rem-Host, Rem-DB, Rera-F-Names[i]) {G et_Signature returns the signature o f a fu n c tio n } end; (for) Create local type named Rem-Type-lN’ ame as subtype of Loc-Super-Name using Rem-F-names, Rem-F-signatures end.; {Import-Meta} I Figure 5.5: Pseudo-Code for Import_Type Procedure I i I with out any of the remote instances or subtypes. Figure 5.3a illustrates the result of j Im port _M eta( Journals, Research-Papers, Res-A-DB, Res-A-Host) where Res-A-i j DB and Res-A-Host are the database and host name respectively of Researcher-A’s j ' database. J ' To support the second aspect of type object importation, we must also import ! the remote members and subtypes of the type. This is the kind of type sharing is il lustrated in Figure 5.3b. To accomplish this, we essentially use the same paradigm as j described in the previous case of instance sharing using both Import-InstanceQ and! ! Irnport_Meta(). Import-TypeQ takes as input the same arguments as frnporLMeta().; i ( i j In addition to creating a local type, Import^TypeQ creates a local surrogate for eachj remote instance and then recursively calls itself to create all subtypes and instances! jof those subtypes. Figure 5.3b illustrates the result of Im port_T ype( Journals,! j Research-Papers, Res-A-DB, Res-A-Host). The pseudo code for ImportJTyptQ isi | shown Figure 5.5. procedure Import_Type(Rein-Type-ISrame, Super-'IVpe-Name, Rem-DB, Rem-Host) begin var Imm-Svib-Types: array ofTVpe_Names; i, n_insts, n_i_subty pes: integer; Import_Meta(Rem-Type-Name, Super-TVpe-Name, Rem-DB, Rem-Host, Rem-TVpe-Name) n_insts = Get_Num_Of_Inst(Rem-DB, Rem-Host, Rem-TVpe-Name); { Get_Num_Of_IristO returns the number of instances in a type } for i a s 1 to n_insts { For brevity an d clarity ive do not show all the steps for retrievifig the param eters o f Import__InstO, nor do we show all o f the param eters in the call. } Import _Inst(); end; (for} n_i_subtypes = Get__Imin_Sub_Types(Loc-Type-Name, Imm-Sub-Types); { Get_Imm_Sub_r Pypes() returns the num ber o f Im m ediate Subtypes o f a type an d places their names in Imm-Sub-'Pypes} fo r i s 1 to n_i_subtypes Import_Type{Imj'n-Sub-TVpes[i]» Rem-Type-Name, Rem-DB, Rem-Host, Imm-Sub-Types[i]) end; {for} end; {Import-Type} I 33] 15.1.3 B eh avior Sharing | ! ! (The goal of supporting the sharing of behavioral objects is to allow a component' tto utilize remote services that may not be available locally. This corresponds to, I sharing computed functions.3 | As in instance sharing, meta-information containing the location (e.g., remote 'OID and remote component name) of the remote function object being im ported1 t 1 imust be stored locally. However, by contrast with the case for instance sharing,' this meta-information is associated directly with the function being imported. In. instance sharing, the meta-information is indirectly kept for remote functions such: i as TitleQ and ViewQ via the surrogate instance object (see Figure 5.1). Thus, in! jour implementation we distinguish between two kinds of “remote” functions: those! , i j implicitly defined by importing instance objects and those directly imported when| j behavioral objects are shared. Figure 5.6 shows our mechanism for incorporating' this meta-information for sharing behavioral objects. We exploit the fact that meta-i data is also represented using our functional object-base model. Imported functions' are created as instances of the type R e m o te -F u n c tio n s and can thus store and; access the additional location met a-information required to execute the imported j function. I I We now present a slightly different example from Figure 5.1. In this scenario' (Figure 5.6, the ViewQ function is imported from some remote component. In; addition, R -IE E E -P a p e rs no longer supplies a ViewQ function. Hence both thei remote and local instances of IE E E -P a p e rs use the same imported ViewQ function! for displaying research papers. This is evident in Figure 5.6 from the absence of thej ViewQ function from the type R -IE E E -P a p e rs and the addition of a new (bold, faced) ViewQ function defined on n R e se a rc h -P a p e rs.4 ! I In order to explain how the ViewQ function works, we must first explain howj i our implementation addresses the issue of side-effects. By side-effect we mean two! 3This corresponds to the computed function case of Remote function - Local object in th e taxonomy presented in Chapter 4. ■ 4The bold font is used to indicate that the ViewQ function is imported and no longer local. 34 E ntity S upertype M em b ersh ip R _O ID (): String R _D B N A M E<): S tring R _H O ST O : S tring U ser’s View R em o te I I k ^ s ^ s i r c h - l ' u i ^ r K » i IKI* l-'-pfpj i A C M |* i - o t i j Figure 5.6: Sharing Function Objects things:5 (1) any kind of implicit input other than the input argument that is neces-j sary to compute the result, and (2) any modifications to the state of the database! where the function executes other than to the input argument. For functions whose | arguments are literals, this simply requires that the function being imported com-! putes its result value solely based on its input argument without modifying any; database state (e.g, FibonacciQ). Functions whose input argument are non-literals | pose additional difficulties. In this case, the input argument is the OID of an! j instance. The problem then lies in determining what information a computed func-. I tion accesses in order to compute its results. Strictly applying our definition of side-effects would restrict computed functions on non-literals to solely accessing and then manipulating the input OID. But realistically, a computed function must' i be able to access some state of the object corresponding to the OID when comput-! ing its result. In our implementation, we take the position that the only state that! a computed function can access are those functions that serve to encapsulate that| object. In other words, the only state the computed function will possibly access: 1 5This extends the more traditional definition given by the programming language community which defines side-effect as “the modification of a data object that is bound to a non-local variable.” ; are those functions that are defined on the types of which the instance is a member. | In the case of the ViewQ computed function, the only functions that ViewQ needst 1 _ t | to access are TitleQ, Text-BodyQ, and Pub.DateQ. i Having determined what information a computed function on a non-literal type- | can access, a problem arises when trying to execute such a function remotely on! ! a local object. The problem occurs when supplying local arguments to a remote] * | ; computed function. Although we know the computed function is limited to onlyj i accessing the functions that encapsulate the instance, we do not know exactly which| | ones it does need. Even if we pass all the possible values the computed function can{ ! access, the computed function must be written in such a way as to retrieve these j arguments from the network and not the local database. This would be undesirable | and contrary to our goal of relieving the computed function writer of needing to know where the data on which it operates is located. The best approach to this problem in our autonomous environment is to allow, computed function writers to define functions without concern as to whether the function is to be exported. Thus, in our implementation, whenever a remotely] J executing computed function needs state from the local database, it performs a callback to the local database to retrieve that state. In particular, the local database1 that imports remote objects can be viewed essentially as a client and the exporter, providing the objects as a server. During instance sharing, the local database simply operates as a client and makes RPC requests to the server. However, for behavior] sharing, when the callback mechanism is used, the local database must in addition! operate as a server to accept the callback requests. Computed functions can be; written using any programming language that can be compiled to re-entrant object] | code. This object code is then dynamically linked into the database managementj ; system kernel when the computed function is accessed. In our prototype using' | the Omega database management system, a computed function accesses the local< ! database through omega.evalQ, which has two parameters: an argument and the' function that is to be applied to the argument.6 i We can now consider how the callback mechanism works transparently and allows! computed functions to be written uniformly without regard as to whether that' I _________________________ i Omega_eval() is described in more detail in Section 6.2. function is to be exported. Consider again the example using the ViewQ computed j function. Suppose that the imported ViewQ function retrieves the TexBBodyQ of an| i instance (say in latex format), computes the dvi formatted version, and displays the! form atted version through a dvi previewer (e.g., xdvi). When the user of the local j i database invokes the ViewQ function on a local object, the ViewQ function is passed I jthe local OID of a R e se a rc h -P a p e rs instance. The ViewQ on the remote serverj ! makes a call to omega_eval() to retrieve the Text-BodyQ, omega_eval() recognizes J j that the OID passed in as its argument is not local and performs a callback to! ithe server of the local database that invoked the ViewQ function. Since the local| I server recognizes the OID as a local OID, it performs the request and passes backi Ithe TexBBodyQ to the remote server which can than complete its computation and! ! display the results on the local database monitor. j I i Functions involving signatures returning com plex/abstract data types can the-j j oretically also be imported. In this case the result argument type must be definedj I locally; if they are not, their m eta-data can be imported using ImporBMetaQ. 1 ' i i i 5.2 Summary of Object Constructs I ; j I In this section we summarize the key constructs we used to implement the seamless i | importation and exportation of objects. We discuss both their functionality and why! I . . they are needed in order to support transparent sharing of objects. The followingj explains how these constructs are used in our sharing mechanism. j Object identity is a common feature found in object-oriented systems. Thisj 1 feature refers to the ability a system to automatically generate logically-uniquei identifiers for each object in the system. Object identifiers (OIDs) serve as unique' handles to access, refer, and manipulate objects. OIDs impact federated environ-! ments because each autonomous component uses their own OID generating scheme;j there is no global OID space. If a global OID space is used, each component’s query' j engine be modified in order to interpret (global) OIDs. This is unacceptable in the J federated environment we are considering. Our approach creates local surrogates 1 for remote objects and stores a remote object’s OID as an attribute of the surrogate J which can later be used to access the remote object in queries. ' C omputed functions are user defined functions which allow users to create theirj I own programs written in a turing complete programming language. This feature j [enables a DBMS to extend and tailor its functionality beyond those originally in- j |tended. The remote access needed in a federated environment is a good example^ l ' | of a functionality that may not have been originally provided by a component’s lo-' jcal DBMS. Furthermore, we can exploit the natural extension of writing computed: functions as Remote Procedure Calls (RPC) [5] for accessing remote data. Overloading allows multiple functions to have the same name. Hence, func-i tions are uniquely identified by the combination of both their name and signature. [ | This feature is useful in situations where overriding methods with the same nam e: j is needed. For example, inherited functions from a supertype may have to be en- I hanced to satisfy the requirements of members of a more specialized type. Database (transparency can be achieved by overloading local stored functions with computed [functions which access remote data. j M ulti — type membership allows a single object to become a member of more j than one type. This is a more generalized notion than the subtype to supertype | relationship where objects that are members of the subtype are also members of' I the supertype. In addition to being members of all its supertypes, objects can be; J members of types which are not related through the subtype-supertype relationship. I For example, the Chiti-Chiti-Bang-Bang object may be a member of both the type A irc ra ft and A u to m o b ile. A general multi-type membership facility like this pi'ovides for a clean mechanism of overloading local functions to remote procedure calls as we explained in Section 5.1.1. 5 Dynamic Binding is a strategy for resolving overloaded functions. The ambi guity arises when an object is a member of more than one type. The same function name may be defined on more than one type of which the object is a member. In order to resolve overloaded functions, it is necessary to determine which type’s function to apply. The semantics of dynamic binding in traditional programming languages is to bind variables to the most recent type which it was used as. We | extend these semantics to object databases. For the case of the subtype-supertype j relationship, we interpret the most specialized type as the most recent. For the [ . 1 I multi-type membership, we interpret the type which was most recently “added” to: 1 an object as its most recent. 5.3 Discussion The basic technique our mechanism uses to import data (i.e., stored functions) is to overload stored functions with computed functions that retrieve remote data byj make R P C calls to the component with the data. Dynamic binding is needed resolve j the overloaded functions correctly. Sharing behavioral objects (i.e., computed func-| ' . . I tions) causes additional problems with side effects. We applied the object-oriented principle of encapsulation to limit the kinds of information the a function could ac-| cess. However, we were confronted with the problem of determining precisely what information a given computed would access and how to supply that information without placing restrictions on how computed functions are written. We solved this problem using a callback scheme which allows computed function writers to write computed functions completely unaware of where they might execute. Our resulting ! approach decouples the location of the function and where it executes from the data, ■upon which it operates. j I : Chapter 6 i Prototype Implementation We currently have implemented an experimental testbed of interconnected Iris and; ! Omega components. In our testbed, a component consists of a database and is! ! i j associated with a machine where the database physically resides.1 A component; 1 can participate in the federation in two fashions. The first is as an im p o rte r of! i I information from other components. In this case, objects from other components! are seamlessly integrated with the local objects in this component’s database. Thel other way in which a component can operate in the federation is as a repository, of information. In this case, the component acts as an e x p o rte r from which other i | components can import information. In this chapter, we discuss the implemen- j tation details of both exporter and importer functionality in our Iris and Omega1 | prototypes. } i I 1For the sake if simplicity, we will assume that each component exports or imports only a single! database and that each component resides on a single machine. > I 40 6.1 Iris Prototype For our Iris prototype, we used an early version of the Iris database management j system (DBMS) that was developed at Hewlett Packard LaboratoriesflO]. Iris is, 1 based on a functional object-oriented data model which supports the usual object- oriented constructs including the ones in Section 5.2. The data manipulation lan guage (DML) provided with Iris is a variant of SQL called OSQL. In the following,| we discuss key stages involved in implementing the sharing mechanism described in. Chapter 5 for an Iris based component. 1 There were three major aspects to our Iris implementation. First, our approachj ! relies on overriding stored functions with computed functions. W riting a separatel I computed function for each overloaded stored function would have been tedious and inefficient. In addition, the overhead in dynamically loading each function could have been very expensive. We eventually wrote a single foreign function! which would apply a remote function to a remote OID, given the host and database! names upon which the function and instance exist. This function, r^ievalQ (remote! iris-eval), was written using the SUN Remote Procedure Call (RPC) protocol [32]. All other computed functions for accessing remote data were derived from it. The second aspect related to the manner in which Iris binds overloaded functions. Iris, uses compile-time (early) binding of types to the variables being quantified over inj i the for each clause of an OSQL query. For example, if the query: I ! s e l e c t T itle (r -p a p er ) 1 fo r each Research-Papers r-paper; i is posed against the database of Figure 5.1, the variable r-paper would be bound to| I i ! the type R e se a rc h -P a p e rs. Hence, the same Title() function would be applied to| | | every object r-paper, the one originally defined on R e se a rc h -P a p e rs. However, we needed the system to use dynamic binding semantics and apply the Title () function1 defined on R -IE E E -p a p e rs to those instances which are remote and apply the; ! Title() function defined on IE E E -p a p e rs (inherited from R e se a rc h -P a p e rs) to[ , those instances which are local. Fortunately, since Iris is an extensible system we can override this functionality; indeed, Iris provides a function latebind which uses the needed semantics. Finally,we needed a server which would service RPC requests to retrieve data from an Iris database.2 Instead of creating one session for each! request, we implemented the server to only open a session for the very first request, after which all other requests are handled with the same session but as a separate transaction for each request. After a certain period of non-use, the server times out, s 'closes the session and retunrs to the initial state. The only functionality required; ^from the server is the ability to: 1) receive a RPC request, 2) apply a function to 1 ■ an argument, and 3) return the results to the requester. The resulting simplicity ofi ! . 1 I this interface greatly facilitates the addition of components to the federation. i I ; i ! 1 6.2 Omega Prototype I ; 0mega[12] is a parallel object-based system constructed using a relational file sys- i [tem (the Wisconsin Storage Structure[7]) Similar to Iris, it is based on a functional data model[27] and supports OSQL at its DML. We used a standalone single-node, configuration of Omega for the testbed prototype. This configuration allows evalu ation and testing of the prototype to be much simpler. In the following, we present the implementation of Omega.3 as it relates to the constructs in Section 5.2. In addition, this overview will prove useful when we interpret the results of the exper-( t iments performed in Chapter 7. | ! The physical implementation of Omega is organized around three types of files:! 11) a meta-data file, 2) a Sub-object Directory (SubDir) file, and 3) a t-file for each typej in the system. Omega represents a type as a record in the m eta-data file. This record; j contains: the name of a type, its immediate subtypes (supertypes), the identity of■ [processors that contain its fragments, relevant declustering information, number of1 I • I j functions, name of each of its functions, internal organization of a function, available’ | indices on a function, etc. ! Omega represents an object as a collection of sub-objects. A sub-object repre-; i i sents the membership of an object in a type. The system supports a t-file for each \ type in the system and groups its instances (each instance is a sub-object of an ; object) together in that file. An object is represented as a record in the SubDir file.; * i 1 ! ; 1 2Rpcgen [32] was utilized to simplify the construction of both the client stub and server, and, to facilitate access to other RPC services. ! 3For a more detailed description see [12]. j 42 j This record maintains a set of (t-file,SID) pairs where a t-file corresponds to a type! and SID is a physical pointer to the sub-object that represents the membership of, this object in that type. The physical location of this record constitutes its identifier that uniquely distinguishes this object from the other objects in the system (OID). The organization of the SubDir file makes it possible to support the multi-type' membership construct. In addition, the SubDir file provides an efficient way for; implementing dynamic binding. In order to support dynamic binding semantics, the' exact order in which an object becomes a member of a type must be maintained. This is done by ordering the set of (t-file,SID) pairs for an object from left to right.; Each time a new type is added to the object, the new (t-file,SID) pair is simply; appended to the rightmost position of the object’s record in the SubDir file. ! i Omega supports stored and computed functions. Evaluating the value of a func-' ! . . . . . i jtion for a particular argument depends on its implementation. Retrieving the value, I of a stored function requires a record look up in the corresponding t-file. Obtain- J ing the value of a computed function on the other hand, requires first dynamically1 , loading the program into memory and then executing the program. In order to ab-, I . . . . 1 : stract out the implementation of a function, the all-purpose function, omega-eval(), t was created. This function takes in as arguments the name of a function and the| function’s argument. Next, it applies the function to the argument and returns th e1 results. Dynamic binding is implemented in omega_eval() by scanning the SubDir; record for an object from right to left. The first type which has the required function! i is the one that is applied to the object. j The Omega importer was constructed exactly as described in Chapter 5 using! the same RPC stub generated for the Iris component. Similarly, the Omega exporter; j was constructed using a server similar the Iris prototype that makes direct calls to, 1 omega_eval(). ! 6.3 Discussion . In this section, we take the opportunity to discuss some of the more interesting de- i ! tails of our Omega implementation. The original version of Omega did not use the. | omega_eval() abstraction. Rather, computed functions were not supported and all j functions were implemented as stored functions. In addition the code for retrieving! stored function values was very cumbersome when evaluating nested function ap- j plications. The omega_eval() abstraction significantly simplified the coding of sucln | nested applications (through recursion) and allowed for the unified and transparent | implementation of functions: stored and remote. In addition, it allowed com putedt i : function writers to access the database via a clean omega_eval() interface. However, I the omega_eval() implementation suffered from the garbage collection problems com- jmon to functional languages. The problem arise from the repetition of allocating ; and deallocating of the arguments and results to omega_eval(). Performing this j memory management was a significant portion the time to process queries. A fu- | ture implementation should allocate and deallocated memory in larger chunks so as ’ I to minimize this overhead. i Chapter 7 I i Performance Evaluation In this chapter, we characterize the overhead of our seamless interconnection mech anism using the USC benchm arkfll]. This evaluation has two main goals: 1) to: demonstrate the feasibility of our mechanism using existing object-oriented technol-, ogy and 2) to gain insight into the performance overhead of our mechanism through i experimental measurements. 1 1 I Function Name Return Type Range of Values Order Unique IDentifier (UID) Integer 0-(MAXOBJS - 1) Sequential Shuffled UID (SH-UID) Integer 0-(MAXOBJS - 1) Random Unique Float (UF) Float 0-(MAXOBJS - 1) Random Unique Stringl (USTR1) String (52 bytes) ‘**...*UID’ Sequential Shuffled String2 (SH-USTR2) String (52 bytes) ‘**...*SH-UID’ Random one-percent Integer 0-99 Random ten-percent Integer 0-9 Random even Integer 0,2,4,...,198 Random odd Integer 1,3,5..... 199 Random one-to-one LVO-T-ROOT (Object-valued) - Random one-to-five LV0-T-ROOT (Object-valued) - Random one-to-O.lp LVO-T-ROOT (Object-valued) - Random Table 7.1: Functions Defined on the Root of the USC Benchmark Type Lattice 7.1 Design of the Experiments j For these experiments we used the Iris and Omega prototypes described in Chapter; | 6. From our initial series of experiments, we observed a wide variation in the relative1 J overhead of our mechanism between the Iris and Omega prototypes. In order to! J study these factors more systematically, we instrumented the Omega prototypes! ; and conducted a second series of experiments using only Omega components.1 j j In the following, we provide a brief description of the USC synthetic database.! , I jNext, we present the results of our experiments for: 1) queries that process the' J instances of a type and 2) queries that reference a fixed set of functions . For the; ! first queries, we controlled the percentage of the instances that are imported from a' i t remote site (horizontal partitioning), while for the second queries we controlled thel j number of functions imported from a remote site (vertical partitioning). We finish; this chapter by analyzing these results and presenting our main conclusions about j factors affecting the performance of our mechanism. I j 17.2 The Benchmark Database i We used a modified version of the USC benchmark for this experimental study. The. j database consisted of a single type, LVO-T-ROOT containing MAXOBJS instances. I I ____________________________________ 1We could not instrument the Iris prototype because we did not have access to its source code. j Table 7.1 contains the functions defined on LVO-T-ROOT. The first nine func-j | tions can be used to model queries with a wide range of selectivity factors. The' I name of each function reflects its range of values. UIDQ , for example, is an inte-> Iger ranging between [0 - MAXOBJS-1] and is assigned sequentially during object; | creation time. One-to-one() on the other hand, is a self-referencing, single valued j function which maps one object to another randomly selected object which is unique and different than itself.2 The same database was used for importer and exporter components in our ex periments. In addition, the importer database constructs a remote type hierarchyi I according to our mechanism described in Chapter 5. Hence, the type R-LV0-T-| | ROOT is created with the same named functions as LVO-T-ROOT that are actually! computed functions which make RPC requests to the exporter. However, functions! which return object-values were not created on R-LVO-T-ROOT. One-to-one() forj example, is defined only on the LVO-T-ROOT otherwise, an object identifier re-j turned from a remote system would be undefined locally. | 7.3 The Queries i I i ’ We used three different queries to quantify the overhead of our mechanism. The- I first two queries are used to evaluate the scaling of our instance level sharing mech-: j anism (logical horizontal partitioning) as a function of: 1) the fraction of instances^ imported, and 2) the size of the database. The third query is designed to quantify ^ the overhead of sharing remote stored functions (logical vertical partitioning). Queries in OODBMS’ generally fall into two categories: associative queries rang-; ing over one or more sets and navigational queries. The first category of associative| [queries resemble queries found in traditional relational database systems. Query 1 < 5 jis used to model this class of queries. It retrieves a fixed percentage of objects that: ; satisfy a certain selection predicate. By varying the percentage of LVO-T-ROOT ! objects that are remote, we can evaluate the overhead of sharing remote instances. ■ This query retrieves 10% of objects from the database. For example, when th e : database consists of 100,000 objects (MAXOBJS 100,000), Query 1 is as follows: ; i _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ; ; 2 A complete explanation of each function can be found in [11]. s e l e c t UID(p) fo r each LVO-T-ROOT p where UID(p) < 10000; i | | Navigational queries traverse the subcomponents of a complex object and typi-j ; cally result in expensive random disk accesses. Query 2 below is used to evaluate our| 1 mechanism when processing this class of queries. We used the one-to-one() function| , to model queries that traverse subcomponents of a complex object. As in Query 1J ; we vary the the percentage of remote objects in the experiments. Once again, we. I control the selectivity factor of this query to 10% of objects in the database. When: | MAXOBJS is 100,000, Query 2 is as follows: j r I I I s e le c t UID(p) fo r each LVO-T-ROOT p j where U ID (one-to-one(p)) < 10000 I The last query, Query 3, projects out each function defined on the type LVO-T- ROOT with 100% selectivity. For this query, we vary the number of functions which: ! are remote. First, UIDQ is made remote, then U S T Rl(J and finally SH-USTR2Q.. t : s e le c t UID(p), USTRl(p), SH_USTR2(p) i ' fo r each LV0-T-R00T p; ■ 1 J 7.4 Organization of Experiments ; i The hardware platform used for these experiments consisted of two HP 9000/720! workstations, each with 16 Megabytes of main memory and ah HP 9000/834 work station with 32 Megabytes of main memory. The workstations were physically j connected using Ethernet (10 megabits/second). In order to systematically charac- ! terize the behavior of the system, we conducted our experiments on two separate J configurations. The first configuration consisted of an Iris im porter and an Omega j exporter.3 The purpose of this configuration was to demonstrate the feasibility of | 3The complement configuration of Iris exporter and Omega importer was also evaluated. These ; results were eliminated because they provide no additional observations as compared to the second configuration. our mechanism and to gain an intuitive understanding of its overhead. The second: configuration which consisted of Omega as both the importer and exporter. T h is, configuration provided a far more accurate environment for quantifying this over-! head of our mechanism as we had access to Omega’s source code and were able ! instrum ent it through simple modifications. j ! j The experiments for both configurations were conducted in the following manner, j I I ; I ! 1. importer and exporter databases are created. The object values for the one-to- j one() function are randomized. In addition, on the importer, the remote typej hierarchy is created. The generated values for R_OID() are random; objects j t on the exporter site. I 2. The response time of Query 1 and Query 2 are measured as a function of th e , i percentage of remote instances (10%, 20%,40%, 80%, and 100%). | f I I 3. The response time of Query 3 is measured as function of the number of remote; functions. I < This entire process was repeated for three different database sizes: 1,000, 10,000, ’ | and 100,000 object databases. Below, we characterize the performance of our constructs for the three types of queries. Aside from the percentage of remote and local instances, we observe that jtwo other factors impact the overall response time of the system: 1) percentage of | buffer pool hits observed by both the exporter and importer, and 2) total number | of requests for object retrieval requests.4 ' I I I . ; 7.5 Heterogeneous Configuration ! 1 i i This configuration consisted of an Iris (version DPP 4.0) importer running on a HP j ,9000/834 workstation and an Omega exporter running on a HP 9000/720 worksta tio n . The goal of this experiment was to demonstrate that the local data manipula- j tion language (DML) of Iris was preserved while providing transparent access to the remote objects that resided on the Omega exporter. To this end, the experiment i 4This corresponds to the number of o in eg a _ ev a l() calls that were made. ; 1,000 Objects 10,000 Objects 100,000 Objects i i 0% 0.98 sec 14.23 sec 171.80 sec 10% 3.42 sec 59.25 sec 983.54 sec 1 20% 4.28 sec 92.24 sec 1,686.13 sec 1 i 40% 5.85 sec 159.59 sec 3,058.80 sec i 80% 8.91 sec 288.63sec 5,727.04 sec i 100% 10.35 sec 346.64 sec 7,045.40 sec i i Table 7.2: Response Times for Query 1. ; was quite successful. However, since this version of Iris was an early prototype, we! 1 . i j had some difficulties creating large databases and making accurate measurements.. j Nevertheless, the results of running the queries showed that the overhead for remote, j access was approximately 25%, making remote accesses quite comparable to local; i , j j access. As we will see in the next section, this overhead depends on a number of; i factors which could not be be fully quantified by this experiment due to our lack; ! of accessibility to the Iris source code. However, our course grain measurements for this configuration indicate that the overhead of our mechanism was very small relative to the local database access performed by Iris because local access relatively is slow. The high local access times for Iris resulted from the use of the latebind{)s \ function. LatebindQ is a computed function which must search the m eta-data every • time an attribute of an object is accessed. If we had used the default early bind-; I | ! ing semantics of Iris, local access time would have decreased significantly, however,; j remote transparency would no longer have been possible. i | In the next section, we analyze the overhead of our mechanism more precisely consisting of Omega as both importer and exporter, and Omega exporter. 1 ] 7.6 Homogeneous Configuration ; i i i , In this configuration, Omega components were used as both the importer and ex porter. The sizes of the exporter’s database for the 1,000, 10,000 and 100,000 object databases were 600 Kilobytes, 6 Megabytes, and 60 Megabytes respectively. The size of the im porter’s database, however, increases as the percentage of remote in-i stances increases; one surrogate must be created for each imported remote instance.' Query 1 for 100K Database Query 1 C 3 Pi 0.9 0.8 0.7 0.6 0.5 1 1 1 * ' importer Importer - 0.9 D t s \ i t t - - 0.8 t \ » t i - \ “ \ O \ \ i c « P i \ ' " w 2 0.7 ' ' v Exporter - - 0.6 ..........’ 1--------- i f / / / / t t / / i / I Exporter 9 - - - e -------------------9 --------------------------------------------- J n 1 1 t 1 1 ■ 1 " 0.5 i 50 % Remote Objects ( a ) 100 0 50 100 Size of Databases(in thousand objects) '(b) Figure 7.1: Hit Ratios for Query 1 When the importer contains 100% remote instances, the database size is roughly: double the size of the exporter at 1 Megabytes, 10 Megabytes and 100 Megabytesj respectively for the 1,000 object, 10,000 object, and 100,000 object databases. For[ ( these experiments, the Omega components were configured with a four kilobyte disk; page and a 1 megabyte buffer pool. Q U ER Y 1 Table 7.2 shows the response time of Query 1 for the three databases sizes as a function of the percentage of remote objects. The results demonstrate a relatively! linear increase in response time as a function of the percentage of remote objects.! For example, in the 100,000 object database column, the increase in response timei between 20% to 40%, 80% to 100% is roughly 1,300 seconds (i.e., the slope is I constant). The observed response times are not precisely linear because the buffer pool hit ratio on the exporter increases as the percentage of remote objects increases.; 51 1,000 Objects 10,000 Objects 100,000 Objects 0% 10% 20% 40% 80% 100% I.97 sec 4.44 sec 5.23 sec 6.71 sec 10.04 sec II.11 sec 237.41 sec 251.05 sec 301.05 sec 411.71 sec 531.11 sec 563.25 sec 6,496.11 sec 7.165.53 sec 7.772.54 sec 9,053.62 sec 11,374.07 sec 12,578.91 sec Table 7.3: Response Times for Query 2. I This causes a decrease in the average object retrieval time for remote objects. This; factor is ineffective for the 100,000 object database because its size is significantly, larger than that of the buffer pool on the exporter. Consequently, the exporter’s! i buffer pool hit ratio remains constant at a 53% for different percentages of remotej ! objects due to its limited size. Figure 7.1a shows this constant hit ratios observed) i . i jfor the importer and exporter for the 100,000 object database. The hit ratio on the j t j im porter remains constant at over 98% is due the sequential nature of the accesses; j on the importer. \ I j In this experiment, the response time increased by a factor of 20 as a function of r i j the size of the database (instead of a factor of 10). This observation is justified by the j different buffer pool hit ratios observed by the various database sizes. Figure 7.1b ■ show that with a 1,000 object database, the exporter observed a 99% buffer hit (ratio, while with a 100,000 object database, the hit ratio is reduced to 53%. Q U E R Y 2 Table 7.3 shows the results of Query 2’s response time with respect to the per-j centage of remote objects. This figure seems to confirm our initial intuition that the' randomness created by the navigational aspect of this query causes a dramatic in-! s | crease in response time of the system. For the 100,000 object database, the response; 1 j time increased from 170 seconds for Query 1 to 6,500 seconds when every object is: j local. For this case (0% remote), our measurements indicated that the average per' , object look up time increased from 1 ms to 30 ms as the hit ratio decreased from ; 98.7% for Query 1 to 80.0% for Query 2 on the importer. In addition, to the hit I ration, the sheer size of the 100,000 object database (117 Megabytes) causes the* ' database to be physically spread over more disk tracks which incurred an increased; 52 Query 2 for 100K Database Query 2 0.95 0.9 Importer 0.85 _ o (2 0.75 K 0.7 0.65 0.6 0.55 Exporter 0.5 100 % Remote Objects 1 0.95 0.9 Importer 0.85 0.8 (2 0.75 0.7 Exporter 0.65 0.6 0.55 0.5 0 Size of Databases(in thousand objects) 50 100 (a) (b) I Figure 7.2: Hit Ratios for Query 2 : I i 1 ; penalty in the seek time. The percentage overhead associated with our transparency; i mechanism exhibits an equally striking decrease. For example, in Query 1 the re-1 I sponse time of the system increased by a factor of 18 when the percentage of remote! instances/objects was increased from 0% to 40% (170 sec to 3,000 sec), however, for Query 2 the response time increased only by a factor of 1.4 (6,500 sec to 9,000 sec). The average lookup time per object for local objects on the importer decreases1 steadily from 31.5 ms to 10.1 ms as a function of the percentage of remote objects, j There are two reasons for this. First, the randomness associated with the navigation! • of objects is partially moved to the exporter. Second, the percentage of buffer pool1 hits observed at the importer increases as Figure 7.2a shows. The exporter is not j affected (the hit ratio stays at 52%) by any additional randomness since all of J its accesses are originally random. The im porter’s buffer pool hit ratio increases ; because for each remote function, it looks up the R J J ID (), R .D B N A M E {), and! ! 1 i I I 5 3 1,000 Objects 10,000 Objects 100,000 Objects 0 Functions 1 Functions 2 Functions 3 Functions 4.36 sec 13.25 sec 19.61 sec 26.94 sec 46.09 sec 382.17 sec 441.02 sec 514.28 sec 456.75 sec 6,228.30 sec 6,931.03 sec 7,487.53 sec Table 7.4: Response Times for Query 3. R -H O STQ of this object. For a given object, these three (stored) functions are, physically clustered on the same disk page. Consequently, the percentage of buffer pool hit ratios increases as a function of the number of remote instances. j Figure 7.2b shows that as the number of objects in the database is increased from 1,000 objects to 100,000 objects, the exporter’s buffer the hit ratio drops froim over 90% to just over 50%. This is decrease is to be expected since we kept the; buffer pool size constant throughout all the experiments. Q U E R Y 3 • Table 7.4 shows the response time of our configuration as a function of the remote' functions (attributes). In this experiment, the response tim e increases significantly from no remote functions to 1 remote function. This is due to the significant increase1 in the number of RPC calls performed by the importer to access the exporter’s data.' In fact, there is at least one RPC call for each object. Table 7.4 exhibits only modest increases in response time as the number of remote functions increases. These slight increases are due to a higher buffer pool hit ratio on the exporter. Figure 7.3 shows that for the 100,000 object database, the hit ratio increases from 51.8% for 1 remote function to 83.9% for all remote functions. The increased hit ratio is reflected ini the average per function access on the exporter as it increases from 26.1 ms for l< remote function to only 9.1 ms when all functions are remote. The increased hit ratio is due to the clustering of function values for the same object on the same’ page of the exporter. : 5 4 ; H it Ratio Exporter Behavior for Query 3 Database of size IK 0.9 ■ ? > .© - - - - - - 0.8 .--''Database of size 1 O K 0.7 Database of size 100K 0.6 0.5 0.4 0.3 0.2 0.1 0.5 2.5 1.5 # of Functions Figure 7.3: Exporter’s Hit Ratio for Query 3 Query 1 for 100K Database Query 2 for 100K Database 7000 7000 6000 6000 Importer 5000 5000 I 1 4000 £ 4000 03 5 o 3000 O 3000 Exporter Exporter 2000 2000 1000 1000 Importer 50 % Remote Objects 100 0 50 % Remote Objects 100 0 ; (a) (b) Figure 7.4: Time Spent in Omega_eval() as Function of Database Size 7.7 Discussion 1 From our simple, but rigorous experiments, we make three im portant observations.1 i 1 1 First, our instance level sharing mechanism scales linearly with increasing numbers: i ! of remote objects for relational style associative queries over sets. Second, this mechanism scaled sub-linearly for object-oriented style navigational queries. Third, the overhead of our mechanism is determined mostly by the I/O subsystems of the participating components. 1 Our first observation arises from the results of Query 1 and Query 2. As the. j percentage of remote instances/objects increases, the slope for Query 1 remains | constant at about 650 sec/per-cent-remote. Figure 7.4a shows the time spent in omega_eval() for Query 1 on both the importer and exporter. The sum of these: : two times accounts for over 90% of the overall response time (see Appendix C for the exact breakdown of the total times). Both of these times increase linearly (al-| though the exporter has a larger slope) which results in the linear behavior of the I I overall response time. Query 2 exhibited a sublinear behavior due to the random j i I/O ’s performed at each site. In particular, Figure 7.4b shows that the tim e spent j jin omega_eval() for the importer decreases as the percentage of remote objects in-J j creases. As explained previously, this is due to the increased buffer hit pool ratio on! : the importer. In our experiments, each remote instance object is chosen randomly! ' which results in a poor buffer pool hit ratio on the exporter. However, with typel i _ _ i j level sharing, all the instance objects of a type are imported. These objects would; I j tend to be clustered together thereby increasing the buffer pool hit ratio on th e1 I exporter and lowering the total response time. The second observation we made about our mechanism being I/O bound is based on the fact that on the average 95% of the response time was due to the object retrieval requests in omega_eval(). Of the total time spent in omega_eval(), gprof indicated that only 15% of this time was spent on active CPU. This should1 | come as no surprise since the CPU rating of our hardware platform is 60 MIPS.' i ! | Further, this shows that the network overhead is not the real bottleneck (the disk, i access tim e is the real bottleneck). The network service time contributed to only' I a fraction of the object retrieval time and remained constant in all experiments.' Appendices A, B, and C report the exact measurements we observed for the 1,000, 10,000, and 100,000 object databases. Chapter 8 I I Conclusion In this chapter, we summarize the major results and research contributions presentedi in this thesis. We also present current work and future research opportunities based; i on our approach and mechanism for sharing in federated database systems. ; 8.1 Summary of Results j i In this thesis, we have introduced the concept of object level sharing in a federated database system. In addition to supporting the sharing of informational units a t1 various levels of abstraction, this approach supports the sharing of behavior. In or-j j der to provide database transparency, we have presented a sharing mechanism based, ^ on fundamental object-oriented constructs. The constructs used in the mechanism ; are commonly found in most existing object-oriented systems[10, 12, 17, 18, 22]. Hence, our mechanism can be simply implemented in these systems without mod ifying existing DBMS software. We have demonstrated this by implementing our; i i 1 mechanism on the Iris and Omega research prototypes. Finally, we have evaluated! i ' the overhead of our mechanism and have shown that it behaves linearly. This is a, ! direct consequence of the fundamental linear behavior exhibited by our mechanism. 1 i8.2 Research Contributions i i The results of this research will have both direct and practical impact on information sharing in a federated environment: l i ( s • Existing Components: Throughout this thesis we have paid careful attentionj to the requirement that there should be no modification to existing DBMS; software. As a result, our approach requires no modification to the query! processor or any other part of the local DBMS software. In particular, we do ; not assume a standard global OID space of which each component must be aware. Not only does our approach support the existing DBMS software, it' likewise supports existing application programs developed by the users. ; i ! • Sharing Patterns: Our approach introduces new sharing patterns not found^ I in other systems. We support the sharing of objects at various levels of gran-, \ ularity and abstraction. In addition, we support the sharing of behavioral i 1 ] objects. These sharing patterns may be established dynamically from multi- 1 pie sources and are determined individually by each component; there is no global schema. # Database Transparency: Another feature of our approach not found in other| systems is database transparency. By providing this level of transparency,! users are more productive since learning a new language or moving to a new i environment is not a pre-requisite to sharing information. Furthermore, we I demonstrated that our approach was well behaved with respect to performance issues. j • Decoupling of Data and Behavior: We have considered the importance of; decoupling the location of (persistent) data and the location of the functions1 that operate on data in a distributed environment. Traditional approaches^ , inextricably link the location of the data and the execution of the operation, i ; i ! • Common Data Model: The results of this work also impacts the area of hetero geneous database systems with respect to data model and conceptual schema heterogeneity. One of the critical factors determining the success of these systems is the choice of common data model (CDM) used for integrating com- ! ponent database systems['26j. We expect that our experiences with FOOM I and the specific object constructs we used will provide insight into the design I and functionality of future CDM’s. , I j 1 8.3 Directions for Future Research 8.3.1 P erfo rm a n ce E n h a n cem en ts I i It is im portant to note that the experiments in the evaluation section of this thesis1 : were designed to quantify the overhead of our mechanism in a standard environment, j There are several ways of improving the performance of our mechanisms. First, there| j is an inherent parallelism that exists in a federated environment. In the experiments' j we conducted, only one exporting component existed in the federation. If there' i were n exporting components, the same number of objects that existed on only | one component would then be distributed among n components thereby increasing W j the hit ratio on each exporting component. Further, by using asynchronous or, ! multi-threaded RPC requests, the results can be retrieved from the n components ; in parallel. Finally, we present here a mechanism for high level caching of remote1 objects into the local database. We observe that access to a remote object is always, more costly than access to a local object.1 Hence, if a remote object is “cached” inj the local database as a local object, its access times will decrease. This mechanism] 'relies on the ability to dynamically bind instance objects to types. Surrogates are ! I created as both members of a remote type (e.g., R_LVO_ROOT) and members of ai j local type (e.g., LVCLROOT). Currently surrogate objects are bound such that thej ! functions defined on the remote type (e.g., R_LVO_ROOT) have precedence over the] ■local type (e.g., LVCLROOT). The functions defined on the local type are defined1 and allocated but simply never used. In order to “cache” the values of the remote functions, we simply set the values of the local functions correctly, and re-bind the ■ surrogate objects so that the local type (e.g., LVCLROOT) has precedence. The. 1 strategy for setting and updating the values of the local functions is determined| i ; j by an appropriate caching policy. We are in the process of studying the impact of] [various caching policies for this high level object caching scheme in reducing the' I j amount of remote accesses, i i | 8.3.2 R e m o te E xch an ge R esearch P r o je c t We have laid the foundations for further work to be carried out in the Remote I Exchange project. Current directions are: l • Resource Discovery: This work addresses the need to discover relevant infor-i i m ation in the federation. Due to the uncertainty and possible inconsistencies! ] inherent in the federated environment there will be a heavy dependency on. | user feedback. Thus, an im portant area of this work lies in addressing User1 i Interface issues as well. We have constructed a rudimentary discovery tool to browse Exporter databases. I I j • Semantic Resolution and Integration: Once the information to be shared is discovered, the problem of resolving conflicts must be solved. Semantic het erogeneity can appear at various levels of granularity and abstraction[23]; we I _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ' 1This is clue to the fact that in addition to the cost of remote access, a local access must still' ■ always be done in order to retrieve the remote oid, the database name and remote host. are taking an approach based on behavioral object equivalence to resolve thesej conflicts. • Security: The current prototypes and testbed provide a good basis to ex plore security and authorization issues in a federated systems; currently, the Exporter simply exports his entire database. ! • Updates: Currently, our only support for updates results from the ability to: create local (stored) functions for remote objects. This has the overall benefit' of allowing the local component to create local state for remote objects that! ( completely conform to the local system’s mechanism for updates. In general1 however, the object-oriented paradigm should lend itself nicely to supporting more generalized updates through the notion of encapsulation. : I f I Appendix A i l I > I J i I Measurements for 1,000 Object Experiments j For each Query, the measurements are broken down into four tables. ; The S u m m a ry table gives a summary of observed response (resp-time), the' i total number of calls to omega_eval() on the Importer(t-ocalls), the total tim e spent i I in omega_eval() on the Im porter (t-otime) and the total remote time (rem-time).' T-otime includes the rem-time. Subtracting rem-time from T-otime gives the timej spent locally (1-otime) in omega_eval(). : i The I m p o rte r table shows the number of local calls to omega_eval() (1-ocalls),; S the local time spent in omega_eval() (1-otime), the average time per call, the number' : pages read from disk (reads), the number of page writes (writes) to disk, the number I of buffer pool hits (hits), and the hit ratio (ratio). The N e tw o rk table shows the number of remote calls (r-calls), the time spent, in the network for sending and receiving (n/w-tim e), and the average network time! ! per call (tim e/call). J | The E x p o rte r table shows the number of remote calls to omega_eval() (r-ocalls): | on the Exporter, the remote time spent in omega_eval() (r-otime), the average time i 1 per call, the number pages read from disk (reads), the number of page writes (writes)' ! to disk, the number of buffer pool hits (hits), and the hit ratio (ratio). i r Q U E R Y 1 — 1,000 O b ject D a ta b a se j S u m m a r y resp -tim e t-o ca lls t-o tim e rem -tim e 0% 10% 20% 40% 80% 100% 0.98 3.42 4.28 5.85 8.91 10.35 1,100 1,430 1,759 2.415 3,755 4.416 0.51 2.58 3.44 5.04 8.28 9.81 0 1.098 1.529 2 .267 3 .864 4.585 ! Im p o rter 1-ocalls 1-otim e tim e /c a ll reads w rt h its ratio 0% 1,100 0.51 0.4 m s 99 11 6,698 .985 10% 1,321 1.482 1.1 m s 160 11 8,402 .981 20% • 1,540 1.911 1.2 m s 181 11 10,134 .982 40% 1,976 2.773 1.4 ms 185 11 13,622 .986 80% 2,871 4.416 1.5 m s 185 11 20,777 .991 100% 3,312 5.225 1.5 m s 185 11 24,304 .992 I N etw ork r-calls n /w -tim e tim e /c a ll 10% 20% 40% 80% 100% 109 219 439 884 1,104 0.584 0.887 1.498 2.776 3.381 5.3 m s 4 .0 m s 3 .4 m s 3.1 m s 3.0 m s I E x p o r t e r r-ocalls r-otim e tim e /c a ll reads w rites h its ratio 10% 109 0.514 4.7 m s 86 0 371 .811 20% 219 0.642 2.9 m s 95 0 802 .894 40% 439 0.769 1.7 m s 96 0 1,681 .945 80% 884 1.088 1.2 m s 96 0 3,461 .973 100% 1,104 1.204 1.1 m s 96 0 4,341 .978 Q tJE R Y 2 — 1,000 O b ject D a ta b a se i S u m m a r y resp -tim e t-o ca lls t-o tim e rem -tim e 0% 1.97 2,098 1.40 0 10% 4.44 2,471 3.95 1.13 20% 5.23 2,739 4.81 1.54 40% 6.71 3,358 6.41 2.18 80% 10.04 4,733 9.89 3.83 100% 11.11 5,372 10.81 4.58 I I ) Importer 1-ocalls 1-otim e tim e / call reads w rt liits ratio 0% 2,098 1.40 0.6 m s 97 4 10,686 .991 10% 2,344 2.82 1.2 m s 145 4 12,714 .988 20% 2,526 3.27 1.3 m s 169 4 14,236 .988 40% 2,936 4.23 1.4 m s 181 4 17,712 .989 80% 3,856 6.06 1.5 m s 182 4 25,461 .992 100% 4,279 6.23 1.4 m s 183 4 24,588 .992 | Network r-calls n /w -tim e tim e / call 10% 20% 40% 80% 100% 127 213 422 877 1,093 0.629 0.861 1.454 2.774 3.400 4.9 m s 4.0 m s 3.4 m s 3.1 ms 3.1 m s I j i Exporter | I j t r-o calls r-otim e tim e /c a ll reads w rites h its ratio 10% 127 0.501 3.94 m s 76 0 453 .856 20% 213 0.679 3.18 m s 91 0 782 .895 40% 422 0.726 1.72 ms 96 0 1,613 .943 80% 877 1.056 1.20 m s 96 0 3,433 .972 100% 1,093 1.180 1.08 m s 96 0 4,297 .978 Q U E R Y 3 — 1,000 O b ject D a ta b a se S u m m a r y resp-time t-ocalls t-otime rem-time 0 Func 1 Func 2 Func 3 Func 4 Func 4.36 13.25 19.61 19.79 26.94 4.000 7.000 10.000 10,000 13,000 2.62 11.47 18.37 18.51 25.85 0 4.56 8.10 8.20 11.85 i Importer 1-ocalls 1-otime time/call reads wrt hits ratio 0 Func 4,000 2.62 0.6 ms 98 0 21,993 .995 1 Func 5,000 6.91 1.1 ms 183 0 36,908 .995 2 Func 8,000 10.27 1.2 ms 183 0 51,907 .996 3 Func 8,000 10.31 1.2 ms 183 0 51,908 .996 4 Func 10,000 14.00 1.4 ms 183 0 66,904 .997 Network i‘ -calls n/w-time time/call 1 Func 1,000 3.06 3.0 ms 2 Func 2,000 5.87 2.9 ms 3 Func 2,000 5.85 2.9 ms 4 Func 3,000 8.95 2.9 ms i Exporter 1___________ r-ocalls r-otime time/call reads writes hits ratio 1 Func 1,000 1.50 1.5 ms 96 0 3,925 .976 2 Func 2,000 2.23 1.1 ms 96 0 7,925 .988 3 Func 2,000 2.35 1.1 ms 96 0 7,925 .988 4 Func 3,000 2.90 0.9 ms 96 0 11,925 .992 I i Appendix B | | Measurements for 10,000 Object Experiments i ! i j For each Query, the measurements are broken down into four tables. The S u m m a ry table gives a summary of observed response (resp-time), the I j total number of calls to omega_eval() on the Importer(t-ocalls), the total time spent; | in omega_eval() on the Importer (t-otime) and the total remote tim e (rem-time).: I T-otime includes the rem-time. Subtracting rem-time from T-otime gives the time | spent locally (1-otime) in omega_eval(). ; The Im p o rte r table shows the number of local calls to omega_eval() (1-ocalls), j the local time spent in omega_eval() (1-otime), the average time per call, the number , pages read from disk (reads), the number of page writes (writes) to disk, the number of buffer pool hits (hits), and the hit ratio (ratio). The N e tw o rk table shows the number of remote calls (r-calls), the time spent, in the network for sending and receiving (n/w-tim e), and the average network time i per call (tim e/call). . ! The E x p o rte r table shows the number of remote calls to omega_eval() (r-ocalls)j j on the Exporter, the remote time spent in omega_eval() (r-otime), the average time j per call, the number pages read from disk (reads), the number of page writes (writes) ' to disk, the number of buffer pool hits (hits), and the hit ratio (ratio). i i I j 67 Q U E R Y 1 — 10,000 O b ject D a ta b a se ' S u m m a r y resp-time t-ocalls t-otime rem-time 0% 10% 20% 40% 80% 100% 14.23 59.25 92.24 159.59 288.63 346.64 11,000 14,373 17,742 24,602 37,962 44,716 7.94 49.64 81.99 149.93 278.75 339.51 0 30.52 57.31 116.08 227.51 282.31 j Im p o rter i I_______ ___ 1-ocalls 1-otime time/call reads wrt hits ratio 0% 11,000 7.94 0.7 ms 884 17 66,725 .986 10% 13,248 19.12 1.4 ms 1,664 17 83,931 .980 20% 15,490 24.68 1.5 ms 2,002 17 101,541 .980 40% 20,085 33.85 1.6 ms 2,303 17 137,935 .983 80% 29,010 51.24 1.7 ms 2,501 17 209,082 .988 100% 33,537 57.20 1.7 ms 2,505 19 245,225 .989 N etw ork r-calls n/w-time time / call 10% 20% 40% 80% 100% I,125 2,252 4,517 8,952 II,179 3.87 7.17 13.67 26.59 33.11 3.4 ms 3.1 ms 3.0 ms 2.9 ms 2.9 ms E xporter r-o calls r-otime time/call reads writes hits ratio 10% 1,125 26.65 23.6 ms 1,353 0 3,168 .700 20% 2,252 50.14 22.2 ms 2,610 0 6,419 .710 40% 4,517 102.41 22.6 ms 5,091 0 12,998 .718 80% 8,952 200.92 22.4 ms 10,159 0 25,670 .716 100% 11,179 249.20 22.2 ms 12,740 0 31,997 .715 Q U E R Y 2 — 10,000 O b ject D a ta b a se ' S u m m a r y resp-time t-ocalls t-otime rem-time 0% 10% 20% 40% 80% 100% 237.41 251.05 301.05 411.71 531.11 563.25 20,920 24,384 27,448 34,232 47,437 54,156 228.06 239.47 286.13 388.24 493.92 521.66 0 31.03 58.93 126.89 245.03 305.31 | Importer 1-ocalls 1-otime time/call reads wrt hits ratio 0% 20,920 228.06 10.9 ms 13,902 7 93,144 .870 10% 23,239 208.44 8.9 ms 14,636 7 111,933 .884 20% 25,285 227.20 8.9 ms 15,053 7 128.874 .895 40% 29,824 261.35 8.7 ms 15,702 7 166,488 .913 80% 38,620 248.89 6.4 ms 15,764 7 240,816 .938 100% 43,117 216.35 5.0 ms 15,248 8 279,255 .948 | Network r-calls n/w-time time/call 10% 20% 40% 80% 100% I,145 2,163 4,408 8,817 II,039 4.15 6.65 14.09 25.75 31.87 3.6 ms 3.0 ms 3.1 ms 2.9 ms 2.8 ms ! Exporter r-ocalls r-otime time/call reads writes hits ratio 10% 1,145 26.88 23.4 ms 1,325 0 3,276 .712 20% 2,163 52.28 24.1 ms 2,543 0 6,130 .706 40% 4,408 112.80 25.5 ms 5,404 0 12,249 .693 80% 8,817 219.28 24.8 ms 10,921 0 24,368 .690 100% 11,039 273.44 24.7 ms 13,630 0 30,547 .691 Q U E R Y 3 — 10,000 O b ject D a ta b a se S u m m a r y resp -tim e t-o ca lls t-o tim e rem -tim e 0 Func 1 Func 2 Func 3 Func 4 Func 4 6.09 382.17 441.02 443.13 514.28 40.000 70.000 100.000 100,000 130,000 29.11 365.38 429.12 430.82 504.50 0 292.64 322.56 323.94 364.19 j I m p o r t e r 1-ocalls 1-otim e tim e /c a ll reads w rt h its ratio 0 Func 40,000 29.11 0.7 m s 798 0 219,793 .996 1 Func 60,000 72.74 1.2 m s 1,633 0 368,958 .995 2 Func 80,000 105.56 1.3 m s 1,633 0 51 8 ,9 5 7 .996 3 Func 80,000 106.88 1.3 m s 1,633 0 518,958 .996 4 Func 100,000 140.31 1.4 m s 1,633 0 66 8 ,9 5 4 .997 N e t w o r k r-calls n /w -tim e tim e /c a ll 1 Func 10,000 30.74 3.1 m s 2 Func 20,000 58.71 2.9 m s 3 Func 20,000 59.03 2 .9 m s 4 Func 30,000 88.59 2 .9 m s E x p o r t e r r-ocalls r-otim e tim e /c a ll reads w rites h its ratio 1 Func 10,000 261.90 26.1 m s 12,625 0 27,396 .684 2 Func 20,000 263.85 13.1 m s 12,634 0 67,387 .842 3 Func 20,000 264.91 13.2 m s 12,634 0 67,387 .842 4 Func 30,000 275.60 9.1 m s 12,634 0 107,387 .894 [Appendix C ! ) i I 'Measurements for 100,000 Object Experiments j I i I \ \ I i For each Query, the measurements are broken down into four tables. j ! The Sum m ary table gives a summary of observed response (resp-time), the ' total number of calls to omega_eval() on the Importer(t-ocalls), the total time spent in omega_eval() on the Im porter (t-otime) and the total remote time (rem-time). T-otime includes the rem-time. Subtracting rem-time from T-otime gives the time: ' spent locally (1-otime) in omega_eval(). i The Im porter table shows the number of local calls to omega_eval() (1-ocalls),, the local time spent in omega_eval() (1-otime), the average time per call, the number! \ t ' | pages read from disk (reads), the number of page writes (writes) to disk, the number ! of buffer pool hits (hits), and the hit ratio (ratio). The N etw ork table shows the number of remote calls (r-calls), the time spent' I ; in the network for sending and receiving (n/w-tim e), and the average network time per call (tim e/call). ; | The E xp orter table shows the number of remote calls to omega_eval() (r-ocalls)' I on the Exporter, the remote tim e spent in omega_eval() (r-otime), the average time j per call, the number pages read from disk (reads), the number of page writes (writes) I to disk, the number of buffer pool hits (hits), and the hit ratio (ratio). I I I I i I i 7 1 ' Q U E R Y 1 — "100,000 "Object D a ta b a se i S u m m a r y resp-time t-ocalls t-otime rem-time 0% 10% 20% 40% 80% 100% 171.80 983.54 1,686.13 3,058.80 5,727.04 7,045.40 110,000 142,891 175,962 242,029 373,846 439,832 100.28 860.71 1,549.81 2,917.29 5,588.66 6,909.33 0 628.46 1,259.81 2,535.99 5,044.71 6,305.96 Importer 1-ocalls 1-otime time/call reads wrt hits ratio 0% 110,000 100.28 0.9 ms 8,696 114 667,087 .987 10% 131,923 232.25 1.7 ms 16,277 114 834,903 .980 20% 153,968 290.00 1.8 ms 19,438 115 1,008,199 .981 40% 198,017 381.30 1.9 ms 21,939 114 1,357,987 .984 80% 285,881 543.95 1.9 ms 24,248 114 2,058,632 .988 100% 329,874 603.37 1.8 ms 24,447 115 2,410,370 .989 Network r-calls n/w-time time/call 10% 20% 40% 80% 100% 10,968 21,976 44,012 87,965 109,958 32.00 63.16 126.48 256.56 311.44 2.9 ms 2.8 ms 2.8 ms 2.9 ms 2.8 ms I Exporter r-o calls r-otime time/call reads writes hits ratio 10% 10,968 596.46 54.3 ms 20,481 0 23,427 .533 20% 21,976 1,196.65 54.4 ms 41,103 0 46,837 .532 40% 44,012 2,409.51 54.7 ms 81,960 0 94,124 .534 80% 87,965 4,788.15 54.4 ms 164,036 0 187,860 .533 100% 109,958 5,994.52 54.5 ms 205,174 0 234,694 .533 L 72 j Q U E R Y 2 — 100,000 O b ject D a ta b a se Sum m ary resp-time t-ocalls t-otime rem-time 0% 10% 20% 40% 80% 100% 6,496.11 7.165.53 7.772.54 9,053.62 11,374.07 12,578.91 210,066 242,331 275,656 342,963 472,561 539,908 6,376.82 6,750.60 7,070.35 7,785.31 9,175.92 9,886.87 0 633.31 1,298.44 2,631.95 5,189.24 6,562.34 | I .Im p o rter 1-ocalls 1-otime time/call reads wrt hits ratio 0% 210,066 6,376.82 30.3 ms 206,810 43 869,366 .807 10% 231,579 6,117.30 26.4 ms 208,012 43 1,050,254 .834 20% 253,806 5,771.91 22.7 ms 209,057 40 1,236,999 .855 40% 298,675 5,153.36 17.2 ms 210,339 41 1,614,676 .884 80% 385,524 3,986.68 10.3 ms 211,034 45 2,345,361 .917 100% 429,931 3,324.53 7.7 ms 209,158 43 2,726,027 .928 N etw ork r- calls n/w-time time/call 10% 10,752 34.72 3.2 ms 20% 21,850 66.24 3.0 ms 40% 44,288 133.49 3.0 ms 80% 87,037 256.83 2.9 ms 100% 109,977 339.34 3.0 ms E xporter r-ocalls r-otime time/call reads writes hits ratio 10% 10,752 598.59 55.6 ms 20,353 0 22,695 .527 20% 21,850 1,232.20 56.3 ms 41,848 0 45,588 .521 40% 44,288 2,498.46 56.4 ms 84,874 0 92,314 .520 80% 87,037 4,932.41 56.6 ms 167,120 0 181,064 .520 100% 109,977 6,223.00 56.5 ms 211,287 0 228,657 .519 i I 73 Q U E R Y 3 — 100,000 O b ject D a ta b a se 1 S u m m a r y resp-time t-ocalls t-otime rem-time 0 Func 1 Func 2 Func 3 Func 4 Func 456.75 6,278.30 6,931.03 6,879.58 7,487.53 400,000 670,057 940.114 940.114 1,210,171 290.39 6,100.21 6,767.55 6,736.41 7,360.35 0 5,411.86 5,773.52 5,736.73 6,089.80 Importer 1-ocalls 1-otime time/call reads wrt hits ratio 0 Func 400,000 290.39 0.7 ms 7,813 0 2,157,869 .996 1 Func 580,038 688.35 1.1 ms 16,148 0 3,499,819 .995 2 Func 760,076 994.03 1.3 ms 16,148 0 4,850,103 .996 3 Func 760,076 999.68 1.3 ms 16,148 0 4,850,104 .996 4 Func 940,114 1,270.55 1.3 ms 16,148 0 6,200,385 .997 Network r-calls n/w-time time/call 1 Func 2 Func 3 Func 4 Func 90,019 180.038 180.038 270,057 269.4 533.65 523.21 791.71 2.9 ms 2.9 ms 2.9 ms 2.9 ms | Exporter r-ocalls r-otime time/call reads writes hits ratio 1 Func 90,019 5,142.46 57.1 ms 173,280 0 186,832 .518 2 Func 180,038 5,250.87 29.1 ms 173,280 0 546,908 .759 3 Func 180,038 5,213.52 28.9 ms 173,280 0 546,908 .759 4 Func 270,057 5,298.09 19.6 ms 173,280 0 906,984 .839 i \ ( I i __ f * i i \ i * ; Reference List i i [1] H. Afsarmanesh and D McLeod. The 3DIS: An Extensible, Object-Oriented' Information Management Environment. ACM Transactions on Office Infor- I mation Systems, 7:339-377, October 1989. I | ! [2] M. Atkinson, et al. The Object-Oriented Database System Manifesto. In! ! Proceedings of the 1st Intl. Conf. on Deductive and Object-Oriented Databases.> I Kyoto, Japan, December 1989. i [3] C Batini, M. Lenzerini, and S. Navathe. A Comparative Analysis of Methodolo- : gies of Database Schema Integration. ACM Computing Surveys, 18(4):323-364, j 1986. i I [4] E. Bertino, G. Pelagatti, and L. Sbattella. An Object-Oriented Approach to the. Interconnection of Heterogenous Databases. In Proceedings of the Workshop ; on Heterogenous Databases. NSF, December 1989. j [5] A. Birrell and B. Nelson. Implementing Remote Procedure Calls. ACM Trans-, : actions on Computer Systems, 3(1), February 1985. i [6] S. Ceri and G. Pelagatti. Distributed Databases: Principles and Systems. Mc- : Graw Hill, 1984. i j [7] H. T. Chou, D.J. DeW itt, R. Katz, and T. Klug. Design and implementation of the Wisconsin Storage System. Software Practices and Experience, 15(10),; 1985. I ! [8] D. Fang, J. Hammer, D. McLeod, and A. Si. Remote-Exchange: An Approach to Controlled Sharing among Autonomous, Heterogenous Database Systems. In Proceedings of the IEEE Spring Compcon, San Francisco. IEEE, February- ! 1991. : . [9] A. Ferrier and C. Stangret. Heterogeneity in the Distributed Database Man- ! gagement System SIRIUS-DELTA. In Proceedings of the International Confer- j ence on Very Large Databases. VLDB Endowment, 1983. I • ‘ [10] D. Fishman, D. Beech, H. Cate, E. Chow, T. Connors, T. Davis, N. Der- rett, C. Hoch, W. Kent, P. Lyngbaek, B. Mahbod, M. Neimat, T. Ryan, and 7 5 M. Shan. Iris: An Object-Oriented Database Management System. ACM' Transactions on Office Information Systems, 5(l):48-69, January 1987. [11] S. Ghandeharizadeh, V. Choi, and G. Bock. Benchmarking Object-based Con- j structs. Technical Report USC-CS, Computer Science Department, University of Southern California, Los Angeles CA 90089-0781, 1992. j j [12] S. Ghandeharizadeh, V. Choi, C. Ker, and K. Lin. Omega: A Parallel Object-j I based System. Technical Report USC-CS, Computer Science Departm ent, Uni-; ! versity of Southern California, Los Angeles CA 90089-0781, September 1991. * ( ‘ [13] M. Hammer and D. McLeod. Database Description with SDM: A Seman-* i tic Database Model. ACM Transactions on Database Systems, 6(3):351-386, I September 1981. i i [14] D. Heimbigner and D. McLeod. A Federated Architecture for Information Systems. ACM Transactions on Office Information Systems, 3(3):253-278, July 1985. I [15] R. Hull and R. King. Semantic Database Modeling: Survey, Applications, and j Research Issues. ACM Computing Surveys, 19(3):201-260, September 1987. ! [16] W. Kim, N. Ballou, J. Garza, and D. Woelk. A Distributed Object-Oriented Database System Supporting Shared and Private Databases. ACM Transac-' j tions on Office Information Systems, 9(1):31— 51, January 1991. ' [17] W. Kim, J. Banerjee, H. T. Chou, J. F. Garza, and D. Woelk. Composite Object Support in an Object-Oriented Database System. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Appli- ; cations, pages 118-125, 1987. : [18] C. Lecluse, P. Richard, and F. Velez. 02, an Object-Oriented Data Model. In ' Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM SIGMOD, June 1988. I [19] Q. Li, D. Fang, and D. McLeod. An Approach to Partial Integration for Dis-' tributed, Autonomous, Object Databases. In Proceedings of the Australian 1 Database - Information Systems Conference. Kensington, Australia, February 1991. 1 [20] W. Litwin and A. Abdellatif. Multidatabase Interoperability. IEEE Computer, 19(12), December 1986. j [21] P. Lyngbaek and D. McLeod. Object Management in Distributed Information' ! Systems. ACM Transactions on Office Information Systems, 2(2):96-122, April 1 1984. 76 j [22] D. Maier, J. Stein, A. Otis, and A. Purdy. Development of an Object-Oriented] DBMS. In Proceedings of the Conference on Object-Oriented Programming, ! Systems, Languages, and Applications, pages 472-482. ACM, 1986. , [23] D. McLeod, D. Fang, and J. Hammer. The Identification and Resolution of Semantic Heterogeneity. In Proceedings of International Workshop on Interop erability in Multidatabase Sytems. Kyoto, Japan, April 1991. I ' ' [24] P. Scheuermann, et al. Report on the Workshop on Heterogenous Database. Systems. Sigmod Record, 19(4):23-32, December 1990. . ] [25] M. Shan. Unified Access in a Heterogenous Information Environment. IEEE Office Knowledge Engineering, 3(2):35-42, August 1989. I [26] A. Sheth and J. Larson. Federated Database Systems for Managing Dis tributed, Heterogeneous, and Autonomous Databases. ACM Computing Sur-' : veys, 22(3): 183— 236, September 1990. i [27] D. Shipman. The Functional Data Model and the D ata Language DAPLEX.| ACM Transactions on Database Systems, 2(3):140— 173, March 1981. i : ! [28] A. Silberschatz, M. Stonebraker, and J Ullman. Database Systems: Achieve- j ments and Opportunities. Sigmod Record, 1 9 ( 4 ) : 6 —23, December 1 9 9 0 . ; [29] J. Smith, P. Bernstein, U. Dayal, N. Goodman, T. Landers, K. Lin, and E. Wong. Multibase: Integrating Heterogeneous Distributed Database Sys tems. In Proceedings of the National Computer Conference, pages 487-499. ! AFIPS, June 1981. : | [30] M. Stonebraker and E. Neuhold. A Distributed Database Version of INGRES. ■ In Proceedings of the Berkeley Workshop on Distributed Data Management and Computer Networks, pages 19-36. University of California, Berkeley, May 1977.' ; [31] M. Stonebraker, L. Rowe, B. Lindsay, J. Gray, M. Carey, M. Brodie, P Bern- I stein, and D Beech. Third-Generation Database System Manifesto. Sigmod j Record, 19(3):31 — 44, September 1990. I i [32] Network Programming Guide. Sun Microsystems, 1988. , I 1 | [33] T. Templeton, et al. Mermaid: A Front-End to Distributed Heterogenous Databases. In Proceedings of IEEE, pages 695-708, 1987. i [34] R. Williams. R*: An Overview of the Architecture. Technical Report RJ3325, : IBM, February 1981. i 77
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11255814
Unique identifier
UC11255814
Legacy Identifier
DP22846