Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Resolving Semantic Heterogeneity in a Federation of Autonomous, Heterogeneous Database Systems by Joachim Hammer A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) August 1994 Copyright 1994 Joachim Hammer UMI Number: DP22882 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Dissertation Publishing UMI DP22882 Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, written by Joachim Hammer under the direction of his ....... Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of DOCTOR OF PHILOSOPHY D ean of G raduate Studies Date 994 DISSERTATION COMMITTEE Chairperson ph-P- Cf 5 I H 2 2 - + Dedication To my parents, Renate and Peter, my brother Christian, and Kathryn Stuart. Every thought which genius and piety throw into the world alters the world. —Ralph Waldo Emerson Acknowledgements I wish to express my gratitude to the many helpful people at the University of Southern California where this dissertation was written. I have been fortunate to be associated with so many competent and intellectually stimulating individuals who helped me grow into the person I am today. I deeply regret that I cannot acknowledge all who assisted. First and foremost I would like to thank Dennis McLeod, my research advisor and ’’m entor”, for his continuing and enthusiastic support, advice, and friendship during my graduate years. His insight, high standards of scholarship, and intellectual integrity contributed substantially to the results presented in this dissertation. I was very fortunate to work under his guidance during the past five years. I am also grateful to the other members of my qualifying examination and disser tation committees—Ellis Horowitz, Richard Hull, Bob Neches and Paul Watkins for serving on these committees and for their helpful comments and discussions while this work was in progress. Bob Neches and Brian Harp and numerous other people from the SHELTER group at ISI supported me with their resources during the testing and prototyping stages of my research. Thank you for providing such a helpful and stimulating environment. I sincerely thank all the members, old and new, of Professor E strin’s Network and Distributed Systems Laboratory at USC which provided a home as well as network connections for one of our database machines for nearly five years. This community created a friendly and helpful research environment which greatly strengthened the outcome and quality of this work. Last, but not least, my friends and colleagues: namely Antonio Si, Jonghyun Khang, K.J. Byeon, and Doug Fang deserve my special thanks for their endurance and their helpful input during our countless research discussions helping me prepare for and overcome the many obstacles along the way. I enjoyed the camraderie that developed while sharing an office with you. Finally, I would also like to thank George Mullen for making all this possible by providing me with the chance to attend the University of Rochester, my undergrad uate alma m ater. His memory will provide the inspiration to move on to whatever challenges may follow. Contents D ed ication iii A cknow ledgem ents iv List O f Tables ix List O f Figures x A b stract xi 1 Introduction 1 1.1 Next Generation Database T ech n o lo g y ........................... 2 1.2 About this D issertation.................................................................................... 4 1.3 Some Guidelines for the R e a d e r ................................................................... 5 2 R elated R esearch 6 2.1 View In te g ra tio n ............................................................................................. 6 2.2 Database In te g ra tio n ....................................................................................... 7 2.2.1 Database Integration Based on Structural K now ledge.............. 7 2.2.2 Database Integration Using B e h a v io r........................................... 8 2.3 Heterogeneous Database S y s te m s ................................................................ 9 2.4 Object S h a r in g ................................................................................................. 10 3 T he Federated D atabase System C on text 11 3.1 A Sharing S c e n a rio .......................................................................................... 11 3.2 Semantic Heterogeneity in Federated Database S y stem s........................ 15 3.2.1 A Spectrum of Semantic H eterogeneity........................................ 15 3.2.2 Causes for Schema D iv ersity ............................................................. 18 4 A P ersp ective on D atabase Interoperability 20 4.1 Operational P h a s e s .......................................................................................... 20 4.2 Resource Discovery and Identification......................................................... 22 4.3 Resolution of Semantic H eterogeneity......................................................... 23 vi 4.4 Sharing and T ra n sm issio n ......................................................................• • • 24 5 T he O bject D atabase M odel 26 5.1 Core Object Data M o d e l................................................................................ 26 5.2 Relationships Among O b je c ts ....................................................................... 27 5.2.1 Common C o n c e p ts ............................................................................. 27 5.2.2 Related C o n c ep ts................................................................................ 28 6 A M echanism for Sem antic H eterogen eity R esolution 30 6.1 Earlier A p p ro a ch e s......................................................................................... 31 6.1.1 Basic Structural Equivalence............................................................. 31 6.1.2 Basic Behavioral E q u iv a le n c e ................................. 32 6.2 A Three-Pronged Approach to Relative Object Equivalence.................. 32 6.2.1 Remote Sharing L an g u a g e................................................................ 32 6.2.2 Local L ex ico n ............................................................... 33 6.2.3 Semantic D ic tio n a ry .......................................................................... 36 6.3 Resolving Object R e la tio n sh ip s........................................ 38 7 U nification of R em ote and Local Inform ation 42 7.1 Eliminating Inconsistencies............................................................................. 43 7.2 Unification ................................................................................................. 44 8 Inter-C om ponent Behavior Sharing 51 8.1 Behavioral Objects in C O D M ...................................................................... 51 8.2 The Sharing of B e h a v io r................................................................................ 52 8.2.1 Stored F u n c tio n s ................................................................................ 54 8.2.2 Computed F u n c tio n s.......................................................................... 57 8.3 Observations on the Practical Use of Behavior Sharing ........................ 59 9 E xp erim ental P rototyp e Im plem entations 61 9.1 Semantic Heterogeneity Resolution in the SHELTER T estbed.............. 62 9.2 Schema Unification in the Remote-Exchange T e s tb e d ........................... 66 9.2.1 The Prototype ................................................................................... 66 9.2.2 Sharing and T ran sm issio n ................................................................ 68 9.2.2.1 Instance Level Sharing Im plem entation....................... 68 9.2.2.2 Function Level Sharing Im p le m e n ta tio n .................... 71 10 C onclusion 75 10.1 S u m m a r y .......................................................................................................... 75 10.2 R e su lts................................................................................................................ 77 10.3 Future W o rk ....................................................................................................... 78 R eference List 79 vii A p p en d ix A Remote-Sharing Language R S L ............................................................................ 87 A p p en d ix B SHELTER P ro to ty p e ................................................................................................ 92 B .l Semantic Dictionary ..................................................................... 93 B.2 Travel Agency A .................................................................................. 98 B.3 Travel Agency B .....................................................................................101 B.4 Travel Agency E .....................................................................................105 V lll List Of Tables 6.1 Remote sharing language commands............................................................. 33 6.2 Relationship descriptors.................................................................................... 34 6.3 Contents of a local lexicon................................................................................ 36 ix List Of Figures 3.1 A vision of world-wide interoperability.......................................................... 12 3.2 A loosely interconnected federation of travel agencies............................... 14 3.3 A conceptual overview of the federation and its components.................. 14 3.4 Semantic heterogeneity in FOTA: Example 1.............................................. 17 3.5 Semantic heterogeneity in FOTA: Example 2.............................................. 17 3.6 Semantic heterogeneity in FOTA: Example 3. 19 4.1 Interoperability in a federation of databases................................................ 21 4.2 Three partial conceptual schemas for travel agencies A, E, and D. . . 23 4.3 Component A ’s conceptual schema augmented with remote information. 25 6.1 A partial view of three local lexica................................................................. 37 6.2 The semantic dictionary.................................................................................... 38 6.3 Sharing architecture and the various interactions among its components. 39 6.4 Resolution of object relationships................................................................... 41 7.1 Unification of two related objects................................................................... 46 7.2 Unification of several inter-related objects: Example 1..................................48 7.3 Unification of several inter-related objects: Example 2................................. 49 8.1 Eight different situations for sharing of behavior........................................ 53 8.2 Stored functions in two component databases. . . . : ............................ 54 8.3 Stored and computed functions in two component databases.................. 57 9.1 The SHELTER development environment................................................... 63 9.2 The Remote-Exchange sharing interface....................................................... 69 9.3 Sharing instance objects.................................................................................... 70 9.4 Sharing function objects.................................................................................... 72 x Abstract We present an approach to accommodating semantic heterogeneity in a loosely- coupled federation of interoperable, autonomous, heterogeneous databases. Such federations attem pt to support operations for partial and controlled sharing of data with minimal effect on existing applications. A major problem that arises frequently when attem pting to collaborate among heterogeneous database systems is the occur rence of representational differences that exist among related data objects in different component systems. W ithin the context of such a federation, components who are willing to share related, relevant information made available by other members of the collection will have to resolve all existing representational differences before they can unify related foreign data with their own local data. We will term representational differences among related data objects semantic heterogeneity. In this dissertation we describe a mechanism for identifying and resolving seman tic heterogeneity while at the same time honoring the autonomy of the databases that participate in the federation. By resolving semantic heterogeneity we mean two things: 1. Determine as precisely as possible the relationships between objects that model similar information in different components, and 2. Detect possible conflicts in their structural representations that pose problems during the unification with local objects when sharing information. A minimal, common data model is introduced as the basis for describing sharable information, and a three-pronged facility for determining the relationships between objects is developed. Our approach serves as a basis for the sharing of related concepts through partial schema unification without the need for a global view of the data that is stored in the different components. The mechanism presented here can be seen in contrast with more traditional approaches such as “integrated X I databases” or “distributed databases”. An experimental prototype implementation has been constructed within the framework of the Remote-Exchange experimental system at USC. Chapter 1 Introduction The imminent combination of computing and telecommunications has radically chan ged our vision of world-wide computing. In this new vision, problems or questions posed by one or more agents (e.g., humans or computers) can be solved as auto matically and transparently as possible. However, this requires that the necessary computing resources (e.g., programs, information bases, knowledge bases, etc.) are identified and caused to interact cooperatively to effectively and efficiently solve problems. Thus the dominant computer paradigm will involve large numbers of heterogeneous, information systems (ISs), such as computers (from mainframe to personal), information-intensive applications (multimedia), and data (files and data bases) distributed over large computer/communication networks. Networks consist ing of heterogeneous ISs are seen as the next generation of information process ing systems and the goal is that any computing resource connected to the network should be able to transparently and efficiently utilize any other. Such interconnected environments supporting the sharing of resources are called intelligent and cooper ative information systems (ICISs) [50, 51, 58] and the design, construction, use, and evolution of such systems within the above paradigm will require sophisticated technologies from many different areas of computer science. 1 As a result of rapid technological development in the computer software and hardware, and telecommunication industries, I believe a true revolution in the deliv ery of information, transactional, and telecommunication services may be upon us. These new technologies expand access to data requiring capabilities for assimilation and analysis that greatly exceed what we now have in hand. W ithout intelligent processing, these advances will provide only a minor benefit to the users, who will be swamped with ill-defined data of unknown origin. New high-speed networks, for example, could give us access to the Library of Congress, containing nearly 100 million books. However, without the appropriate tools (e.g., intelligent information retrieval systems) to access, search, and exchange this wealth of information the de sired data could be hard or even impossible to obtain. Wiederhold [73] proposes the use of mechanisms that will use the context knowledge to mediate among systems to provide for meaningful data exchange. Other examples of intelligent tools that are needed to facilitate user access and processing include presenting an integrated view of m ultiple information systems, intentional queries, presenting functionality through graphic, visual, linguistic or other support, etc. 1.1 Next Generation Database Technology One of the areas in which collaboration among heterogeneous information systems is desirable is in the area of data/knowledge bases. For example, in the scientific community, in order to keep pace with the wealth of newly discovered knowledge, researchers continue to specialize in smaller and smaller areas. Therefore, the quality and progress of scientific endeavors depends on the researcher’s ability to efficiently store large quantities of mostly heterogeneous data and, even more importantly, to share and exchange this knowledge with his/her colleagues. Such environments con sisting of a collection of data/knowledge bases and their supporting systems, and in which it is desired to accommodate the controlled sharing and exchange of informa tion among the collection are extremely common in but not exclusively confined to the scientific community. Cooperative work, computer-based manufacturing, scien tific databases, and traditional data processing are only a few of the environments where collaboration among autonomous, heterogeneous components is desired. It is obvious that the next generation of database systems must keep pace with this 2 current trend in order to ensure support for large inter-disciplinary projects such as the Human Genome Project and Informatics1 (HGI) [20]. In light of these observations, participants in the 1990 Workshop on Future Database Systems Research sponsored by the National Science Foundation (NSF) have identified the following goal as one of the key areas in future database research [68]: The creation of an environment that allows for the controlled sharing and ex change of information among autonomous, heterogeneous databases. This is a very ambitious goal and its complete realization depends on the realization of several im portant sub-goals. In order to support this kind of information sharing, a mech anism is needed to make selected data objects from one database “usable” by other databases of similar or different constitution. For example, the respective databases may differ in their database management system (DBMS), their data model (DM), or their conceptual schema; in this research we are not concerned with differences in hardware or operating systems. Such a collection of cooperating but heteroge neous, autonomous component database systems (DBSs) may be term ed a federated database system (FDBS) or federation for short [25, 27]. A key characteristic of a federation is the cooperation among independent systems, which is reflected by con trolled and sometimes limited integration of its autonomous components. This kind of cooperation is often referred to as interoperability. To this extent, an FDBS pro vides an explicit interface to its database components th at facilitates sharing while at the same time allowing each component to function independently. A key aspect of making data be “usable” across heterogeneous databases involves (partial) schema unification, and it is imperative to develop methods that can per form this task with as little human intervention as possible. In this dissertation, we use the term “unification” to denote the process of combining different (m eta-)data objects together. While the term “(schema) integration” prevails as the predomi nant expression for activities of this kind, we stress here partially uniting m eta-data specifications. By m eta-data we mean objects that represent the structural part of a database (i.e., the conceptual schema) as opposed to “factual” data which repre sents the contents of the database. Rather than unifying complete schemas with each 1 This project is a major scientific venture under the auspices of both the National Institute of Health (NIH) and the Department of Energy (DOE). 3 other, a process that is both costly and difficult, partial schema unification is con cerned with uniting non-local objects2 that are selected on demand by components in the federation. One of the fundamental problems that must be solved before one can successfully unify schemas (or parts thereof) is that of semantic heterogeneity or semantic diversity. In the database context, this heterogeneity refers to differ ences in the meaning and use of data that make it difficult to identify the various relationships that exist between similar or related objects in different components. The goal of this dissertation is to demonstrate a new, modular facility for resolv ing semantic heterogeneity in a federation of autonomous, heterogeneous, object- based databases in order to achieve interoperability. By resolving semantic hetero geneity we mean two things: (1) determine the relationships between objects that model similar information, and (2) detect possible conflicts in their representations that pose problems during the unification of shared data. 1.2 About this Dissertation The remainder of this dissertation is organized as follows: In Chap. 2, we review related work. In Chap. 3, we introduce the context for this research. Section 3.1 describes a typical sharing scenario in a federated database environment where in dividual components can share and exchange objects. Section 3.2 gives a detailed definition of semantic heterogeneity and other important terms used in this research. A spectrum for heterogeneity is provided and the most common causes for schema diversities are discussed using examples. Chapter 4 describes the different steps that are needed to achieve interoperability between individual components. We consider the use of an intelligent sharing advisor that will locate relevant information sources in other components. In Chap. 5, we provide the object context in which we have couched our research. In order for any sharing to take place among heterogeneous components, the components must agree on some common model for describing shared data. The model used in this work is called Common Object Data Model (CODM) which is a simplified version of the Kernel Object D ata Model (KODM) 2In this context and throughout the remainder of this dissertation, the term “object” may refer to type objects, function objects (also termed methods or behavioral objects), and instance objects. 4 from Remote-Exchange [15]. In this chapter, we also present an overview of the dif ferent ways in which (m eta-)data objects (i.e. type objects) can be related to each other depending upon the relationships of the real-world concepts they represent. In Chap. 6, we show in detail how our approach to resolving semantic heterogene ity operates. Specifically, we present a strategy for determining the relationship between two objects which is based on three separate mechanisms (parts) for gath ering structural as well as semantic information about the objects in question. In Chap. 7, we investigate three scenarios for unifying foreign objects with local ob jects once ambiguities between the objects have been resolved. In this chapter, we limit our discussion to the unification of type and instance objects. The unification of behavioral objects requires additional work and is treated in detail in Chap. 8. Here we present a comprehensive mechanism for the transparent sharing of methods. Chapter 9 contains a description of two experimental prototypes that were used to prove the feasibility and correctness of our approach. Finally, Chapter 10 contains concluding observations with a critical evaluation of our results and their potential impact. 1.3 Some Guidelines for the Reader In order to facilitate the reading, we would like to establish the following terminolog ical conventions. The term “data” can refer to m eta-data such as types, behavioral data such as methods and functions, and factual data such as the actual instances that make up the state of a database. The term “object” generally refers to type ob jects, function objects, and instance objects unless explicitly stated otherwise (e.g., instance object). In all pictures and tables, the b o ldface type will be used for type objects and concepts, the italicized type will indicate function objects. Program code as well as screen output will be typeset in c o u rie r font. 5 Chapter 2 Related Research Projects and prototype development efforts to integrate heterogeneous database schemas started in the late 1970’s and early 1980’s, mostly focusing on providing methodologies for relational database design. Their goals are to produce an abstract, global conceptual description of a proposed database application based on different perspectives and expectations of the users. Subsequent research efforts started to focus on the integration of object-based databases or parts thereof in order to obtain a global view of related information in different domains. The term schema integra tion, as it is used in most of the reviewed literature, includes semantic heterogeneity resolution, merging, and restructuring. As of today, research efforts have brought about a variety of different sharing architectures utilizing different approaches and mechanisms for integrating heterogeneous databases. Depending on the degree of integration of the participating component schemas different mechanisms to resolve semantic heterogeneity are utilized ranging from manual resolution (before the shar ing phase) to semi-automated resolution (during the sharing phase). 2.1 View Integration Early integration work focused on algorithms and (manual) procedures to merge individual, user-oriented schemata into a single global conceptual schema. Views 6 are typically designed for rather narrow domains and thus exhibit a great deal of redundancy in their vocabulary. View integration mainly performs structural trans formations from one conceptual schema to another and the process of disambigua tion (viz. solving semantic heterogeneity) is much more straightforward than in any other context. There are several different approaches to view integration that can be distinguished by the set of operators used to perform the transformation [4, 13, 26, 54, 53, 55, 65]. In their 1986 survey paper, Batini et al. [5] investigated twelve of these m ethod ologies and compared them on the basis of five commonly accepted integration ac tivities (pre-integration, comparison of schemas, conforming of schemas, merging, restructuring). However, most of the approaches examined in this survey do not directly address the problem of resolving semantic heterogeneity described earlier. Although the problem of achieving semantic interoperability has been studied exten sively in view integration using the relational and semantic data models, (partial) integration of multiple heterogeneous object-based databases is still in its infancy. 2.2 Database Integration Schema integration of heterogeneous databases poses substantially new problems. Since names of objects are used much more inconsistently in multiple heterogeneous databases, previous approaches taken from view integration can only be partially applied to the comparison and integration of heterogeneous databases. Another problem that must be addressed when integrating schemas is that of inter-database object correspondences. Work in this area has focused on inter-database object reference by description [62] and object identification via language constructs [35]. Based on these earlier results, there are now several efforts focusing on database schema integration, with varied approaches to handling the problem of semantic heterogeneity. 2.2.1 D a ta b a se In teg ra tio n B a sed on S tru ctu ra l K n o w led g e One common approach to conflict resolution is to reason about the meaning and re semblance of heterogeneous objects in terms of their structural representation [40]. 7 However, one can argue that any set of structural characteristics does not sufficiently describe the real-world meaning of an object, and thus their comparison can lead to unintended correspondences or fail to detect important ones. Other promising methodologies that were developed include heuristics to determine the similarity of objects based on the percentage of occurrences of common attributes [26, 55, 65]. More accurate techniques use classification for choosing a possible relationship be tween classes [61]. Whereas most of these methods primarily utilize schema knowl edge, techniques utilizing semantic knowledge (based on real-world experience) have also been investigated. Siegel and Madnick [67] use a rule-based approach to resolv ing semantic conflicts. In their approach they present methods for comparing rules that describe the application’s semantic view and the database m eta-data definition. The database manager maintains the results of these comparisons for use in query processing. Sheth et al. [63] introduce the concept of semantic proximity in order to formally specify various degrees of semantic similarities among related objects in different application domains based on the real-world context in which these objects are used. Fankhauser et al. [17] present an approach that utilizes fuzzy and pos sibly incomplete real world knowledge for resolving semantic and representational discrepancies. In their methodology, class definitions, or more generally schemas, are disambiguated by matching unknown term s with concepts in an interconnected knowledge base. A similar methodology is suggested by Heiler and Ventrone [72] that uses so-called enterprise models to provide an application context for describing unknown terms in the schemas of the participating components. 2 .2 .2 D a ta b a se In teg ra tio n U sin g B eh avior A different approach uses behavior to solve domain and schema mismatch problems (see Kent [34]). Domain and schema mismatch are two im portant semantic integra tion problems for interoperating heterogeneous databases. The domain mismatch problem generally arises when some commonly understood concept, for example money, is represented differently in different databases (i.e., U.S. dollars vs. English pounds). Schema mismatch arises when similar concepts are expressed differently in the schema (i.e., a relationship that is being modeled as one-to-one in one schema and one-to-many in another). Kent proposes to use an object-oriented database 8 programming language to express mappings between these common concepts that allow a user to view them in some integrated way. It remains to be seen if a language that is sophisticated enough to meet all of the requirements given by Kent in his solution can be developed in the near future. Despite the many problems that researchers are faced when trying to resolve semantic heterogeneity in object-based databases, there is a variety of promising efforts in the makings. Kim et al. [38] provide the most up-to-date enumeration and classification of techniques for resolving representational conflicts arising within the context of object-based schema integration. 2.3 Heterogeneous Database Systems Research on building complete architectures that support the sharing of information among heterogeneous database systems began only a decade ago [18, 69]. The term “heterogeneous database system” (HDBS) was originally used to distinguish work which included database model and conceptual schema heterogeneity from work in “distributed databases1” which addressed issues solely related to distribution [10]. Recently, there has been a resurgence in the area of designing heterogeneous database systems. This work may be characterized by the different levels of integration of the component database systems and by different levels of global (federation) services. In Mermaid [71], for example, which is considered a tightly coupled HDBS, component database schemas are integrated into one centralized global schema with the option of defining different user views on the unified schema. While this approach supports pre-existing component databases, it falls short in term s of flexible sharing patterns. Due to the tight integration of conceptual schemas, no mechanism for dynamically resolving semantic heterogeneity is needed. The federated architecture proposed in [27], which is similar to the m ultidatabase architecture of [46], involves a loosely coupled collection of database systems, stress ing autonomy and flexible sharing patterns through inter-component negotiation. Rather than using a single, static global schema, the loosely coupled architecture 1The term distributed database is used here as it has been m ainly used in the literature, denoting a relatively tightly coupled, homogeneous system of logically centralized, but physically distributed com ponent databases. 9 allows m ultiple import schemas, enabling data retrieval directly from the exporter and not indirectly through some central node as in the tightly coupled architecture. Examples of loosely coupled FDBSs are MRSDM [45], Omnibase [60], and Calida [31]. Most of the above architectures do not explicitly address the problem of resolving semantic heterogeneity. The architects rely on manual resolution techniques or the existence of integration mechanisms mentioned earlier. 2.4 Object Sharing Besides work on resolving semantic heterogeneity, much effort has been spent on the problem of inter-component object sharing. This involves the placement of objects such as instances, types, and methods in the best possible location in the existing schema of the importing component. Sharing of methods, also term ed behavior sharing has attracted most of the attention due to the particular difficulties that are involved. At the top level, there are two distinct aspects of behavior sharing: (1) the remote execution of behavioral objects, and (2) the location of the actual information units upon which behavioral objects operate. Research in the area of distributed programming languages has of course addressed issues of the remote execution of behavioral objects (which may also be term ed operations, methods, or functions) [44, 70]. Work in the area of database systems, on the other hand, predominantly focuses on the m anipulation of information units; this is related to the second main aspect of behavior sharing. Research on object-oriented database systems has approached the problem of supporting behavior in the database itself [3, 19, 39, 41, 49]. When these systems are extended to a distributed environment, it is the location of the (persistent) data that determines the location of the remote execution of the func tions [19, 39, 49]. In the approach used in this research, for example, methods are implemented without explicit knowledge of where they will be executed or where the data will reside. 10 Chapter 3 The Federated Database System Context As stated in the introduction, the trend towards decentralization of computing that has occurred over the past decade has accelerated the need for effective principles, techniques and mechanisms to support information sharing and exchange among distributed heterogeneous databases. Ideally, we would like to have a single, world wide database system, from which not only scientists but every user can obtain information on any topic covered by data that has been made available. Assuming that global weather data has been made available, an agent in Capstat (e.g., a travel agency booking a trip for one of its customers) would be able to access the predicted air and water tem peratures for the Los Angeles area between March 10 and March 14, 1994, for example (see Fig. 3.1). While such a system is still many years away, we m ust first start with a much smaller and more restricted version that will help drive the technology needed for world-wide interconnection of information. 3.1 A Sharing Scenario Imagine the following scenario involving the travel business. Several travel agencies located in different cities of the country decide that it would be m utually beneficial to join forces and form a nation-wide Federation Of Travel Agencies (FOTA). The goal of FOTA is to share and exchange travel related information between the individual 11 Figure 3.1: A vision of world-wide interoperability. agencies in order to stay competitive and keep up to date with the fast pace in the world of business and pleasure travel. However, each travel agency wants to retain autonomy over its own database with respect to organization and administration. Therefore, when a component agrees to joining the federation it will keep its own local DBMS together with its original conceptual schema. The main advantage of this is that the costly and inefficient process of restructuring the component’s existing data is avoided. Furthermore, there is no need for travel agencies to retrain their employees on a new system. We may assume that since the agencies are located in different parts of the country, the contents of their databases reflect the different travel habits of their customers. Thus FOTA depicted in Figure 3.2 is an example of a loosely coupled collection of heterogeneous database systems, or FDBS, as described in Section 2. Components of FOTA keep track of the following kinds of information: rental cars, airline information, train information, pleasure cruise information, sight-seeing information such as museums and historic buildings, local entertainm ent such as shows and events, and hotel information. We also assume that each travel agency will agree to use an object-based database model at the federation interface. Components 12 may update their own local schema at any time, and it is the responsibility of an importing component to obtain access to new information. Figure 3.3 shows a snapshot of the information stored in the federation at a given instance during its lifetime. As mentioned before, the content and organization of each database differs from travel agency to travel agency. We can see, for example, that travel agency B in Miami is the only component with information on pleasure cruises, due to the heavy demand and the large number of cruises departing from and arriving in that area. All components contain data on hotels for the specific area to which the travel agency is primarily catering. W ith the exception of some large chains such as Holiday Inn, or Hilton, for example, hotel information is relatively localized and not as readily available in the remaining parts of the country. For example, travel agency E which is located in Washington, has data on hotels in the Northeastern U.S., whereas agency A in Los Angeles represents “places-to-stay” in California. All components contain national airline and train information. It is im portant to note that an additional difficulty in finding a solution to the problem of achieving interoperability in FOTA and other similar federations stems from the conflicting nature of sharing and autonomy. On one hand, a travel agency would like to share information with other components of FOTA. On the other hand, the same component would also like to exercise some degree of control over the sharing process, e.g., control over the information it is willing to “export” to the other components. Since the focus of our work is on a solution to resolving semantic heterogeneity, several other im portant issues such as security, i.e., access control, and autom atic update of shared data are not dealt with in their entirety here. In consequence, we assume that all the information stored in a specially “m arked” section of a travel agent’s database, called the export schema, is available to every other travel agency in the federation. 13 New York t ( huat*o Washington I ruvel Agency I) schem a D O m ega Travel Travel , .schem a i' A g e n cy C schem a N e t w o r k i > ...j , Miutnt Angeles Object store Travel Agcno II Travel A gency A schem a B Figure 3.2: A loosely interconnected federation of travel agencies. Hfrayel Agency P (^Public Transportation) (A irlin e ? ) ( Accommodations x^— _ _ _ _ _ _ — s v in New England Entertainment \ Travel Ageqcv C ■ Travel ^ency.E C ^ S ig h ts e e in g ) C~Rental Car Companies^ c ^ u s e u m ^ i ^ Places Irains Places-to-Stay Northeastern US in the M idwest N e t w o r k (^Airlines^) Airlines i . — i \ r ! T ra v e l A g e n c y B ^vlravel Agency A / C R ental Car CompaiiK y ^ ^R ental Car C om pands) Trains Pleasure A /^'TTaces-to-Stay Cruises J K in Southern US Places-to-Stay in California Airlines Airlines Figure 3.3: A conceptual overview of the federation and its components. 14 3.2 Semantic Heterogeneity in Federated Database Systems One of the central problems of interoperability that must be addressed in order to support the sharing of information among a collection of autonomous, hetero geneous databases is semantic heterogeneity. By this we mean variations in the m anner in which data is specified and structured in different components. Semantic heterogeneity is a natural consequence of the independent creation and evolution of autonomous databases which are tailored to the requirements of the application system they serve. For the remainder of this dissertation, we will term the problem of overcoming semantic heterogeneity in order to enable information sharing among autonomous, heterogeneous databases the heterogeneity problem in federated data bases. Before we can present a solution to the heterogeneity problem, it is useful to examine the different kinds of semantic heterogeneity that may occur. 3.2.1 A S p e ctr u m o f S em an tic H e te r o g e n e ity W ithin the context of a federation of loosely coupled components, we can identify a spectrum of heterogeneity based on the following levels of abstraction: 1. M eta-data language (conceptual database model): The components may use a different collection of and techniques for combining the structures, constraints, and operations used to describe data. For example, different data models pro vide different structural primitives (e.g., records in the relational model versus objects in the object-based models) and operations (QUEL versus OSQL1) that contribute to semantic heterogeneity. 2. M eta-data specification (conceptual schema): While the components share a common m eta-data language (conceptual database model), they may have independent specifications of their data (varied conceptual schemas). For ex ample, this refers to the different schemas used by the members of FOTA. 1 Object-based dialect of SQL, used as the D DL/DM L in IRIS [19]. 15 3. Object comparability: The components may agree upon a conceptual schema, or more generally agree upon common subparts of their schemas; however, there may be differences in the manner in which information facts are repre sented [33]. This variety of heterogeneity also relates how information objects are identified to the interpretation of atomic data values as denotations of information modeled in a database (e.g., naming). Looking ahead to travel agency D ’s schema in Fig. 3.4, we can see that the type A ccom m od ation s represents information that is comparable to the information represented in E ' s P laces in N ortheastern U .S.. Both types have associated with them information regarding the name and owner(s). However, schema E does not contain information about ratings (missing information) and the location of an accommodation in schema is only implicitly contained in the type name, whereas it is explicitly stored in an attribute in schema D. Finally, a place in schema E can be co-owned by several people but only one owner per accom modation is allowed in schema D (mapping constraints). 4. Low-level data format: While the components agree at the model, schema, and object comparability levels, they may utilize different low-level represen tation techniques for atomic data values (e.g., units of measure). In term s of FOTA this refers to the problem that arises when values such as fare prices are represented in different denominations (U.S. dollars versus English pounds) or different precisions (16-bit float versus 8 bit integer) respectively (see Fig. 3.5). 5. Tool (database management system): The components may utilize different tools to manage and provide an interface to their data. This kind of hetero geneity may exist with or without the varieties described immediately above. In the context of FOTA, this kind of heterogeneity is due to the fact that components may use a different DBMS. We assume that the components utilize a common language (see Section 5.1), thus ruling out the occurrence of heterogeneities of type (i), conceptual database model. Furthermore, tool heterogeneity, which is type (v) in the above spectrum , is treated as somewhat orthogonal to our concern in this paper. As a result, for the purpose of this research we term types (1) through (4) in the heterogeneity spectrum semantic heterogeneity. 16 T ravel A e e n c v I ) is_located_in Accommodations (i) ( it> is_m anaged_by ( 1) ( 1) has name Person Sem antic H etero g en eities N am ing M issing Inform ation/O verloading M apping C onstraints Name T ravel A g e n c y E Places in ^ Noiihcastcrn I ’.S has_name 0) Business-Name ) has_ rating owned_by ( 1) Owner-Natue legend Property -------► single-valued (1) m ulti-valued (n) Figure 3.4: Semantic heterogeneity in FOTA: Example 1. T ravel A e e n c v B \irttjvei has_price ( i) has_airline CD (1 ) Flighl-Num J S em antic H etero g en eities N am ing Inform ation U nit Packaging M issing Inform ation N am ing Problem Type C onflict (Type-Property R elativism ) T ravel A e en cv C Farc-Typos ^y.-'"'nas__t ar'e ^ type ^has„flight aT~ t L e g e n d Property -------► single-valued (1) m ulti-valued (n) ( 1) valid_on Date Figure 3.5: Semantic heterogeneity in FOTA: Example 2. 17 3 .2 .2 C a u ses for S ch em a D iv e r sity According to Batini et al. [5] there are three m ajor causes for semantic heterogeneity: 1. Different perspectives: This is a modeling problem that finds its roots mostly during the design phase of a database schema. Different user groups or designers adopt their own viewpoints when modeling the same information. For instance, in example 1 in Fig. 3.4, different names were attached to the same concept (A ccom m o dations versus P laces in th e N orth eastern U .S .) in the two schemas of travel agencies D and E , respectively. 2. Equivalent constructs: The rich set of constructs in data models allows for a large number of modeling possibilities, which results in variations in the conceptual database structure [33]. Typically, in conceptual models, several combinations of constructs can model the same real-world domain equivalently. In example 2 in Fig. 3.5, the association between flights and fare information was modeled as a function has-price in schema A as opposed to a separate type Flight-F are-C om binations connecting F lights and Fare-T ypes in schema B. 3. Incompatible design specifications: Different design specifications result in different schemas. In example 3 in Fig. 3.6, the relationship between Travelers and B ookings in schema B in dicates that a customer can only have one booking at a time, since the cardi nality constraint 1 : n has been specified. The more realistic situation (that a customer may have several reservations at once) appears in schema C. Thus far, we have discussed the nature of the interoperability problem and identified the causes and implications of semantic heterogeneity. In the following sections, we present the details of our mechanism for accomplishing interoperability between two components. In order to illustrate our solution we use the framework of FOTA. 18 (' T ravelers 'j (1) (n) ( R eset Vittinns ) Travel A eencv B O nly one reservation record p er custom er (n) (m) T ravel A gency C M ultiple reservation records per custom er T.egenri Property ------ ► single-valued (1) m ulti-valued (n) Figure 3.6: Semantic heterogeneity in FOTA: Example 3. 19 Chapter 4 A Perspective on Database Interoperability Database interoperability refers to the ability to allow partial and controlled shar ing of data among autonomous, heterogeneous database components [64]. Due to the complexity and difficulty in achieving interoperability in such systems, we have divided up this task into three subtasks which can be performed (iteratively) during different operational phases in the lifetime of a federation. The approach that we take in implementing each phase is interactive to the extent that the user (e.g., the user of a database component that is engaged in a sharing procedure) or the admin istrator of the federation has the ability to intervene at any tim e during each phase. In addition, human input is essential for resolving semantic heterogeneity. 4.1 Operational Phases W hen a federation is initially formed or when a new component joins an existing federation, an intelligent sharing advisor must first locate all sharable information in each component and store the (partially) identified concepts in a separate repository called the semantic dictionary. In our approach, sharable information corresponds to all those data objects that a component is willing to export to the rest of the federation. Each component has a so-called export schema [27] that contains only 2 0 D B S 1 D B S 2 D B S 3 D B S n c Interconnection: Sharing and Transmission (CODM) Semantic uses Sharing Advisor uses Sharing Dictionary Heuristics Discovery & . Identification Semantic Heterogeneity Resolution Reconciliation] & U nification Figure 4.1: Interoperability in a federation of databases. those objects th at are non-private and thus sharable by other components. All other objects are kept in the component’s private schema. This is a relatively simple way for each component to exercise control over its database but it suffices for our purpose since the focus here is not on security and access control mechanisms. The sharing advisor tool also responds to requests from database components to locate information that is in structure and content “relevant” to certain information in its own local schema. This first phase of locating and identifying non-local, relevant information is term ed resource discovery and identification and currently under investigation in a related research project at USC (see [15]). During the second phase, the exact relationship between the requested non-local information and the local information in the schema of the integrating component is determined. This phase is term ed resolution of semantic heterogeneity and is the focus of this dissertation. After the exact relationship between a foreign object and one or more local objects has been established in phase two, the third phase, called sharing and transmission supports efficient access to shared object(s). A picture of the operational phases is given in Fig. 4.1. In the next three sections, we will describe in more detail each of these three phases using the familiar example of FOTA. 2 1 4.2 Resource Discovery and Identification A travel agency, for example, travel agency A located in Los Angeles, needs informa tion on bed & breakfast places in the New England area. A only has bed & breakfast information for California and wants to incorporate this additional information into its own schema so that it can be accessed transparently like any other local infor m ation in A ’s database. Assuming that A has no prior knowledge of where to find bed & breakfast places in New England within the federation, it consults with the intelligent sharing advisor that assists components in discovering relevant, sharable information throughout the federation. The sharing advisor uses the information stored in the semantic dictionary and a set of heuristic rules (see Fig. 4.1); it returns to the component initiating the inquiry those concepts that it considers relevant to the requested information. In our case it locates two possible sources of bed & breakfast places in the New England area: the types L odgings1, In exp en sive, and E xp en sive of travel agency E in Washington, and P rivate A ccom m od ation s of travel agency D in New York. Fig. 4.2 displays the three (partial) schemas of the travel agencies involved in this identification process. The goal of identifying relevant information is to retrieve information in other components that is identical, similar, or related to the requested information. De pending upon the amount and kinds of information in which a user is interested, s/he may wish for a large intersection or a small one. Large intersections correspond to concepts that are identical or similar to the concepts in his/her database (e.g., all components of FOTA are in the travel business, hence their databases contain many similar concepts such as A ccom m od ation s and H otels, F lights and A irtravel, etc.). Small intersections correspond to related types of information (e.g., a travel agency and an automobile club share a few commonalities like Travel and Trips, hence their databases have a smaller overlap as far as data objects are concerned), or completely disjoint information (e.g., a travel agency and a telephone company will probably have no common concepts in their database schemas). th r o u g h o u t the rest of this dissertation, the b o ld fa c e type will be used for type objects, the italicized type will indicate function objects. 2 2 needs bed & breakfast inform ation for the N ew E ngland area " Places in the Northeastern US possible source possible source Places-to-Stay Private N Accomodations, Hotels Hotels Resorts & Spas United Airlines Partnership Hotels. Bed & Breakfast Hotels Recommended V. by A A A y 0 6 0 Hotels o Legend Supertype M em bership —» Instance ( ) One-Star Two-Star o o Concept. Schem a Travel A gen cy A Concept. S chem a Travel A gen cy D Concept. Schem a Travel A gency E Figure 4.2: Three partial conceptual schemas for travel agencies A, E , and D. 4.3 Resolution of Semantic Heterogeneity Having identified two possible sources of information, component A has to find the exact relationship(s) between its own type B ed & B reakfast, the types Lodgings, In exp en sive, and E xpen sive of component E, and P rivate A ccom m od ation s of component D in order to determine if and how they can be folded into its concep tual framework. This is the problem of resolving semantic heterogeneity as depicted in Fig. 4.1; it must be addressed before any sharing can take place. Using a sharing language, a local dictionary or lexicon as well as the semantic dictionary mentioned earlier, the meanings of concepts unknown to another component are “derived”. Sharing language commands return structural information about an object (super type, subtype(s), properties, etc.). The local lexicon, which is created and updated by each component separately, contains a semantic description of every sharable type object in the database. In order to make the local lexica usable throughout the entire federation, a common knowledge representation is used. Terms in question can be located and compared with each other. The semantic dictionary describes the relationships between terms in the local lexica. 23 4.4 Sharing and Transmission In general, there is a spectrum of the kinds of inter-component sharing that may be desired. At one extreme, a copy of the foreign object can be created in the importing database. At the other, a surrogate (i.e., handle or place-holder) for the foreign object can exist in the local database. In the later case, objects such as instances, types, and functions are added to the importing schema at specially created places using local surrogates. Surrogates are essentially references that are used to refer to the original object in the exporting database whenever it is used by the importing database. Thus, local surrogates enable individual components to exchange and use each others’ objects in a transparent way without making physical copies of the shared objects. The goal is to place the shared object into the best possible location in the existing schema of the importing component. In addition to sharing type objects with or without instances, it is also possible to share methods and individual instances by themselves. In order to share methods and instances, the importing component must already have the underlying m eta-data (i.e. the type(s) th at the m ethod or instances belong to) in its schema. We will have more to say about the sharing of methods in Chapter 8. Based upon this sharing mechanism the component that wishes to im port the foreign concept will be able to add the (m eta-)data representing the concept to its own local schema. Adding (m eta-)data to an already existing schema is a two step process. First, conflicts (e.g., naming, modeling, scaling) between the objects of the importing database and the external objects from the other components must be resolved (conflict resolution). Second, these objects m ust be imported into the new schema as gracefully as possible and unified with already existing local types (unifi cation). Fig. 4.3 shows the final schema of travel agency A after conflict resolution and unification are completed. Finally, our sharing mechanism has to guarantee the transparent access to instances and functions that belong to the imported m eta-data object and were imported at the same time. 24 • • • Places-to-Stay Bed & Breakfast Resorts & Spas iitions. Hotels O o O ^In ex p en siv e') K xpcnsive ' ) Legend Supertype M em bership Instance o imported “Bed & Breaklast"*- like meta-data o Figure 4.3: Component A’s conceptual schema augmented with remote information. 25 Chapter 5 The Object Database Model In order for any collaboration to take place among the heterogeneous components of a federation, a common model for describing the sharable data m ust be estab lished. This model must be semantically expressive enough to capture the intended meanings of conceptual schemas which may reflect several of the kinds of hetero geneity enumerated above. Further, this model must be simple enough so that it can be readily understood and implemented. The advantage of a simple model is that it can be implemented using a variety of already existing object-oriented database m anagement systems saving both tim e and effort. To this end, we have chosen to use a Core Object Data Model (CODM) as the common data model for describing the structure, constraints, and operations for sharable data. 5.1 Core Object Data Model CODM is a generic functional object data model, which supports the usual object- based constructs. In particular, it draws upon the essentials of functional database models, such as those proposed in in Daplex [66], Iris [19], and Omega [21]. CODM contains the basic features common to most semantic [1, 29] and object-oriented models [3], such as GemStone [49], 0 2 [41], and Orion [37]. The model supports 26 complex objects (aggregation), type membership (classification), subtype to super type relationships (generalization), inheritance of stored functions (attributes) from supertype to subtypes, and user-definable functions (methods). Not supported at this point are run-time binding of functions (method override), overloading of op erations, constraints (semantic integrity rules) on types and functions, and remote transparency. We expect that CODM will eventually incorporate some of the con cepts from [42, 47] for additional support in the unification process. Some of the advantages of using an object-based common data model include the ability to encapsulate the functionality of shared objects [6], its extensible nature [43], and object uniformity [11]. This last item is especially im portant for the unifi cation phase where one can ask about the equivalence of actual data-values, types, and operations. 5.2 Relationships Among Objects Before we present the details of our approach to resolving semantic heterogeneity, we first present an overview of the various relationships that can exist among objects that model the same or similar real-world concepts in different components of a federation [5]. 5 .2.1 C o m m o n C o n cep ts As a direct result of the different causes for schema diversity described above, it may happen that the same concept of an application domain is modeled by different representations R i and R 2 in different schemas. Returning to the example in Fig. 4.2 of Sec. 4, we can see that the concept of “bed & breakfast” as it can be found in travel agency A ’s schema is also represented by the type Lodgings in travel agency E and by the type P rivate A ccom m od ation s in travel agency D. In addition to the obvious naming differences, both abstract objects mirror closely related real-world information but use different modeling constructs in their representations. Several types of semantic relationships can exist between two representations R i and R 2: they may be identical, equivalent, agreeable, or incompatible: 27 1. Identical: R i and R 2 are exactly the same. This happens when the same modeling constructs are used, the same perceptions are applied, and no extraneous in formation enters into the specification. For example, R eservations in schema B and B ookings in schema D are equal representations for the same real world concept, namely records of services rendered by the travel agency to its customers. 2. Equivalent: Ri and R 2 are not exactly the same because different modeling constructs have been applied. For example, in Fig. 3.4, the types A ccom m od ation s in travel agency D, and P laces in N ortheastern U .S . in travel agency E can be used to model the same information, namely hotels in the Northeastern U.S. of the U.S., but the representation is different. In D ’s schema the location is explicitly modeled through a separate function called isJocatedJn. In E”s schema, the location information is implicit since it is part of the type name. 3. Compatible: R i and R 2 are neither identical nor equivalent. However, their representation is not contradictory. For example, P rivate A ccom m od ation s and R esorts both model the same basic hotel information (i.e., rooms to rent) but differ in the type of services offered. 4. Incompatible: R i and R 2 are contradictory because of inconsistent design specifications or fundam ental differences in the underlying information. For example, in Fig. 3.6, the two partial schemas displaying Travelers and R eservations and C us tom ers and B ookings are incompatible because of the cardinalities assigned to their respective relationships. 5 .2 .2 R e la te d C o n cep ts In addition to common concepts, related concepts arise frequently; we can enumerate the following most commonly used types of interschema (binary) relationships of this kind: 28 1. Generalization/Specialization: Generalization is the result of taking the union of two or more types to produce a higher-level type. In terms of the travel agency example, P laces-to-S tay of schema A is the generalization of R esorts & Spas, H otels, and B ed & c B reakfast. Specialization is the opposite of generalization. 2. Positive Association: It is impossible to accurately classify all kinds of relationships that can exist between objects. This category includes concepts that are “synonyms” in some context (e.g., B ed & : B reakfast and P rivate A ccom m od ation s), and those th at are typically used in the same context (e.g., H otel and R eservations). This list is by no means exhaustive but rather indicative of useful inter-relationships vis-a-vis semantic heterogeneity resolution. 29 Chapter 6 A Mechanism for Semantic Heterogeneity Resolution The fundam ental goal of this research is to autom ate the resolution of semantic het erogeneity when sharing information among components in a federation. In general, sharing is possible at many different levels of abstraction and granularity, rang ing from specific information units (data objects), to m eta-data1, to behavior. For this dissertation, we term the sharing of type objects type-level sharing. Sharing of individual instances is term ed instance-level sharing and sharing of behavior or functions is term ed function-level sharing. Type-level sharing forms the foundation for instance- and function-level sharing since when sharing instances or functions, the underlying type(s) has to exist in the remote component. Furthermore, instance and function-level sharing requires more work than type-level sharing in the sense that there may be problems with side-effects2. Information sharing is investigated in detail in Chapters 7 and 8. 1This includes structural schema specifications and sem antic integrity constraints. 2By side-effects we really mean: (1) any kind of im plicit input other than the input argument, and (2) any modifications to the state of the database where the function executes. 30 6.1 Earlier Approaches As observed in Sec. 2, most of the previous work on resolving semantic heterogene ity has concentrated in the area of determining structural equivalence or behavioral equivalence at the schema level. In what follows, we first present a brief overview of these two approaches (Secs. 6.1.1 and 6.1.2). In Section 6.2 we describe our mech anism; it is based on three parts, each addressing a different aspect of the problem of establishing the relationship between type objects in different components. 6.1.1 B a sic S tru ctu ra l E q u ivalen ce A widely used approach for disambiguating two objects3 is to determine if they are structurally equivalent (at some level of abstraction). As an example, consider a (local) object, say : joachim, of type Traveler in travel agency B ’s database and a foreign object, say :dennis, of type C ustom er in travel agency C ’s database, that are being compared. In this case, where both objects are atomic, the comparison is relatively easy. One can simply apply some sort of “eq” or “equal” semantics. Otherwise, further comparisons are needed. When comparing type objects this in cludes comparing the type names, the instances of that type, the subtypes of the type, and so forth. In addition, function information such as value types, func tion names, missing functions, and mapping constraints, for example, can prove useful on structural characteristics. Consider the two abstract objects A ccom m o dation s of travel agency D and P laces in N ortheastern U .S . of travel agency E as shown in Fig. 3.4. In this case, the functions has-name with value type N am e and is-managed-by with value type Person in £>’s schema and has^name with value type B u sin ess-N am e and owned-by with value type O w ner-N am e in E ’ ’s schema suggest a certain similarity between the two concepts. In order to determine a more exact connection, one has to make further comparisons. The more commonalities there are with respect to the above criteria the higher the correlation between A c com m od ation s and P laces in N ortheastern U .S.. 3In this section the term object can refer to a type object as well as an instance object. 31 6 .1 .2 B a sic B eh a v io ra l E q u ivalen ce Another approach to determining object equivalence is to look at the operations (i.e., methods) that are defined on the objects in order to establish a behavioral equivalence between them. The general idea behind this approach is to apply all operations that are associated with a given local object, say C ities, and compare the results with those obtained when running the same methods on a foreign object, say D estin ation s. Examples of such methods are sightsQ, which takes a city instance as input and returns its sights, or location(), which returns the country in which a particular city is located. Behavioral equivalence requires the ability to compare methods and their results when executed in different environments. Thus, its success largely depends on how well a particular federation environment supports the remote execution of procedures. Another problem that remains in this approach is that of deciding when the results of the procedure applications are equal. 6.2 A Three-Pronged Approach to Relative Object Equivalence In order to determine the relationship between objects within a broader context, we realize that not one single method (such as the structural approach) but a combi nation of several different approaches taken together is highly promising. While it is nearly impossible to completely autom ate such a procedure, the following mecha nism provides substantially useful functionality. We now describe our three-pronged approach for resolving semantic heterogeneity, viz. determining relative object equi valence. 6.2.1 R e m o te S haring L angu age The rem ote sharing language (R SL) is part of CODM and provides a standardized interface to the conceptual schemas of the participating components. R SL provides the capabilities to (1) query the m eta-data of selected databases in order to obtain structural information about objects (resolution), and (2) augment selected database schemas with remote objects (unification). 32 RSL C om m and D escrip tion S h o w M e t a Returns a list of all sharable types H a s P r o p e r t i e s Returns a list of stored functions H a s V a l u e T y p e Returns the value type for a stored function H a s M e t h o d s Returns a list of computed functions H a s l n s t a n c e s Returns a list of instance OID’s H a s V a lu e Returns the value of a stored function H a s D i r e c t S u b t y p e s Returns a list of all direct subtypes of a type H a s D i r e c t S u p e r T y p e Returns the direct supertype of a type I m p o r t M e t a Imports m eta-data I m p o r t l n s t a n c e Imports an instance object I m p o r t M e t h o d Imports a computed function Table 6.1: Remote sharing language commands. Table 6.1 shows a list of R S L commands. See App. A for a more detailed descrip tion of how these commands are used. Note that we cannot depend on examining supertypes of an object, since the supertype might not be available for sharing; however, we always assume that subtypes are. Since a subtype is a specialization of a type, it makes sense to allow all subtypes of a sharable type to be sharable as well. Also remember that the term s to r e d f u n c t i o n in our data model refers to an attribute denoting an inter-type relationship. Components that wish to share and exchange information must agree on a common interface (CODM) that can provide the functionality described above. 6 .2 .2 L ocal L exicon One of the foundations for our approach is to be able to m aintain semantic infor m ation about all the sharable objects in each component beyond the information that is already provided by the underlying schema. For this purpose, each com ponent database system is augmented by a local le x ic o n where it defines all type objects it is willing to share with the other components. The common vocabu lary in which shared knowledge is represented in a lexicon draws some ideas from declarative knowledge representation forms such as the K n o w le d g e R e p r e s e n t a t i o n L a n g u a g e (KRL) [8], semantic networks [59], and the C y c knowledge base [28]. In 33 R -D escrip tor M eaning I d e n t i c a l Two types are the same E q u a l Two types are equivalent C o m p a t i b l e Two type are transformable K i n d O f Specialization of a type A sso c Positive association between two types C o l l e c t i o n O f Collection of related types I n s t a n c e O f Instance of a type C o m m o n Common property of a collection F e a t u r e Descriptive elaboration H a s Property belonging to all instances of a type Table 6.2: Relationship descriptors. our approach, knowledge is represented in a local lexicon as a static collection of facts of the simple form: <term> relationship-descriptor <term> A term on the left side of a relationship descriptor represents the unknown concept which is described by the term on the right side of a relationship descriptor. The set of descriptors is extensible and specifies the relationships that exist between the two terms. Table 6.2 shows a list of conceptual relationship descriptors utilized in our mechanism. The term s that are used to describe the unknown concept are taken from a dy namic list that characterizes commonalities in a federation. This dynamic list of commonly understood term s is called ontology. According to Gruber [22], “[An] ontology is a description of the concepts and relationships that can exist for a [com ponent] or a [federation of components] for the purpose of enabling knowledge sharing and reuse.” Since interoperability only makes sense among components that model similar or related information, it is reasonable to expect a common understanding of a minimal set of concepts taken from the universe of discourse. Thus, in this research, an ontology defines the vocabulary with which information and assertions are exchanged among components. For example, a possible ontology for a federa tion consisting of collaborating travel agencies is different from the ontology used in a federation of cooperating biologists. Ontologies are organized in “packages” 34 depending on the application domain they describe. Providing a federation with the appropriate ontology package beforehand will speed up the initial set-up time. Part of each federation is therefore an initial ontology package that consists of a g e n e r a l- p u r p o s e ontology (GPO) and one or more s p e c ia l- p u r p o s e ontologies (SPOs). The GPO is application independent and constitutes the minimal vocabulary that forms the basis for any kind of inter-component communication. SPOs on the other hand are application dependent and contain terms taken from a specific topic. By organizing ontologies based on their contents, they can easily be reused and shared by other federations. Both the GPO and SPOs are highly dynamic, meaning that the number of terms in them grows during the life-time of the federation depending on the sharing patterns of the participating components. Both the GPO and SPO ’s together provide the components with a vocabulary that describes the application areas of the database systems involved. As a result, our resolution mechanism works best if all the components have a similar background, thus reducing the need for many additional SPO’s or SPO’s with a lot of terms. In the case of FOTA, a subset of the ontology package could be: Q V O = {Person, Name, Number, Owner, Thing, . . . } SV O xravel = {A rrival, Booking, Customer, Departure, Trip, ...} SPO Travel— Air — { Airline, Fare, Flight, ... } SVOxravel-Hotel — {Category, One-Star, Two-Star, Three-Star, Hotel, Location, ... } SVOxravel-Cruise — {Boat, Entertainment, Sightseeing, . ..} Specifically, SPOpravei is a special-purpose ontology containing terms used in the travel agency business. SPOpravel-Air, SPOxravel-Hotel, and S PO Travel-Cruise con tain terms th at are used mostly in the context of air travel, hotel reservations, and pleasure cruises respectively. Using this package in FOTA, we may have the following in a local lexicon: The underlying idea of the lexicon is to represent the real-world meaning of all shared terms in order to complement the results of the RSL com mands. In some cases, it is not possible to “derive” the meaning of a term only by looking at its structure (for example, Pleasure C ruises and F lights both have functions containing departure and arrival information but the types themselves are not related). By the same token, the real-world meaning by itself might not be enough to correctly integrate a term into another type hierarchy (for example, two 35 Local Term Relationship Descriptor O ntology Term Textual Description (not part o f lexicon) Bed & Breakfast 99 K i n d O f F e a t u r e Hotel economical A bed & breakfast place is a specialization of hotel and one of its features is its low price. Sightseeing 99 99 99 K i n d O f C o l l e c t i o n O f H a s F e a t u r e Entertainment Trips Place-to-visit Pleasure Sightseeing is a special form of entertainm ent that is a collection of trips to places and is m eant to be enjoyable. M exicoFlights 99 C o l l e c t i o n O f C o m m o n Flight Destination M exicoFlights is a collection of flights with the comm on destination of Mexico. Table 6.3: Contents of a local lexicon. types Fare and Airfare that represent the cost of tickets but store their prices in different denominations cannot simply be merged into a new type). Thus, by using the RSL together with local lexica, we are able to achieve a higher degree of confi dence in the correctness of our mechanism when integrating objects. An example of the partial contents of three local lexica is given in Fig. 6.1. 6 .2 .3 S em a n tic D ic tio n a r y W hereas local lexica contain semantic descriptions about local, sharable objects, they do not contain any knowledge about relationships between the entries in dif ferent lexica. This information is collected in a global repository, called semantic dictionary. Like the local lexica, the semantic dictionary is dynamic, meaning that its content is updated whenever new or additional information becomes available (e.g., after a relationship between two similar remote types has been established). 36 P laces-to-Stay K in d O f H otel C o m m o n Location = “C alifornia” B e d & B rea kfa st K in d O f H otel F e a tu re C ategory = low R esorts & Spas: K in d O f H otel F e a tu re C ategory = high • • • L o c a l L e x ic o n Travel A g en cy A Places in N ortheastern U S K in d O f Hotel C o m m o n Location = “N ortheastern R egion” H otels E q u a l H otel Lodgings K in d O f Hotel E xpensive: K in d O f H otel F e a tu re C ategory = high In exp en sive K in d O f H otel F e a tu re C ategory = low R e n ta l C ar C om panies K in d O f H otel • • L o c a l L exico n Travel A g en cy E A cco m m o d a tio n K in d O f H otel F e a tu re C ategory = m edium P rivate A cco m m o d a tio n s K in d O f H otel F e a tu re O w ner = private H otels E q u a l H otel 9 • • L o ca l L exico n T ra vel A g en cy I ) Figure 6.1: A partial view of three local lexica. In specific, related types from different components that have been identified are grouped into a collection called a concept, within which subcollections called sub concepts can be further identified. This generates a concept hierarchy much like the one shown in the bottom of Fig. 6.4. A concept, shown as a rectangular box (note that ovals are reserved for type objects) represents a global view of its members (which are type objects from the component schemas) and the members of all its subconcepts. Initially, the concept hierarchy consists of key concepts from the on tology package only. During the life-time of the federation this hierarchy will grow as sharable objects and their relationships to these key concepts are identified. In a sense, the semantic dictionary with its concept hierarchy represents a dynamic fed erated knowledge base about the different sharing patterns in the federation. The larger the concept hierarchy, the greater the probability that a relationship between two objects has been previously resolved. In these cases, the resolution process is reduced to a look-up of previously established object relationships, similar to the problem solving approach applied in Case-Based Reasoning (CBR) techniques (see [12])- 37 5 ■ * ! e X H U A s c tf 3 U f c * J - W * 3 3 — & Si JO 0 3 SPO SPO SPO SPO SPO Special-Purpose SPO Ontology G e n era l-P u rp o se O ntology G P O Semantic Dictionary Travel gcPO ~ (Person, Name, Number, Thing) SPOrravel “ (Arrival, Booking, Customer, Departure, Price, Origin, Ticket, Destination, Package, Schedule, Seat) SPOrravei-Air — (Airline, Class, Connection, Fare, Flight, First, Business, Coach, Service) SPOrravel-Cruise — (Boat, Cabin, Pleasure Cruise, Port, Sightseeing, Entertainment) StPOrravel-Hotei = (Hotel, Room, Category, One-Star, Two-Star, Three-Star} SPOrravel-Tra in ~ (Train, Car, Compartment) ■S^^rravel-CarRental ~ (R ental Car, Insurance, Compact, Large, Van) Relationship D escriptors ID E N T IC A L EQUAL CO M PA TIBLE KXNDOF A SSO C COLLBCTXONOF IN ST A N C E O F COMMON FEATURE HAS Concept Hierarchies Hotel KindOf> Lodging I Kin j Resort) A ir line! 2 T [Flight I KindOf 1 ---- D estination Figure 6.2: The semantic dictionary. In addition to the concept hierarchy of sharable objects, the semantic dictionary also contains the ontology package described earlier as well as a list of the relationship descriptors that are used in the local lexica. A schematic overview of a semantic dictionary is given in Fig. 6.2. 6.3 Resolving Object Relationships One reason for the relatively slow progress in the area of semantic heterogeneity resolution is that heterogeneous databases provide a wide spectrum of vocabulary and name usage which is inherently difficult to “understand” by computers. The narrower the domain, i.e., the higher the degree of redundancy in the vocabularies and the closer the relationships among objects, the higher the chances of successful resolution. Currently, the process of disambiguation (viz., solving semantic diversity) is still only partially autom ated and requires human interaction along the way. In order to determine the relationship between objects within a broader context, we realize that not one single m ethod (e.g., schema resolution based on structural 38 the semantic dictionary describes the relationships between terms in local lexica Sem antic D iction ary local lexicon export schema Component A the local lexicon contains a semantic description o f all objects in the export schema local lexicon Legend logical — connection sharing pattern ^ RSL returns representational information about objects and supports remote schema evolution C ore O b je c t D a ta M o d e l (C O D M ) in te rfa c e + R S L export schema Component B the export schema contains type objects that can be shared with > ( other components . Component B ’s augmented I \ schema after integration/' ' ..of type objects from A ' " local lexicon export schema Component h Figure 6.3: Sharing architecture and the various interactions among its components. knowledge, see [13]) but a combination of several different approaches taken together is highly promising. This also becomes evident when studying the different types of semantic heterogeneities displayed in Figs. 3.4 through 3.6. Therefore, we decided to augment the conceptual schemas of participating components with additional semantic information describing the usage of sharable objects in their application environment. While it is nearly impossible to completely autom ate the resolution of inter-object relationships, the following mechanism provides substantially useful functionality. We now describe our approach for resolving semantic heterogeneity, viz., determining relative object equivalence. Using the RSL, local lexica as well as the semantic dictionary mentioned ear lier, the meanings of concepts unknown to another component are “derived” . RSL commands return structural information about an object (supertype, subtype(s), properties, etc.) and perform the unification of remote objects with local schemas. The local lexicon, which is created and updated by each component separately, con tains a semantic description of every sharable type object in the database. In order 39 to make the local lexica usable throughout the entire federation, a common knowl edge representation is used. Terms in question can be located and compared with each other. The semantic dictionary describes the relationships between term s in the local lexica. See Fig. 6.3 for a pictorial description of the interactions among these components. In order to provide components with a way to protect their private data from the rest of the federation, sharable objects must be placed in a special section of the conceptual schema term ed export schema. Everything that is not part of the export schema is invisible to the rest of the federation. The basic problem addressed by the semantic heterogeneity resolution mechanism may be expressed, without loss of generality, as: given two objects, a local and a foreign one, return the relationship that exists between the two. Specifically, our strategy is based on structural knowledge (conceptual schema) and the (known) relationships that exist between keywords and the two objects in questions (local lexicon, semantic dictionary). One characteristic of our approach is that the m ajority of user input occurs before the resolution step is performed (i.e., when selecting the set of keywords and creating the local lexicon) rather than during. In Fig. 6.4 we can see a pictorial description of parts of two local lexica belonging to schema A and schema B respectively. In the same picture we can also see a subpart of the concept hierarchy in the semantic dictionary. Looking at A’s lexicon, we can see that the property discovered.by is equivalent to the concept A uthor from the ontology package. We also see that schema B 's R esearcher and Name combination is equivalent to the concept A uthor. Therefore, discovered-by and the type property combination of R esearcher and Name must be equivalent. We can make the following observation on the use of RSL commands with re spect to resolution of semantic heterogeneity. All information about the structure of a type object is provided through selected RSL commands: an approach that can be viewed as a variation to the usual paradigm of behavior encapsulation in object-oriented programming languages. Rather than encapsulating behavior, these commands “encapsulate” the structure of a type object. The advantage of this ap proach is that RSL commands that are essentially computed or foreign functions in our data model can be part of each component’s schema without modifications to the underlying architecture. 40 Figure 6.4: Resolution o f object relationships. Travel Agency D in New York Travel Agency E in 'Washington Hotels m NE rs/ A Accommodation has accommodation name Airlines ownedjby name ‘ Equal Organizations Airtravel 5 name Compatible 4 Airline Owner V 2 fl H M l { 6 . KindOf Lodging Semantic Dictionary (Concept Hierarchies) Legend Stored Function — ► Relationship w ///^ > Key Concept | | Relshp. Descriptor Chapter 7 Unification of Remote and Local Information Thus far we have discussed the nature of the schema integration problem and iden tified the various causes of and a solution to the heterogeneity problem. In this section we describe the individual activities that a component m ust go through when importing a type object. As mentioned before, there are two steps that must be performed when adding imported (m eta-)data to an already existing schema. First, once conflicts (e.g., naming, modeling, scaling) between a source and a target schema are detected, these conflicts must be resolved so that the unification of the foreign type(s) with the corresponding local objects is possible (conflict resolution). Second, the foreign object(s) must be imported into the local schema as gracefully and naturally as possible (unification). In the context of this research, “schema” refers to a collection of type objects that are connected through the usual object- based constructs. As noted above, we consider here only the im portation of type objects or collections thereof. 42 7.1 Eliminating Inconsistencies The goal of this activity is to resolve inconsistencies between the imported type(s) and the target schema before the unification step. However, autom atic conflict res olution is in general infeasible. Sometimes conflicts cannot be resolved because they arose as a result of some basic inconsistencies. In cases where autom atic resolution is not possible, the conflicts are reported to the users who must guide the unification mechanism in the process. The following specific activities are performed during conflict resolution: • Operations on atomic data values: This is an attem pt to resolve type (iv) heterogeneities (low-level data format. For example, a conversion of fare prices that are represented in English pounds into U.S. dollars (this might also affect the respective value types of functions associated with fare prices). • Renaming: This activity addresses the problem of homonyms (where the same name is used for two different concepts) and synonyms (in which the same concept is described by two or more names). As an example for homonyms, consider the following scenario. Two components A and B of FOTA want to share fare information. Both schemas include an object named Fare-Prices. However, Fare-Prices in schema A includes airport and sales taxes whereas in schema B it represents the “true” fare price without any taxes added. It is obvious that merging the two types would result in a problem. Therefore, one of the two types must be renamed before unification can take place in order to reflect the differences in their representations. As an example of synonyms, consider two schemas representing customer information, C lients and C ustom ers, where both types contain the same data. In this case, keeping two distinct types in the integrated schema would result in modeling a single object by means of two different types. Additional conflicts are resolved during the following unification phase. 43 7.2 Unification At this point, the foreign object(s) can be unified with the corresponding local object(s). If necessary the target schema must be restructured to achieve a result th at is (1) complete, (2) minimal, and (3) understandable. Complete, since the new, integrated schema must contain all concepts that were present before the unification process took place. Minimal, since concepts should only be represented once, and understandable, since the integrated schema should be easy to understand for the end user. Upon importing the (m eta-)data, structural conflicts with existing types in the component’s local type hierarchy may arise. Several possibilities exist, and we dem onstrate our approach with the help of several sample scenarios from FOTA. These scenarios differ in the complexity of the schemas to be integrated, starting with the most simple one: the importation of a single foreign type object. The second scenario examines the cases when a component wishes to import one or more types that are inter-related. The types to be unified are usually part of a more com plex type hierarchy, hence we also need to be concerned with inheritance issues. The last scenario is a special case of the previous one and focuses on the relationship(s) between the objects to be unified. In this scenario, it is possible to construct the situation in which a component wishes to import objects that are connected in a complex m anner involving one or more types not present in the local schema. In assembling this framework, we want to point out the relevance of previous work in this area described in [5, 30, 32, 53]. Below, we assume the following naming convention. Let Cim p denote an im port ing component of FOTA, Cexp denote an exporting component of FOTA, L denote a local object belonging to and F denote a foreign object belonging to Cexp. 1. In the first scenario, we assume that a component C,m p wishes to im port a single type object F from component Cexp. In increasing order of generality, this scenario can be decomposed into the following three cases. (a) F does not exist in Cimp’ s schema. For example, travel agency D wishes to import the object type M u seu m s from travel agency E. Since M u seu m s does not exist in ZTs schema, it can be added without further 44 modifications. The designated place for the new type is either as a sub- type of “root” or “system” (i.e., the highest place in the type hierarchy), or as a subtype of some other user-specified local type. It is im portant to note that “adding” a type requires some additional work in the sense th at the value types for the associated stored and/or computed functions do not always exist in the target schema and may have to be im ported as well. The exact details on how to import functions is described Chapter 8. (b) F is (semantically) equivalent to some local object L in Gimp’ s schema. In this case we make F a subtype of L and add the necessary functions to both L as well as F. For example, travel agency A wishes to import the object type P rivate A ccom m od ation s from travel agency D (this scenario is depicted in Fig. 4.2). An equivalent type, namely B ed & B reakfast already exists in A’s schema. Thus, the im ported type P ri vate A ccom m od ation s is added as a subtype of B ed &c B reakfast and all its functions are created for both the supertype as well as the subtype. Note that “subsetting” is a unification practice th at is used by most methodologies [5]. In fact, it is considered to be the basis for accommodating multiple user perspectives on comparable types. The case where F is exactly identical to L (e.g., structurally as well as semantically), is merely a simplification and only requires the im portation of the type instances in which Cimp is interested (this is essentially a type merge as mentioned earlier). (c) F is related to an object L in Cimp’ s schema. In the case when L and F are similar in their semantic meaning but not equivalent, i.e., they are identical with respect to some functions and different with respect to oth ers, a new supertype is created that contains only the identical functions of L and F. The functions in which L and F differ are associated with two new subtypes which inherit the functions common to both L and F. Together, the new supertype and its two subtypes contain the same infor mation as L and F before the unification. This m ethod was proposed in [13]. In term s of FOTA, the following example as shown in Fig. 7.1 best il lustrates this method. Assume that travel agency D is about to integrate 45 Hotels in Northeastern US Travel Agency E Accommodations (a) D ’ s sch em a before unification L egend Supertype unified Accommodations jjid Hotels in Northeastern L'S (b) D ’ s Schem a after unification Figure 7.1: Unification of two related objects. H otels in N ortheastern U .S. into its own schema. It has determined that H otels in th e N ortheastern U .S . is related to its own A ccom m od ations in N ew England. Although there is considerable overlap between the two types (e.g., both contain hotel information, both cover a similar geographic area), there are still enough differences to prevent D from simply “merging” the two types. Using the above method, a new supertype A ccom m od ation s is created that has two subtype N orth eastern U .S . and N ew -E ngland. Together, all three types contain the same information as the two original types. 2. In the second scenario Cimp wishes to import (m eta-)data consisting of several, inter-related objects from Cexp. For simplicity, we use only two different foreign type objects, namely F\ and U2, but the following discussion can be adapted to situations where more than two foreign type objects are involved in the unification process. As before, we can distinguish the following cases based on their degree of generality: 46 (a) F\ and F2 do not exist in Cimp’ s schema. Here the type objects and all instances as required by Cimp are added. For example, travel agency B that has no train information in its schema so far, wishes to im port two related types called Trains and Train-Fares from travel agency C . (b) Either F\ or F2 is (semantically) equivalent to some object L in Cimp’ s schema. If we assume that foreign type Fj is equivalent to a local type, say L \, several things need to be done. First a new subtype of Li is created, in order to hold the imported instances from F x. Then F2 is added as L 2 to Cimp’s local schema, including all of its instances. Finally, new functions relating F i’s subtype and L2 are created. For example, travel agency D is importing the type A irtravel and the value type P rice of function hasjprice from travel agency C (see Fig. 3.5). Since A irtravel is related to C ’s F lights it is added as a subtype of it. Then the type P rice is added to the newly imported A irtravel just as in the original schema of travel agency C . In the case where both types Fi and F2 are present in Cimp's schema, all foreign instances of Fx and F2 can be imported into the already existing local types L\ and L 2 (type merge). (c) Either Fi or F2 is related to some object L in Cimp’ s schema. Assume that Fi is related to L. In this case, which is similar to case (c) of the first scenario, the same method applies. The functions that are common to both Fi and L are associated with a new supertype, which has two sub- types containing the functions that distinguished F x and L. Since F 2 was related to the original type F x, it will also be related to the new supertype to which it is added as a stored function. For example, travel agency D is importing the two types Sightseeing and C ities (containing the cities where the sights are) from travel agency C. C ’s type S igh tseein g is re lated to D ’s local type E ntertainm ent. Following the m ethod outlined above, a new supertype is created by travel agency D, say T h in gs-to- do, which contains all the information common to D ’s E n tertain m en t 47 (b) D ’ s Schem a after unification Entertainment (a) D ’ s sch em a before unification Tfavel Agency C Cities Figure 7.2: Unification of several inter-related objects: Example 1. and C ’s Sightseeing. The new supertype T hings-to-do has two sub- types, namely E ntertainm ent and Sightseeing, that reflect the differ ences between the original types. C ities, which is a stored function of both subtypes, is connected to T hin gs-to-do via a stored function called hasJocation (see Fig. 7.2). (d) Both F\ and F2 are related to objects L\ and L 2 in Cim-p’ s schema. For example, travel agency E with types P laces in N ortheastern U .S. and O w ner-N am e wants to import the types A ccom m od ation s and P erson from travel agency D (see Fig. 3.4). In this case, both types A c com m od ation s and Person are imported separately. If the relationship between the two is im portant for the importing component, the appro priate function(s), in this case is-managed-by can be added afterwards. 3. In the third scenario, consider the case where component Cimp wishes to im port types from Cexp whose relationships are modeled differently then the corresponding type relationships in Cimp’s local schema. For example, L\ and L2 are related through a ternary relationship that includes an additional type 48 Legend Supertype — im ported C y Airtravel Price Travel Agency B Fare- Flights has _flight (a) C ’ s schem a before unification (b) C ’ s Schem a a fter unification Figure 7.3: Unification of several inter-related objects: Example 2. L3 whereas the corresponding foreign types, Fi and F2, are related directly through functions and their inverses. Applying the m ethod of “subsetting” , two new subtypes for F\ and F2 are necessary. Then, for each instance pair th at Cimp imports, a new instance for L3 is created, which relates the im ported objects. As an example in FOTA, consider the following situation depicted in Fig. 7.3. Travel agency C models its “flight-fare” information through a ternary relationship consisting of the types F lights, Flight-Fare- C om binations, and Fare-Types. In order to obtain fare information on charter flights from Miami to the Florida Keys, it decided to im port this infor mation from travel agency B. B uses two types A irtravel and P rice and a function has-price and its inverse to model the relationship between flights and fares. Using the procedure outlined above, two new subtypes for F lights and Fare-T ypes are created, namely Southern-F lights and Southern-Fare- T yp es respectively (to represent the fact that they contain information about the southern parts of the U.S.). Further, for every related pair of objects in the new subtypes, a new object is created in C s F light-F are-C om binations 49 relating each flight from Southern-F lights to its corresponding price object in Southern-Fare-T ypes. These three scenarios are an attem pt to examine the most im portant sharing situations that arise during the life-time of FOTA. We have only examined the im portation and unification of single type objects or pairs of related type objects. However, the mechanisms presented in this section can be extended to accommodate the unification of three or more inter-related type objects. Some of these cases will be studied during subsequent research in this area. Although it is difficult to propose a useful measure of the “completeness” of this enumeration, we shall attem pt to use the scenarios presented here as a foundation for our work, and argue that they are sufficient to prove the feasibility of this approach. So far, we have not explicitly addressed the sharing of instances. It is im portant to note that this so-called instance level sharing requires the ability to compare individual instances in different components in order to prevent the unification of duplicates. This problem is addressed in more detail in [35] and will be left for further extensions to this work. Conceptually, it only makes sense to import instances that belong to a type which has been imported before. In this context, instance-level sharing can be compared to assigning the desired values to the function(s) that is (are) defined on the imported type(s) in the local schema. For example, in Fig. 7.2, component D can import instances defined on C s types Sightseeing and C ities by creating new objects for both types in D J s schema (we assume that D has imported the types Sigh tseeing and C ities as well as the function has-price as can be seen in Fig. 7.2 (b)) and then assigning the desired values to the newly created objects under the hasJocation function. We will complete our discussion on schema unification in the next chapter when we present a mechanism for the sharing of behavior. 50 Chapter 8 Inter-Component Behavior Sharing In the previous chapter we have described our theory for placing rem ote type objects into the best position of an existing local schema. Until this point, we have limited our discussion of partial schema unification to the sharing of m eta-data objects. We have observed that in addition to the sharing of type objects sharing of instances [16] and sharing of functions, also term ed behavior sharing, constitute the second and third aspect of partial schema unification respectively. In this chapter, we provide an overview of our underlying theory of behavior sharing in federated database systems [14]. We address the essential problems that are associated with each sharing situation and describe our mechanism to support sharing as implemented in our experimental prototype system. 8.1 Behavioral Objects in CODM In CODM, behavior is represented by three types of functions: • Stored Functions: A stored function records data as prim itive facts in the database. Stored functions correspond to inter-object relationships (attributes). • Derived functions: A derived function is defined by a data m anipulation lan guage (DML) expression. Derived functions correspond to queries (derived data). 51 • Computed functions: A computed function (sometimes term ed a foreign func tion) is defined by a procedure written in some programming language. Com puted functions correspond to methods (operations). For the remainder of this dissertation, derived and computed functions are treated uniformly, and will be term ed “computed functions” herein. 8.2 The Sharing of Behavior Let us assume the existence of a function IF, which can be shared among components of a federation; without loss of generality, assume T takes as input the argument a1. The argument type can be a literal (i.e., Integer, String, ...) or a user- defined type such as A ccom m od ation s, for example. Sharing takes place on a component-pairwise basis, meaning that T is exported by a component C l and im ported by a component C 2. The importing component is called the local database, while the exporting component is called the remote database. There are several ways components C l and C 2 can share the service provided by Jr, depending upon the location where T executes and upon where its input argument a resides (i.e., there are two degrees of freedom). At this level of abstraction there are four distinct function-argument combinations: • local function - local argument • local function - remote argument • remote function - local argument • rem ote function - remote argument Upon closer analysis we can note that it is also necessary to differentiate between stored functions and computed functions. At this (finer) level of granularity, we can now distinguish between a total of eight different sharing scenarios2, as presented in Fig. 8.1. In this table of Fig. 8.1, “Local” refers to the domain of the local database 1Since the argument can be a complex unit of information, this is not a lim itation; m ultiple arguments can be handled by an obvious extension of our approach. 2We now have three degrees of freedom. 52 Stored Functions Com puted Functions v a rg u m e n t fu i* c tio n \ L ocal L ocal R em ote R em ote a rg u m e n t fu n ctio n 's. L ocal R em ote base case useful L ocal base case useful processing done locally local attribute value for remote object processing (lone locuily local hcliav ior on remote objects undefined re la led In in stan ce le \e l sharin g R em ote basis for b eh avior sh arin g related to instance level sh arin g processing done remotel} remote behavior on local objects processing done remotely Figure 8.1: Eight different situations for sharing of behavior. while “Remote” refers to the domain of the remote database. Local objects are those that belong to the local database, while remote objects belong to the remote database. It is im portant to note that a principal goal of our approach is to provide a mech anism for behavior sharing that makes the location of a function and its argument transparent to the user. The details of how this transparency can be achieved are more fully described in Sec. 9.2, but we will briefly highlight the process here. In par ticular, we note that the state of a remote object (i.e., its functional values) always resides in the remote database, but when the object is im ported to a local database, a surrogate is created for it in the local component. The creation of such surrogates is necessary in order to refer to remote objects using local database system tools without modification [15]. Since these surrogates are created locally, the local sys tem is able to interpret and m anipulate remote objects as usual, for example, when using them as arguments in local function calls. However, when retrieving the actual state of a remote object, the use of surrogates alone is not sufficient. Our approach exploits the extensible nature of our object-based database model by rewriting the 53 Legend Supertype M em bership » Instance Rem ote / / / // ‘ S to red F unction Remote F unction 'F'laoes-to-Stay') Nam eO : String I AddressQ : String "Hame(): String Price(): Integer Fax-N um ber(): String JLocationQ: String [OwnerQ: String Private .Accommodations Name () -.String JPri ce ( ) : Tn t eger Fax-Number () :String Location () : S trin g Bed & Breakfast < Z Z 2 > / | \ Travel Agency L > * s Conceptual Schema Travel Agency A*s Conceptual Schema -with Remote Objects Figure 8.2: Stored functions in two component databases. functions encapsulating surrogate objects as computed functions. These higher-level computed functions serve as place holders, and retrieve the results of applying the function on the remote component where the object is actually stored. Given the above observations, it is now possible to consider each of the behavior sharing situations summarized in Fig. 8.1. We first focus on stored functions, and then turn our attention to computed functions. 8.2.1 S to red F u n ctio n s As a framework for analysis, consider again the familiar example of FOTA: Travel agency A and travel agency D contain related hotel information. Fig. 8.2 specifies parts of the m eta-data (conceptual schemas) for these two example components3. Let travel agency A denote the local component and travel agency D denote the rem ote component. The four situations for the sharing of stored functions among components can be analyzed as follows: 3In order to keep the signature of each function as short as possible, we have om itted the input argument whenever it is obvious; the result type, on the other hand, is always explicitly shown. 54 1. Local function - Local object This is what we term the base case. Both objects, the stored function T and its argument a, reside in the local component and can be executed as usual without any additional mechanisms. 2. Local function - Remote object In this case, T is a local stored function that is applied to rem ote argument a. For example, component A’s Address{) function can be applied to the P rivate A ccom m od ation s that have been imported from component D. As previously mentioned, surrogates are created in component A ’s database for each remote object. These surrogates are collected in a newly created subtype of B ed Sz B reakfast called P rivate A ccom m od ation s. All local functions, for example the functions defined on P laces-to-S tay and B ed Sz B reakfast in Fig. 8.2, operate normally on the surrogates (i.e., the instances of P rivate A ccom m od ation s). In the case of local functions that do not have counterparts in the remote component (e.g., AddressQ), new function values can be created locally for each imported instance. As a result, each imported object also has local state in component A’s database and can be altered by A without affecting the original object in component D ’s database. The four functions (Nam eQ , PriceQ, Fax — NumberQ and LocationQ) that are shown next to component A’s P rivate A ccom m od ation s in Fig. 8.2 are actually remote functions defined in component D ’s schema; these have been included in A’s database for completeness. Exact details of how these functions are created and invoked are presented in Sec. 9.2. Provided that component A has created new values for AddressQ for each of the im ported P rivate A ccom m od ation s, s/he might pose the following OSQL query in this context which returns the name and address of each private accommodation: select Name(p), Address(p) for each PRIVATE ACCOMMODATIONS p In this query, the variable p is defined on all instances of type P rivate A c com m od ation s. 55 3. Remote function - Local object This situation is somewhat meaningless, since stored functions only have a meaning in the local context of the component in which they were initially created. For example, suppose that component A, who does not have the Fax — N um ber() function defined in his/her schema, wants to see the fax numbers for all his/her P laces-to-S tay. Component A would first need to create a new function and then populate it with the appropriate values, rather than being able to use component D ’s Fax — NumberQ function. However, when looking at this situation from component D ’s point of view, one can argue that this case is merely the mirror image of the second case (Local function - Remote object). Rather than executing component D ’s function remotely in component A ’s database, one can integrate component A’s P laces- to -S ta y into component D ’s schema (create a new subtype, P laces-to-S tay, of A ccom m od ation s and populate it with component A’s P laces-to-S tay) and execute the Fax — NumberQ function locally. 4. Remote function - Remote object This case serves as the basis for instance level sharing. All remote objects of interest to a component, say A, must already be imported into the local database using surrogates. Each time a remote object is referenced, the surro gate “points” to the remote object and the desired functional value is fetched using a rem ote procedure call to the other component (see Sec. 9.2). In this situation (see Fig. 8.2), component A might pose the following OSQL query against his/her schema: select Fax-Number(p) for each PRIVATE ACCOMMODATIONS p As before, p is a variable defined on type P rivate A ccom m od ation s. In order to find the Fax — NumberQ of all remote P rivate A ccom m od ation s, surrogates for each of these accommodation instances are used to locate the remote accommodations. For each remote accommodation, the fax number is 56 Legend Supertype M em bership - - - - - - -►- Instance Remote / y ^ / / Stored Function ■ C o m p u te d F u n c t i o n Remote F unction n^Jame(): String I Price(): Integer Fax-Num berO : String I LocationQ : String Accommodations Private Accommodations / i \ O o C D S e n e tM c \ w/jjr f t l/ k - d u m b e r ! * * - ttn n / i r Travel Agency J O * s Conceptual Schema 'Places-to-Stay^ |N a m e (): String ^ L A ddiess (): String \ [OwnerQ: String Bed & Breakfast / N a m e ( ) : S t r i n g F x r l c e ( ) : I n t e g e r F a o c - N u m b e xr ( ) : S t r i n g L o c a t i o n ( ) : S t r i n g / l \ ^ < Z Z 2 > M tn t i * u lth e \s ) - * Travel Agency A ’ s Conceptual Schema with Remote Objects Figure 8.3: Stored and computed functions in two component databases. determined using the remote Fax — NumberQ function and the result of this function is returned to the local database. 8 .2 .2 C o m p u ted F u n ctio n s In order to consider sharing for computed functions, consider again the example of two collaborating travel agents in FOTAs. Fig. 8.3 shows the two example com ponent databases as before, but with additional (computed) functions: SendM es- sage() in component .D’s schema takes a fax number as input param eter and then opens a text editor which lets the user create a message text. At the end of the editing session, the composed text is faxed to the specified number. M ail() in com ponent A’s schema uses an address (throughout this chapter we assume th at address refers to an e-mail address) and sends a precomposed text to that e-mail address. The four situations for the sharing of computed functions among components can be analyzed as follows: 1. Local function - Local object As in the case of stored functions, this is the base case. Computed function T 57 as well as its argument a resides in the local component and the execution is local (e.g., in component D\ Send M essage(F ax-N u m b er(a))). 2. Local function - Remote object This situation can be reduced to the base case described in case 1. For example, if component A wants to mail a note to one of component D ’s hotels, s/he will run M ail on the AddressQ of the surrogate for that hotel, say inst (e.g., M ail (A ddress (inst))). 3. Remote function - Local object This is the reverse of the previous case: the function executes remotely and the input argument is supplied from the local database. For example, compo nent A may desire to send a fax message but does not have a fax program in his/her own local database4. In this case, s/he will invoke component D ’s SendMessageQ function remotely through a previously created handle in his/her own database and supply it with a local argument. In effect, the re m ote database is providing a non-local “service” . Intuitively, from component A ’s perspective, this is what “sharing of behavior” corresponds to. 4. Remote function - Remote object This situation is similar to the first case (Local function - Local object) in that both the state of the object and execution of the function are in the same component. For example, component A sends a fax message to one of D ’s P rivate A ccom m od ation s using D ’s original SendM essage() function. Compared to the first case (Local function - Local object) where no sharing takes place and execution occurs locally, in this case all the processing is done on the remote site. In order to invoke a remote computed function using remote arguments from within the local component, surrogates for the remote objects must be created locally. These surrogates enable the local component to access the actual state of desired objects which reside with the remote component. This procedure is similar to instance level sharing (see Sec. 9.2.2.1) where surrogates for shared instances are created locally in order to provide access to the actual state of each instance in the remote component. 4For sim plicity, we assume that .A’s hardware include a fax modem. 58 In the examples above, the functions being shared have returned a literal type (e.g., the Address() function returns a String). However, functions with signatures involving abstract (user-defined) types can also be shared. In this case, both the input and output argument types must be defined locally; if they are not, their m eta-data m ust be imported beforehand. The location of the result argument is determined by the location where the function executes. 8.3 Observations on the Practical Use of Behavior Sharing W ith the above analysis and framework in place, it is now possible to make some observations on the practical utility of the behavior sharing capabilities supported by our mechanism. In the above analysis of behavior sharing, we stressed the sepa ration of the location where the function executes from the location where the data resides. However, from a user’s perspective, this separation of function execution and argument location is completely transparent. Our analysis of eight different sharing patterns can be reduced to two “most interesting” cases: (1) executing an imported function on a local argument, and (2) executing a local function on an imported (shared) argument. The first case involves the reuse of a previously defined function in a different environment. This “reuse of function” is a principal reason why components would want to share behavior. The second case can be described as extending the “characteristics” of a remote object while at the same time respecting the autonomy of the originating site. T hat is, the im porter can customize the remote object according to his local conceptual schema. These local attributes are managed entirely by the local component avoiding any unnecessary modification to the originating (remote) component. In both cases above, the user is not aware of the environment in which a shared function executes, and need not worry about where the state of an im ported object actually resides. Instead, components are able to freely browse the m eta-data and 59 behavior th at have been made available (i.e., exported) by others in the federation in order to select the services that they would like to share5 (i.e., im port). 5 Our discussion has in a sense assumed that there are no access restrictions in place that would further com plicate the sharing process. An investigation of behavior-based authorization in object- based databases is the subject of a related research project at USC. 60 Chapter 9 Experimental Prototype Implementations We have implemented two separate prototypes in order to verify the underlying the ories of this research. The first prototype is intended to prove the feasibility of our mechanism to resolve representational discrepancies among related objects in differ ent federation components. Specifically, we show the use of ontologies and relation ship descriptors as a means of adding semantic information to conceptual database schemas. By implementing this prototype, we show that this additional seman tic information is sufficient to resolve semantic heterogeneity in a loosely coupled federation environment such as FOTA. The details of this prototype are described immediately below. In our second prototype we focus on unifying remote objects with existing schemas. Besides the sharing type objects, we paid particular atten tion to our implementation of instance level sharing as well as the implementation of the behavioral sharing patterns described in Chapter 8. 61 9.1 Semantic Heterogeneity Resolution in the SHELTER Testbed We have investigated the resolution of semantic heterogeneity in the SHELTER knowledge base environment [56]. In essence, SHELTER offers an environment for collaborative, team development of knowledge based systems. Knowledge bases cre ated with SHELTER are translated into LOOM [48] objects. LOOM is a knowledge representation system that supports a description language for modeling objects and relationships. Furthermore, its production-based and classification-based inference capabilities support deductive reasoning. LOOM knowledge bases can be perm a nently stored using the ITASCA storage manager [36]. By using the SHELTER testbed, we were able to make use of LOOM’s knowledge representation and in ference mechanism without having to implement our own from scratch. For these reasons, SHELTER became the ideal test vehicle for our approach to resolving ob ject relationships. Note that since we are dealing with knowledge bases instead of databases, we are using the term concept in place of object for the rem ainder of this section. The first step in building this prototype was to create several knowledge bases that resembled components of FOTA. In order to correctly simulate our resolution mechanism, we also implemented a semantic dictionary (in a separate knowledge base) and local lexica (part of the component knowledge bases). Recall th at the semantic dictionary contains, among other things, the ontology for the application domain, in our case the collaborating travel agencies. The ontology is used to de scribe unknown term s which appear in the export schema of each component. The semantic dictionary is the first knowledge base that is created and defined in SHEL TER. In this way, components can use the already defined ontology concepts in their local lexica. Once the semantic dictionary is defined, the order in which the other knowledge bases are created and loaded into SHELTER has no effect on the resolution mechanism. Fig 9.1 depicts a sample session with SHELTER during the creation of the travel agency knowledge bases. Refer to Appendix B for a complete listing of the LOOM script files that are used to create three FOTA components and the semantic dictionary as LOOM knowledge bases. 62 igure 9.1: The SHELTER development environment. 'V S h e Her C o n so le C o n te x t (K B ): BOTTOM-KB J F in d e r s | E d i t o r s j G rap h er* ] N o t e s | LOOK f e a t u r e s } S a v e KB j C o n te x t (K B ): SEIIANTIC-BICTIONRJRY ! I — .r” _ ISER:B0TT0M-KB> (1oomtdefproduct1on p twhen <:and <:detects (is_kind_of ?x ?y>) (is_equal_to ?z ?y>) :do ((format t Do It for “a ~a" ?z ?y) (loom;tell (is_kind_of ?x ?z)>)) I PRODUCTIOH IP USER:BOTTOM-KE> (loomjdefproduction p- :when (jdetects (is_kind_of ?x ?y>) :do ((format t ' ‘ "ZSuccess"))) I PRODUCTION IP- USER:BOTTOM-KB> (loomttellm) 26 USER;BOTTOM-KB> (loomtask (is_kind_of bedS-breakfast lodgings)) IJ^hPjBCTTrM-KB> <loon;ask (is_kind_of bed&breakfast hotel)) T U ' .E P: Bf H CM-KB> (loomtask (is_equal_to lodgings hotel)) T USER:BOTTOM-KB>0 ‘ CAR -T RA V E L: CATEGORY ‘ TRAVEL: OtlltPACT “TRAVEL : COMPARTMENT '■TRAVEL: HOTEL -TRAVEL : INSURANCE -TRA V EL: ONE-STAR ‘ TRAVEL : RENTAL . CAR R e la tio n G r a p h e r (SEM A N TIC-D ICTIO N A R V ) * v ‘ TRAVEL :UEBAIJREAKEAST ‘ TRAVEL:COST ‘ TOAVTT. rRATE ‘ IRAVEI :liW> ‘ TRAVEI :RES •TRAVEL : RESORT G raph V iew C o n t e x t (KB) : SEMftNTIC-DTCTIOIIAP.Y 1 V ie w N ode A »g L im it p p s L ftte x t; <K B): TRAVEL-aeEWCY-KB-2 "j" O ) 0 3 t|FT ‘ TRAVEL:ASSOC .W IT H ‘ TRAVEL: HAS ‘ TRAVEI. : IS _C nL L E C T IO N „O F ‘ T R A V tl. - . I S_CUKMDN _ AMONG ‘ TRAVEL: IS_GUMI?AT.CHLE_WITH •TR A V E L: IS_EQ TIA L_T0 C "- ‘ TRAVEL : IS JFE A T U R E _U F ‘ TRAVEL; TS_UIENTICAL._TO V \N -TRAVEL : IS _ rN S T A N C E _ O r ‘ TRAVEL : IS _K IN D _O F ‘ TRAVEL ; I S REQUAL _ T 0 ‘ TRAVEL :RTXPE_OT B r a n d l i n g L i m i t | ‘ TRAVEL : Cl. I ENTS ‘ TRAVEL: DATE ‘ TRAVEL : IIATE ‘ TRAVEI : TTXNEKAH / S_TO_DO<- * TRAVEL: A IR L IN E S •TRAVEL iLDDGINGS -T RA V E L: RENTAL .CABS '( W L n U tK m C III1.NU I £ t docoTnetttftd The second part of this prototype consists of the implementation of so-called res olution rules that LOOM’s inference engine can use in order to disambiguate related objects in different knowledge bases. Resolution rules are formulated to capture ob ject relationships which are described by relationship descriptors. Resolution rules are of the form: If {list of facts connected by “and”} then {newly inferred fact} Based on the set of relationship descriptors introduced earlier a sample resolution rule using LOOM syntax is as follows: (defproduction p :when (:and (:detects (is-kind-of ?x ?y)) (is-equal-to ?z ?y)) :do ((format t "~%Do it for ~A ~A" ?x ?z) (tell (is-kind-of ?x ?z)))) (defproduction p- :when (:detects (is-kind-of ?x ?y)) :do ((format t "~%Success"))) (tellm) In plain English, this resolution rule can be described as follows: If X K in d O f Y and if Z E q u a l Y th en X K in d O f Z In this rule, Y is a concept from the ontology, and X and Z are concepts whose relationship is being investigated. Using this resolution rule for example, we could “inquire” about the relationship between two concepts from different schemas, for example B ed& B reakfast from component A and Lodgings from component E: > (ask (is-kind-of Bed&Breakfast Lodgings)) Based on the knowledge already defined in A ’s and E ’s knowledge bases, namely the fact that B ed& B reakfast is K in d O f a H otel (A’s local lexicon) and the fact that L odgings is E q u a l to H otel (ITs local lexicon, see also App. B), LOOM’s inference engine will return: T We have implemented resolution rules that cover most of the relationships th at can occur in object-based data models. For example, we have transitive rules such as 64 If X R e l Y and if Z E q u a l Y th en X R e l Z (1) where X , Y , and. Z are as above and R e l is a relationship descriptor in the set: { I d e n t ic a l, E q u a l, K in d O f, A s so c , C o lle c t io n O f , I n s ta n c e O f} . Further more, we have rules comparing characteristic properties of concepts. The general structure of so-called property rules is: If X R e l Y and if Z R e l Y th en X Assoc Z (2) Again, X , F , and Z are as above and REL stands for either FEATURE or Com m on. All in all, we estimate the total number of resolution rules to be around 50; so far, in this prototype we have implemented 15 of these rules, enough to demonstrate the correctness of our resolution mechanism. For each pair of related objects, there may be more than one resolution rule th at is applicable, resulting in more than one possible relationship. In order to arrive at a solution that describes the relationship in the most accurate way, we have assigned a value to each relationship descriptor indicating its expressive power. In this prototype, IDENTICAL has the highest expressive value, followed by EQUAL, K in d O f, C o m p a tib le, and Assoc, which has the lowest. C o l l e c t i o n O f and I n s ta n c e O f are on the same level as K in d O f. COMMON, F e a tu r e , and H as are not rated since they will not be used to describe results of resolution rules. Going back to our example of involving A ’s B ed& B reakfast and E ’s L odg ings, we end up with two relationship descriptors describing their relationship, namely K i n d O f and A SSO C . The positive association is due to the fact that both B ed& B reakfast and Lodgings are of the “One-Star” category (rule (2) above). Since K i n d O f has a higher descriptive value than A SSO C , it will be used as the final result of the resolution process. In those cases where no exact relationship between two concepts can be established (except for Assoc), the resolution process depends on the input from the user. So, for example, whenever the resolution rules return positive associations (A S S O C ) the user is asked to intervene and decide on the final outcome. Note, the more positive associations there are, the more likely it is that the objects are at least compatible or even equal. We have defined a lisp function relationship ? which applies the algorithm de scribed above to a pair of concepts which are under investigation. Thus, rather than manually applying every resolution rule that could apply, we can simply ask 65 > (ask (relationship? Bed&Breakfast Accommodation)) and LOOM will return ( ’ Is-Kind-Of) Based on the assumption that our set of relationship descriptors is capable of de scribing most of the relationships that can be modeled in CODM like data models, the above resolution rules proved sufficient for resolving all the semantic hetero geneities occurring in FOTA. After completing this prototype, we are satisfied with the performance of our mechanism and feel comfortable th at more complicated re lationships can be resolved by extending the set of relationship descriptors. For example, the current set lacks the ability to describe exactly how two objects are related if they are not equal, identical, or subtype/supertype of one another. In this case, A SSO C is the only other alternative and simply indicates that there is some kind of positive relationship between these two objects. Further prototype imple mentations will test the use of additional relationship descriptors and corresponding resolution rules, moving us even closer ahead to a design that can be implemented and used in real-world applications. 9.2 Schema Unification in the Remote-Exchange Testbed An experimental implementation of our sharing and transmission mechanism has been designed and built using our Remote-Exchange testbed consisting of a federa tion of Omega database components1. 9.2.1 T h e P r o to ty p e In the current Remote-Exchange testbed, we have implemented the seamless (trans parent) unification of objects from remote databases with local schemas. Each fed eration component consists of an Omega DBMS for maintaining local information 1The current Remote-Exchange environment also supports IRIS database components to some degree, and we plan to incorporate other database system s in the future. 66 and an importer/ exporter for handling communication requests to and from remote components. Communication among individual Omega components is supported using the RPC message passing paradigm. In what follows, we describe the essential aspects of this testbed, and examine some of the critical implementation issues we faced in our experiments. As noted, our unification mechanism requires the ability to extract m eta-data in formation from a database component. Since the original functional database model underlying the Omega DBMS does not support this functionality2, we implemented our remote sharing language RSL as part of each exporter/im porter3. In effect, RSL together with the exporter/im porter serves as our inter-component communication protocol. Sharing in Remote-Exchange is made possible through the implementation of RSL commands in form of an X ll-based graphical user interface. This user interface called discovery and unification tool hides the syntax of the pure RSL commands from the user and provides a more flexible and intuitive basis for communicating with component schemas. At this point, automatic conflict resolution has not been implemented in our Remote-Exchange prototype yet, and the user must perform this task manually before unification can occur. Fig. 9.2 shows a sample session with the unification tool. In this session, travel-agency A is the local component and we see parts of its schema displayed in the large window titled ***| Omega DBM S |***. Travel-agency A has opened a connection to another rem ote component travel- agency D on host aludra. Information about this remote component is displayed in the smaller window to the right, titled ***| Discovery Tool |***. At any point, A can select among the operations displayed in the bottom row of each window. In fact these choices correspond roughly to the RSL commands th at are implemented in our prototype. At this point, A is browsing m eta-data in D ’s export schema. Since A has not selected a type in D ’s schema, neither the Show Values nor the Im p ort... buttons are activated. Locally in A ’s schema, the P la ces-to -S ta y type 2Besides the lack of an RSL equivalent, there is a close resemblance between O m ega’s functional data model and CODM. 3Another alternative would have been to implement this functionality inside each Omega DBMS; however, it would require modification of the kernel of the database system which did not seem ac ceptable. Furthermore, additional modifications would be necessary every tim e the RSL is extended with new functionality. 67 is selected and its functions are displayed in the Functions On Type window. If A selects one of the three functions (i.e., N a m e , A d d r e s s , or O w n e r ) the S h o w V a lu e s button on the bottom of the large window becomes active and allows the user to display the value of the selected function for all instance objects belonging to the underlying type. The ability of our graphical interface to allow the selection of types and functions on the screen eliminates the need for set operations involving OID values th at would otherwise be necessary in a character based interface (e.g., store the set of OID’s returned by H a s l n s t a n c e s (), select each OID individually from the set, call H a s V a l u e Q ). In the next sections, we describe our approach to implementing the sharing of instance and function objects in the Remote-Exchange testbed, i.e., the R S L com mands I m p o r t I n s t a n c e ( ) and I m p o r t M e t h o d Q . 9 .2 .2 S h arin g and T ran sm ission Conceptually, in instance level sharing, a remote instance object is imported directly into a local type. This remote instance behaves in the same m anner as a local instance object from the user’s perspective. However, the actual state of the remote instance exists in the remote component database; retrieval of any state of the remote object is done by accessing the remote database transparently. Hence, access to remote instance objects corresponds to the Remote function - Remote object situations described above4. The im portation of a function object can be seen as follows. Intuitively, when an instance object is imported, only data is being shared. On the other hand, importing a function object gives the importer access to services not provided by his/her local system. This corresponds to the Remote function - Local object situations described in Chapter 8, 9.2.2.1 In stance Level Sharing Im plem en tation Our mechanism for importing instance level objects follows three steps: 1. Create local surrogates for remote objects. 4Note that this does not depend upon whether the function is stored or computed. 68 Figure 9.2: The Remote-Exchange sharing interface. 0 3 co m F ile D atabase T r a tv e l-A g e n c y -jf Onega Osql In te r fa c e A v a ila b le Types P e r so n P ia c e s -to -S ta H o tels R esorts-S p as B ed-B reakfast Rental-Car-Com panies T rains A i r l i n e s F unctions O n Type DS Name -> STRING DS Address -> STRING DS Owner -> Person I O m ega DBMS |* Help ***l D isco v ery TnuI I F il e Browser Host Database aludra' Travel-A gency-lJ A v a ila b le Types Accommodations H o tels Private-Accammodatioons A ir lin e s P u b lic-T ran sp ortatian Entertainm ent F unctions O n Type S h o w V a l u e s Update Types Import... Update TTypes | j Show" Values' Show Schema j R e g iste r Episcorvery"] j QoLit j Entity S how _Inst() Sh ow _S u b () N am e() : : O ID] : O ID Stringj R _O ID (): String H ost_N am eO : String D b_N am e(): String Surrogate^) Types |N a m e(): String IPriceO: String Fax-N um ber(): String Functions User- Types pNameO: String | A ddressQ : String Places-to-Stay) String OwnerO: String! »i i T d t n ' "1 " ■ * « . —L I N R-Private Accommodations^ L e g e n d Supertype Surrogate R em ote Bed & Breakfast Private Accommodations X o 0 Figure 9.3: Sharing instance objects. 2. Create computed functions for retrieving data from remote components. 3. Overwrite functions defined on surrogates to use (or refer to) the newly created computed functions in step 2. The local surrogate serves as a local handle for accessing a remote instance. By using the surrogate, differences between remote representations of objects, e.g., object identifiers (OIDs) can be masked out (made transparent). Since the state of the remote object exists externally, computed functions for accessing that state m ust be created. These computed functions use the Remote Procedure Call (RPC) paradigm [7] for accessing the remote component. Finally, in order to have the surrogates use the remote functions, any existing functions defined on the surrogate m ust be overridden to use the RPC defined functions. This is implemented by dynamically binding functions to objects. In term s of the collaborating travel agents, we can envision a scenario where travel agent A wants to import instances of another remote type P rivate A ccom m od ation s into his/her local schema using the above approach. This situation 70 may occur after A has already imported the m eta-data, in this case P rivate A c com m od ation s, and is interested in remote instances belonging to this type. This situation is depicted in Fig. 9.3. Surrogates are created as instances of both a local type (i.e., component A ’s P rivate A ccom m od ation s) and a remote type called R -P rivate A ccom m od ation s in Fig. 9.3. The purpose of the new surrogate type is to override the local functions that surrogates inherit from the local type to which they belong (e.g., A’s AddressQ function defined on P laces-to-S tay). By addi tionally creating the surrogate as a member of this surrogate type, the functions that the surrogate instance originally inherited are overridden. The two thin dotted arrows from R -P rivate A ccom m od ation s (“R” for remote) and P rivate A c com m od ation s to the surrogate instance serve to indicate that surrogate instances (i.e., rem ote instances) of P rivate A ccom m od ation s are created as members of both P rivate A ccom m od ation s and R -P rivate A ccom m od ation s. Thus these (remote) instances inherit functions multiply from both R -P rivate A ccom m od a tion s and P rivate A ccom m odations; any duplicately named function from the two types is overridden by the function defined for R -P rivate A ccom m od ation s. The functions defined on R -P rivate A ccom m od ation s are computed functions, which make RPC requests to the remote component database and retrieve the values of functions on remote instances. 9.2.2.2 Function Level Sharing Im p lem en tation As in instance level sharing, meta-information containing the location (e.g., remote OID and remote component name) of the remote function object being imported must be stored locally. However, by contrast with the case for instance level sharing, this m eta-information is associated directly with the function being imported. In instance level sharing, the meta-information is indirectly kept for remote functions such as NameQ and Location() via the surrogate instance object (see Fig. 9.3). Thus, in our implementation we distinguish between two kinds of “rem ote” functions: those implicitly defined through instance level im portation and those directly imported through function level importation. Fig. 9.4 shows our mechanism for incorporating meta-inform ation for function level importation. We exploit the fact that m eta information is also represented using our functional object-based model. Im ported 71 Sh ow JtastO : O ID] Sh ow _S u b () : O ID N a m e() : String | Functions E n tity R _O ID (): String H ost_N am e(): String D b_N am e(): String Surrogated T y p e s |N am e(): String Price(): String Fax-Num ber(): String [LocationQ: String User- Types fN a m e(): String A ddress(): String I Sen d M essa g e () Places-to-Stay) R-Private Accommodations OwnerQ: String L egend Supertype Surrogate R em ote 6 2 > \ S en d M essa g e( ) V Bed & Breakfast Private Accommodations Figure 9.4: Sharing function objects. functions are created as instances of the type R em ote-F u n ction s and can thus store and access the additional location meta-information required to execute the im ported function. Fig. 9.4 depicts a slightly different sharing pattern from Fig. 9.3. In this sce nario, the SendMessageQ function5 is imported from some remote component (i.e., component D). Note that both the remote and local instances of P rivate A c com m od ation s use the same imported SendMessageQ function for retrieving fax numbers. This is evident in Fig. 9.4 by the absence of the SendMessageQ function from the type R -P rivate A ccom m od ation s and the addition of a new (italicized) SendMessageQ function defined on P laces-to-S tay6. In order to explain how the SendMessageQ function works, we must first explain how our implementation addresses the issue of side-effects. By side-effect we really m ean two things: (1) any kind of implicit input other than the input argument th at is necessary to compute the result, and (2) any modifications to the state of 5Assume for the remainder of this example that component A does not have an equivalent function such as the M ailQ function described earlier. 6T he italicized font is used to indicate that the SendMessageQ function is imported. 72 functions whose arguments are literals, this simply requires that the function be ing im ported computes its result value solely based on its input argument without modifying any database state. Functions whose input argument is non-literal pose additional difficulties. In this case, the input argument is the OID of an instance. The problem then lies in determining what information a computed function accesses in order to compute its results. Strictly applying our definition of side-effects would restrict computed functions on non-literals to solely accessing and then m anipulating the input OID. But realistically, a computed function must be able to access some state of the instance corresponding to the OID when computing its result. In our implementation, we take the position that the only state that a computed function can access are those functions th at serve to encapsulate that object. In other words, the only state the computed function will possibly access are those functions th at are defined on the types of which the instance is a member. In the case of the SendMes- sage() computed function, these functions are NameQ, PriceQ, Fax-Number(). and Location (). Having determined what information a computed function on a non-literal type can access, a problem arises when trying to execute such a function remotely on a local object. The problem occurs when supplying local arguments to a remote computed function. Although we know the computed function is limited to only accessing the functions that encapsulate the instance, we do not know exactly which ones it does need. Even if we pass all the possible values the computed function can access, the computed function must be w ritten in such a way as to retrieve these arguments from the remote component and not the local one. This would be undesirable and contrary to our goal of relieving the computed function writer of needing to know where the data on which it operates is located. The best approach to this problem in our autonomous environment is to allow computed function writers to define functions without concern as to whether the function is to be exported. Thus, in our implementation, whenever a remotely executing computed function needs state from the local component, it performs a callback to the local component to retrieve that state. For the instance level sharing described in Sec. 9.2.2.1, the local component simply runs as a client and makes RPC requests to the exporter running as a server. However, for function level sharing, when the callback mechanism is used, the local component must in addition run 73 as a server to accept the callback requests. Computed functions can be written using any programming language that can be compiled to re-entrant object code. This object code is then dynamically linked into the database management system kernel when the computed function is accessed. In our prototype using the Omega database management system, a computed function accesses the local component through Omega-evalQ, which has two parameters: an argument and the function that is to be applied to the argument. We can now consider how the callback mechanism works transparently and how it allows a computed function to be written uniformly without regard as to whether th at function is to be exported. Consider again the example using the SendMes- sage() computed function. Suppose that the imported SendMessage() function re trieves the Fax-Number() of an instance and faxes a precomposed message to the retrieved number. When the user of the local component invokes the SendMessage() function on a local object, the SendMessageQ function is passed the local OID of a P la c e s-to -S ta y instance. The SendMessageQ on the remote server makes a call to Omega-evalQ to retrieve the Fax-NumberQ, Omega-evalQ recognizes that the OID passed in as its argument is not local and performs a callback to the server of the local component that invoked the SendMessageQ function. Since the local server recognizes the OID as a local OID, it performs the request and passes back the Fax-NumberQ to the remote server which can than complete its task and display an acknowledgment on the local database monitor. 74 Chapter 10 Conclusion We have presented an approach and mechanism for resolving semantic heterogeneity in the context of a federation of autonomous database system components. While our mechanism is general enough to apply to schemas in most data models, we claim successful resolution only within the context of object-based m ultidatabase systems. For this mechanism to operate effectively, each participating component m ust agree to meet two principal conditions: First, the CODM data model must be supported at the federation interface, including the RSL functionality, as presented in sections 5.1 and 6.2.1. Second, a local lexicon must be provided, wherein a component describes the meaning of the (type) objects it is willing to share with other components in federation (see Sec. 6.2.2). Sharable objects must be described using the conceptual relationship descriptors supported by our mechanism. 10.1 Summary The approach to semantic heterogeneity presented in this dissertation is couched in the framework of Remote-Exchange, a research project currently underway at USC [24]. The Remote-Exchange architecture provides three “services” to its federation components: an intelligent sharing advisor, semantic heterogeneity resolution, and unification [23]. When a new component initially joins the federation it must first 75 register with the sharing advisor. The sharing advisor enters the data that the component is willing to share into the semantic dictionary so that it can be used by other components in the federation. Assuming that a component has met the two basic conditions above, it is able to participate in the exchange of information. Sharing takes place on a component-pairwise basis. The importing component se lects relevant foreign objects and invokes the sharing advisor to resolve semantic heterogeneities that might exist between the foreign objects and local objects in its own schema [52], Given a foreign object, a related local object and the relationship between the two, the unification tool places the foreign object into the appropriate place in the local schema. At this point, the unification is complete and the newly im ported object can be used by the remote component. We note th at at present it is not possible to completely autom ate the process of resolving semantic heterogeneity. In consequence, humans will likely be required to assist. In order to facilitate the establishment of new federations, we employ so-called ontologies, which contain an initial set of terms that can be used to describe unknown concepts in the local lexicon of each component. A complete ontology package describes general as well as specific information from the application domain, and will evolve and grow with tim e to accommodate additional, more complex concepts within a given federation. Such packages can be provided for particular domains, e.g., the travel industry or genetic information. In this dissertation, we have also presented an approach and mechanism to sup port the sharing of behavior among the component database systems in a federa tion. We considered the various situations for supporting the sharing of behavior. In our approach, we have considered the importance of decoupling the location of (persistent) data and the location of the execution of methods that operate on it. Traditional approaches inextricably link the location of the data and the execution of the operation. Other research efforts specifically address the issues in this area [9]. However, our approach allows the local component to create local (e.g., stored) functions on remote objects. This has the overall benefit of allowing the local com ponent to create local state for remote objects th at completely conform to the local system ’s mechanism for updates. Experimental prototypes of our resolution and sharing mechanisms have been developed. We have simulated a federation similar to FOTA using the SHELTER 76 development environment in order to test our resolution mechanism. Furthermore, we have built upon the Remote-Exchange prototype to demonstrate, evaluate, and refine our unification mechanism. The Remote-Exchange prototype is based upon a testbed consisting of Omega DBMS components. Omega was chosen for two reasons. First, by using an exiting DBMS we were able to focus our attention on implementing the resolution and sharing mechanism. Second, the Omega database model contains most of the modeling constructs specified in CODM. The focus of our experimental prototype implementation has been mainly on instance- and function-level sharing. We have found these sharing patterns to be very natural and easy to use in our environment. In light of these achievements, we are currently further investigating type-level sharing (involving complex abstract types) and are proceeding to quantify the performance efficiency of our mechanism. 10.2 Results The result of this research have both direct and practical im pact on information sharing among heterogeneous databases, specifically in the following areas: • Framework: We have established a framework for accommodating semantic heterogeneity in interoperable object-based database systems. We specifically use an functional object-based data model (extended with a sharing language called RSL) for describing the sharable data as well as their relationships to the real-world concepts they represent. • Architecture: We have introduced an architecture and experimental system for resolving semantic heterogeneity. Our system is based on the interaction between RSL that provide structural information, local lexica that provide semantic information, and the semantic dictionary that provides partial or incomplete knowledge about the relationships between the concepts in the local lexica. Relationships are described using a set of fundam ental descriptors, which is extensible. • Existing Components and Autonomy: Throughout this research we have paid careful attention to limiting required modifications to existing DBMS software 77 and conceptual schemas. As a result, our approach requires no modification to the query processor or any other aspect of the local architecture. Other than the basic requirements of a CODM-like interface and support of a local lexicon, each component retains autonomy over its database. Furthermore, through the export schema it can specify at any given tim e which objects are sharable and which objects should remain private. Our approach to resolving semantic heterogeneity can be utilized by a large variety of intelligent and cooperative information systems, such as Services and Information Management (SIMS) [2], G TE’s Distributed Object Management project [50], and OMG’s Object Request Broker [57] to name a few. 10.3 Future Work We are planning to extend our current discovery and unification tool to include the semantic heterogeneity resolution mechanism, thus getting closer to our final goal of creating an intelligent sharing advisor to oversee all of the operations in the federa tion. Specifically, we plan on extending our set of relationship descriptors to include some variations to the relatively coarse relationship types currently described. P art of the sharing advisor’s functionality will be an inference mechanism much like the one provided by LOOM. In addition to resolving object relationships, the advisor will m aintain the semantic dictionary (i.e., the ontology package and the concept hierarchy), two activities that have not been implemented in our prototypes here. Another area of research that has not been covered in this dissertation involves local updates of shared objects. In our approach, all im ported objects are copies of their rem ote counterparts and no physical connection between the copy(ies) and the original(s) exist. Although this approach is useful in the sense that it protects the owner of shared information from accidental or unwanted updates to his/her data (e.g., updates to customer reservation records), we can envision scenarios where the propagation of updates is useful (e.g., in FOTA, travel agencies might want to be informed whenever hotel prices have changed). 78 Finally, we would like to investigate access control mechanisms that go beyond the protection provided by the export schemas (in our current approach, each fed eration component shares his/her complete export schema with every other compo nent). 79 Reference List [1] H. Afsarmanesh and D. McLeod. The 3DIS: An Extensible, Object-Oriented Information Management Environment. A C M Transactions on Office Inform a tion System s, 7:339-377, October 1989. [2] Y. Arens, C. Y. Chee, C. Hsu, and C. Knoblock. Retrieving and Integrat ing Data from Multiple Information Sources. Technical Report ISI-RR-93-308, USC/Inform ation Sciences Institute, March 1993. [3] M. Atkinson, et al. The Object-Oriented Database System Manifesto. In Pro ceedings of the 1st Intl. Conf. on Deductive and Object-Oriented Databases. Kyoto, Japan, December 1989. [4] C. Batini and M. Lenzerini. A Methodology for Data Schema Integration in the Entity Relationship Model. IEEE Transactions on Software Engineering, 10(6):650-664, 1984. [5] C. Batini, M. Lenzerini, and S. Navathe. A Comparative Analysis of Methodolo gies of Database Schema Integration. A C M Computing Surveys, 18(4):323-364, 1986. [6] E. Bertino, G. Pelagatti, and L. Sbattella. An Object-Oriented Approach to the Interconnection of Heterogenous Databases. In Proceedings of the Workshop on Heterogenous Databases. NSF, December 1989. [7] A. Birrell and B. Nelson. Implementing Remote Procedure Calls. A C M Trans actions on Computer Systems, 2(l):39-59, February 1984. [8] D. G. Bobrow and T. Winograd. An Overview of KRL, a Knowledge Repre sentation Language. Cognitive Science, 1(1):10— 29, 1977. [9] Y. Breibart, A. Silberschatz, and G. Thompson. Reliable Transaction Manage m ent in a M ultidatabase System. In Proceedings of the AC M SIGM OD Inter national Conference on Management o f Data, pages 215-224. ACM SIGMOD, May 1990. 10] S. Ceri and G. Pelagatti. Distributed Databases: Principles and Systems. Mc- Graw Hill, 1984. 80 [11] T. Connors and P. Lyngbaek. Providing Uniform Access to Heterogeneous Information Bases. In Proceedings of 2nd International Conference on Ob ject Oriented Database Systems. Bad Munster am Stein Ebernburg, Germany, September 1988. [12] DARPA. Case-based Reasoning from DARPA: Machine Learning Program Plan. In Proceedings of the 1989 DARPA Case-Based Reasoning Workshop. DARPA, Washington, D.C., 1989. [13] U. Dayal and H. Hwang. View Definition and Generalization for Database Integration in Multibase: A System for Heterogeneous Distributed Databases. IE EE Transactions on Software Engineering, 10(6):628-644, 1984. [14] D. Fang, J. Hammer, and D. McLeod. An Approach to Behavior Sharing in Federated Database Systems. In M.T. Ozsu, U. Dayal, and P. Valduriez, editors, Distributed Object Management, pages 334-346. Morgan Kaufman, 1993. [15] D. Fang, J. Hammer, D. McLeod, and A. Si. Remote-Exchange: An Approach to Controlled Sharing among Autonomous, Heterogenous Database Systems. In Proceedings of the IEEE Spring Compcon. IEEE, San Francisco, February 1991. [16] D. Fang and D. McLeod. A Testbed and Mechanism for Object-Based Sharing in Federated Database Systems. Technical Report USC-CS-92-507, Computer Science Departm ent, University of Southern California, Los Angeles CA 90089- 0781, February 1992. [17] P. Fankhauser and E. Neuhold. Knowledge Based Integration of Heterogeneous Databases. Technical report, Technische Hochschule D arm stadt, 1992. [18] A. Ferrier and C. Stangret. Heterogeneity in the Distributed Database Man- gagement System SIRIUS-DELTA. In Proceedings of the International Confer ence on Very Large Databases. VLDB Endowment, 1983. [19] D. Fishman, D. Beech, H. Cate, E. Chow, T. Connors, T. Davis, N. Derrett, C. Hoch, W. Kent, P. Lyngbaek, B. Mahbod, M. Neimat, T. Ryan, and M. Shan. Iris: An Object-Oriented Database Management System. A C M Transactions on Office Information Systems, 5(1):48— 69, January 1987. [20] K. Frenkel. The Human Genome Project and Informatics. Communications of the ACM, 34(11):41-51, 1991. [21] S. Ghandeharizadeh, V. Choi, C. Ker, and K. Lin. Design and Implementation of the Omega Object-Based System. In Proceedings o f the Fourth Australian Database Conference. Australian Computer Society, February 1993. 81 [22] T. R. Gruber. A Translation Approach to Portable Ontologies. Knowledge Acquisition, 5(2):199-220, 1993. [23] J. Hammer and D. McLeod. An Approach to Resolving Semantic Heterogene ity in a Federation of Autonomous, Heterogeneous Database Systems. Inter national Journal of Intelligent & Cooperative Information Systems, 2(1):51— 83, March 1993. [24] J. Hammer, D. McLeod, and A. Si. An Intelligent System for Identifying and Integrating Non-Local Objects in Federated Database Systems. In Proceedings of the 27th Hawaii International Conference on System Sciences, pages 398-407. Com puter Society of the IEEE, Hawaii, USA, January 1994. [25] M. Hammer and D. McLeod. On Database Management System Architecture. In Infotech State o f the A rt Report: Data Design, volume 8 of Infotech State of the A rt Reports, pages 177-202. Pergamon Infotech Limited, Maidenhead, United Kingdom, 1980. [26] S. Hayne and S. Ram. Multi-User View Integration System (MUVIS): An Expert System for View Integration. In Proceedings of the 6th International Conference on Data Engineering. IEEE, February 1990. [27] D. Heimbigner and D. McLeod. A Federated Architecture for Information Sys tems. A C M Transactions on Office Information Systems, 3(3):253-278, July 1985. [28] M. Huhns, N. Jacobs, T. Ksiezyk, W. Shen, M. Singh, and P. Cannata. Enter prise Information Modeling and Model Integration in Carnot. Technical Report Carnot-128-92, MCC, 1992. [29] R. Hull and R. King. Semantic Database Modeling: Survey, Applications, and Research Issues. AC M Computing Surveys, 19(3):201-260, September 1987. [30] R. Hull and S. Widijojo. ILOG: Specificational Language for Generating OIDs. Technical Report RJ3325, ISI, February 1990. [31] G. Jacobsen, G. Piatetsky-Shapiro, C. Lafond, M. Rajinikanth, and J. Hernan dez. CALIDA: A Knowledge-Based System for Integrating M ultiple Heteroge neous Databases. In Proceedings of the 3rd International Conference on Data and Knowledge Bases, pages 3-18, June 1988. [32] R. Katz and N. Goodman. View Processing in Multibase - A Heterogeneous Database System. In An Entity-Relationship Approach to Information Mod elling and Analysis, pages 259-280. ER Institute, 1981. [33] W. Kent. The Many Forms of a Single Fact. In Proceedings of the IE E E Spring Compcon. IEEE, February 1989. 82 [34] W. Kent. Solving Domain Mismatch Problems with an Object-Oriented Database Programming Language. In Proceedings o f the International Con ference on Very Large Databases, pages 147-160. IEEE, September 1991. [35] W. Kent, R. Ahmed, J. Albert, M. Ketabchi, and M. Shan. Object Identifica tion in M ultidatabase Systems. Technical report, Hewlett-Packard Laborato ries, 1992. [36] W. Kim, N. Ballou, H.-T. Chou, J.F. Garza, and D. Woelk. Integrating an Object-Oriented Programming System with a Database System. In Proceedings of A C M Conference on Object-Oriented Programming Systems, Languages, and Applications, pages 142-152. ACM, September 1988. [37] W. Kim, J. Banerjee, H. T. Chou, J. F. Garza, and D. Woelk. Composite Object Support in an Object-Oriented Database System. In Proceedings of the Confer ence on Object-Oriented Programming Systems, Languages, and Applications, pages 118-125, 1987. [38] W. Kim, I. Choi, S. Gala, and M. Scheevel. On Resolving Schematic Het erogeneity in M ultidatabase Systems. Distributed and Parallel Databases, l(3):251-279, July 1993. [39] W. Kim, J. Garza, N. Ballou, and D. Woelk. Architecture of the ORION Next-Generation Database System. IE E E Transactions on Knowledge Data Engineering, 2(1):109— 117, March 1990. [40] J. Larson, S.B. Navathe, and R. Elmasri. A Theory of A ttribute Equivalence and its Applications to Schema Integration. IE E E Transactions on Software Engineering, 15(4):449— 463, April 1989. [41] C. Lecluse, P. Richard, and F. Velez. 02, an Object-Oriented D ata Model. In Proceedings of the AC M SIGMOD International Conference on Management of Data, Chicago, 111., June 1988. ACM SIGMOD. [42] Q. Li and D. McLeod. Object Flavor Evolution in an Object-Oriented Database System. In Proceedings of the Conference on Office Information System. ACM, March 1988. [43] V. Linnemann, et al. Design and Implementation of an Extensible Database Management System Supporting User Defined D ata Types and Functions. In Proceedings of the International Conference on Very Large Databases, pages 294-305, Los Angeles, Ca., 1988. [44] B. Liskov. Distributed Programming in Argus. Communications of the ACM , 31(3):300-312, 1988. 83 [45] W. Litwin. An Overview of the M ultidatabase System MRSDM. In Proc. of the A C M National Conference, pages 495-504. ACM, October 1985. [46] W. Litwin and A. Abdellatif. M ultidatabase Interoperability. IE E E Computer, 19( 12):10— 18, December 1986. [47] P. Lyngbaek and D. McLeod. Object Management in Distributed Information Systems. A C M Transactions on Office Information Systems, 2(2):96-122, April 1984. [48] R. MacGregor. A Deductive Pattern Matcher. In Proceedings of the AAAI-89 National Conference on Artifical Intelligence. AAAI, St. Paul, Minn., August 1988. [49] D. Maier, J. Stein, A. Otis, and A. Purdy. Development of an Object-Oriented DBMS. In Proceedings of the Conference on Object-Oriented Programming Sys tems, Languages, and Applications, pages 472-482. ACM, 1986. [50] F. Manola, S. Heiler, D. Georgakopoulos, M. Hornick, and M. Brodie. Dis tributed Object Management. International Journal of Intelligent & Coopera tive Information Systems, 1(1):5— 42, 1992. [51] D. McLeod. Beyond Object Databases. In W. Stucky and A. Oberweis, editors, Datenbanksysteme in Biiro, Technik, und Wissenschaft. Springer Verlag, 1993. [52] D. McLeod, D. Fang, and J. Hammer. The Identification and Resolution of Semantic Heterogeneity. In Proceedings of International Workshop on Interop erability in Multidatabase Systems. Kyoto, Japan, April 1991. [53] A. Motro. Superviews: Virtual Integration of Multiple Databases. IE E E Trans actions on Software Engineering, 13(7), July 1987. [54] A. Motro and P. Buneman. Constructing Superviews. In Proceedings of the A C M SIGMOD International Conference on Management o f Data, Ann Arbor, Mich., April 1981. ACM SIGMOD. [55] S. B. Navathe, R. ElMasri, and J. Larson. Integrating User Views in Database Design. IE E E Computer, 19(l):50-62, 1986. [56] R. Neches. Approaches to Cognitive Issues in Knowledge Acquisition within the SHELTER Knowledge Base Development Environment. In Proceedings of the A A A I Spring Symposium on Cognitive Issues in Knowledge Acquisition. AAAI, Stanford, California, March 1992. [57] OMG. The Common Object Request Broker: Architecture and Specification. Technical Report Number 91.12.1, Draft 10, The Object Management Group, 1992. 84 [58] M. Papazoglou, S. Laufmann, and T. Sellis. An Organizational Framework for Cooperating Intelligent Information Systems. International Journal o f Intelli gent & Cooperative Information Systems, 1(1): 169— 202, 1992. [59] B. Raphael. A Computer Program for Semantic Information Retrieval. In M. Minsky, editor, Semantic Information Processing. MIT Press, Cambridge, Mass., 1968. [60] M. Rusinkiewitz, R. Elmasri, B. Czejdo, D. Georakopoulous, G. K arabatis, A. Jamoussi, K. Loa, and Y. Li. OMNIBASE: Design and Im plementation of a M ultidatabase System. In Proceedings of the 1st Annual Symposium in Parallel and Distributed Processing, pages 162-169. IEEE, May 1989. [61] A. Savasere, A. Sheth, S. Gala, S. Navathe, and H. Marcus. On Applying Classification to Schema Integration. In Proceedings of IE E E 1st International Workshop on Interoperability in Multidatabase Systems, pages 258-261. Kyoto, Japan, April 1991. [62] M.H. Scholl, C. Laasch, and M. Tresch. Update Views in Object-Oriented Databases. In Proceedings of the 2nd International Conference on Distributed Object-Oriented Databases, December 1991. [63] A. Sheth and V. Kashyap. So Far (Schematically) yet So Near (Semanti cally). In D.K. Hsiao, E.J. Neuhold, and R. Sacks-Davis, editors, Interoperable Database Systems (DS-5) (A-25), pages 283-312. Elsevier Science Publisher B.V. (North-Holland), 1993. [64] A. Sheth and J. Larson. Federated Database Systems for Managing Dis tributed, Heterogeneous, and Autonomous Databases. A C M Computing Sur veys, 22(3):183-236, September 1990. [65] A. Sheth, J. Larson, A. Cornelio, and S. B. Navathe. A Tool for Integrating Conceptual Schemata and User Views. In Proceedings of the jth International Conference on Data Engineering, pages 176-183. IEEE, February 1988. [66] D. Shipman. The Functional D ata Model and the Data Language DAPLEX. A C M Transactions on Database System s, 2(3):140-173, March 1981. [67] M. Siegel and S. E. Madnick. A M etadata Approach to Resolving Semantic Conflicts. In Proc. of the 17th International Conference on Very Large Data bases, pages 133-145. IEEE, September 1991. Barcelona. [68] A. Silberschatz, M. Stonebraker, and J. Ullman. Database Systems: Achieve ments and Opportunities. A C M Sigmod Record, 19(4):6-23, December 1990. 85 [69] J, Smith, P. Bernstein, U. Dayal, N. Goodman, T. Landers, K. Lin, and E. Wong. Multibase: Integrating Heterogeneous Distributed Database Sys tems. In Proceedings of the National Computer Conference, pages 487— 499. AFIPS, June 1981. [70] R. Strom and S. Yemini. The NIL Distributed Systems Programming Language: A Status Report. A C M Sigplan Notices, 20(5):36-43, May 1985. [71] T. Templeton, et al. Mermaid: A Front-End to Distributed Heterogenous Databases. In Proceedings IntP Conf. on Data Engineering, pages 695-708. IEEE, 1987. [72] V. Ventrone and S. Heiler. A Practical Approach for Dealing with Semantic Heterogeneity in Federated Database Systems. Technical report, The M ITRE Corporation, October 1993. [73] G. Wiederhold. Mediators in the Architecture of Future Information Systems. IE E E Computer, 25(3):38-49, March 1992. 86 Appendix A Remote-Sharing Language R S L The following is a detailed description of the functionality that is provided by the rem ote sharing language commands used in CODM. Components may use their own data models as long as they adhere to the specifications set forth by CODM and provide a sharing functionality that is compatible with that of R SL described below. We realize that some of the RSL commands are not very “user-friendly.” Haslnstances, for example, returns a set of OID’s and thus additional set operators are needed before one can access and m anipulate individual instances. Therefore, we encourage that most federation components will hide the RSL communication protocol layer from users by adding their own interface, similar to the graphical interface provide by our unification tool (see Fig. 9.2). 87 S h o w M eta type-level command Syntax: ShowMeta(d:DB) D escription: Lists all the types that belong to the export schema of database d. Example: ShowMeta (Travel-Agency-Z)) — > Accommodations Airline Ent ert ainment Flight Rental-Car H a sP r o p e r tie s function-level command Syntax: HasProperties(d:DB, t:Type) D escription: Lists all stored functions that are defined on type t in database d. Example: HasProperties(Travel-Agency-D, Accommodations) — —> ■ Name Fax-Number Price H a sV a lu e T y p e type-level command Syntax: HasValueType(d:DB, f:Func) D escription: Returns the value type for stored function / in database d. Example: HasValueType(Travel-Agency-D, Name) — > STRING 88 H a sM e th o d s function-level command S y n tax : HasMethods(d:DB, t:Type) D e sc rip tio n : Lists all computed functions that are defined on type t in database d. Example: HasMethods(Travel-Agency-Z), Accommodations) — > SendMessage MakeReservat ion H a sln sta n c e s instance-level command S y n tax : Haslnstances(d:DB, t:Type) D e sc rip tio n : Returns a list of all instance OID’s of user-defined type t in database d. Example: Haslnstances(Travel-Agency-D, Accommodations) 10001 10002 10003 10004 H a s V alu e instance-level command Syntax: HasValue(d:DB, idnstOID, f:Func) Description: Returns the value of the stored function / on instance i in database d. Example: HasValue(Travel-Agency-A, 10003, Name) y Holiday Inn 89 H a sD ir e c tS u b ty p e s type-level command S y n tax : HasDirectSub Types(d:DB, t:Type) D e sc rip tio n : Returns a list of all DIRECT subtypes of type t in database d. Example: HasDirectSubtypes (Travel-Agency-Z), Accommodations) — > Hotels Private Accommodations H a sD ir e c tS u p e r ty p e type-level command S y n tax : HasDirectSuperType(d:DB, t:Type) D e sc rip tio n : Returns the DIRECT supertype of type t in database d. Example: HasDirectSupertype(Travel-Agency-Z), Accommodations) — > Travel-Info Im p o rt M e t a type-level command S y n tax : ImportM eta(dl:DB, tl:Type, d2:DB; t2:type, r:Rel) D e sc rip tio n : Im ports type t l from database dl into d2 and relates remote type tl to local type t2 using relationship r. Example: ImportMeta(Travel-Agency-Z), Private Accommodations, Travel-Agency-A, Bed&Breakfast, SUBTYPE) Im p o r tln s ta n c e instance-level command S y n tax : Importlnstance(dl:DB, tl:Type, d2:DB, t2:type, i:InstOID) 90 D e sc rip tio n : Imports instance i of type tl from remote database dl into type t2 of local database d2 E x am p le : Import Instance (Travel-Agency-Z), Private Accommodations, Travel-Agency-A, Bed&Breakfast, 10003) Im p o rt M e th o d function-level command S y n ta x : ImportComputedFunction(dl:DB, f:Func, d2:DB) D e sc rip tio n : Im ports computed function / from remote database dl into local database d2. Example: ImportMethod(Travel-Agency-Z), SendMessage, Travel-Agency-A) 91 Appendix B SHELTER Prototype The following is a listing of the LOOM commands used to create a sample federa tion of three travel agencies in the SHELTER development environment. The three travel agencies are named A, B , and E to express the similarities with the exam ple used in earlier chapters of this dissertation. Each component knowledge base contains the schema, the data, as well as the local lexicon for the underlying travel agency. The last part of this appendix contains a listing of the semantic dictio nary with the ontology and relationship descriptors that were used in this prototype implement ation. 92 B .l Semantic Dictionary The following is a listing of the SHELTER script file that was used to create the semantic dictionary. ;;;-*- Mode: LISP; Syntax: Common-Lisp; Package: TRAVEL; Base: 10.-*- (in-package "TRAVEL") ;;; Knowledge Base: SEMANTIC DICTIONARY ;;; Last Saved On: 02/02/94 10:50:42 (defparameter savedKnowledgeBase loom:*knowledge-base*) (defparameter savedWorld loom:*world*) (1o om:change-kb »TRAVEL:SEMANTIC-DICTI0NARY) (eval-when #+:CLTL2(:execute :load-toplevel :compile-toplevel) #-:CLTL2(load eval compile) ;;; Relationship Descriptors ; ;; Relationships Between Concepts (loom:defrelation Is_Identical_To :is (:and Loom:Binary-Tuple :primitive) :range Loom:Concept :domain Loom:Concept :characteristics (:symmetric)) (loom:defrelation Is_Equal_To :is (:and Loom:Binary-Tuple :primitive) :range Loom:Concept :domain Loom:Concept :characteristics (:symmetric)) (loom:defrelation Is_Compatible_With :is (:and Loom:Binary-Tuple :primitive) :range Loom:Concept :domain Loom:Concept :characteristics (:symmetric)) (loom:defrelation Is_Kind_0f 93 : is (rand Loom:Binary-Tuple :primitive) :range Loom:Concept :domain Loom:Concept) (loom:defrelation Assoc_With :is (rand Loom:Binary-Tuple :primitive) :range Loom:Concept :domain Loom:Concept) (loom:defrelation Is_Collection_Of :is (rand Loom:Binary-Tuple :primitive) :range Loom:Concept :domain Loom:Concept) (loom:defrelation Is_Instance_Of :is (rand Loom:Binary-Tuple :primitive) :range Loom:Concept :domain Loom:Concept) (loom:defrelation Is_Common_Among :is (rand Loom:Binary-Tuple :primitive) :range Loom:Concept :domain Loom:Concept) (loom:defrelation Is_Feature_Of :is (rand Loom:Binary-Tuple :primitive) :range Loom:Concept :domain Loom:Concept) (loom:defrelation Has :is (rand Loom:Binary-Tuple :primitive) :range Loom:Concept :domain Loom:Concept) ;;; Relationships Between a Relation and a Concept (loom:defrelation Is_REqual_To :is (rand Loom:Binary-Tuple :primitive) :range Loom:Concept :domain Loom:Relation) 94 (loom:defrelation RType_Of :is (:and Loom:Binary-Tuple :primitive) :range Loom:Relation :domain Loom:Concept) ; ; ; gpo (loom:defconcept Person :is (:and :primitive)) (loom:defconcept Thing :is (:and :primitive)) (loom:defconcept Marne :is (:and :primitive)) (loom:defconcept Number :is (:and :primitive)) ;;; SPO-Travel (loom:defconcept Customer :is (:and :primitive)) (loom:defconcept Package :is (:and :primitive)) (loom:defconcept Schedule :is (:and :primitive)) (loom:defconcept Booking :is (:and :primitive)) (loom:defconcept Arrival :is (:and :primitive)) (loom:defconcept Departure :is (:and :primitive)) (loom:defconcept Price :is (:and :primitive)) (loom:defconcept Ticket :is (:and :primitive)) (loom:defconcept Destination :is (:and :primitive)) (loom:defconcept Seat :is (:and :primitive)) (loom:defconcept Activity :is (:and :primitive)) 95 (loom:defconcept Restriction :is (:and :primitive)) ; ; ; SPO-Travel-Air (loom:defconcept Flight :is (:and :primitive)) (loom:defconcept Airline :is (:and :primitive)) (loom:defconcept Fare :is (:and :primitive)) (loom:defconcept First :is (:and :primitive)) (loom:defconcept Business :is (:and :primitive)) (loom:defconcept Coach :is (:and :primitive)) (loom:defconcept Service :is (:and :primitive)) (loom:defconcept Connection :is (:and :primitive)) (loom:defconcept Class :is (:and :primitive)) ;;; SPO-Travel-Hotel (loom:defconcept Hotel :is (:and :primitive)) (loom:defconcept Room :is (:and :primitive)) (loom:defconcept Category :is (:and :primitive)) (loom:defconcept One-Star :is (:and :primitive)) (loom:defconcept Two-Star :is (:and :primitive)) (loom: def concept Three-Star- :is (:and .-primitive)) ;;; SPO-Travel-Train (loom:defconcept Train :is (rand :primitive)) (loom:defconcept Car :is (rand :primitive)) (loom:defconcept Compartment :is (rand :primitive)) ;;; SPO-Travel-CarRental (loom:defconcept Rental_Car :is (rand :primitive)) (loom:defconcept Insurance :is (:and :primitive)) (loom:defconcept Compact :is (rand :primitive)) (loom:defconcept Large :is (rand :primitive)) (loom:defconcept Van :is (:and :primitive)) ;;; SPQ-Travel-Cruise (loom:defconcept Pleasure-Crui :is (rand :primitive)) (loom:defconcept Boat :is (rand :primitive)) (loom:defconcept Port :is (rand :primitive)) (loom:defconcept Cabin :is (rand :primitive)) (loom:defconcept Sightseeing :is (:and :primitive)) (loom:defconcept Entertainment :is (rand :primitive)) (loom:finalize-definitions) ) ; END EVAL-WHEN (loom:change-kb savedKnowledgeBase) (when savedWorld (loom:change-world savedWorld)) B.2 Travel Agency A The following is a listing of the SHELTER script file that was used to create a conceptual schema for travel agency A. ;;;-*- Mode: LISP; Syntax: Common-Lisp; Package: TRAVEL; Base: 10.-*- (in-package "TRAVEL") ;;; Knowledge Base: TRAVEL:TRAVEL-AGENCY-KB-1 ;;; Last Saved On: 09/01/93 10:03:48 (defparameter savedKnowledgeBase loom:*knowledge-base*) (defparameter savedWorld loom:*world+) (loom:change-kb »TRAVEL:TRAVEL-AGENCY-KB-1) (eval-when (LOAD EVAL COMPILE) ;;; Local Schema for Travel Agency A (loom:defrelation Airline :is :Primitive :kb Travel-Agency-Kb-1) (loom:defrelation End :is :Primitive :kb Travel-Agency-Kb-1) (loom:defrelation Has-Cost :is :Primitive :kb Travel-Agency-Kb-1) (loom:defrelation Has-Total-Cost :is :Primitive :kb Travel-Agency-Kb-1) 98 (loom:defrelation Has-Train-Ride :is :Primitive :kb Travel-Agency-Kb-1) (loom:defrelation Is-Owned-By :is :Primitive :kb Travel-Agency-Kb-1) (loom:defrelation Lodging :is :Primitive :kb Travel-Agency-Kb-1) (loom:defrelation Makes :is :Primitive :kb Travel-Agency-Kb-1) (loom:defrelation Owned-By :is :Primitive :kb Travel-Agency-Kb-1) (loom:defrelation Start :is :Primitive :kb Travel-Agency-Kb-1) (loom:defrelation Train-Ride :is :Primitive :kb Travel-Agency-Kb-1) (loom:defconcept Hotel :is (:and :primitive (:and (:at-least 1 (:and Owned-By (:range Owner))) (:at-most 999 (:and Owned-By (:range Owner)))) (:and (:at-least 1 (:and Has-Cost (:range Cost))) (:at-most 999 (:and Has-Cost (:range Cost)))))) (loom:defconcept Bed&Breakfast :is (:and :primitive Hotel)) (loom:defconcept Cost :is (:and :primitive Integer)) (loom:defconcept Customer) (loom:defconcept Date :is (:and :primitive String)) (loom:defconcept Flight :is :Primitive :kb Travel-Agency-Kb-1) (loom:defconcept Owner :is :Primitive :kb Travel-Agency-Kb-1) (loom:defconcept Reservation 99 :is (:and :primitive (:all Is-Owned-By Customer) (:exactly 1 Is-Owned-By) (:at-most 999 (rand Train-Ride (:range Train))) (rat-most 999 (rand Airline (rrange Flight))) (rat-most 999 (rand Lodging (:range Hotel))) (rail Has-Total-Cost Cost) (:exactly 1 Has-Total-Cost) (rthe Start Date) (rthe End Date))) (loomrdefconcept Resort :is (rand rprimitive Hotel)) (loom:defconcept Train :is rPrimitive rkb Travel-Agency-Kb-1) ;;; Local Lexicon (loomrtell (semantic-dictionary“Is_Equal_To semantic-dictionary“Booking Reservation)) (loom rtell (semantic-dictionary“Is_Equal_To semantic-dictionary“Price Cost)) (loom:tell (semantic-dictionary“Is_Kind_Of Resort semantic-dictionary“Hotel)) (loomrtell (semantic-dictionary“Is_Kind_0f Bed&Breakfast semantic-dictionary“Hotel)) (loom:tell (semantic-dictionary“Is_Kind_Of Owner semantic-dictionary"Person)) (loom:tell (semantic-dictionary“CollectionOf Lodgings semantic-dictionary“Room)) (loom:tell (semantic-dictionary“Feature Bed&Breakfast semantic-dictionary“One-Star)) (loom:tell (semantic-dictionary“Has Hotel Owner)) (loom:finalize-definitions) ) ; END EVAL-WHEN (loom:change-kb savedKnowledgeBase) (when savedWorld (loom:change-world savedWorld)) B.3 Travel Agency B The following is a listing of the SHELTER script file that was used to create a conceptual schema for travel agency B. ;;;-*- Mode: LISP; Syntax: Common-Lisp; Package: TRAVEL; Base: 10.-*- (in-package "TRAVEL") ;;; Knowledge Base: TRAVEL:TRAVEL-AGENCY-KB-3 ;;; Last Saved On: 09/08/93 10:40:46 (defparameter savedKnowledgeBase loom:*knowledge-base*) (defparameter savedWorld loom:*world*) (loom:change-kb * TRAVEL:TRAVEL-AGENCY-KB-3) (eval-when (LOAD EVAL COMPILE) ;;; Local Schema for Travel Agency B (loom:defrelat ion Arrival_Date :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Arrives_In :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Carrier :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Departs_From 101 :is :Primitive :kb Travel-Agency-Kb-3) (loomrdefrelation Departure_Date :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Destination :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Duration :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelat ion Ends_0n :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation End_Date :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Has-Last_Name :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Has_Address :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Has_Contact :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Has_First_Name :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation In_Charge_Of :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Located_In :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Package-Id :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Ss# :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Starts_0n :is :Primitive :kb Travel-Agency-Kb-3) (loom:defrelation Start_Date :is :Primitive :kb Travel-Agency-Kb-3) (loom:defconcept Date :is (:and :primitive String)) (loom:defconcept Air-Trips :is (:and :primitive (:the Arrival_Date Date) (:the Departure_Date Date))) (loom:defconcept Contact :is (:and :primitive Person (:and (:at-least 0 ( :and In_Charge_Of (:range Package))) (:at-most 999 ( :and In_Charge_Of (:range Package)))))) (loom:defconcept Cruises :is (:and :primitive Package (:and (: at-least 1 (:and Arrives.. In (: range Port))) (:at-most 999 (:and Arrives_In (:range Port)))) (:and (:at-least 1 (:and Departs_From (:range Port))) (:at-most 9999 (:and Departs_From (:range Port)))))) (loom:defconcept Exotic :is (:and :primitive Package (:th.e Start_Date Date) (:the End_Date Date))) (loom:defconcept Hotel-Air :is (:and :primitive Package (:and (:at-least 1 (:and Destination (:range Hotels))) (:at-most 999 (:and Destination (:range Hotels)))) (:the Carrier Air-Trips))) (loom:defconcept Hotels :is (:and :primitive (:the Has_Address String))) (loom:defconcept Package :is (:and :primitive (:all Duration Time) (:exactly 1 Duration) (:all Package-Id Integer) (:exactly 1 Package-Id))) (loom:defconcept Person :is (:and :primitive (:the Has_Address String) 103 (rthe Ss# String) (rthe Has_First_Name String) (rthe Has-Last_Name String))) (loom:defconcept Port :is (rand rprimitive (rthe Located.In String))) (loom:defconcept Schedule :is rPrimitive rkb Travel-Agency-Kb-3) (loom:defconcept Time ris (rand rprimitive String)) ;;; Local Lexicon (loom:tell (semantic-dictionary"Is_Equal_To semantic-dictionary"Customer Contact)) (loom:tell (semantic-dictionary"Is_Compatible_With semantic-dictionary"Booking Package) ) (loom:tell (semantic-dictionary"Has Package semantic-dictionary"Restriction)) (loom:tell (semantic-dictionary"Is_Compatible_With semantic-dictionaxy"Booking Hotel-Air-Package)) (loom:tell (semantic-dictionary"Is_Compatible_With semantic-dictionary"Pleasure-Cruise Cruises)) (loom:tell (semantic-dictionary"Is_Compatible_With semantic-dictionary"Hotel Exotic)) (loom:tell (semantic-dictionary"Is_Feature_Of semantic-dictionary"Three-Star Exotic)) (loom:finalize-definitions) ) ; END EVAL-WHEN (loom:change-kb savedKnowledgeBase) (when savedWorld (loom:change-world savedWorld)) 104 B.4 Travel Agency E The following is a listing of the SHELTER script file that was used to create a conceptual schema for travel agency E. ;;;-*- Mode: LISP; Syntax: Common-Lisp; Package: TRAVEL; Base: 1 0 . - * - (in-package "TRAVEL") ;;; Knowledge Base: TRAVEL:TRAVEL-AGENCY-KB-2 ;;; Last Saved On: 09/03/93 12:15:57 (defparameter savedKnowledgeBase loom:*knowledge-base+) (defparameter savedWorld loom:*world*) (loom:change-kb 'TRAVEL:TRAVEL-AGENCY-KB-2) (eval-when (LOAD EVAL COMPILE) ;;; Local Schema for Travel Agency E (loom:defrelation Belongs_To :is :Primitive :kb Travel-Agency-Kb-2) (loom:defrelation Client_Id :is :Primitive :kb Travel-Agency-Kb-2) (loom:defrelation End_Date :is :Primitive :kb Travel-Agency-Kb-2) (loom:defrelation Flight :is :Primitive :kb Travel-Agency-Kb-2) (loom:defrelation Has_Itinerary :is :Primitive :kb Travel-Agency-Kb-2) (loom:defrelation Has_Name :is ‘ .Primitive :kb Travel-Agency-Kb-2) (loom:defrelation Itinerary_Id :is :Primitive :kb Travel-Agency-Kb-2) (loom:defrelation Rental_Car 105 :is :Primitive :kb Travel-Agency-Kb-2) (loom:defrelation Start_Date :is :Primitive :kb Travel-Agency-Kb-2) (loom:defrelation Stays_In :is :Primitive :kb Travel-Agency-Kb-2) (loom:defconcept Airlines :is (:and :primitive Things_To_Do)) (loom:defconcept Clients :is (:and :primitive (:the Has_Name String) (:the Client_Id Integer))) (loom:defconcept Date :is (:and :primitive String)) (loom:defconcept Itinerary :is (:and :primitive (:all Itinerary_Id Integer) (:exactly 1 Itinerary_Id) (:all Belongs_To Clients) (:exactly 1 Belongs_To) (:and (:at-least 0 (:and Flight (:range Airlines))) (:at-most 999 (:and Flight (:range Airlines)))) (:and (:at-least 0 (:and Rental_Car (:range Rental_Cars))) (:at-most 999 (:and Rental_Car (:range Rental_Cars)))) (:and (:at-least 0 (:and Stays_In (:range Lodgings))) (:at-most 999 (:and Stays.In (:range Lodgings)))))) (loom:defconcept Lodgings :is (:and :primitive Things_To_Do)) (loom:defconcept Rental_Cars :is (:and :primitive Things_To_Do)) (loom:defconcept Things_To_Do :is (:and :primitive (:the Start_Date Date) (:the End_Date Date))) ;;; Local Lexicon (loom:tell (semantic-dictionary~Is_Equal_To semantic-dictionary~Flight Airlines)) (loom: tell (semant ic-dict ionary- ' I s_Equ.al_To 106 semantic-dictionary~Customer Clients)) (loom:tell (semantic-dictionary''Is_Equal_To semantic-dictionary''Activity Things_To_Do) ) (loom:tell (semantic-dictionary~Is_Equal_To semantic-dictionary~Rental_Car Rental_Cars)) (loom:tell (semantic-dictionary"Is_Equal_To semant ic-dict ionary'’ Hot el Lodgings)) (loom: tell (semant ic-dict ionary "'I s_Compatible_With semantic-dictionary~Schedule Itinerary)) (loom:tell (semantic-dictionary~CollectionOf Lodgings semantic-dictionary~Room)) (loom:tell (semantic-dictionary~Feature Lodgings semantic-dictionary~One-Star)) (loom:tell (semantic-dictionary^Has Itinerary Airlines)) (loom:tell (semantic-dictionary''Has Itinerary Lodgings)) (loom:tell (semantic-dictionary~Has Itinerary Rental_Cars)) (loom:finalize-definitions) ) ; END EVAL-WHEN (loom:change-kb savedKnowledgeBase) (when savedWorld (loom:change-world savedWorld)) 107
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11255813
Unique identifier
UC11255813
Legacy Identifier
DP22882