Close
The page header's logo
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Sharing persistent object-bases in a workstation environment
(USC Thesis Other) 

Sharing persistent object-bases in a workstation environment

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content SHARING PERSISTENT OBJECT-BASES IN A WORKSTATION ENVIRONMENT by Surjatini W idjojo A D issertation Presented to the FACULTY O F TH E GRADUATE SCHOOL U N IV ERSITY O F SOUTHERN CALIFORNIA In P artial Fulfillment of the Requirem ents for the Degree D O C TO R OF PH ILO SO PH Y (C om puter Science) August 1990 Copyright 1990 Surjatini W idjojo UMI Number: D P22809 All rights reserved INFORMATION TO ALL U SERS The quality of this reproduction is d ep en d en t upon the quality of the copy subm itted. In the unlikely event that the author did not sen d a com plete m anuscript and th ere are m issing pag es, th e se will be noted. Also, if m aterial had to be rem oved, a note will indicate the deletion. Dissertation Publishing UMI DP22809 Published by P roQ uest LLC (2014). Copyright in th e Dissertation held by the Author. Microform Edition © P roQ uest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United S tates Code P roQ uest LLC. 789 E ast Eisenhow er Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 -1 3 4 6 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK q LOS ANGELES, CALIFORNIA 90089-4015 | K JD , cps ’9 0 m 3? This dissertation, written by S u r ja tin i W idjojo under the direction of h.zr. Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillm ent of re­ quirem ents for the degree of D O C TO R OF PH ILOSOPH Y Dean of Graduate Studies D a te ........... DISSERTATION COMMITTEE ........................................ Chairperson A ck n ow led gm en ts I wish to express m y thanks to my advisor, Dave W ile, for his guidance, support, and encouragem ent. I am deeply grateful to Rick Hull for his invaluable technical help on m y thesis and for the opportunities to share ideas w ith him . I would like to th an k both Dave and Rick for their help in form ulating m y thesis. The help and suggestions I received from Professor Alice Parker are also highly appreciated. My thanks to Drs. Jean ette W ing and Dennis M cLeod who encouraged me at th e sta rt of my graduate career; B alaji N arasim han, Dennis A llard, Don Co­ hen, K. Narayanaswam y, Y ingsha Liao, and Jianw en Su for their willingness to lend technical and m oral support when I needed it. Also thanks to the research and support staff at U SC /Inform ation Sciences In stitu te for providing a pleasant working environm ent. Finally, I wish to th an k my parents w ith all my heart for their support and confidence in me; and m y husband, Ed Ipser, Jr. for his patience and unending m oral support. This research was supported, in p art, by Defense Advanced Research Projects Agency, Inform ation Science and Technology office, ARPA O rder No. 6096, issued by Defense Supply Service (W ashington) under contracts no. MDA903-87-C-0641 and MDA903-81-C-0335. C O N T E N T S A cknow ledgm ents ii A b stract viii 1 In trod u ction 1 1.1 Thesis Goals ............................ 4 1.2 Overview of W orldBase .................................................................. 5 1.2.1 W orldBase E n v iro n m e n t................................................................... 5 1.2.2 Sharing Support in W o r ld B a s e ..................................................... 8 1.2.3 D ata Access Using W orldBase: S electio n .................................... 10 1.2.4 D ata Access using W orldBase: T ra n s fo rm a tio n ..................... 15 1.2.5 D ata Access using W orldBase: M e r g in g ................................... 17 1.3 Scope of T h e s is .......................................... 20 1.4 O rganization of T h e s is .................................................................................... 21 2 Background and R elated R esearch 22 2.1 Caching and D atabases ........................................................................... 23 2.1.1 R elated Research in O bject-O riented P e r s is te n c e .................... 25 2.1.2 Related Research in O bject-O riented D a ta b a se s....................... 27 2.1.3 W orldBase A p p ro ac h .......................................................................... 29 2.2 D atabase T ra n s fo rm a tio n ............................................................................. 30 2.2.1 Aspects of the P r o b l e m .................................................... 31 2.2.2 Im pact of O b je c t-O rie n ta tio n ......................... 32 2.2.3 R elated Research in T ra n s fo rm a tio n ........................................... 33 2.2.4 W orldBase A p p ro ac h .......................................................................... 37 2.3 M erging D a ta b a s e s ................................................................................................ 37 2.3.1 O bject Identification and E q u iv a le n c e ....................................... 38 2.3.2 R elated Research in M e r g in g ........................................................ 39 2.3.3 W orldBase A p p ro a c h .......................................................................... 40 2.4 M ultiple D atabase E n v iro n m e n ts .............................................. 41 2.4.1 D atabase In te g ra tio n .......................................................................... 41 2.4.2 Federated D a ta b a s e s .......................................................................... 42 2.4.3 D O D M /D P D M .................................................................................... 43 2.4.4 M RDSM .................................................................■ . . 44 iii 2.4.5 Com parisons w ith W o rld B a se ......................................................... 44 2.5 AP5 - Platform for W orldBase ................................................................... 45 3 W o rld B a s e E n v ir o n m e n t 47 3.1 W orldBase D ata M odel ................................................................................. 47 3.1.1 States of O b je c ts ................................................................................. 50 3.2 W orldBase C o m p o n e n ts ................................................................................. 50 3.2.1 W orld D atabase ................................................................................. 51 3.3 Persistent S t o r e .................................................................................................. 55 3.3.1 R e g is try ................................................................................................... 55 3.4 V irtual S t o r e ...................................................................................................... 56 3.4.1 W o rk sp a c e s............................................................................................ 57 3.4.2 W orld Schem a S p ecific atio n ............................................................ 57 3.4.3 W orld Closure Specification . ..................................................... 59 3.4.4 W orld Equivalence Specification .................................................. 60 3.4.5 W orld C onstraint S p e c ific a tio n ..................................................... 61 3.4.6 W orld I n s ta n c e ..................................................................................... 62 3.4.7 D atabase Definition and M anipulation L an g u ag es................... 63 3.5 S u m m a r y ........................................................ 64 4 P o p u la tio n b y C lo s u re S p e c ific a tio n 65 4.1 Closure Specification O v e r v ie w ................................................................... 65 4.1.1 Consistency and C o m p le te n e s s ............................ 68 4.2 Closure Specification S e m a n tic s ................................................................... 69 4.3 Closure Specification in W o rld B a se ............................................................ 72 4.4 Sum m ary . . ................................................. 73 5 W o rld B a s e T r a n s fo r m a tio n 74 5.1 Transform ation Specification w ith I L O G * .............................................. 74 5.1.1 E x a m p le s .............................................................................................. 77 5.1.2 ILOG* S y n t a x ............................................. 83 5.1.3 Typing Issues in I L O G * ................................................................... 84 5.1.4 Form al S em an tics................................................................................. 86 5.1.5 Invalid ILO G 1 * 1 Specifications . .................................................. 87 5.2 Transform ations in W orldBase ..................................................... 87 5.2.1 Bulk T ra n sfo rm a tio n .......................................................................... 88 5.2.2 Transform ation w ith C l o s u r e ......................................................... 92 5.2.3 S u m m a r y ................................................................................ 99 6 O n M e rg in g D a ta b a s e s 101 6.1 Aspects of M erging D a ta b a se s...........................................................................101 6.1.1 Schemas and M e r g in g ........................................................................... 101 6.1.2 D eterm ining O bject E q u iv a le n c e ...................................................... 102 iv 6.1.3 M erging the Rem aining D a t a ............................................................104 6.2 M erging in W orldBase - in T h e o r y ...............................................................106 6.2.1 Keys and C o n s t r a i n t s ......................................................................... 106 6.2.2 Schem a C om patibility and Equivalence .......................................108 6.2.3 C om puting th e Weak M erge of Two O bject-Bases ..................109 6.2.4 Enforcing the C o n s tra in ts ...................................................................I l l 6.3 M erging in W orldBase - in P ra c tic e ............................................................... 114 6.3.1 M erging S p ecificatio n s..........................................................................115 6.3.2 M erging I n s t a n c e s .................................................................................117 6.3.3 Enforcing C o n s tr a in ts ..........................................................................117 6.4 S u m m a r y .............................. 119 7 E xp erim en tal P ro to ty p e 120 7.1 Features of the P r o t o t y p e .................................................................................120 7.2 W orldBase M o d u l e s ................ 122 7.2.1 R e g is try ................ 123 7.2.2 W orld Environm ent M a n a g e r ................ 124 8 C onclusions and Future D irection s 129 8.1 W orldBase P a ra d ig m ...........................................................................................129 8.1.1 Focusing, caching, and co m m u n icatio n ................ 130 8.1.2 A u to n o m y ..................................................................................................131 8.1.3 Living w ith In c o n sis te n c y .................... 132 8.1.4 Sharing in W o r ld B a s e ..........................................................................133 8.1.5 D istributed Inform ation Sharing S y s te m .......................................134 8.2 Closure S p ecificatio n ........................................................................................... 137 8.3 IL O G * .......................................................... 138 8.4 D atabase M erging ........................................................................................... 139 8.5 The P r o t o t y p e ..................................................................................................... 141 R eference List 143 A A P 5 150 A .l AP5 D ata M o d e l ..................................................................................................150 A.2 D a ta Definition L a n g u a g e .................................................................................151 A .3 D ata M anipulation L a n g u a g e ..........................................................................152 B G ram m ars 154 B .l Syntax for Schem a Specification....................................................................... 155 B.2 Syntax for Closure S pecification....................................................................... 156 B.3 Correspondence Language S y n ta x ...................................................................156 B.4 Preference Specification S y n ta x ...................................................................... 158 C W orldB ase P ro to ty p e Im p lem en tation 160 v C .l Internal D ata S tru c tu re s ...................................................................................160 C.1.1 R egistry D ata S t r u c t u r e s ..................................................................160 C.1.2 W orld D atabase Support S t r u c t u r e s ............................................. 161 C.2 Supported F u n c tio n s ............................................................. 165 C.2.1 T he R e g i s t r y .......................................................................................... 165 C.2.2 W orldBase W orkspace S u p p o r t .............................. 167 C.2.3 M iscellaneous Library F u n c tio n s .................................................... 181 D T h e P ro to ty p e in O peration 184 D .l C reate and Populate W orld D a ta b a s e s ........................................................188 D.1.1 C reate E ntertainm ent Schema and W o rld s ...................................188 D.1.2 C reate R estaurant Schem a and W orld . . . ............................... 195 D .l.3 C reate R ecom m endation Schema and W o r ld s ............................202 D.2 Transform ing W o rld s .................................................................... 209 D.2.1 Bulk T ra n sfo rm .................................................................... 209 D.2.2 Closure T ra n s fo rm ................................................................................219 D.3 M erging W o rld s......................................................................................... 226 D.3.1 Merging: Identical S c h e m a ............................................ 227 D.3.2 Merging: O verlapping S c h e m a s ....................................................... 234 D.4 Verbose M ode O p e r a tio n s ............................................................................... 243 D.4.1 Restore S ch em a.......................... 243 D.4.2 Clearing Dom ains .................................................................... 247 D .4.3 Bulk T ra n sfo rm ........................................... 249 D.4.4 Closure T ra n s fo rm ................................................................................253 D.5 Persistent S to r e .................................................................................................... 261 D.5.1 W orld S c h e m a s ........................................................... . 261 D.5.2 W orld Instances ...................................................................................262 VI L IST O F F IG U R E S 1.1 G aining Unified Access to M ultiple D atabases using W orldBase . . 8 1.2 Jo e’s E ntertainm ent S ch em a......................................................................... 10 1.3 P au l’s Recom m endation S c h e m a ........................................................... . 11 1.4 R estau ran t Guide S c h e m a ............................................................................ 12 1.5 An Exam ple of Accessing M ultiple D atabases using W orldBase . . 13 1.6 Transform ation S c e n a rio ................................................ 16 4.1 Person Inform ation S c h e m a ......................................................................... 66 4.2 Closure Traversal A lg o rith m ......................................................................... 70 4.3 F ind Closure of O b je c t................................................................................... 70 4.4 F ind Closure of T u p l e ................................................................................... 71 5.1 Source Schem a ( A ) .................... 77 5.2 Target Schem a ( B ) .................... 78 5.3 Bulk Transform A lg o rith m ................ 91 5.4 Closure Transform Top Level F u n c tio n ....................... 96 5.5 Transform O bject’s Closure F u n c tio n ....................................................... 97 5.6 Transform T uple’s Closure F u n c t i o n ....................................................... 98 5.7 G enerate Target T u p l e s ..................................................................... 98 5.8 In stan tiate Target V a ria b le s .............................................................................100 6.1 A lgorithm to com pute the Weak Merge ....................................................... 110 D .l An E xtended Exam ple using W o rld B a s e .....................................................185 vii A b stract T his dissertation provides a fram ework to study th e problem of sharing infor­ m ation in a netw ork of highly autonom ous w orkstations. It focuses on provid­ ing the necessary basic support for accessing shared inform ation, and provides a fram ework to study other forms of synchronized sharing. This thesis introduces W orldB ase-a system for storage and access of distributed, possibly inconsistent, possibly overlapping inform ation. W orldBase provides flexible structure m anipula­ tions, allowing users to conceptually group related objects and relations together, and share them am ong people on the network. T he W orldBase environm ent supports storage of clustered, stru ctu red infor­ m ation. T he unit of inform ation cluster is called a world database (or sim ply a world). A world consists of a collection of objects and relationships. It is a unit for conceptual grouping, focusing, com m unicating, and sharing of inform ation. Each world has a world schema specification, and a collection of other specifications th a t constrain th e world. A world schem a specification specifies the schem a of objects and relationships in the world. T he world schem a specification and other specifications constrain a w orld’s population and express some of its sem antics. W orldBase provides a “file-like” paradigm to deal w ith worlds. It uses the virtu al m em ory of a w orkstation to provide a WorldBase workspace w ith which the user interacts. T he user has access to collections of worlds and their specifications in persistent store b u t cannot access their contents directly. Loading a world from persistent m em ory to a workspace activates the world so its contents can be accessed by th e user. T he user m ay load m ultiple worlds into a workspace, effectively m erging them . Each workspace contain a working database consisting of a m erge of all loaded worlds. Different com binations of worlds (loaded and m erged in the workspace) m ay result in different, possibly conflicting, working databases. Support for m anipulating worlds in workspaces is provided. T he user m ay create worlds, create and assert inform ation into them , and save them to persistent store. W orldBase provides a selection m echanism to extract a database subset of interest when creating or adding to a world. It supports transformation of inform ation in worlds. It supports merging of worlds in th e workspace such th a t two objects in distinct worlds representing the sam e real world object m ay be m erged into a single object in th e workspace. These operations effectively support user access to m ultiple worlds. A prototype of W orldBase is im plem ented in AP5, a d atab ase program m ing language extension of Com m on Lisp developed at US C /Inform ation Sciences In­ stitu te. T he prototype provides an effective platform for accessing m ultiple object- bases and for studying various forms of sharing synchronizations and u p d ate prop­ agation am ong autonom ous object-bases. T he focus of th e W orldBase system is to provide th e basic prim itives for dealing w ith object-base collections; sophisticated user interface is not provided. This is appropriate because each environm ent in which it is em bedded will present a different facade to the user, and a different orga­ nization of prim itives in to common, yet complex operations. W orld specifications are also separated into their prim itive com ponents: schema specifications, clo­ sure specifications, equivalence specifications, and constraint specifications. These define th e sem antics of an object-base. By avoiding overcom m ittal to complex functionality and specifications, W orldBase provides a sound basis on which to build sharable w orkstation environm ents. T he support W orldBase provides form the basic, prim itives w ith which to access object-bases. T he set of operations W orldBase provides can be used to im plem ent a m ore complex set of operations on collections of object-bases. The prim itives help isolate th e problem s w ith each specific step, and allow users to focus on each p articu lar step before going on to the next step. C h ap ter 1 In tro d u ctio n W ith the ever increasing popularity of databases and electronically stored infor­ m ation, th e need to provide access to a m ultitude of possibly overlapping, possibly inconsistent databases is growing. T he proliferation of powerful personal com put­ ers and w orkstations has given individuals access to m ore inform ation from various different sources. Furtherm ore, the wide acceptance of th e object paradigm for m odeling inform ation establishes a need for m echanism s th a t allow individual users to access, m anipulate and possibly share possibly overlapping, possibly inconsis­ ten t persistent object-bases. We call the support the m echanism s provide dis­ tributed information sharing support (also called “m ultidatabase interoperability” [47]), and a system th a t provides these support a distributed information shar­ ing system. This thesis introduces and investigates an approach to distributed inform ation sharing. One of the prim ary requirem ents of a distrib u ted inform ation sharing system is th a t it m ust incorporate, at a fundam ental level, the fact th a t d a ta found in differ­ ent databases m ay be inconsistent. In addition, th e system m ust support a focusing mechanism th a t allows a user to focus on selected subsets of different databases; a transformation mechanism to restructure d a ta from different databases; and merging mechanism to m erge d a ta from different databases. A nother requirem ent is th e ability to cache the database subsets (views) obtained through selection, transform ation and m erging. For the system to support full sharing, instead of m ere access, th e system m ust support update of the underlying databases through m anipulations of the cached views. 1 T his thesis introduces a system called WorldBase th a t provides a novel ap­ proach to supporting m ost of these capabilities in a rich and highly flexible fram e­ work, and provides a foundation for investigating the others. It supports a user environm ent w ith richly stru ctu red access to d a ta from m ore th an one database. Each inform ation repository, called a world database (or sim ply a world), contains a cluster of related inform ation and is a basic unit for sharing inform ation across a netw ork. In addition to presenting the overall architecture of W orldBase, this thesis also describes m echanism s provided for selecting inform ation from worlds and underlying databases; for transforming d a ta from one schem a to another; and for merging the d a ta in two or m ore worlds to create a new world. T he W orldBase system is based on the W orldBase paradigm , which is an ex­ tension and generalization of th e W orlds paradigm of [73, 76, 74, 75]. T he basic construct there is the world, an aggregation unit used to cluster database objects and relationships th a t are strongly, conceptually related. W orlds are like database views w ith two unique properties: (1) world specifications describe the schem a and properties of inform ation perm itted in worlds, and (2) a recursive popula­ tion algorithm derived from th e world specification is used to form the aggregates. T raditional views are generally form ed from collections of m onolithic types and relations; in contrast, W orlds perm it an orthogonal, concern-based selection of d a ta which retrieves only those aspects of objects and relations relevant to a given context. A nother essential aspect of W orldBase originally proposed and developed in the W orlds system is th e overall m anner in which they are used. In particular, there is a strong analogy between worlds as used under this paradigm , and files as used in m ost w ord-processing environm ents. W orlds and W orldBase provide an ability to share worlds which is sim ilar to the sharing of files. A WorldBase workspace is the W orldBase analog of a file editor buffer: loading a world into a W orldBase workspace is like bringing a file from persistent store into an editor buffer. Bringing a file into a buffer which already has an existing file (if supported) m ay cause th e new file to be appended to the existing one, i.e. like m erging th e two files. Likewise, loading a world into a workspace containing an existing world will cause m erging of the inform ation, although unlike m erging files in buffers, world m erging can cause stru ctu ral merging. M odifications can be m ade to the world(s) 2 in the W orldBase workspace, ju st as files m ay be edited in a buffer. Changes to a file are not perm anent until th e file is saved. Likewise, m odifications to worlds in the W orldBase workspace do not persist until the worlds are committed (via the sa v e -w o rld operation). T he user m ay create new world databases, populate them w ith d ata, and save them to persistent store. T hus, W orldBase can be used as an interface through which users interact w ith m ultiple databases by caching focused subsets of the databases in h is/h e r workspace and saving interm ediate results. W hile each world m ust be internally consistent, users can sim ultaneously m aintain and access nu­ m erous worlds; in this way the W orldBase paradigm accom m odates inconsistency betw een different databases. Furtherm ore, the world population algorithm provides n atu ral filtering and selection of clusters of d a ta which can be perform ed before com m itting a world. Analogous to files, worlds can serve as a n atu ral unit of com m unication between users for tran sm ittin g stru ctu red d a ta sets. Also, because worlds provide the abil­ ity to extract d a ta w ith a given focus, they provide a n atu ral unit for downloading to personal and lap-top com puters for short-term d a ta access and m anipulation (e.g., while on business trips). W orldBase extends th e original W orlds paradigm to the larger arena of m ulti­ ple, diverse, general-purpose databases in the effort to provide autonom ous sharing of diverse inform ation. In addition to the focusing support provided in Worlds, W orldBase provides m echanism s for selecting inform ation from worlds and u n ­ derlying databases; transforming d ata from one schem a to another; and merging the d a ta in two or m ore worlds to create a m erged database. These features can be used in th eir current form to provide access (but not u p d ate) to m ultiple databases, and provide m ost of th e basic capabilities required for a distributed inform ation sharing system . T he issue of propagating updates from worlds to u n ­ derlying databases is not addressed in this thesis; however, extensions of relational view u p d ate m ethodologies such as [21, 20, 17, 39, 65] will probably be used in future research. T he current form of W orldBase can be used as a platform to study the issues involved in providing support for autonom ous sharing of inform ation in a network. 3 Different forms of sharing synchronization m echanism s m ay be studied w ith this platform . 1.1 T h esis G oals W orldBase is based on the philosophy th a t th e universe consists of inherently inconsistent, differently structured collections of d a ta w ith different, possibly con­ flicting sem antics. T he goal of this research is to provide th e underlying support for an environm ent from which users m ay interact w ith this universe of inform a­ tion, i.e. to study and provide th e various tools required for access and sharing of possibly overlapping, possibly inconsistent databases. We introduce a paradigm th a t supports this philosophy and introduce a system based on this paradigm . T he system , called W orldBase, provides basic support needed for accessing dis­ trib u ted inform ation. It provides a group of users the capability to define, classify, interrelate, retrieve and share a universe of diverse inform ation, and supports au­ tonom ous access to shared databases. W orldBase is intended for an autonom ous netw ork w ith no central control. Because of the sim plicity of the d ata model used in th e prototype, the class of intended applications revolves around personal databases and office inform ation system s. However, the paradigm and support tools could be reim plem ented w ithin any environm ent to support shared access to diverse inform ation w ithout central­ ized control. One em phasis of this thesis is on the autonom y of users of th e system , i.e. their ability to function and operate independent of a global or rem ote authority. T he user is provided w ith m echanism s to retrieve, m anipulate and access diverse inform ation w ith no central intervention. A nother em phasis of this work is on a specificational approach to sharing in­ form ation as opposed to an operational1 one. A world has associated w ith it a set of specifications th a t constrain th e inform ation in the world. This specifica­ tional approach aids the sharing process by describing the sem antic underpinnings 1The database community generally uses the terms “declarative” and “procedural.” 4 of the data, thereby helping users to m ake sharing decisions intelligently. It also facilitates th e form al analysis of the various aspects of the sharing process. 1.2 O verview o f W orldB ase D atabase sharing is a complex problem , and W orldBase is a complex system th a t isolates several of the functionalities needed. In this section, we give an overview of W orldBase. F irst, we describe the com ponents of th e W orldBase environm ent. Next, we provide an overview of sharing support provided by W orldBase and illustrate th e various support using an extended example. T ranscripts of actual executions using the prototype w ith the sam e exam ple are provided in A ppendix D. A prototype of W orldBase is im plem ented in AP5[15, 14, 16], a database pro­ gram m ing language which extends Com m on Lisp. It is described in detail in C hapter 7. In the rem ainder of this section, we describe th e tools W orldBase supports to facilitate sharing of diverse inform ation and present exam ples to illustrate their function and use. 1.2.1 W orld B ase E n viron m en t In th e fram ework proposed here, a distributed inform ation environm ent is modeled as a netw ork of w orkstations. Each w orkstation has W orldBase system installed, and is operated by a single user at a tim e. A user m ay own m any databases (or worlds) which are uniquely identified in W orldBase by their owner nam e, world nam e pair. A com ponent of W orldBase called the WorldBase registry (or sim ply registry) keeps track of worlds, their schemas, and their location in the persistent store(s). A user cannot access d a ta in the persistent worlds directly. He2 m ust load specific worlds into the virtual store to access th e inform ation contained in them . A WorldBase workspace (or sim ply workspace) refers to the v irtual store of th e w orkstation. It contains of a v irtual m em ory database th a t consists of a 2 “He” is used in non-gender form. It refers to a male or female user. 5 workspace schema th a t is form ed from com patible world schemas th a t are loaded into th e workspace, and a workspace instance com posed of com patible worlds th a t are loaded into th e workspace. Each workspace is disjoint from other workspaces in th e network, regardless of its contents. W orkspaces do not persist. A world database is a repository of inform ation. It has a set of specifications which consists of the w orld’s schema specification, object equivalence specification, an optional closure specification, and constraints specification. It has a world instance consisting of an interrelated fam ily of objects and relationships th a t m ake up the w orld’s population. T he closure specification is used to extract inform ation from the workspace database, or an underlying database, to be the population of the world. In the absence of a closure specification, a default m echanism m ay be used to m ake a world from the workspace instance, restricted to the schem a of the world. Saving a world causes th e structure form ed by the objects and relationships (i.e. world instance) to be preserved in persistent store. Loading a world database causes th e w orld’s schema, specifications and its population to be (re)created in the workspace (subject to m erging). Operations on th e world instance in the w orkspace m ay modify the stru ctu re in the workspace by deleting some objects or relationships or adding other objects and relationships. T he changes to the struc­ tu re in virtual m em ory are not persistent until the world containing th e changed stru ctu re is saved. T he user m ay create another world to contain th e changed world instance, and save the new world, leaving the old world containing the orig­ inal world instance unchanged (different versions of the sam e inform ation). The user can change the w orld’s specifications as easily; however, changes to th e world schem a m ust be propagated to all worlds using th e schema. T he population of a world is defined by th e user. For instance, given inform ation about entertainm ent spots (such as those whose schem a is shown in Figure 1.2), the user can store all of his entertainm ent inform ation in a single world or he m ay distribute his inform ation into several world instances to be stored in separate worlds. He m ay p artitio n the entertainm ent inform ation into disjoint worlds (e.g. group them into worlds based on their location) or overlapping worlds (e.g. group entertainm ent spots based on different people’s recom m endations). 6 Inform ation in W orldBase is m odeled in a sem antic database m odel called WorldBase Data Model (or W DM ) described in C hapter 3. Briefly, W DM is a m odified E ntity Relationship m odel [11] th a t includes subtype relationships (more specifically, specialization relationships in the sense of [1, 35]) b u t does not perm it a ttrib u tes (these axe sim ulated by relationships). E n tity types include abstract types, subtypes and printable/value types. Rela­ tions range over subsets of n-axy C artesian product of entity types. T he specializa­ tion relationship relates subtypes to ab stract types or subtypes. D irected cycles of subtype relationships are not p erm itted in a W DM schema. Also, all m axi­ m al directed paths of subtype relationships from a given subtype end at th e same ab stract type. C ardinality constraints m ay be specified on binary relations. M em­ bers of ab stract types are called abstract objects; these correspond to physical or conceptual objects in the world. In some im plem entations these are represented conceptually using object identifiers (surrogates); in other im plem entations, in­ cluding th a t of the prototype, these are represented by internal addresses over which the system program m er has little control. In diagram s depicting W DM schemas (see Figures 1.2, 1.3 and 1.4), printable nodes are depicted using rectangles, ab stract nodes using diam onds, subtype nodes using (em pty) circles, and relation nodes using circles w ith x ’s in them . Subtype relationships are depicted using bold arrows, and com ponent edges of relations are depicted using lines (the direction of com ponent edges is easily inferred since they originate at relation nodes and term inate at printable, ab stract or subtype nodes). We som etim es have duplicate versions of printable types in th e schem a for clarity of representation. O bjects in W orldBase have two different representations. O bjects in the w orkspace (i.e. virtu al m em ory) are represented by internal addresses. Con­ ceptually, we call these internal addresses OIDs (O bject Identifiers). O bjects in secondary memory (by virtue of being in a saved world) are represented by lo­ cal identifiers th a t uniquely identify the objects w ithin the world, b u t are not m eaningful across worlds. We call these identifiers PIDs (P ersistent Identifiers). Loading a world instance causes each PID in the instance to have a corresponding OID in the workspace it is being loaded into, either by creating a new OID or by 7 discovering its equivalence to an object w ith an existing OID. In th e second case, an object m erge has occurred. 1.2.2 Sharing S u p p ort in W orld B ase merged database subsets compatible database subsets database subsets Databases DB1 Merge Transform Select DB2 Figure 1.1: G aining Unified Access to M ultiple D atabases using W orldBase T he overall fram ework of sharing support W orldBase provides for distrib u ted in­ form ation access is shown in Figure 1.1. T hree m ain functionalities are provided. T he first is a selection m echanism to extract a database subset of interest. In p articu lar, users can specify a tem plate (called a closure specification) and a set of seeds from which a world is to be recursively populated. T he selection m echa­ nism provides a focusing m echanism allowing the user to extract relevant related inform ation as specified by th e tem plate. 8 W orldBase supports transformation of a world (source) of one world schem a into another world of a different world schem a using ILO G * [37], a Datalog-like language which supports object creation. Unlike the transform ation m echanisms of [33] and m ost previous schem a integration m ethodologies [6] , which are generally- based on iterating local stru ctu ral m anipulations, ILOG* is based on a least fix- point sem antics stem m ing from logic program m ing, and is am enable to theoretical investigation [38]. W orlds extracted from m ultiple databases are transform ed to com patible worlds which are then merged into a single database in th e workspace. A novel feature of W orldBase is the prom inence th a t it gives to the issue of autom atic merging of the d a ta stored in different worlds. U nder th e approach of W orldBase, d a ta to be m erged is first transform ed using ILO G ^ to have com patible stru ctu rin g (schem as). A fundam ental issue th a t arises in d atab ase m erging is th a t of object equivalence.3 Two objects from distinct databases (possibly over the same schem a or com patible schem as) are considered equivalent if they correspond to the sam e conceptual or real-w orld object. If two objects are equivalent, they should be merged when com bining the two databases. W orldBase includes a specificational m echanism based on a generalized notion of keys so th a t object m erging can be perform ed autom atically (see also [71]). It uses object equivalence specifications to specify when two objects from different worlds refer to th e sam e conceptual object in the workspace (i.e. when they are equivalent). Equivalent objects from different worlds are merged in the workspace to correspond to a single object when the worlds are loaded. We call objects whose types have an object equivalence specification associated w ith them to be mergeable objects. W here the newly m erged database violates its constraints, preferences m ay be specified to autom atically remove the conflicting data. Hence, a preference specification is explicitly attached w ith th e m erging process for this purpose. W orldBase also provides some of the capabilities needed to serve as the sole in­ terface to m ultiple databases. Specifically, users can cache and re-use worlds they have constructed. W orlds serve as a n atu ral unit of com m unication for two rea­ sons: F irst, worlds include their specifications, which includes in tu rn the schem a 3A variant of this issue arises if one attempts to describe, using an external query language, an object stored in a database. 9 by which the world is structured. Second, th e d a ta selection m echanism of worlds perm its th e inclusion of d a ta relationships to provide th e context relevant to ob­ jects in th e world. 1.2.3 D a ta A cc ess U sin g W orldB ase: S electio n name string phone string entertainment rating string string specialty current-showing string theatre restaurant string string r-type _ avg-pnce string Figure 1.2: Jo e’s E ntertainm ent Schem a We now introduce an extended exam ple which will be used to illustrate the various aspects of the W orldBase system . T he sam e exam ple is given in detail in A ppendix D. T he exam ple shows how a user m ay access and m erge database fragm ents from different, inconsistent, databases. In this exam ple, the user is interested in Japanese restau ran ts from Jo e ’s entertainment-world database whose schem a (entertainment) is shown in Fig­ ure 1.2, and highly recom m ended restau ran ts of P a u l’s recommendation-world database whose schem a (recommendation) is shown in Figure 1.3. T he user is also interested in a public restau ran t guide database called restaurant-info whose schem a (restaurant-guide) is shown in Figure 1.4. To view the fragm ents from Jo e ’s and P au l’s databases, the user first extracts Japanese restau ran ts from 10 string specialties string r-type r-name string r-phone string integer cost string integer restaurant r-address recommendation person string p-name Figure 1.3: P a u l’s R ecom m endation Schema Jo e’s entertainment-world database, highly recom m ended restau ran ts from P a u l’s recommendation-world database, and transform s them to be of com patible schemas so they can be m erged w ith the restaurant-info database into a single working database in the workspace. To simplify the exam ple, we do not deal w ith con­ straints. C onstraints are dealt w ith in C hapter 6. Figure 1.5 illustrates th e steps taken in this exam ple. Each box in the figure represents a world; th e first string in the box is the nam e of the world, and the second is the nam e of the its schema. Each box is connected to another by a line and a description of the steps taken by the user to create and populate one world from another. A m ore extensive figure is illustrated in A ppendix D. To ex tract inform ation from Jo e’s entertainment-world, the user first loads th a t database into a new (em pty) workspace. This is perform ed by first loading the world schem a of th a t database - entertainment. Types and relations corresponding to those defined in the world schem a specification are created in th e workspace. T his causes the types and relations to become the workspace schema. Next, the user loads the world instance into the workspace. Loading th e instance causes all 11 string address string string location stnng specialties hours stnng classification facilities avg-pnce rest­ aurant integer stnng fast- food foreign amencan Figure 1.4: R estau ran t Guide Schem a 12 extract highly recommended restaurants extract japanese restaurants transform to recommendation instance > t merge- same schema merge compatible schema \* \ i res taurant-info restaurant-guide joe-rec-1 recommendation my-rec recommendation entertainment-world entertainment recommendation-world recommendation joe-entertainment-world entertainment paul-recommendation-world recommendation F igure 1.5: An Exam ple of Accessing M ultiple D atabases using W orldBase 13 objects and relationships in Jo e ’s recommendation-world to be created (or dupli­ cated) in the workspace. Notice th a t loading a world into an existing workspace m ay involve m erging of th e world w ith the existing schem a (see Section 1.2.5). Saving a world creates a persistent im age of all th e inform ation (i.e. popula­ tion) in the world. T here are two ways a world m ay be saved. T he first (schema- based) approach saves ail p arts of th e instance associated w ith th e workspace subschem a th a t is p a rt of the world schema. This is essentially a projection of the contents of the workspace. T he second (closure-based) approach is a stylized view definition m echanism based on closure specifications. To m otivate the sec­ ond approach, im agine th a t in th e exam ple, the user is interested in selecting highly recom m ended restau ran ts from P au l’s recommendation database. However, a r e s t a u r a n t object is sim ply an OID, i.e. an internal pointer in the workspace. P u ttin g this isolated object in a world only pu ts a “p o in t” into th e world. The restau ran t object is m ore m eaningful in a world if its relevant relationships are also p u t in the world, such as r-nam e, r-p h o n e and s p e c i a l t i e s , etc. Some of its relationships (such as recom m endation) m ay relate it to another object (e.g. p e rso n ), in which case, some of this other o bject’s relationships (e.g. p-name) m ay also be included. A closure specification specifies a tem plate which directs, given th e set of seeds of a world, w hat inform ation should be extracted from th e workspace database to form its closure. This specification consists of a (possibly em pty) closure speci­ fication for each type or relation of th e world specification schema. T he closure m echanism of W orldBase autom atically includes all objects and relationships re­ lated to the seed (as dependents) by the closure specification in th e world it is creating. B oth objects and tuples are perm itted as seeds as long as there are closure specifications specified for their types and relations in the w orld’s spec­ ification. T he gram m ar and sem antics for the W orldBase closure specification language is described in C hapter 4. To create a new world to contain Japanese restau ran ts to be saved w ith a closure specification, th e user first creates a world database (call it joe- entertainment-world) w ith Jo e’s entertainment schem a as its world schem a specifica­ tion. In th e exam ple, th e closure includes all objects and relationships reachable from e n te rta in m e n t and r e s ta u r a n t. Next, he specifies th e set of Japanese 14 r e s t a u r a n t objects as the seed set for the world. T he actual population of th e world will consist of all the inform ation relevant to Japanese r e s ta u r a n ts . T his world is saved into persistent store. T he inform ation is ex tracted from the workspace database which currently holds only Jo e’s entertainment-world database, ' and th e newly created world. To conclude the selection portion of the extended exam ple, we assum e th a t th e user, in an analogous fashion, creates a second world containing t highly recom m ended restau ran ts (call it paul-recommendation-world) from P au l’s recommendation-world database. W orldBase supports renam ing of schem a struc­ tures while loading; this is described in Section 1.2.5. 1.2.4 D a ta A cc ess u sin g W orldB ase: T ran sform ation U ltim ately, th e user would like to view sim ultaneously the restau ran t recom m en­ dations and restau ran t inform ation for th e restau ran ts he extracted. In this exam ­ ple, P a u l’s recommendation schem a is com patible w ith the restaurant-guide schema, b u t neither of these is com patible w ith Jo e’s entertainment schema. For exam ple, r e s t a u r a n t is a subtype in entertainment, b u t is an ab stract type in recommenda­ tion and restaurant-guide (see Section 1.2.5 for m ore details). T he user transform s Japanese restau ran ts from Jo e’s entertainment-world (under Jo e’s entertainment schem a) to be an instance of P au l’s recommendation schema. T here are two ways W orldBase supports transform ation of instances. T he first is by transform ing th e saved instance joe-entertainment-world to be an instance of recommendation schem a by a process known as bulk-transform. All th e objects and relationships from joe-entertainment-world are transform ed into an instance of recommendation. T he second perm its sim ultaneous selection and transform ation. T he user m ay extract and transform Japanese restau ran ts from entertainment- world directly. T his process is known as closure-transform. T he transform ation is based on closure specification of the target world. We briefly describe th e two approaches using th e exam ple below. In b o th transform ations, the user specifies th e correspondences betw een the source and targ et schemas in the ILO G * specification language. T he sem antics 15 and exam ples of transform ation specifications in ILO G* are presented in C hapter 5. workspace instance Source instance Source schema input Transformation specification output Target schema Target instance (generated) instance Figure 1.6: Transform ation Scenario T he transform ation process is achieved in a series of steps (illustrated in Figure 1.6). We assum e the user has a source world schem a (entertainment schem a of Figure 1.2), source world instance (joe-entertainment-world in bulk-transform , and entertainment-world in closure-transform ), a targ et world schem a (recommendation schem a of Figure 1.3) and a transform ation specification in ILOG± th a t transform s instances of entertainment to instances of recommendation. F irst, th e source world schem a is loaded into a workspace.4 T hen th e source instance(s) to be transform ed are loaded into the sam e workspace. Next, th e target world schem a is loaded into the sam e workspace, distinct and disjoint from the source schem a in th e workspace. To bulk-transform , the user initiates th e bulk-transform operation w ith the transform ation specification. T he transform ation specification is com piled and 4To ensure that no other instances of the source world schema exists in the workspace that could affect the resulting transformation, we assume that the workspace is cleared of all instances. 16 new objects and tuples are created as instances of th e targ et schem a to correspond to those instances of th e source. If the transform ation process finishes w ithout aborting and no consistency constraints are violated, th e user m ay create a new targ et world (call it rec-from-joe-1) to contain the newly created targ et instance. To do closure-transform , the user m ust first create a targ et world database (call it rec-from-joe-2) w ith its world closure specification defined, and provide a correspondence specification (call it seed-correspondences) for th e initial set of seeds for th e target world. In this exam ple, th e seed correspondences specify the transform ation of Japanese restau ran ts using ILO G *. T he user initiates the closure-transform operation th a t evaluates and tra n s­ form s only relevant subsets of the source instance into instances of th e target schem a and target world. T he process first transform s th e initial set of seeds into objects of recommendation schem a th a t corresponds to Japanese restau ran ts and tries to do a closure on these seeds based on the ta rg e t’s closure specification. The closure process essentially tries to look up the transform ation specification and transform only th e relevant inform ation. If the transform ation process finishes w ithout aborting and no consistency constraints are violated, rec-from-joe-2 con­ tains th e targ et instance. In fact, rec-from-joe-1 and rec-from-joe-2 m ay contain equivalent inform ation, depending on the closure specifications used. 1.2.5 D a ta A cc ess u sin g W orldB ase: M ergin g To view the resulting transform ed world rec-from-joe-1, highly recom m ended restau ran ts in P a u l’s recommendation-world, and restau ran t guide inform ation in restaurant-info sim ultaneously, the user m ust load the worlds and world schemas into the same workspace, m erging the relevant p arts of the schemas and instances. We first describe w hat happens when the user loads two world instances of the sam e schem a into the sam e workspace, and then explain w hat happens when two worlds w ith distinct world schem as are loaded and m erged. For sim plification, we do not discuss m erging of equivalence specifications and constraint specifications in this section. These issues are discussed in detail in C hapter 6. 17 M erging: Iden tical Schem as T he user starts w ith a new workspace and loads recommendation schem a and rec- from-joe-1. Loading paul-recommendation-world into the sam e workspace causes highly recom m ended restau ran ts and their associated relationships to exist in the workspace. Since rec-from-joe-1 and paul-recommendation-world m ay contain in­ form ation for the sam e real world restau ran t, a user m ay specify an equivalence specification for r e s t a u r a n t to m erge them . Intuitively, an object equivalence specification specifies w hen two objects refer to the sam e real world object. T he collection of object equivalence specifications for a world is called world equivalence specification. W orldBase supports a two phase approach to m erging databases during load­ ing. F irst, objects are m erged or created based on the world equivalence specifi­ cation. T he o b ject’s equivalence specification is used to check if an OID already exists in th e workspace for a given PID th a t is to be loaded by looking for an OID w ith equivalent properties; otherw ise a new object (O ID ) is created as the virtual equivalent of th e PID (this sem antics is called exist-or-create). If no equivalence specification is specified for an object (PID ) type, a new OID is created to rep­ resent th e PID . N ext, other relationships are asserted (and possibly m erged). In th e second phase, constraints are checked. M echanism s are supported to deal w ith constraint violations. In th e exam ple, the user specifies (in th e world equivalence specification of both worlds) th a t r e s t a u r a n t objects are identified by the equivalence of their r-nam e and r-p h o n e relationships (i.e. if two r e s t a u r a n t objects have equivalent r-nam e and r-p h o n e values/objects, they refer to th e sam e real world object). R e s ta u ra n t objects in paul-recommendation-world are loaded into the w orkspace w ith an “exist-or-create” sem antics. T hus, restau ran ts w ith equivalent r-nam e and r-p h o n e values are merged. There is a potential conflict if two distinct objects are m erged, b u t th e val­ ues of its other relationships differ. For instance, the sam e r e s t a u r a n t object m ay have several s p e c i a l t i e s because th e relationship m ay have different values from the two worlds. There are several possibilities when faced w ith such po ten ­ tial conflicts. One m ay keep all relationships (unless cardinality constraints are violated); provide conflict resolution; abort th e m erge and notify the user, etc. 18 A preference m echanism described in C hapter 6 allows a user some control over which relationships are to be removed to avoid constraint violations. M erging: C om p atib le Schem as Since th e user is also interested in th e restaurant-info which has detailed infor­ m ation about restau ran ts, he loads th e restaurant-guide schem a into th e sam e w orkspace th a t holds recommendation schema. W orldBase provides an autom atic renam ing m echanism for schem a structures during loading. Types and relations of a schem a m ay be loaded w ith th eir original nam es, or they m ay be renam ed in the workspace. T he current im plem entation of W orldBase supports simple schem a stru ctu re m erging using name equivalence. Speaking broadly, two types or rela­ tions from different schem as denote a single type or relation in the workspace if they have the sam e nam e in the workspace. T he user can load restaurant-guide into the workspace and m erge selected struc­ tures w ith existing schem a in the workspace. A ssum ing th a t recommendation was loaded w ithout renam ing, the user m ust renam e name to r-nam e, phone to r-p h o n e, c l a s s i f i c a t i o n to r - ty p e , a v g -p ric e to c o s t, to m erge r e s ta u r a n t objects and the renam ed relationships. However, he should not renam e lo c a tio n to r - a d d r e s s , since th a t will lead to stru ctu ral conflicts. Two schemas conflict (are incompatible) if they have th e sam e nam e for struc­ turally different types or relations in the workspace (also called structural conflict). W orldBase considers relations w ith the sam e nam e and different type constraints to have stru ctu ral conflicts. In this exam ple, w ith th e above renam ing, th e types and relations w ith th e sam e nam e (in the workspace) from recommendation schem a and restaurant-guide schem a do not conflict structurally. O n th e other hand, en­ tertainment schem a and recommendation schem a have potential stru ctu ral conflicts since r e s t a u r a n t is an ab stract type in recommendation and is a subtype in en­ tertainment. T he two schemas could be loaded together if those structures axe renam ed to be distinct. W orldBase does not prevent the renam ing of r e s ta u r a n t in recommendation to e n te rta in m e n t, which m ay violate the sem antics of enter­ tainment schema. If the schem a structures axe m erged, there is usually a m erging of th e instances as well. T he sam e object equivalence sem antics are used to m erge instances of the 19 overlapping schema. However, although th e schem as m ay be com patible, their instances m ay not be. Different users m ay in terp ret the sam e schem a differently. In general, it is im possible to prevent m istakes. W orldBase catches errors only w hen constraints are violated. It is the responsibility of th e user to ensure th a t d a ta from different users are com patible and interpreted in th e sam e way. 1.3 S cop e o f T h esis T he m ain focus of th e research is to provide a fram ework and prim itive support required for accessing and m anipulating sharable distributed inform ation. T he pri­ m ary contributions of the thesis are the design and im plem entation of th e W orld- Base fram ework and support for selection, transform ation and m erging. We focus on the underlying m echanism s required by th e W orldBase environm ent; specifi­ cally, the im plem entation of world databases, the interactions w ith the workspace, and selection, transform ation and m erging tools. W orldBase provides th e bare necessities required to support autonom ous shar­ ing of distrib u ted inform ation. A dditional facilities th a t will be needed await fu rth er research for th eir resolution. One such issue is u p d ate propagation. A n­ o ther im p o rtan t unresolved issue concerns th e choice and organization of m eta­ d a ta inform ation, i.e. inform ation on properties and inter-relationships of worlds and world schemas for m ore effective sharing. T he current prototype uses a sim ­ ple, directory structure; this places a burden on users to rem em ber th eir nam ing conventions. M any critical issues related to physical design and im plem entation of dis­ trib u ted database system s are not addressed here. Such issues include: • Efficient storage; • Concurrency control and locking mechanisms; • Synchronization issues in sharing; • User Interface; • Access control; 20 • D ata transm ission; • Network issues. In defense of th eir omissions, m any of these issues involve policy issues in a software developm ent environm ent. Here, we m erely try to provide the prim itives upon which such an environm ent can be built. 1.4 O rgan ization o f T h esis C hapter 2 provides an overview of background and related research. C hapter 3 describes the overall W orldBase fram ework. C hapters 4, 5, and 6 describe the selection, transform ation and m erging m echanism s, respectively. C hapter 7 de­ scribes the W orldBase prototype developed as p art of th e thesis research. C hapter 8 provides a sum m ary of the thesis and described contributions and directions for future research. T he appendices provide some of the background and various details om itted in th e actual thesis. A ppendix A describes AP5, a database program m ing language th a t provides v irtual m em ory database support used by the W orldBase prototype. A ppendix B provides gram m ars of th e different specification languages described in th e thesis. A ppendix C describes im plem entation details of th e prototype. Finally, A ppendix D provides tran scrip ts of the prototype in operation. 21 C h ap ter 2 B ack grou n d and R ela ted R esearch Personal com puters have been the fastest growing sector of th e com puter industry in the last decade. They are widely used in personal as well as corporate sectors. M any of the m achines have access to a network. Providing the right environm ent for dealing w ith m ultiple databases is vital. F urther, th e wide acceptance of the object paradigm requires th e environm ent to support th e object paradigm and provide access to persistent object-bases (PO B s). T he environm ent m ust support sharing of inform ation in a world of autonom ous users, each having his own collection of PO B s, loosely connected in a network. Unlike trad itio n al databases, w here a global schem a is im posed on the users and sharing support m eans supporting concurrent access to th e central database, in personal com puter and w orkstation environm ents, the sharing is m ore diverse. In fact, a spectrum of sharing is possible. At one extrem e, several users m ay fully' share a specific PO B , w here they agree on all its stru ctu re and sem antics (logically centralized, no user autonom y). At the other extrem e, users m ay each have their own distinct PO B s w ith varying stru ctu re and sem antics (logically decentralized, full user autonom y). W ithin the spectrum , different form s of sharing policies for different collections of PO B s m ay be agreed upon by a group of users, such as sharing only on a specific set of PO B s w ith one user group, and a different set of PO B s w ith other groups. Also, th e agreem ents m ay be on several aspects: access (such as read or m odify) and stru ctu ral sem antics (unit of m easures, in terpretation agreem ents), etc. T he environm ent th a t supports th e spectrum of sharing m ust necessarily support the fully autonom ous one. Sharing policies m ay be im posed 22 on different groups, sim ulating real world solutions to how people agree to share th eir inform ation. W orldBase is an environm ent for distributed inform ation access. It supports diverse m ultiple databases and provides tools for database selection, database transform ation, and database m erging in a distributed object-based environm ent. As a result, it is a com bination of several research areas, namely, conceptual schem a design; database transform ation, m erging and integration; distrib u ted databases; and object-oriented databases. T he focus of this thesis is on stru ctu ral object- orientation, not the behavioral aspects. In this thesis, a sem antic m odel is used to m odel objects in PO Bs. In this chapter, we discuss the various areas W orldBase touches and provide an overview of related research in the area. Specifically, we discuss th e issues of pro­ viding a database environm ent, database transform ation, and database merging. In each of the issues, we discuss various aspects of the problem , th e im pact of struc­ tu ra l object-orientation, and related research in the area. T here is considerable overlap in related research in database transform ation and database m erging. P ri­ m arily, we concern ourselves w ith specific aspects of transform ation and m erging, and describe related research specific to these issues. N ext, we describe prom i­ nent system s th a t try to provide m ulti-database support. Finally, we describe A P5, a v irtu al m em ory database m anagem ent system used in conjunction w ith W orldBase. 2.1 C aching and D a ta b a ses Traditionally, a database environm ent provides a user w ith a database m anager th a t supports user interactions w ith inform ation in th e persistent database. Some caching m ay be im plem ented for faster access, b u t th e cache is invisible to the user. T raditional program m ing environm ents, on th e other hand, provide users w ith support for complex interactions w ith inform ation in th e virtual store. Some persistence is allowed through files. Inform ation in the virtu al store is typed and structured. In some cases, the support provided for dealing w ith stru ctu red d ata in the virtu al store rivals th a t of an object-oriented database (e.g. Sm alltalk [29]). C urrent research in persistent program m ing languages and database program m ing 23 languages [36] signify a move tow ard integration of program m ing languages and database technologies to provide an integrated system to deal w ith developm ent of m odern data-intensive applications. T he integrated system m ust provide a pleasing com bination of b o th v irtual m em ory and persistent m em ory interactions. We define a program m ing environm ent as providing virtual memory database support if th e user is provided w ith d a ta definition, d a ta m anipulation and query language capabilities sim ilar to th a t of a database (conceptual inform ation m od­ eling support). V irtual m em ory database system s m ay not support persistence, and th u s m ay not be am enable to sharing of inform ation, except in th e crudest form (e.g. by snapshots). In addition, they do not support persistence and crash recovery. T hus, by them selves, they are not considered full-fledged databases. We define a database as presenting a uniformly structured interface (or a single- level store) to a user if he is allowed to interact w ith only one level of inform ation- th e persistent database itself. In a database supporting a uniform ly stru ctu red interface, th e user has no explicit control over his v irtual m em ory and no explicit view of the inform ation in his v irtual store. We define a d atab ase th a t allows explicit caching as providing a two-level store interface. T he user m ay interact and operate w ith a two-level view of inform ation-the d a ta in his virtu al m em ory (cache) and the persistent d a ta in his database. B oth types of databases support some form of caching for efficiency, since accessing the persistent m em ory usually takes longer th a n accessing the v irtual store.1 Some database m anagem ent systems are described only as “back-ends,” to be used in conjunction w ith a program m ing language or an interactive browser to provide the user interface and deal w ith caching of results of queries. There are pros and cons to providing single- or two-level stores. An advantage of a single-level store is th a t it presents a consistent and synchronized view of the data; w hereas cached d a ta m ay not be kept up to date. However, explicit caching has its advantages in th a t it allows th e user to explicitly control w hich d a ta to cache, also explicitly narrow ing th e d a ta dom ain in which to do searches. Caching m ay also be desirable in the context of long transactions. xThis may not be true with optical laser discs. 24 In m ost cases, caching is provided for efficiency. However, caching m ay be used in a non-uniform ly stru ctu red interface for a different reason-such as to provide an interface to possibly inconsistent inform ation in th e persistent store. As m entioned above, our focus is to provide sharing support for persistent object-bases. O bject-based system s adopt the object paradigm as th eir m odeling base. T he object paradigm [69] defines objects as unique entities in their own right which encapsulates local state and behavior. In this thesis, we focus on the stru ctu ral aspects of objects and ignore the behavioral aspects. A discussion of object-oriented m odels for databases can be found in [4, 7]. O bjects have unique object identifiers. It is generally considered a h ard prob­ lem to generate sharable unique identifiers for com ponent databases of a logically decentralized system in which there is no centralized control; unless each of the com ponent database generates their own unique nam es. This leads to th e diffi­ culty in identifying equivalent objects across com ponents as discussed in Section 2.3. Below, we briefly review related work in object-oriented system s; specifically, those th a t provide persistence. There is a spectrum of related work in providing object persistence. At one extrem e is a class of research focused on providing persistence to object-oriented program m ing languages. A pproaches to im plem en­ tatio n of persistent objects can be found in [9, 63]. T he other extrem e represents a class of research focused on grafting the object paradigm onto database technology. O bject-oriented database system s try to combine the object paradigm w ith gen­ eral database support such as associative queries, persistence, concurrency control and sharing. 2.1.1 R e la te d R esea rch in O b ject-O rien ted P er sisten ce M ost of the approaches in this area focus on trying to provide persistence as an or­ thogonal property to objects in program m ing languages (possibly object-oriented, or supporting ab stract d a ta structures). Persistent program m ing is concerned w ith long-lived d a ta -d a ta th a t outlive program s th a t produce it. T here are several dif­ ferent issues th a t need to be considered in providing persistence to objects. 25 A m ajor issue is the form at of th e persistent object representations and the tran slatio n to and from transient to persistent representations. T he form at chosen affects functionalities th a t could be provided. It m ay p u t ceilings on the size of the structures, lim it sharing of structures, or m ay h in d er/aid in providing clustering and indexing m echanism s. A nother issue is how one refers to objects in persistent store. O ne m ay present a “single-level” view of th e object store, w here th e user tre ats th e v irtu al store and th e persistent store as one; or one m ay present a “two-level” store, in which the user is aw are of two different stores, and persistent objects exist in th e persistent store, and a copy is brought into th e v irtu al store. T he m ethod of accessing/addressing objects also depends on the internal representation of the objects. O ther im p o rtan t issues to decide on include: m anagem ent of recovery after a processor crash, and deallocation of non-accessible objects in th e persistent store (i.e. garbage collection). If one views a persistent object as inherently persistent, th en one need not provide garbage collection of persistent objects. However, if object persistence depends on th e persistence of other objects, one has to garbage collect non-reachable objects. Below, we review some of the prom inent work in th e area. M ost im plem entations of storage m anagers for objects rely on some kind of heap structure. In PS-Algol [30], a program can operate on d a ta in a “database” (actually persistent heap). Functions are provided to open, com m it (save changes) and close the database. A root is provided w ith a database; objects found through following pointers from th e roots are preserved in the database. T he root is the only way through which a program can get access to th e data. Similarly, in N apier [3], a persistence root called “P S ” is associated w ith each program . W hen a program term inates, all objects reachable by th e com putation of th e transitive closure of objects from the persistent root is persistent. “P S ” is of type env (for “environm ent” ) which is a collection of type nam e and value pairs. A com bination of heap structures and files m ay be used. Exodus [62] supports a program m ing language E, which is an extension of C + + . A storage m anager provides storage objects for storing uninterpreted bytes of d a ta and files for group­ ing storage objects together. T he object identifier of a storage object is a disk address. 26 In [70], sharing of objects in a d istributed system is supported w ith a level of indirection. Tables are used to tran slate im ported OIDs into local reference, and relate im ported OIDs w ith their rem ote references. 2.1.2 R e la te d R esearch in O b ject-O rien ted D a ta b a ses M ost of th e approaches to object-oriented databases [60, 34, 27, 5] use unique sur­ rogates as object identifiers. M ost im plem entations of object-oriented databases use a m ulti-level approach w ith a repository manager (or storage manager) th a t m anages globs of inform ation of indeterm inate size, a type and object manager th a t m anages classes, m ethods and objects (in essence, providing a d a ta m odel), and some interactive and program m atic interface components. Unlike persistent pro­ gram m ing languages, all objects in the object-oriented databases are persistent, although th e interface com ponents m ay provide for tran sien t objects. O bject-oriented databases are m ainly focused on efficient storage and access of objects. In m ost cases, they assum e th a t persistent store will be organized around objects stem m ing from the conceptual schema. We briefly describe some of the prom inent research in this area below. G em s ton e G em stone [60] consists of Stone which is the object m em ory m anager th a t provides secondary storage m anagem ent, concurrency control, authorization, transactions, recovery and support for associative access for b o th types and indices; and Gem which corresponds to the virtual store support. OPAL is a unified object-oriented language (com putationally com plete w ith assignm ent, conditional and iteration) used w ith Gem stone. Several program m atic and interactive interfaces are avail­ able. T he G em stone object server m ay be used w ith a procedural language (C or Pascal) w ith the P rocedural Interface M odule (PIM ) as its interface, or w ith the O pal Program m ing Environm ent (O P E ), or it m ay be integrated w ith a virtual m em ory database (Sm alltalk) using an agent interface. In all cases, objects are cached im plicitly by th e PIM , O P E or th e agent interface. Integrating G em stone 27 w ith Sm alltalk provides a rich m odel to the user in the v irtu al m em ory while al­ lowing persistence to those Smalltalk objects th a t are referenced by a Gem stone object. G em stone objects reside in GemSession, which is a Sm alltalk class. It con­ tains agents (which are Sm alltalk objects) th a t represent G em stone objects in the virtu al memory. W hen a Sm alltalk object is referenced by a G em stone object (or rath er, an agent), the Sm alltalk object is autom atically converted to a G em stone object. Im p o rtE q u iv a le n ts is a Sm alltalk object th a t m aps G em stone classes and objects to th eir equivalents in the Sm alltalk environm ent; E x p o rtE q u iv a le n ts m aps Sm alltalk classes and th eir objects to their equivalents in th e G em stone environm ent. Shared, S egm en ted M em ory S ystem T he shared, segm ented m em ory system of [34] consists of a typeless backend re­ sponsible for m anaging th e use of persistent object store called O bserver (O bject Server) and subsystem for enforcem ent of the type system which is called Encore (Extensible and N atural Com m on O bject Resource). O bject Server reads and w rites chunks of m em ory from secondary storage using a unique identification for each chunk. Encore deals w ith object sem antics through type definition. All interfacing to th e server is handled by a m odule called the client, a copy of the object server th a t becomes p art of th e im age of the interfacing process. Caching is explicitly handled by the client. Segm ents are used to cluster a logically related set of objects for m ore efficient access to the persistent store. Segments contain objects th a t th e database expects a client to access during a transaction. To keep segm ents sm all and allow clients to retrieve different sets of related seg­ m ents, segm ent groups axe introduced. A segment group (SG) is a uniquely nam ed collection of one or m ore segment groups or segm ents. Each database m aintains its own SG forest, and an SG m ay only contain the segm ents w ithin a database. A segm ent is th e un it of transfer of objects from persistent store (Server) to m ain m em ory (client); i.e. a segment is the unit th a t is cached in th e user’s m ain memory. W hen a user requests an object, a segm ent containing the object is returned. T he user m ay also specify a segment group th a t indicates the context in which it is working. Once a user receives a segm ent, the objects are placed in an 28 object hash table and the segm ent is discarded. T hus, th e segm ent is used only for unidirectional u n it of transfer from the database server to the client. O bjects have two unique identifiers (UIDs) associated w ith them : external and internal. T he external UID acts as an identifier to a database object, while an internal UID is used to locate th e object physically. A user m ay access objects by referencing its external UID, which is dereferenced (from Object Location Table) to an internal UID or a collection of UIDs (duplicate objects in different segm ents), from which its segm ent(s) is found. Iris Likewise, Iris [27] consists of a storage m anager, which is a relational storage subsystem sim ilar to System R [26], an Iris object m anager w hich im plem ents the Iris d a ta m odel, and a collection of interactive and program m atic interfaces. A notable interface is the O bject SQL (OSQL) interface, an object-oriented extension to SQL, which provides a uniform ly stru ctu red interface. A nother interactive interface called the Inspector is an extension of Lisp stru ctu re browser. A num ber of Com m on Lisp program m atic interfaces have also been im plem ented, b u t the details of the im plem entation are unavailable at this m om ent. 2 .1.3 W orld B ase A p p roach W orldBase addresses a different problem th a n persistent program m ing languages and object-oriented database system s. T he purpose of W orldBase is to support possibly inconsistent, overlapping m ultiple databases. In this case, the m ain con­ cern is in providing persistence to collections of objects, instead of individual objects. W orldBase uses caching for a different purpose: to provide support for in­ consistency betw een databases at a fundam ental level. In the approach taken by W orldBase, inform ation in persistent stores m ay be inconsistent. A database sup­ porting a uniform ly stru ctu red interface does not allow one to deal w ith m ultiple, possibly overlapping, possibly inconsistent, databases at the sam e tim e, unless th e different databases are integrated (if such integration is possible) into a single global database view. T hus, it is not suitable for W orldBase. 29 W orldBase supports a two level store, allowing a user to cache inform ation in the v irtual store. M onitoring and enforcem ent of consistency is moved to the cache. T he user interacts w ith a v irtual m em ory d atabase (provided by AP5, described below), and persistence and sharing support is provided by W orldBase in the form of persistent worlds in containers whose internal inform ation m ay not be accessed directly. W orldBase supports a sim ple storage m anager th a t stores collections of ob­ jects and relationships in repositories in persistent store. Each repository has a persistent set of specifications th a t define the stru ctu re and sem antics of the repos­ itory. M apping of persistent identifiers to corresponding v irtu al object identifiers is provided by world equivalence specification when specified, otherw ise it is a one- to-one m apping to a new v irtual identifier. Repositories and their specifications in persistent store(s) are potentially sharable by users in W orldBase. 2.2 D a ta b a se T ransform ation Because of flexibility in stru ctu rin g inform ation and diversity in d a ta m odels, there can be m any representations of the sam e inform ation. Some applications or users require inform ation to be of a certain structure, while others m ay w ant the same inform ation in a different structure. Transform ation support of th e stru ctu re and d a ta is m andatory. One of the goals of this research is to support transform ation of database subsets and m erging of objects in object-based databases. In this section, we focus on the issue of database transform ation; th e next section discusses database m erging. A transformation is a m apping of instances of one schem a to instances of another schema. Typically, a transform ation specifies b o th th e target schem a and how instances of the source determ ine instances of th e target. D atabase queries, views and schem a integration m ethods provide some form of database transform ations. D atabase queries are n atu ral transform ations of database instances. Each database query results in a subset of th e database in­ stance in a p articu lar stru ctu re specified by the query. Views are a p articu lar kind of transform ation th a t have been widely studied for relational and other database m odels. Transform ations also occur in schem a integration. T he integrated schem a 30 provides a single interface to instances of two or m ore source schem as. T he in­ stances in the source(s) are transform ed to conform to th e integrated schema. In this section, we identify several issues relevant to database transform ations and clarify the differences betw een th e various kinds of transform ations. We briefly survey o ther transform ation fram eworks present in the literatu re and categorize our approach. 2.2.1 A s p e c ts o f th e P ro b lem T here are two m ajo r issues involved in the transform ation process. T h e first is the transform ation language used. This language m ight be based on an algebra, calcu­ lus, im perative restru ctu rin g language, D atalog variant, or iterative restructuring language. T he second issue is how actual transform ation of the d a ta is supported. T here are various ways in which databases are transform ed and these can be dif­ ferentiated according to a num ber of dim ensions. We begin by describing those axes which are largely independent of the value-based/object-based dim ension. T he source of the transform ation m ay be from a single database instance or multiple databases. In the case of m ultiple sources, they m ay be of a single schema or of multiple schemas. In the case of m ultiple source databases, a real world o b je c t-a single object conceptually-m ay be represented differently in different source databases. For exam ple, in object-based databases, distinct object sur­ rogates m ay be used; in valued-based databases, the keys from different sources m ay have different structure. T he targ et instance m ay be materialized or virtual. A m aterialized database instance exists as an actual database. A v irtual database instance is n o t physically stored. Instead, m echanism s are provided so th a t it (essentially) appears to be stored. T he targ et instance m ay be maintained or independent. A m aintained target instance is one whose objects are derived from some source(s), and whose link­ age to the source(s) is kept, so changes to th e source(s) are propagated to the targ et (sim ilar to th e traditional notion of views, where changes to the database (source) are reflected in the view (target)). An independent targ et is one whose initial extent is derived from some source(s), b u t whose existence, thereafter, is 31 independent of the source(s). (This is called an “independent view” in [19]). V ir­ tual targ et instances are necessarily m aintained, because queries to th e targ et are tran slated to queries to the source; thus changes to th e source are always reflected in the virtual targ et instance. T he targ et schem a m ay be stated explicitly or it m ay be derived implicitly from the transform ation specification. It m ay be an augmentation to the source schema, or a separate schem a independent of th e source. In th e sem antic d a ta m odeling literatu re, augm entations are usually called “derived d a ta ” . T he transform ation specification m ay be declarative or procedural. Several procedural transform ation m echanism s are based on local restructurings of schem as which have associated sem antics concerning transform ation of instances. 2 .2 .2 Im p a ct o f O b ject-O rien ta tio n T he object-oriented paradigm assum e a conceptual fram ew ork based on object identifiers (O ID s) th a t act as surrogates for real world and conceptual objects. T he OIDs are not directly accessible to users, although they can be m anipulated through the use of variables. It is conventional to view two databases to be equiva­ lent if they differ only in th eir choice of OIDs [2, 38]. T hus, transform ations in th e object-based realm are, technically speaking, m appings from equivalence classes of instances to equivalence classes of instances. In practice, this subtlety can largely be ignored. Im plem entations of transform ations in object-based system s depend on th e treatm en t of OIDs in th e transform ation. T here are two approaches: creative vs. derivative. T he creative approach allows new OIDs to be created and used in the targ et instance. T he derivative approach uses th e existing OIDs from the source(s). Technically speaking, th e la tte r approach can be used only for schem a augm entations and for virtu al instances; in other contexts, th e use of source schem a OIDs would violate the usual assum ption th a t OIDs, in and of them selves, have no sem antic content. 32 2.2.3 R e la te d R esearch in T ran sform ation In this section, we briefly describe various form s of transform ations supported in d atab ase literature. We first survey different form s of transform ations, then describe prom inent transform ation languages in the area, and provide an overview of languages w ith object identifiers. T ransform ing H ierarchically S tru ctu red D atab ases A large num ber of proposals [68, 67, 56] exist for perform ing d a ta restru ctu rin g of hierarchically stru ctu red data. T he restru ctu rin g is perform ed by applying a series of prim itive operations to tree structures until the desired stru ctu re is obtained. M ost operators of these system s correspond to operations of M o tro ’s [54] proposal described below. D erived Schem a C om p on en ts Derived schem a com ponents refer to th e schem a com ponents th a t are derived from other schem a com ponents. It consists of two p arts: stru ctu ral specifications and derivation rules. M ost sem antic m odels support derived d a ta in the form of derived subtypes and derived attrib u tes. In m ost models th a t support derived data, a first-order predicate calculus variant is provided as th e language for specifying a derivation rule [32, 43, 66]. This perm its d a ta relativism , i.e., m ultiple perspectives on the sam e underlying d a ta set. In Functional D ata M odel, FD M [66], derived d a ta is m odeled by derived func­ tion definition. Essentially, a derived function defines new properties of objects based on the values of other properties. It behaves as if their values were recom ­ p u ted on each access, although it m ay be im plem ented differently, e.g. by caching and u p d atin g the cache when values on which it is based changes. T he language to specify the derivation rule is a first order predicate calculus extended to include aggregation and ordering. Sem antic D ata M odel, SDM [32], allows derived inform ation in stru ctu ral spec­ ification. It provides a rich set of prim itives for specifying derived subclasses and derived attrib u tes. For instance, subclass in SDM m ay be : (i). attribute-defined, (ii). user-controllable (user-defined) subclass constrained by a derivation rule, (iii). 33 defined by set operations on existing classes, (iv). existence subclass (those which serve as the range of some attrib u te). T he language to specify th e derivation rule is first-order predicate calculus extended to include set operations. D atab ase V iew s D atabase views usually refer to views w ith respect to a relational d atab ase m an­ agem ent system . However, th e concept of views applies to any other database system s. Views in databases are used to present d a ta to different users in a form th a t reflects th eir individual needs. It provides a focusing m echanism as well as a lim ited restru ctu rin g m echanism for th e inform ation in th e database. T he repre­ sentation of d a ta stru ctu re as seen by a user is referred to as an external schema. T he view m echanism is a m eans w ith which a d atab ase m anagem ent system can support various external schemas. M ost d atab ase m anagem ent system s support v irtu al views. They construct and store an internal representation for each view they support in the form of a parse tree. W hen a view is referenced in a query, a view com position operation combines th e view ’s parse tree w ith the query parse tree and returns a com posite p arse tree which contain only references to stored database relations. In [10], th e views concept is used to provide for d a ta independence, isolation and protection. Research into updates to views includes [21, 20, 17, 39, 65]. The concept of views are used in software engineering for tool integration [28], in pro­ gram developm ent environm ents such as Pecan [61]. T ransform ation in D atab ase In tegration M eth od ologies As m entioned earlier, this section focuses prim arily on transform ation m echanism s supported, while th e next section covers m erging. A brief overview of database integration m ethodologies is presented in Section 2.4.1. A range of transform a­ tion fram ew orks are represented in [6]. Some, for exam ple [22], support declara­ tive transform ations, while others, e.g. [54], support procedural transform ations. T he target schem a is generally explicitly specified and is a separate, independent schem a, although it is derived using some prescribed schem a integration m ethod­ ology. In m ost cases, the instance associated w ith the targ et schem a is a v irtual 34 view. Queries to th e view are m odified and m apped to the actual local databases. T hus, th e target schem a is m aintained, i.e. links to th e source are kept. We briefly describe M otro’s approach [55, 54] to present a set of transform ation operations th a t is offered as an alternative to th e transform ation language used by W orldBase. M otro provides a com prehensive set of operators for th e functional d a ta m odel. T he operators allow a set of base schem as to be m erged into a superview - a new schem a th a t contained all th e inform ation of th e base schemas. He provides seven schem a m anipulation operators: meet creates a new type as a union of two types; join creates a new type th a t is th e intersection of two types; fold removes a subtype and transfers its a ttrib u tes to its p aren t type; aggregate defines a new type from a given type taking some of its attrib u tes; telescope does the inverse of aggregate; add to add a new attrib u te; and delete to rem ove types and attrib u tes. As an alternative to M otro’s approach, we briefly describe the approach of Dayal and Hwang [22] where database integration is supported in a declarative m anner through views and generalizations. Dayal and Hwang provided an ex­ tensive view definition m echanism th a t allows definition of v irtu al entities and functions. A user m ay also define supertypes and superfunctions in a view such th a t th e supertypes are generalizations of a set of types in th e local database(s), and the superfunctions generalizes its relationships (m odeled as functions). Keys (single-valued functions) are used to identify entities of a generalized type. Con­ ditional functions m ay be defined to deal w ith overlapping relationships. T ransform ation in th e F ederated A pproach Transform ation in Federated D atabases [33] is provided by specifying an import schem a w ith derived types or relations. T he derived type or relation has an as­ sociated derivation expression which is a collection of procedural transform ation operators. O perators for type consists of: rename to renam e a type; concatenate to get a disjoint union of instances of two types (no m erging in instances); sub­ traction which is associated w ith an identity function to remove instances of one type from another; cross product to create a type of paired objects of two types; and subtype th a t defines a predicated subtype of another type. O perators for de­ rived relations (also called maps) consists of: rename to renam e a m ap; equality 35 to provide equality function for two types; composition to provide com position of two m aps; inversion to provide an inverse m ap; extension to extend a m apping to a supertype; restriction to restrict a m apping to a subtype; discrimination to discrim inate m aps from different sources; cross-product to create a cross product m ap; projection to project some of the types of a cross product; selection to select different m aps based on some discrim ination (usually used w ith discrimination operator). T he transform ation language allows types and relations from m ultiple sources. T he targ et schem a is defined explicitly using th e above operators. T he target instance is virtual. L anguages w ith O ID s T his section discusses practical and theoretical investigations which support OID creation. Explicit object creation is supported in m ost object-oriented languages and databases, typically via a new operator. In this context, it is generally used to create new objects in an existing database, not to transform th e database to another schema. T he O O D A PLEX and O O A lgebra languages of [19] are procedural languages which can be used for schem a transform ation. T he focus is on m apping a single source schem a to a single target schema. O O D A PLEX is an im perative language styled after D A PLEX , and O O A lgebra is an algebraic language closely related to the algebra of the P R O B E [23] system , and loosely related to th e relational algebra. Unlike their precursors, O O D A PLEX and O O A lgebra support explicit object creation. O O D A PLEX does this through a new operator; in contrast, th e O O A lgebra creates a set of OIDs in a single operation, which is essentially equivalent to M otro’s aggregate operator. Some languages from the theoretical literatu re which support object creation are w orth m entioning. T he Logical D ata M odel (LDM ) [44] provides w hat am ounts to explicit object identifiers, and the algebra and calculus for LDM autom atically generate new OIDs as needed. B oth of these languages can be used for queries and d atab ase transform ation through schem a augm entation. IQL [2] supports schem a transform ation for separate source and targ et schemas, for a m odel stem m ing from the O 2 object-oriented d atab ase system [45]. IQL 36 is based on a variant of D atalog using th e inflationary sem antics. T he object creation m echanism used in our transform ation specification language, ILO G *, is closely related to th a t of IQL. ILO G * can largely be viewed as a sublanguage of IQL [2], b u t uses a subtly different sem antics for OID creation, stem m ing from [51,42,12]. T he underlying sem antics of ILO G ± is based on such Skolem functions, b u t the user is insulated from these details. Also, th e GOOD (G raph-O riented O bject D ata m odel) language of [31] is a visual d atabase m anipulation language which supports lim ited recursion and OID creation using a fundam entally sim ilar m echanism . 2 .2 .4 W orld B ase A p p roach W orldBase supports transform ations of a single source to a single targ et database. Using th e term inologies defined earlier, T he target schem a is explicit and separate from th e source. Transform ation is specified in a transform ation language called ILO G *. ILOG* provides an object creation m echanism to materialize th e target database. T he target instance is independent. An im p o rtan t influence on IL O G ^’s object creation m echanism is found in [42, 51], which present logics for object-based m odels. In these logics, new OIDs are obtained by the application of certain “object constructor” functions, which can be viewed as Skolem functions of a certain kind. This approach is closely related to M otro’s aggregate operator and to the OID creation of O O A lgebra, b u t as detailed in [38], it provides a m echanism for “tracing” the origin of OIDs. 2.3 M ergin g D a ta b a ses In this section, we study th e general problem of m erging databases in the context of P O B s-object-oriented databases. A lthough persistent program m ing language and object-oriented database research has not focused on these problem s in the p ast, any attem p t to m odel (portions of) the real world in an object-base b u tts up against them . O ne of th e features touted by PO B researchers is th e lack of need for object identification in object-bases [27, 60, 30, 24]. In essence, objects in the object-base 37 have unique identifiers (OIDS) which m ake them distinct from all o th er objects sim ply by the way they are created. In fact, this feature is shared w ith th e pro­ gram m ing language freedom to create arrays, vectors, records and oth er structures, arbitrarily, w ithout regard to or fear of th eir possible equivalence to other such stru ctu res th a t ju st happen to be around in th e address space. T he ‘seam less’ integration of persistence into such fram eworks lends considerable su p p o rt to the notion of persistent OIDS as being a ‘good idea’ [24]. B ut object-bases are in m any cases used to m odel some p a rt of the real world. In m aking th a t connection one does not have th e freedom to create objects indis­ crim inately, b u t often m ust rely on key attrib u tes to identify objects. T he m odeled w orld w ould be inconsistent if duplicate objects w ith the sam e key attrib u tes ex­ isted, for fidelity to the real world would be lost, and predictions based on the m odel would be inaccurate. We em phasize th a t m uch - m aybe m ost - persistent d a ta is private, scaffold-like d a ta whose stru ctu re and identity are understood by program s alone, b u t for th a t d a ta which is shared w ith th e real world, a n d /o r o ther program s and persons, an object identification schem e is necessary. F u rth er­ m ore, as object-bases take on m ore database functionality, e.g. in th e context of d atabase program m ing languages, the quantity of real world d a ta in object-bases will increase. 2.3.1 O b ject Id en tifica tio n and E q u ivalen ce T he sim plest of identification schemes involves m atching p rintable attrib u tes of objects; e.g. one identifies a person by his social-security-number or (first-name, last-nam e)-pair. A m ore complex scheme m ight allow other, previously identified objects, to p articip ate in the identification; e.g. to let home-owner (a person) p artially identify property. T he spectrum of possibilities is quite broad, going through chains of com posed attrib u tes and ending w ith subgraph isom orphism ; e.g., “his wife is th e CEO of th e corporation of which his bro th er is th e personnel m anager.” Identifying a real-world object w ith an im plem ented object is only half of the problem ; merging th e inform ation from two different sources m ay also be difficult. For exam ple, in th e real world the age a ttrib u te m ay be 15, b u t in th e m odeled 38 world, 14. N aturally, we ten d to prefer the inform ation from th e real world in such a situation. However, if the d a ta to be m erged comes from different object-bases, th e situation is not so clear-cut. Suppose we w ant to m erge telephone number a ttrib u tes from object-bases describing employment data and personal data. We m ay w ant to prefer d a ta from one source over another, or we m ay w ant to perform some complex function to com pute d a ta to resolve conflicts between th e source object-base. For exam ple, we m ay wish to use th e maximum of a person’s salary a ttrib u te in the source PO B s as the value in the targ et object-base. U nfortunately, the problem s of object identification and m erging becom e very com plex w hen we consider sharing d a ta between object-bases, for objects m ay be represented differently in different object-bases. In fact, all m anner of inconsis­ tencies can arise; objects in one object-base m ay be represented as a ttrib u tes in another, attrib u tes m ay m ap to printables in one and objects in another, m ulti-slot relationships w ith the sam e nam e m ay disagree in num ber, type, count restrictions, etc. D espite these problem s, we believe th e problem s entailed by th e use of a sin­ gle, uniform object-base to hold all objects in the real world are far m ore difficult th an dealing w ith the sharing of idiosyncratic object-bases. 2.3.2 R e la te d R esearch in M ergin g In this section, we describe three stream s of research which involve th e issues of object equivalence and object oriented database m erging. O ne stream is focused on issues of object equivalence, in th e context of a single object-oriented database. Investigations such as [41, 18, 64] have introduced notions of equivalence betw een complex database objects (which often involve the set construct, and d o n ’t differentiate between key attrib u tes and o th er attrib u tes) based on isom orphism to different depths of th e trees representing these objects. T he investigation reported in [53] takes a different tack, observing th a t some other form s of equivalence are needed to differentiate real-w orld (or conceptual) object equivalence from object-identifier (O ID ) equivalence, since the sam e real-w orld (or conceptual) object m ay be represented by different OIDs. T he notions of object equivalence developed in these investigations are used prim arily in th e context of query processing. 39 A second, recently introduced direction [58] is focused on th e boundary of an object-base, specifically in th e context of inserting d a ta into th e object-base. T he problem here is to devise an appropriate m echanism w hereby users can identify the objects already in the database. As w ith our approach, th e approach of [58] is based on th e use of acyclic sets of keys sim ilar to th a t used by W orldBase. M ultiple keys for a given type is not p erm itted in [58]. T h e th ird stream , which has a longer history, is th a t of schem a integration, which has generally been conducted in th e context of sem antic m odels. O ne aspect shared by some of th e schem a integration investigations and th e investigation rep o rted here is the issue of selecting th e constraints to be associated w ith the superview (respectively, targ et schema). One of the foci in the schem a integration research is on tran slatin g queries against th e superview into queries against the underlying database schemas. Such tran slatio n requires an explicit specification of how the underlying databases are m erged w hen populating (by m aterializing or virtually) th e superview . O ne of th e m ost detailed and rich m echanism s for perform ing this population is given in [22]. T he M ultiB ase project [47] also addresses this issue in th e context of the relational m odel. T here is considerable overlap in the approaches of [22], [47], and th e one taken here; including the use of keys for object identification, and th e use of preference and com putation m echanism s. In b o th [22] and [47] a relatively rich language is provided for specifying how the underlying databases are to be merged. 2.3.3 W orld B ase A p p roach A key difference betw een the schem a integration research and th e research de­ scribed here is th a t in schem a integration, the superview is considered to be a vir­ tual database, w hereas th e m erged databases form ed here are materialized. Such m aterialization does not have prohibitive space needs in our context, because the world m echanism perm its users to focus on reasonably sm all, self-contained d ata sets. In the schem a integration literatu re, if a m erging m echanism is introduced, th en it m ust be incorporated into the query tran slatio n alg o rith m -th is has had the u ltim ate effect of restricting the richness of the m erging m echanism s investigated. 40 O ur context can support a m uch richer set of m erging m echanism s. For ex­ am ple, our key sets m ay include m ultiple keys for th e sam e type, and we offer considerable flexibility in th e selection of the key set for the targ et schem a. We also do not require all objects to have their keys defined, only those to be merged. Also, our context allows us to support a m uch m ore in tricate interaction of con­ strain ts in th e source schemas w ith constraints in th e targ et schem a, and richer m echanism s for m assaging the m erged data. 2.4 M u ltip le D a ta b a se E n viron m en ts A spectrum of ways to deal w ith m ultiple databases exist— from d atab ase integra­ tion efforts [6] th a t lim its local autonom y to provide a global uniform view, to the Federated approach [33] th a t provides local autonom y w ith lim ited inconsistency, to M RDSM [46, 47] th a t provides full local autonom y w ith inconsistency am ong the m ultiple databases. In the rem ainder of this section, we describe prom inent work on system s th a t support sharing of distributed inform ation. 2.4.1 D a ta b a se In teg ra tio n Schem a integration is the activity of integrating the schem as of existing or p ro ­ posed databases into a global, unified schema. T raditional schem a integration occurs in two contexts: • View integration (in database design) produces a global conceptual schema. • D atabase integration (in distrib u ted database m anagem ent) produces the global schem a of a collection of databases. A com prehensive survey on current approaches to schem a integration is avail­ able in [6]. This class of work focuses on restru ctu rin g and m erging th e conceptual schem as to, derive a global view of th e databases to be integrated. A lthough the focus is in providing an integrated schema, some of th e approaches provide for transform ation and m erging of the instances. 41 2.4.2 F ed erated D a ta b a ses T he Federated architecture [33] is an approach to coordinated sharing of infor­ m ation am ong autonom ous databases. T he federation consists of a collection of independent database system s loosely bound to exchange inform ation. [No cen­ tralized organization or control is required.] A federation consists of components and a federal dictionary. Each com ponent of the federation supports a sem antic d a ta m odel and controls its interactions w ith other com ponents via import and export schemas. T he export schem a spec­ ifies inform ation th a t th e com ponent is willing to share w ith others. T he im port schem a specifies th e inform ation (in the federation) th a t it wishes to access, tra n s­ form ed to a desired structure. T he various com ponents com m unicate by passing stru ctu red messages (transactions) or by negotiation, to coordinate agreem ents betw een com ponents. T he federal dictionary is a central database th a t contains inform ation on all the com ponents of a federation. Once the Federation is established, each com ponent includes a private database schem a and zero or m ore separate im port schemas, each of which is the result of a transform ation applied to an export schem a supported by a different com ponent of the Federation. A long term com m unication channel betw een th e exporter and im porter is established through negotiation. Queries to the im ported schem a are directed to th e exporter. T hus, the com ponent im ports (and transform s) d a ta from other com ponents, thus the target instance is virtual and maintained. Once the channel is established, changes to schem as of th e com ponents m ust be renegotiated. T he Federated approach supports explicit object identification. O bject equiva­ lence is achieved via a tran slato r function th a t translates an object of one type into an equivalent object of another type (or an equivalent type of another schem a). The private and im port schem as are separate structures, th u s no autom atic m erging of objects is provided. Inconsistency at local sites is p erm itted to a degree th a t local and im port schem as m ay hold conflicting views of the sam e data. 42 2 .4 .3 D O D M /D P D M Lyngbaek’s thesis [49, 48] uses two d a ta m odels (DODM for D istributed O bject- O riented D ata M odel and D PD M for D istributed Personal D ata M anager) to m odel th e basic kernel of objects and a high level m odel for sharing personal inform ation in a network. T he system is geared tow ards novice users sharing objects across databases. DODM supports a m odel based on objects, which m ay be behavioral, tex t, im ­ age, audio, or structured. S tructured objects are in th e form of trip lets of domain, mapping and range objects. Each DODM object has a unique object key consisting of a local key and a database identifier. Relationships are viewed as objects and th u s also have keys. Keys cannot change during th e lifetim e of an object. Each DODM database is a collection of objects and relationships (stru ctu red objects) w ith a specific user. A context is a collection of d atab ases-a subset of the databases in th e environm ent. T he user m ay group a collection of databases th a t share cer­ tain com m on properties (such as belonging to a certain group of people) form a context. A prim itive d a ta definition and d a ta m anipulation language is provided. T he operations supported allows users to relate objects across context boundaries, copy or move objects, and specify access control and sharing of objects. D PD M is a personal d a ta m anager which provides an interface for m anagem ent of objects by a novice end-user. It includes th e notion of objects, types (also called object kinds), attributes and frames. Each object type has an associated set of object frames which define views of an object and allows users to view related objects and their a ttrib u tes at the sam e tim e. E nd users interact w ith th e database via working kinds which acts like a cache containing objects being accessed by th e user. D PD M supports various operations to th e working kind, operations to objects in th e working kind and operations for sharing and access control. T he working kinds provide an interface for brow sing and object m odification. D PD M does not differentiate betw een objects and its m eta inform ation. A lthough D PD M is a system for sharing personal inform ation in a network, the m ain focus is on th e sharing of data. M eta inform ation m ay be shared as data, however, no transform ation of instances is supported when th e m eta d a ta is modified. D PD M also supports sharing at th e object level; one m ay access 43 a rem ote object by its unique nam e and user. T here are no support to deal w ith stru ctu ral inconsistencies am ong different databases except for renam ing of objects. No m erging of distinct objects from different databases is supported. 2.4.4 M R D S M M RDSM (M ultics R elational D ata Store M ultidatabase) [46, 47] is a relational m u ltidatabase system . It provides su p p o rt for m anagem ent of collections of rela­ tional databases. In particular, it provides support for sim ultaneously accessing m ultiple databases in a relational m odel as well as copying or m oving source d ata into a targ et database. A m ultidatabase m anipulation language supports the m a­ nipulation of d a ta in different databases. T he databases are described by a set of schem as, and a num ber of schem as th a t define inter-database dependencies. Transform ations are achieved by queries in an enhanced relational calculus (the m u ltidatabase m anipulation language). A separate target schem a m ay be specified explicitly to m aterialize th e target instance. Target instances are not m aintained. O bject identification is achieved via keys; key attrib u tes of relations derived by the query m ust correspond to the key attrib u tes of th e target relations. Some m echanism is provided for com bining entity sets from different sources if they have keys w ith the sam e stru ctu re (i.e. having the sam e arity w ith corresponding a ttrib u tes having identical or equivalent dom ains). 2.4.5 C om p arison s w ith W orld B ase In m ost inform ation system s, references to databases or objects are resolved from system catalogs or dictionaries. T he structure of th e dictionary is im p o rtan t in m ultiple databases. C entralized dictionaries m aintained by a server is a potential bottleneck in distrib u ted system s. D istributed dictionaries such th a t each node m aintains its own dictionary causes consistency problem s. T he trade-offs betw een centralized and replicated dictionaries are well known. One cannot elim inate en­ tirely the need for a dictionary in a distributed environm ent, since netw ork nam ing m echanism s rely on dictionaries. T he goal is to m ake these dictionaries as small as possible. 44 Schem a integration research described in the previous sections tried to present a global database view of m ultiple databases. T he m ethodologies described are very useful in providing a global schem a of a set of local databases, if all the sem antic conflicts are resolvable. T he overhead involved in th e process m akes it only feasible w hen th e user w ants to always have an integrated view of th e m ultiple databases. However, this m ay not always be th e case in the real world. T he user m ay not need to integrate all the d a ta in th e databases; m erely different subsets, at different tim es, under different schemas. T he su p p o rt W orldBase tries to provide is sim ilar to th a t of M RDSM for PO B s. W orldBase users are aw are of the m ultiple PO B s they are accessing. Each W orldBase d atab ase has a set of specifications (consisting of its schem a and other specifications) th a t specifies th e m eta-d ata of the inform ation to be shared. Sharing of differently stru ctu red d a ta is allowed by explicitly transform ing the d a ta into com patible schemas and m erging them . Unlike the federated approach, there is no coupling of schem as across worlds. A world database is th e unit of sharing. W orldBase users have access to th e schem a and population of th e whole world. Also, since the schemas are not shared partially, th e dictionary need not contain them . T hus, the W orldBase approach is m ore autonom ous th a n the fed­ erated approach. Unlike D PD M , there is no sharing of objects across databases. Each database is an autonom ous unit. W orldBase does not rely on a single key for each object in th e system . Equivalence specifications are used to specify when two objects from different databases are equivalent. W orldBase also supports a m ore sophisticated d a ta m odel th a n D PD M (which supports th e binary m odel), and a m ore powerful interface in the form of a v irtual m em ory database (using A P5). T hus, it is a m ore powerful sharing environm ent th an D PD M . 2.5 A P 5 - P la tfo rm for W orld B ase T he W orldBase prototype is built on A P5 [15, 14, 16]; it relies on A P5 to provide a virtu al d atabase support. A P5 is a database program m ing language extension to Com m on Lisp. It provides database definition, creation and m anipulation, and a query language to a virtu al m em ory database. This section gives a brief 45 introduction of A P5 on a conceptual level. M ore details are provided in A ppendix A. A P5 supports a d a ta m odel sim ilar to th a t of th e E ntity-R elationship [11] m odel. R elation colum ns are typed. E ntities are represented as database-objects which are classified into types (entity classes). A P5 provides functions to create relations and types, and allows assertion and m aintenance of subtype relation­ ships. It also provides u p d ate prim itives to u p d ate relations, and a transaction m echanism to group updates to th e database. A P5 provides a relational calculus style query language. C onjunctions, dis­ junctions, negations and quantifications of form ulas m ay be used to form a query to the database. T he basic atom ic form ula is in the form of a relation specification: a relation nam e and list of variables for its attrib u tes. R elations th a t are actually com puted functions (e.g. Lisp functions) m ay be used as basic atom ic form ulas as long as the elem ents needed by th e relation are generable from other atom ic form ulas in the query. A P5 also provides a first order logic-style language for expressing constraints. An autom atic constraint checker checks for constraint violations at the end of each transaction. If violations are triggered, A P5 allows rules to intervene (suggest repairs) or to ab o rt th e transaction; in th e latter case it restores the state of th e database to its pre-transaction state. A P5 also provides object creation prim itives to create database-objects. T he W orldBase prototype uses a subset of A P5 features, nam ely the relation declaration/definition, query language and d a ta m anipulation language (update prim itives). These are used to im plem ent the W orldBase D ata M odel and the virtu al database support. W orldBase also relies on A P5 tran sactio n m echanism , type checker and error recovery in th e event of failures. 46 C h ap ter 3 W orld B ase E n viron m en t W orldBase supports two levels of inform ation. T he first is th e data level. D ata in W orldBase is contained in a world database. T he d a ta is m odeled in the W orld- Base D ata M odel (W D M ). D ata in a world, also called a w orld’s population, is constrained by the world specifications. T he second aggregate level inform ation consists of th e collection of worlds and their corresponding specifications. This second level view of aggregates can be used w ith any d a ta m odel; however, in this thesis and prototype, W DM is used. W orldBase supports a two-level store: the persistent store and the virtual store (workspace). A user of W orldBase does not access objects and relationships in th e persistent store directly. He has access to a collection of worlds and their specifications. He m ust load specific worlds into th e virtual store (workspace) to access th e inform ation contained in them . We first describe th e W orldBase D a ta Model. N ext, we describe th e W orldBase environm ent consisting of: worlds and th eir specifications, th e persistent store and its interface, and v irtu al store and its interface. 3.1 W orld B ase D a ta M o d el T he W orldBase D ata M odel (W D M ) is a m odified E n tity R elationship m odel [11] th a t includes specialization (ISA) relationships (in th e sense of [1, 35]) and restricted derived relations. In the prototype, W DM is im plem ented using AP5, so it is a slight variation of A P 5’s d a ta m odel. Some of th e m ore sophisticated features of AP5 (such as derived relations) are adopted by W D M , b u t they are 47 ancillary to th e model. T he focus of the m odel is on a sim ple and bland sem antic m odel, since its use is only to dem onstrate th e W orldBase environm ent and its sharing tools. T he notational and diagram m atic conventions used in this paper are borrow ed heavily from IFO [1]. As m entioned in th e introduction, th e m odeling prim itives of W D M are en­ tity types and relations. E ntity types include abstract types, subtypes and print­ able/value types. An entity set consists of a set of entities which are instances of the entity type. E ntities are used to represent sim ple, as well as com posite objects. We differentiate the notion of objects (entities) vs. values. A value is som ething whose m eaning is universally understood; an object is som ething whose m eaning is understood only by its relationships to other objects and values in the database p ). M em bers of ab stract types are called abstract objects; these correspond to phys­ ical or conceptual objects in th e world. In some im plem entations these are rep­ resented internally using “object identifiers” (O ID s); in other im plem entations, including th a t of th e prototype described in this thesis, these are represented by internal addresses (called OIDs in th e environm ent and PID s in persistent store) over which th e system program m er has little control. Relations range over subsets of n-ary C artesian products of entity types (sets of associations am ong entities). Each instance of a relation is called a relationship. R elationships relate entities or values to other entities or values. W ith each relation we associate th e num ber and types of its param eters, and possibly, its definition. A relation can be user-dejined (also called stored) or derived. A stored relation consists of relationships which the user explicitly asserts. A derived relationship is not assertable; it derives its relationships from its definition. A derived relation consists of two parts: the tem plate of the relation (arity, nam e, etc), and th e derivation rule for deriving the relationship instances. T he derivation rule is in the form of predicate calculus restricted to generability of th e relations used in th e derivation rule. A relation cannot be a subrelation of another relation, except indirectly via a derivation rule. Derived relations are adopted from A P5 and are not the focus of th e m odel. They are largely ignored in descriptions of tools, unless their presence requires special handling. 48 E n tity types are classified into a type lattice. At th e “ro o t” of the lattice is a collection of ab stract types. T he ISA (or specialization) relationship relates subtypes to ab stract types or subtypes. Specialization is done by creating stored subtypes, such as s tu d e n t ISA (subtype of) p erso n . A subtype can have m ore th a n one supertype through specialization. However, it m ust have only one root super type, which m ust be an ab stract type. Derived types are treated as derived unary relations. Only stored types and their specializations are stored in th e type lattice, im posed by th e ISA relationship. D irected cycles of ISA relationships are not allowed in a W D M schema. Also, all m axim al directed p ath s of ISA relationships from a given subtype end at the sam e ab stract type. We do not differentiate th e notion of attrib u te s from binary relations in th a t we allow b o th entity types and printable types to p articipate in relationships. T he W DM is capable of sim ulating th e stru ctu ral portion of essentially all object-oriented database system s in the literatu re. In th e form al m odel, we assum e for each ab stract type T there is an infinite set potdom (T ) of potential abstract objects of type T (these correspond to OIDs); and for each p rintable ty p e V there is a (finite or infinite) set d o m (V ) of printable objects of type V. It is assum ed th a t all of these sets are pairw ise disjoint. A weak instance is a m apping from each ab stract type T to a finite set of OIDs in potdom {T ); from each subtype p, where there is a p ath of ISA edges from p to an ab stract type node q, to a finite subset of potdom(qy, from n-ary relation nodes to sets of n-tuples; and from p rintable types to values. An instance is a weak instance w hich satisfies (i) all set inclusions im plied by th e schem a (via ISA relationships and tuple-range restrictions), and (ii) all explicitly specified constraints. Two weak instances are OID equivalent if there exists a perm u tatio n of OIDs th a t m aps one to the other. In this thesis, we follow the usual convention of blurring th e distinction betw een instances and their associated equivalence classes (explicated in [38]). N ote th a t th ere is a correspondence between W D M schem as and relational schem as. (See [38, 50] for a form al m apping of a sem antic d atab ase m odel to the relational m odel.) W orldBase exploits the correspondence heavily in providing its tools. 49 3.1.1 S ta te s o f O b jects O bjects in W orldBase have two different states: persistent and virtual, objects, in their v irtual states, have object identifiers which are represented as internal addresses over which the system program m er has no control. O bjects in secondary memory (by virtue of being in a saved world) are represented by local identifiers th a t uniquely identify th e objects w ithin the world, b u t are not m eaningful across worlds. We call these identifiers PIDs (P ersistent Identifiers). Loading a world instance causes each PID in th e instance to have a corresponding OID in the workspace it is being loaded into, either by creating a new OID or by finding an existing equivalent OID. In the second case, a m erge has occurred. Unlike m any persistent object system s, the user m ay not access PID s directly, b u t m ust access them via the OIDs. 3.2 W orld B ase C om p on en ts A user of W orldBase views th e environm ent as consisting of a collection of persis­ ten t, possibly inconsistent, potentially sharable world databases. T he user m ust activate th e persistent databases by loading them into his w orkspace(s) in order to access the inform ation in the databases. He m ay access m ore th an one database sim ultaneously, either by loading them to different workspaces, or by loading (po­ tentially m erging) them in the sam e workspace. T he W orldBase aggregate environm ent com prises of a set of world databases, th eir properties, specifications, and inter-relationships. T he set of properties for a w orld database include its nam e, owner, storage location, tim estam p inform ation, access control inform ation, etc. T he set of specifications for a world d atab ase (col­ lectively called the world specification) consists of the world schema specification, world closure specification, object equivalence specification and world constraint specification. R elations in the aggregate m odel relate worlds and th eir properties, specifications as well as th eir relationships w ith other worlds (such as dependence relationship). Below, we describe a world database and its relationships to its properties and specifications. 50 3.2.1 W orld D a ta b a se A world database is a repository of inform ation. It is a clustering unit, as well as a un it for persistence and sharing. It is uniquely identified by its nam e and owner an d has a physical storage u n it attached to it containing its population. Each world database has a unique world schema specification w hich specifies the w orld’s schem a, an optional world closure specification w hich helps in populating th e world database in a concern-based m anner, a world equivalence specification which specifies object equivalence inform ation for m erging, and a world constraint specification which specifies constraints on th e world d atab ase (in the current im plem entation, only cardinality and disjointness constraints are specified). It has a world instance consisting of an interrelated fam ily of objects and relationships th a t m ake up th e w orld’s population. Formally, a world database is a nam ed tuple: ( schema: Si, closure: cl,, equiv­ alence: K i , constraint: C,, instance: /,}. In th e rem ainder of this thesis, form al n o tatio n of a world database uses the symbols: Si to denote a world schem a specification, cl, to denote a world closure specification, K i to denote a world equivalence specification, Ci to denote a world constraint specification, and Ii to denote a world instance (or population). Since cli is optional, it m ay be om itted w here its om ission does not affect the descriptions. W here th e context of the tuple is well understood, th e nam es of the slots in th e tuple m ay be om itted. T he world instance m ay be populated in several ways: th e user m ay assert th a t certain objects a n d /o r tuples are in the world explicitly; or he m ay use th e closure specification w ith a specific set of objects a n d /o r tuples to populate th e world. T he user m ay also save all the objects and tuples in the types and relations in the w orld’s schem a specification (i.e. save th e instance of the schem a). Because of the differences in closure, equivalence or constraint specifications, two worlds of the sam e schem a specification m ay have entirely different sem antics. A world d atab ase has persistent and v irtual states. A world database, in its p ersistent state, consists of a collection of inactive objects and relationships th a t m ake up its population. T he user m ay not query th e individual objects in these worlds directly. A world database is in a virtu al state w hen its specifications and its population are loaded into a workspace (i.e. are in th eir v irtu al states). In 51 its v irtu al state, th e user m ay query th e schem a, d a ta and o ther properties of the world directly. W orld Schem a A world schema specification (or sim ply a world schema) specifies th e schem a (stru ctu re) of a world database. It is uniquely identified by its nam e and creator, and consists of a set of relations and type specifications th a t specifies th e schem a stru ctu re. T he syntax for schem a specifications is described in th e appendix B. An exam ple of a world schem a specification can be seen in th e appendix D. A relation is declared as a list consisting of: its statu s, im plem entation type, nam e and o ther specifications (type constraints or definition). A type is declared as a list consisting of: its statu s, im plem entation type, nam e and an optional list supertypes. W hen no supertype is specified, the type being declared is an ab stract type. S tatus is either internal or shared. Internal refers to th e fact th a t th e relation or type is, by default, unshared by other specifications (unless explicitly renam ed by th e user during loading). Shared m eans th e relation is potentially sharable. Im plem entation is either stored or derived. Stored refers to relations and types whose instances are user-asserted. Derived refers to relations and types whose instances are derived from other relations in the form of derivation rules. T hus, a stored relation is specified by its nam e and a sequence of types th a t constrain the slots of tuples allowed in the relation. T he arity of th e relation is deduced from the length of this sequence. A stored type is specified by its nam e and optional supertype(s). Of course, all world m eta-d ata— schem a, closure specification, equivalence specification and constraint specification-has persistent and v irtual states as well. A world schem a, in its persistent state, is sim ply a tex t description of a schem a specification for a world database. It consists of a nam e of th e schem a and a set of relation and ty p e specifications (the owner is deduced from the user nam e). The tex t description is stored in a repository (file in th e prototype). W hen a schem a is loaded into the virtual store, labeled stru ctu res correspond­ ing to types and relations in th e schem a are created in th e workspace. Types and relations, in th eir virtual state, are im plem ented as (m eta) objects in th e 52 workspace. T he schem a stru ctu re m ay be populated by loading a world instance or by user assertions. T he current prototype does not allow m ajor m odifications to th e world schema. O nly renam ing and adding relations to existing world schem as are allowed. How­ ever, th e user is free to create new world schem as as he sees fit. W orld C losure S p ecification A world closure specification specifies th e closure of objects and tuples in th e world by providing an n o tated tem plates for types and relations th a t specify objects and tuples of concern to the world, to be a p a rt of the object or tuple. It can be used as a specificational selection m echanism . T he syntax and sem antics of th e closure specification are described in C hapter 4. T he world closure specification, in its persistent state, is a text description of a list of object and relation closure specifications for a world database. In its v irtual I state, th e world closure specification resides in m eta-relations in the workspace and is used in populating certain worlds and in some transform ations. W orld E quivalence Specification A world equivalence specification specifies a set of equivalence specifications for object types of a world database. Not all object types in a world database have equivalence specifications specified. T he types for which equivalence specifications are specified are called mergeable types. An equivalence specification specifies when two objects of th e sam e type are equivalent. In this thesis and th e prototype, equivalence specifications refer to specifications of key relationships. T hey are specified as a list of binary relations th a t form a key for a given object type. T he syntax and sem antics for object equivalence is described fu rth er in C hapter 6. A world equivalence specification, in its persistent state, is a tex t description of a list of relations th a t form s a key for a p articu lar type. In its virtu al state, th e world equivalence specification resides in m eta-relations in th e workspace and is used during loading of world instances to determ ine th e existence of objects (in th e w orld) in th e workspace for merging. 53 W orld C on straint Specification A world constraint specification specifies a set of constraints attached to a world database. In this thesis and the prototype, we deal only w ith a restricted form of cardinality constraints and disjointness constraints; key constraints axe dealt w ith separately in w orld equivalence specifications. C ardinality constraints are specified as a 4 tuple: a relation nam e, slot, lower bound and u pper bound, where lower bound € {0,1} and upper bound € {l,u>}. D isjointness constraints are specified as a pair of nam es of subtypes whose instances m ust be disjoint. T hey are used to constrain th e population of a world. A world constraint specification, in its persistent state, is a text description of a list of constraints for a world database. In its virtual state, th e world constraint specification resides in m eta-relations in th e workspace and is used to determ ine th e constraints of th e workspace as a sum of th e constraints of th e loaded worlds. T he constraints are used to determ ine th e m ergeability of worlds w ith different constraints. In its v irtual state, the constraints are not enforced in the workspace unless th e user chooses to enforce them . Enforcing the constraints activates rules th a t trigger on violations and allow interactions w ith th e user. W orld P op u lation A world instance refers to the population of a world database at some fixed point in tim e. T here are two basic ways to populate a world database in th e current system . A w orld’s population m ay be decided according to its schem a definition, i.e. all th e objects and tuples in the schem a definition of th e world are in the world database. A w orld’s population m ay also be decided by its closure specification. T he user specifies th e set of seeds in the world, and all objects and relationships related to th e seeds (as specified by th e closure) are in the w orld’s population. T his is described in detail in C hapter 4. In its virtual state, th e world instance is a collection of OIDs and tuples in the workspace th a t form the population of the world. Saving a world causes the population to be preserved in persistent store. A world instance, in its persistent state, contains a collection of PID s and tuples (of PID s) th a t form the population of the world. Loading a world causes th e world instance to be (re)created in 54 the workspace (subject to m erging). A world instance can only be loaded if its schem a is already loaded. T his phase m ay involve object merging; this is explained in C hapter 6. 3.3 P e r siste n t Store T he persistent store consists of a collection of repositories which represent the persistent states of world databases and their specifications. T he registry contains properties of worlds, world specifications, and their interrelationships. T he registry provides the user w ith a view of th e com ponents in persistent store. 3.3.1 R eg istr y T he registry is im plem ented as a d a ta dictionary th a t is accessible to all th e w orkstations in the netw ork. It contains inform ation on available worlds in the netw ork. It m ay be im plem ented as a single central registry or a netw ork of registries th a t com m unicate to provide each other w ith up-to-date inform ation. Since the W orldBase environm ent consists of a collection of worlds, each owned by a user, nam es are introduced to reflect this structure. Each world in W orldbase is uniquely identified by its nam e and ow ner-nam e. T hus, a user of W orldBase m ay assign nam es to his worlds w ithout fear of conflicts w ith other users’ nam es. T he registry keeps track of the actual location of th e world in th e netw ork based on its nam e and ow ner-nam e. T he underlying im plem entation and actual physical distribution of th e worlds are not reflected in the conceptual sharing layer seen by the user. A copy of th e registry resides in virtu al store. Registries in the virtu al store are periodically saved to persistent store via the checkpoint-registry operation. T he update-registry u p d ates the registry in th e v irtual store from th e m ost up-to-date persistent copy. D a ta D efin ition Several operations are provided to register a new world schem a, a new world database, or to unregister a world or a world schema. A world schem a is registered 55 separately because it m ay be shared by other world instances. T he operation register-worldschema takes a world schem a specification nam e and owner w ith a list of properties and includes its property inform ation in th e registry. T he operation register-world takes a world database nam e and owner, its world schem a nam e and owner, and an optional world closure specification, and includes the inform ation in the registry. D a ta M an ip u lation . D a ta in the persistent store is m anipulated indirectly through operations in the virtual store. Such operations include saving a world, and m e ta -d ata operations to m odify a w orld’s schem a, or its closure, key or constraint specifications. This is described in Section 3.4. Q uery L anguage T here is no query language for th e persistent store directly (except for operations accessing th e repositories) in the prototype. A v irtual state of th e registry exists in the w orkspace, and a user m ay query the registry by querying th e structures th a t represent it. For instance, the user m ay request a list of various properties of each world and world schem a having a known property (such as owned by a specific user); or list various worlds and world schem as in the registry. T he full power of the v irtual database query language is allowed. R egistry queries retu rn values only, since all inform ation in the registry is value-based. 3.4 V irtu a l Store T he v irtu al store refers to a w orkstation’s workspace-th e v irtu al m em ory available for use in th e w orkstation. Each workspace contains a w orking database. T he workspace manager m anages th e loading and unloading of worlds and their specifications to and from th e workspace. 56 3.4.1 W ork sp aces A WorldBase workspace (or sim ply workspace) is a virtu al m em ory database. It contains a workspace schema th a t is form ed from com patible schem as defined by world schem a specifications th a t are loaded into the workspace (and other m eta­ schem as used by W orldBase), and a workspace instance com posed from com patible world instances th a t are loaded into the workspace (and other m eta-d ata). Each w orkspace is disjoint from other workspaces in th e netw ork. W orkspaces do not persist. O peration s on W orkspaces Several workspace-specific operations are supported. One m ay discard the contents of a workspace. One m ay list all worlds in a workspace as well as all world schem a in a workspace. One m ay also u p d ate or m odify the w orkspace database using th e d atab ase m anipulation language. 3.4.2 W orld S ch em a S p ecifica tio n We describe some of th e top level operations supported for a world schem a speci­ fication. C reate a W orld Schem a Sp ecification A world schem a specification is created (and registered) by specifying the world schem a specification nam e and creator along w ith a list of schem a specifications. T he schem a specifications are declarations of types and relations w ith th eir respec­ tive type constraints and statuses. D eclarations of types and relations are ordered such th a t types are defined before being used in relations (as type constraints) and there are no circular dependencies in th e declarations. T he user m ay also create a new world schem a by providing a list of relations and types existing in the workspace as the world schem a. W orldBase autom atically tran slates th e relations and types into its declarations and saves th e persistent state of th e world schema. 57 Load a W orld Schem a S p ecification A world schem a specification m ay be loaded to form its virtu al sta te in the workspace. Loading a world schem a specification causes the w orkspace m anager to create or look up all th e relations and types declared in the world schem a spec­ ification w ith the following nam ing conventions. T he user m ay provide a list of renam ed types and relations (in the form of old-nam e, new -nam e pairs). R elations and types which are being renam ed are looked up or created using th e new nam e, i.e. th e new nam e is the global nam e of th e relation or type. Internal relations and types which are not being renam ed are looked up or created w ith a nam e th a t contains th e world schem a nam e and owner w ith the relation or type nam e, i.e. its global nam e is a concatenation of the world schem a nam e, owner, and th e relation or type nam e. Shared relations and types which are not being renam ed are looked up or created w ith th e nam e as it appears in the schem a specification, i.e. the global nam e of th e relation equals the nam e of th e relation. M appings from the local nam es to the global (workspace) nam es are kept in th e workspace. Tw o world schem a specifications m ay share the sam e relation if they axe loaded together into the sam e workspace w ith the sam e global nam e. T he environm ent m anager keeps track of reference counts on th e num ber of world schem as sharing th e sam e relation. However, it m ay be the case th a t two world schem as m ay have th e sam e nam e for shared relations which are actually different. W orldBase does a check on th e equivalence of relations and types based on type constraints, a n d /o r su b ty p e/su p erty p e relationships. In th e prototype, W orldBase requires derived types and relations to have the sam e derivation rule for the types and relations to be equivalent (this is stricter th an necessary). It cannot detect any conflicts if the relations are m eant to be used differently b u t no stru ctu ral conflicts are detectable, since th e sem antics of the relations are subject to user interpretations. U n load a W orld Schem a Specification A w orld schem a specification can only be unloaded if all th e worlds th a t depend on it have been unloaded from the workspace. U nloading a world schem a involves checking if any other world schem as are sharing th e relations and types of th e schem a it specifies and rem oving the unshared relations and types. T his can be 58 achieved by decrem enting the reference count of relations and types in the world schem a in the workspace. Types and relations th a t are not referenced by other schem as are rem oved as follows: first, all the tuples from the relations or types are removed; next, th e relations and types corresponding to th e schem a stru ctu res are rem oved from the workspace. C le a r W o rld S c h e m a D o m a in T his operation removes all objects and relationships (instances) of a given schem a if they are not shared by other world schem a specifications. T he schem a stru ctu re itself is not removed. S av e W o rld S c h e m a S p e c ific a tio n A persistent copy of a world schem a specification m ay be saved from th e workspace. K ill W o rld S c h e m a S p e c ific a tio n A world schem a specification m ay be rem oved from th e workspace and the registry only if there are no world databases (in the workspace and th e registry) using it or are dependent on it. This m ay be achieved by th e kill-worldschema operation. M o d ific a tio n s to W o rld S c h e m a S p e c ific a tio n A user m ay add a relation or type to a world schem a or remove a relation or type from a w orld schema. No other m odifications are supported. F urtherm ore, when rem oving a relation or type, th e resulting consistency of the world schem a is not guaranteed, since there m ay be relations depending on th e relation or type being rem oved, as well as other persistent worlds depending on th e schema. 3.4.3 W orld C losu re S p ecifica tio n Some of th e top level operations supported for world closure specifications include: • L o a d W o rld C lo s u re S p e c ific a tio n A virtual sta te of the closure specification is restored into a workspace. 59 • U n load W orld C losure Specification T he closure specification for a given world is rem oved from th e workspace. This operation does not affect the persistent state of th e world closure spec­ ification. • Save W orld C losure S p ecification A persistent copy of th e closure specification for a given world is saved from th e workspace. • M odification s to W orld C losure Specification T he user m ay change th e closure specification of a world by asserting or retractin g th e closure specification for a specific type or relation of a world in th e workspace (using the virtu al database m anipulation language). The changes are not persistent until the world closure specification is saved. 3 .4 .4 W orld E q u ivalen ce S p ecifica tio n Some of th e top level operations supported for world equivalence specifications include: • Load W orld E quivalence Sp ecification A v irtual sta te of the equivalence specification is recreated in a workspace. T his operation also changes th e equivalence specifications of types and re­ lations in the workspace database. T he most natural merge of the object equivalence specifications is used; i.e. the intersection of equivalence spec­ ifications for types is used as th e key for th e workspace database. This “m erging” of keys is described in C hapter 6. • U n load W orld E quivalence Sp ecification T he equivalence specification for a given world is rem oved from the workspace. T his operation does not affect th e persistent state of th e world equivalence specification nor the equivalence specifications for th e workspace database. 60 • Save W orld E quivalence S p ecification A persistent copy of a world equivalence specification for a given world is saved from the workspace. • M od ification s to W orld E quivalence S p ecification T he user m ay change the world equivalence specification by asserting or retractin g the object equivalence specification for a specific type in a world in th e w orkspace (using th e virtu al database d a ta m anipulation language). T he changes are not persistent until th e world equivalence specification is saved. T he user m ay also change the equivalence specification of th e workspace d atab ase directly. This change m ay affect the world equivalence specifica­ tion, since some world databases inherit the equivalence specifications of th e w orkspace when they are saved. 3 .4 .5 W orld C on strain t S p ecifica tio n Some of the top level operations supported for world constraint specifications include: • Load W orld C on straint Specification A virtual state of the world constraint specification is recreated in a workspace. This operation also changes the constraint specifications of types and relations in th e workspace database. T he most natural merge of th e con­ strain t specifications is used (see C hapter 6). • U n load W orld C onstraint Specification T he constraint specification for a given world is rem oved from the workspace. T his operation does not affect the persistent state of the world constraint specification nor th e constraint specifications for the workspace database. • Save W orld C on straint S p ecification A persistent copy of a world constraint specification for a given world is saved from th e workspace. 61 • M od ification s to W orld C onstraint S p ecification T he user m ay change the world constraint specification by asserting or re­ tractin g th e constraint specifications for a specific type or relation in a world in the w orkspace (using the virtual database d a ta m anipulation language). The changes are not persistent until th e world constraint specification is saved. T he user m ay also change the constraint specification of the workspace d atab ase directly. T his change m ay affect the w orld constraint specifica­ tion, since some world databases inherit th e constraint specifications of the w orkspace w hen they are saved. 3.4.6 W orld In sta n ce Various operations on th e world instance (i.e. th e objects and relationships in a world) are allowed by using th e virtual database d a ta m anipulation language. T he operations m odify the objects and relationships in the workspace; b u t th e changes are not persistent until the world instance containing th e changed stru ctu re is saved. In addition, the following high level operations are supported: • C reate a W orld A world database is created by providing a nam e and creator and its world schem a specification. If a world closure specification is provided, the world is closure based, i.e. its population is determ ined by the closure of a given set of seeds for the world. Otherw ise, the world is schem a based, i.e. its population is th e instance of its world schema. • Load a W orld A w orld’s population is loaded according to its world specifications. T he w orld’s population, in its persistent state, consists of PID s (objects) and relationships th a t are restored to th e workspace through loading. Restoring a PID (object) consists of looking up equivalent OID based on th eir equivalence specifications (if any); if an equivalent OID is found, th e OID is used as th e virtual version of the PID being restored. If none is found, a new OID is 62 created to correspond to th e PID being restored. Once all th e objects in th e instance are restored, relationships for those objects are asserted. This operation is described in detail in chapter 6. • U n load a W orld U nloading a world does not remove any objects from th e workspace. It m erely removes th e association of objects w ith th e world and removes the world from the workspace. • Save a W orld A persistent copy of a world instance is saved from th e workspace. This involves assigning unique persistent identifiers to objects in the world to save an isom orphic stru ctu re in persistent store. K ill a W orld A world is rem oved from the workspace and the registry, only if there are no world databases (in th e workspace and th e registry) dependent on it. 3 .4 .7 D a ta b a se D efin itio n and M a n ip u la tio n L angu ages W orldBase works w ith a virtu al m em ory database m anager (A P5) to m anage in­ form ation in the workspaces. T he database m anager is required to provide virtual m em ory database support. In particular, it m ust support a schem a definition lan­ guage, a d a ta m anipulation language, a query language and provide some basic su p p o rt for constraints. T he schem a definition language m ust allow one to define relation(s), define type(s) and subtype(s). T he d a ta m anipulation language m ust provide prim itives for d a ta m anipulation, such as create, assert, or remove objects or tuples to and from a relation or class. T he query language m ust provide th e necessary support to access or generate objects and tuples from th e database. T he constraint language m ust su p p o rt cardinality and disjointness constraints, and allow a user to specify his own repair rules for those constraint conditions th a t are violated. In the prototype, A P5 (see chapter 2 and appendix A for a description) is used to provide th e d a ta definition and m anipulation languages. 63 3.5 S u m m ary W orldBase provides two models of inform ation: the d a ta m odel, and the aggregate m odel. T he d a ta m odel is used to m odel inform ation in th e real world. T he aggregate m odel is used to m odel aggregates of inform ation of th e real world. W orldBase supports two states of inform ation: the v irtu al state, and the per­ sistent state. T he virtu al state is th e state the user m anipulates. He has com plete control over all the inform ation in the virtual store. T he persistent state is the state used to store sharable aggregates of inform ation. 64 C h ap ter 4 P o p u la tio n by C losure S p ecification T his chapter describes a sho rth an d m ethod for populating a world by specifica­ tion. We first describe an overview of th e closure specification approach and then describe its syntax and sem antics. 4.1 C losu re S p ecification O verview W orldBase supports a selection m echanism th a t extracts inform ation from the underlying d atab ase (workspace) based on a specified set of tem plates. A closure specification specifies a tem plate which directs, given a set of seeds of a virtu al world, w hat inform ation should be extracted from th e workspace d atab ase to form its closure. B oth objects and tuples are p erm itted as seeds as long as there are closure specifications for their types and relations in th e corresponding world spec­ ification. T he world population m echanism of W orldBase autom atically includes all objects and relationships related to the seed (as dependents') by the closure specification in the world it is creating. A world closure specification consists of a collection of object closure specifica­ tions and relation closure specifications. An object closure specification is specified as a type nam e and a list of annotated relations (also called relation patterns) to indicate th e tuples and objects th a t should be p a rt of the o b ject’s closure. R ela­ tion p a ttern s are relation nam es followed by one or m ore annotations: @ , $, ! and *. A relation closure specification is sim ilar to a relation p atte rn except th a t it does not have @ in its annotations. T he gram m ar for th e closure specification is presented in appendix B. 65 string string An exam ple of an object closure specification for type p e rso n in person infor­ m ation schem a of figure 4.1 is: person (p-nam e(© $) , liv e s - a t( ® ! ) , s k i l l s (0 ! ) , h a s -c h ild (Q $) recommended-by(@ $ * )) Relations p-nam e, l i v e s - a t , s k i l l s and recom m ended-by m ust relate type p e rso n in the position w here the © occurs (in this case, in th eir first position). We illu strate w hat happens to the closure of a p e rso n object p i w ith the above closure specification: • If there exists an x such th a t p-name ( p i x ) is tru e in th e workspace database, b o th x and th e tuple are included in the closure, b u t this does not cause the closure process to continue w ith x. ($ m eans include th e object in th e closure, b u t do not com pute the closure of the object included (i.e. include a “stu b ” ).) string has- child has- spouse p-name string skills street lives-at city person address string recommendation Figure 4.1: Person Inform ation Schem a 66 • If there exists a l such th a t l i v e s - a t ( p i a l ) is tru e in th e workspace, w here a l is an a d d re ss object, a l and its closure (com puted based on the closure specification of type a d d re ss) and th e tuple are included in the closure. (! m eans to include th e object in the closure and also include th e closure of the new object included.) • If th ere exists p2 such th a t h a s - c h ild ( p i p2 ) is tru e in th e workspace, w here p2 is of type p e rso n , p2 is included in the closure as a stub. If p2 is not in th e closure through other m eans, it is sim ply an OID w ith no relationships defined. • If there exists x and p such th a t recommended-by ( p i x p ) is tru e in the w orkspace, w here x is a string and p is of type p e rso n , x is included in the closure as a stub. Also if p is already in th e closure (by being a seed or through th e closure of other objects), then th e tuple is included in the closure. O n the other hand, if p is not in th e closure for some other reason, then neither p nor the tuple ( p i x p ) are included in the closure. (* m eans include th e tuple in the closure only if th e object in th e place of th e * is already in th e closure. In the absence of *, the tuple is included in the closure.) • T he above m echanism also applies for rest of the relations ( s p e c ia lty and a v g -p ric e ) in the object closure specification. W hen an annotation refers to a value, $ and ! is equivalent, i.e. include th e value in th e closure. * is not p erm itted as an annotation to a value colum n. An exam ple of a closure specification for th e relation recom m endation in figure 4.1 w ith p e rso n , s t r i n g (job) and p e rso n in the first, second and th ird positions, respectively, is given by: recommendation (! $ !) T his m eans th a t given a seed tuple recom m endation ( p i "plum ber" p2 ) (relating p e rso n object p i to job of plum ber of a p e rso n object p2), th e tuple is in the closure by default (by being a seed), along w ith the closure of p i and p2 (because of th e !). 67 Because the closure does a “m ark and traverse,” closure specifications m ay be used to save cyclic structures regardless of directionality of relations. An exam ple would be: person (p-name(®,$ ) , l i v e s - a t (®, ! ) , has-spouse(@ , !) , h a s -s p o u se (!,©)) A ssum ing h a s -s p o u se is a binary relation relating a p e rso n object to another p e rso n object, given a p e rso n object p i whose spouse is p2, all inform ation about (i.e. closure of) p i and p2 are in the closure of p i. In p articular, since h a s -sp o u se (® , ! ) relates p i to p2, the closure m echanism will include p2’s clo­ sure, which contains p i through h a s -s p o u s e (! ,©), i.e. th e spouse of a person is always in h is/h e r closure. 4.1.1 C o n sisten cy and C o m p leten ess Closure specifications m ust have th e following properties: • Each of the type and relation nam es in the specification m ust be a valid type or relation in the world schem a specification. • T he num ber of annotations of a relation in th e closure specification m ust equal its arity. • T here m ust be one and only one © per relation p a tte rn of an object closure specification. T he © m ust indicate a position in th e relation w here objects of th e type in th e specification occurs. • M ultiple relation p attern s w ith th e sam e relation nam e m ay be specified for th e sam e object type as long as th eir annotations are different, and the relation p attern s are valid w ith respect to type constraints. T he sam e relation p a tte rn m ay be m ultiply specified; this does not affect the result of the closure. 68 • No m ultiple relation closure specification for th e sam e relation is allowed since it m ay lead to am biguities, e.g. if we allow recom m endation($ $ !) and recom m endation(! $ $) in a world closure specification, given a seed tuple recom m endation { p i "plum ber" p2 ), do we include p i and p2’s closure in th e closure of th e tuple? T he two specifications m ay be replaced by a single, unam biguous specification recom m endation( ! $ ! ). Several object closure specifications m ay be specified for th e sam e object type. T he union of th e relation p attern s for th e object type is used. Each subtype inherits its su p erty p e’s object closure specification. A subtype m ay be specified in several ways. It m ay be specified as a relation closure specification, w ith a single an n o tatio n (unary relation). In this case, a closure specification for its type (or supertypes) m ust be defined. It m ay be specified as a relation p a tte rn (w ith single @ ) under (one of) its sup erty p e’s object closure specification. It m ay be specified as an object closure specification w ith its own relation p attern s. A world closure specification satisfies the closure specification completeness condition if object closure specifications are specified for all objects reachable thro u g h a relation closure specification or relation p attern . For instance, types a t the ! and * positions of relations m ust have their own closure specification defined. This condition ensures there are no stubs (objects which the system or user know nothing about except its participation in some relationship) in the closure of a given set of seeds, thus, no stubs in th e resulting population of the world database. However, this condition is not necessary to populate a world, since stubs m ay be desirable under certain conditions. 4.2 C losu re S p ecification S em an tics T he closure m echanism com putes th e transitive closure of d a ta relationships spec­ ified by th e closure specification until all objects reachable thro u g h relationships in th e closure specification are traversed. T he algorithm is a specialized version of “m ark and sweep.” Below we describe th e algorithm used to com pute th e closure. F igure 4.2 is a pseudo-code of th e m ain body of th e traversal algorithm . T he algorithm described is generic; specialized functions w ith side effects (pre-ob-fn, post-ob-fn and post-tup-fn) are provided w ith each invocation to deal w ith the 69 IN PU T : W orld, seed-objects, seed-tuples, pre-ob-fn, post-ob-fn, post-tup-fn M ETH O D : (a) loop for x in seed-objects do find-object-closure (W orld, x, pre-ob-fn, post-ob-fn, post-tup-fn); (b) loop for x in seed-tuples do find-tuple-dosure (W orld, nil, x, pre-ob-fn, post-ob-fn, post-tup-fn); Figure 4.2: Closure Traversal A lgorithm com puted closure. T he closure is collected by side-effects of th e generic traversal algorithm . Pre-visit and post-visit functions specified w ith th e traversal algorithm and are evaluated for each object and tuple visited in the traversal. T he algorithm also takes as input th e world, whose closure specification form th e basis of th e traversal, a set of seed-objects and seed-tuples. F irst, all the seed objects are traversed, and their closure traversed and com puted (closure is indirectly collected through the pre- and post- functions). T hen all th e seed tuples are traversed and th eir closure com puted (again, closure is indirectly collected). T he functions to traverse objects and tuples are m utually recursive; they call on each other. IN PU T: W orld, object, pre-ob-fn, post-ob-fn, post-tup-fn M ETH O D : (a) If object has already been visited, then return; otherw ise m ark visited; (b) A pply pre-ob-fn to object; (c) F ind closure specification for object; Loop for relpats in the closure specification of object do (c l) find all tuples w ith object as specified in relpats (evaluate query to workspace to generate th e tuples, and save th e results); (c2) loop for tu p in all tuples found do find-tuple-closure (W orld relpats tu p , pre-ob-fn, post-ob-fn, post-tup- fn); (d) A pply post-ob-fn to object; Figure 4.3: F ind Closure of O bject 70 Figure 4.3 is a pseudo-code of th e object traversal algorithm . If the object has already been visited, th e traversal returns. Otherw ise, th e object is m arked visited and the pre-visit function for the object is evaluated. D uring populate,1 th e pre­ visit function pu ts the object in world as a dependent if it is not a seed. D uring saving, th e pre-visit function assigns a unique PID for each OID encountered. N ext, th e closure specification for th e object is found, and th e relation p attern s indicate which relations m ust be traversed and included in th e closure of the given object. T he relevant tuples are found (generated from a query to th e workspace database) and th e closure of the tuples is found recursively. Finally, th e p o st­ visit function for objects is evaluated. Post-visit function for objects is not used populate, b u t is used in save to w rite the objects out to a repository. IN PU T: W orld, relp attern , tuple, pre-ob-fn, post-ob-fn, post-tup-fn M ETH O D : (a) If tuple has already been visited, then return; otherw ise, m ark visited; (b) If no relp atte rn is provided w ith the function call, th e closure specification for th e tuple is looked up; (c) Ite rate over each annotation of relpattern: @ = find-object-closure of object at the @ position of tuple; ! = find-object-closure of object at the ! position of tuple; $ = retu rn t if object at $ position of tuple is a value or is already in the closure; nil otherwise; * = p u t object at * position in th e w orld/closure and m ark th e object as visited (do not find its closure); (d) If tuple is a seed, th en all objects m ust be in the closure otherw ise it is an error (inconsistent specifications); (e) If all the objects are in the closure, then th e tuple is p u t in th e closure, and post-tup-fn is applied to tuple; Figure 4.4: F ind Closure of Tuple Figure 4.4 is a pseudo-code of th e tuple traversal algorithm . If the tuple has already been visited, the traversal returns. If a relp attern is provided, th e traversal 1 Populate and save are specific cases of the generic closure algorithm that provides explicit pre- and post-visit functions, whose effects are described in the text. 71 is p a rt of an object closure specification; otherw ise, a closure specification for a relation is found. Given a relp attern or relation closure specification, th e algorithm uses the annotations in the relp attern or closure specification to decide w hether to recursively find th e closure of the objects at th e an n o tated slots (@ or !) or to retu rn ($) or to sim ply m ark th e object as visited (*). N ote th a t th e sem antics of * is such th a t a collection of seeds, because of th e ordering of evaluation, m ay cause certain tuples not to be included in the closure even though all its objects are in th e world. T he user m ust ensure th a t all th e objects in such tuples appear as seeds if th e tuple is to be included in th e closure. If the given tuple is a seed, then all objects in th e tuple m ust be in the closure. If all the objects in the tuple are in the closure, then th e tuple is p u t in th e closure and th e post-tup-fn is applied to th e tuple. D uring populate, this p u ts the tuple in th e world. 4.3 C losu re S p ecification in W orld B ase T he closure traversal algorithm is used by W orldBase to populate a world by closure, as well as to u p d ate a closure-based w orld’s population so it conforms to th e workspace database. T he description in this section refers only to closure-based worlds. A w orld’s closure specification is a collection of object and relation closure specifications for objects and types in the w orld’s schem a. T he closure specification for an object m ay be changed; however, a closure-based world cannot becom e a schem a-based world by rem oving all the closure specifications for th e world. A closure-based w orld w ith em pty closure specification sim ply cannot be populated by any autom atic m echanism . T he user m ay still populate th e world by explicitly asserting objects and tuples to be seeds and dependents of a world. A world m ay have a closure specification th a t does not satisfy th e closure specification com pleteness condition. W orldBase does not check for the closure specification com pleteness condition, allowing th e user the flexibility to define, m anipulate and save incom plete inform ation. A world closure specification is said to cover a world schem a if there exists an object closure specification for each object type in th e world schem a and there 72 exists a relation closure specification for each relation in the schem a. This prop­ erty ensures th a t any and all types of objects and tuples in the schem a m ay be provided as seeds to th e world, and the closure of th e seeds will be com putable. W orld closure specifications need not have this property; however, assignm ent of objects or tuples whose closure specification is undefined will result in a failure to com pute the closure. T here is a property which a world closure specification m ust have: object closure specifications m ust include their key relationships (specified in object equivalence specification) in their relation p attern s, since key relationships m ust be saved w ith the objects. In the im plem entation described in this thesis, schem a-based worlds are pop­ ulated by closure as well. A closure specification th a t covers the w orld schem a is generated and all th e tuples (instance) of th e schem a are provided as th e seeds. T he world closure specification here is sim ply a collection of relation closure spec­ ification of relations in th e schem a w ith all p arts of it to be traversed (annotated !); and a collection of object closure specifications w ith no relation p attern s. 4 .4 S u m m ary We described a selection m echanism using closure specification. T he syntax of the closure specification is sim ple, w ith a lim ited num ber of annotations; th e sem antics are easy to u n derstand. T he closure m echanism provides an effective focusing m echanism . O ne shortcom ing of th e current im plem entation is th a t W orlds m ust be loaded to th e workspace for th e traversal. In future im plem entations, when access to conventional databases is supported, this loading m ay not be necessary, since u ltim ately we are interested only in a portion of the data. Also, th e traversal algorithm recom putes the closure each tim e it is called. A m ore efficient algorithm th a t increm entally com putes th e closure is needed. Extensions to th e algorithm are easily im agined w ithin the sam e general fram e­ work, w ith atten d a n t com plications. T he closure specification m ay be also viewed and used as a specification of complex object boundaries. A m ore complex, or a different set of sem antics m ay be used, w ith th e sam e general idea, in which case, th e traversal algorithm m ay be m ore complex. 73 C h ap ter 5 W orld B ase T ransform ation This chapter focuses on the W orldBase transform ation support. W orldBase sup­ po rts a non-procedural transform ation language, ILO G *, to transform database instances from one schem a to another. T his D atalog-based form alism specifies the extent of each targ et schem a construct, either directly or indirectly, in term s of th e source schem a instances. An ILOG* specification consists of a set of corre­ spondences each of which describes a p articu lar source-to-target schem a transfor­ m ation. We first describe th e ILO G * specification language and present exam ples of transform ations using th e language. Next, we describe the two different ways transform ations w ith ILOG^" are supported in W orldBase. 5.1 T ransform ation S p ecification w ith ILO G * ILO G ± is a declarative language for specifying correspondences betw een source and target schem as. T he prim ary intent of this language is to experim ent w ith a new paradigm for schem a tran slatio n w ith explicit OID creation; th e syntax used is rath er prim itive b u t could easily be extended to m ake it m ore palatable. U n­ like m ost previous approaches, our transform ation language, ILO G ^, is prim arily declarative in nature. T his is because of th e close relationship of the form al sem an­ tics for ILO G * and m inim al m odel sem antics of logic program m ing (see section 5.1.4). As a result, ILO G ^ can be viewed as expressing transform ations essentially by specifying a correspondence betw een the source and targ et schemas. 74 We m entioned previously th a t there is a correspondence betw een W DM schem as and relational schemas. (See [38, 50] for a form al m apping of a sem antic d atab ase m odel to th e relational m odel.) In fact, the n o tatio n of ILO G * essentially operates on relational sim ulations of th e source and targ et schemas. T he key to ILO G * specifications is th e notion of intermediate relations, used to establish correspondences betw een objects in th e source and targ et, and pro­ vide an essential m echanism for object creation. We use intermediate relations as convenient interm ediate storage in the transform ation process. We restrict th e language so th a t target relations can be explicitly derived entirely from the source and interm ediate relations. T his restriction is m otivated prim arily by prac­ tical considerations in connection w ith distributed databases: if th e source and targ et databases are on separate m achines, the d atab ase transform ation can be com puted prim arily on the source m achine, requiring m essage passing only from source m achine to targ et m achine. T his restriction has no im pact on th e expressive capabilities of the language. An ILO G * specification (also called a transform ation specification) consists of a set of local correspondences for a particular source-to-target schem a transform a­ tion. We assum e th a t the target schem a is a valid W D M schem a defined by the user. A local correspondence (or sim ply correspondence) specifies the correspon­ dence betw een an interm ediate relation or a node or an edge of the targ et schem a structure, and structures of the source. T he union of local correspondences for a p articu lar node specifies its extent in term s of interm ediate and source relations. M ore generally, th e ILO G * specification specifies th e extent of each targ et con­ stru ct, either directly or indirectly, in term s of th e source schem a. T he syntax and sem antics of th e language borrow s heavily from logic program m ing, D atalog and IQL. A local correspondence has th e following form :1 T a r g e t - o r - in t e r m e d ia t e - r e la t io n - s p e c : - s o u r c e -w ff AND in te r m e d ia te -w ff ; 1In the spirit of Datalog and logic programming, we refer to the expression on the left side of the to be the head of the equation; and the expression after the to be the body of the equation. 75 A local correspondence consists of a relation specification (consisting of the relation nam e and list of variables) as the head, and a source-wff and interm ediate- wff in the body. T he relation nam e in the head refers to a target relation or an intermediate relation being defined by th e specification, while th e body of the correspondence is a query to th e source a n d /o r interm ediate relations. T arget relations are n-ary (n > 0) relations of th e targ et schem a. T hey relate targ et objects (i.e. objects in the targ et database) a n d /o r values. Source objects cannot be used in a tuple in a targ et relation unless they have value types. (This restriction prevents the inclusion of source OIDs in th e u ltim ate targ et instance.) Interm ediate relations are relations which are used to relate source OIDs, ta r­ get OIDs a n d /o r values. Target objects (O ID s) are created in interm ediate re­ lations using a special sem antics described below. Each targ et object created in an interm ediate relation participates in th e relation and is accessible in other interm ediate-w ffs. T he interm ediate relations are not p art of th e targ et schem a, b u t they m ay reside on th e target m achine after the transform ation process to facilitate u p d ate propagation. T he so u rc e -w ff is a query to th e source database; it could be specified in any query language th a t can be used w ith our model. In p articular, th e source-wff can be specified in a relational calculus-like language which queries th e source database. For the exam ples in this thesis and th e prototype, we allow th e full power of th e AP5[15, 14, 16] query language, an extension of relational calculus th a t allows b o th object identifiers and values in relations. It is described in m ore detail in Section 2.5 and A ppendix A. I n te r m e d ia te -w ffs are conjunctions of atom s ranging over interm ediate re­ lations, used to access target objects and values in the interm ediate relations. S o u rc e -w ff and in te r m e d ia te -w ff are optional, but at least one m ust be speci­ fied. As in D atalog, any variable occurring free in a rule body is im plicitly assum ed to be existentially quantified in th a t body. 76 5.1.1 E xam p les To illu strate our language, we specify (p arts of) a transform ation of the source schem a of Figure 5.1 to th e target schem a of Figure 5.2. To avoid am biguity of th e source and targ et schem a nam es in this presentation, th e relations and types from th e source schem a are preceded w ith "A. , " and relations and types from th e targ et schem a are preceded w ith "B." However, in our im plem entation, th e com piler for ILO G * autom atically distinguishes betw een th e source, targ et and interm ediate relations. course string c-name char (grade) taught-by enrollment last-name string string t-name teacher student first-name string Figure 5.1: Source Schem a (A) T he source schem a (A) of figure 5.1 has A. s tu d e n t and A .te a c h e r as separate a b stract entities. A .E n ro llm en t is a tern ary relation relating A .s tu d e n t and A .c o u rse to a grade (character). A .te a c h e r s have an A .t-n am e relation relating them to strings, and A .s tu d e n ts have A .la st-n a m e and A .f ir s t- n a m e relations. T he targ et schem a (B) of figure 5.2 is a richer schem a th a t m odels A. s tu d e n t and A .te a c h e r in a single ab stract type B .p e rso n w ith B .s tu d e n t and B .te a c h e r as its subtypes. It also m odels B. e n ro llm e n t as an entity, rath er th a n a relation, and uses integers as grades. 77 first-name string person last-name string student teacher enrollment- student ( teaches enrollment- grade enrollment- course integer string c-name enrollment course Figure 5.2: Target Schem a (B) T ransform ing to O bjects T he transform ation of a source instance to a targ et m ust be specified explicitly, because there m ay be several different ways in which an object in th e source m ay be m apped to the target. In fact, some “obvious” correspondences m ust be specified. For exam ple, we would like to have a B .c o u rse instance for each A .c o u rse instance. To do so, we specify: i- c o u r s e [ ( * B .c o u rse ) a] A .c o u rs e (a ) ; T he relation i- c o u r s e is a bin ary intermediate relation? relating an object of type B .c o u rse (*3 indicates th a t a new object should be created) to objects currently in A .co u rse. A lthough th e types of interm ediate relation argum ents m ust be defined, to simplify the exam ples, we om it th e type specification. For each a in A .c o u rse we create one object of type B .c o u rse as its targ et counterpart. interm ediate relation arguments are in square brackets instead of parentheses. 3The use of * in ILOG* is totally unrelated to the use of * in closure specifications. 78 Intuitively, for each (x , y ) pair in i- c o u r s e , y is the “w itness” for including x in B. c o u rse. T his relationship m ay be used in later correspondences. N ot all source-to-target instance correspondences are one-to-one, however. In transform ing A. stu d en t and A .te a c h e r to B .p erso n , we m ay want to deal w ith the case of a B .p erso n being both an A. stu d e n t and an A .te a c h e r . This is a m erging of several objects from the source (A .stu d e n t and A .te a c h e r objects) into a single target object (A .p erso n ). To do so, we have to determ ine whether an A .stu d e n t object and an A .te a c h e r object actually correspond to a single B .p erso n object. In this exam ple, we assum e that if an A. stu d e n t object and an A .te a c h e r object have the sam e first-nam e and last-nam e (and we can access an A .te a c h e r ’s first-nam e and last-nam es through som e functional m anipulation of the t-nam e relation), they refer to the sam e B .p erso n object. First-nam e and last-nam e values form a witness for B. p er son. T he actual transform ation is done in several steps. First, we have to specify the correspondence of B .p erso n from A .stu d e n t w ith first-nam e (fn ) and last-nam e (in ) as the w itness of creation of B .p erso n objects. i- p e r s o n [ ( * B .p e r so n )] In fn ] : - (A . s tu d e n t(x ) AN D A .la st-n a m e (x In ) A N D A .fir st-n a m eC x f n ) ) ; Here, i-p e r s o n is a ternary interm ediate relation th at relates a target object of type B .p erso n to In and fn . T he target object is created such that there is only one B .p erso n object for any In and fn pair. Thus, if there were two distinct student objects in A w ith the sam e first-nam e and last-nam e, these would be m erged by this correspondence. Conceptually, all (I n , fn ) pairs are enum erated and tested in the source to decide whether to create persons in the target. A .te a c h e r s are transform ed to B . p erson s analogously: i- p e r s o n [ ( * B .p e rso n ) In fn ] :- (A .te a c h e r (x ) A N D A .t-n a m e(x tn ) A N D f ir s t - n a m e - o f ( t n fn ) 79 A N D la s t - n a m e - o f ( t n I n )) ; Here, f ir s t -n a m e -o f and la s t-n a m e -o f are com puted functions4 that return the first-nam e (fn ) and last-nam e (In ) given an A .t-nam e of an A .te a c h e r . W ith the above correspondences defining i-p e r s o n , an A .te a c h e r object and an A. stu d e n t object sharing the sam e first and last nam es w ill correspond to the sam e B .p erso n object in the target. Moreover, even if several A. te a c h e r s have the sam e first and last nam es, only one will be created in the target. Because the target schem a has B .stu d e n t as a subtype o f B .p erso n , we have to populate the subtypes w ith instances of B .p erso n . For this, we have to decide what B .stu d e n t instances are by exam ining the source instance. Since B .p erso n is derived from A. stu d e n t and A. te a c h e r instances, we m ay use a subset of the union of A .stu d e n t and A .te a c h e r instances to populate B .stu d e n t. T his also has to be specified in the correspondence specification. B. stu d e n t can be all B .p erso n objects that correspond to A. stu d e n t (in­ directly via the interm ediate relation i-p e r s o n ). T his can be described by the following correspondence: B .s t u d e n t ( s ) :- (A . s tu d e n t(x ) AND A. la s t-n a m e (x In ) AND A .fir s t-n a m e ( x f n ) ) A N D i- p e r s o n [ s In fn ] ; N ote that B. s tu d e n t’s argum ent is specified in parentheses. T his indicates that B. stu d e n t is a target relation, not an interm ediate relation. T he above correspondence will assert the target object denoted by s to be a B . stu d e n t when there exists an object x € A. stu d e n t and values In and fn , such that x has In and f n as its f i r s t - and la st-n a m e s, respectively, and a tuple in interm ediate relation i- p e r s o n relating s to In and fn . A lternatively, a different sem antics for B . stu d e n t can be assigned, giving it those B .p erso n objects that correspond to A. stu d e n t objects th at are enrolled in 4AP5 permits the use of external functions as relations in the formula if certain conditions are satisfied. 80 some courses (via relation A .e n r o llm e n t). T his can be specified by the following correspondence: B . stu d e n t *( s ) :- ((EXISTS (c g ) I A .e n r o llm e n t(x e g ) ) AN D A .s tu d e n t(x ) AND A .la s t-n a m e (x In ) AN D A .fir s t-n a m e ( x fn)) A N D i- p e r s o n [ s In fn ] ; N ote th a t th e above source-wff could have been specified w ithout th e explicit e x i s t s quantifier since the in terp reter autom atically in terp rets all free variables in th e source-wff as existentially quantified. As a th ird alternative, we m ay w ant only those students corresponding to A. s tu d e n t w hich are enrolled in at least one course, and have a grade of “A” in all th e A .c o u rse s enrolled in: B . s t u d e n t ( s ) ((FORALL (c g) I (NOT A. e n r o llm e n t(x e g ) ) O R A. e n r o llm e n t(x c "A")) A N D (EXISTS (c g) I A. e n r o llm e n t(x c g ) ) AN D A .s tu d e n t(x ) AN D A. la s t-n a m e (x In ) A N D A .fir s t-n a m e ( x fn)) A N D i- p e r s o n [ s In fn ] ; Using interm ediate relations, we can create new objects which correspond to tuples in th e source schem a. For exam ple, we can create a B. e n ro llm e n t object for each A .e n ro llm e n t tuple in the source instance. W hen we do so, we usually need to use several objects from th e source tuple as w itnesses in an interm ediate relation describing the correspondence. T here are several different variations to th e above correspondences because we allow distinct source objects to correspond 81 to a single m erged targ et object. T he w itnesses for targ et object B. e n ro llm e n t m ay be source objects or target objects or a com bination of both. For instance, notice the difference between th e local correspondence: i- e n r o llm e n t [(* B. e n r o llm e n t) c p g] :- (A .e n r o llm e n t(x p grad e) A N D A .s tu d e n t(p ) AND c h a r - t o - in t ( g r a d e g ) ) AN D i-c o u r s e C c x ] ; and i- e n r o llm e n t ’ [(* B . e n r o llm e n t) c s g] : - (A . e n r o llm e n t(x p grad e) AND A .stu d e n t(p ) AN D A .fir s t-n a m e (p fn ) AN D A. la st-n a m e (p In ) AN D c h a r - t o - in t ( g r a d e g ) ) AN D i- c o u r s e [ c x ] , i- p e r s o n [ s In fn ] ; T he first correspondence uses the source A. s tu d e n t object p as th e second position w itness for creating B. e n ro llm e n t objects, w hereas th e second corre­ spondence uses B .p e rso n object s as th e second position w itness. T h e distinction is relevant w hen n (> 1) A .s tu d e n t objects have the sam e first-nam e (fn ) and last-nam e (In ) values. T hen, n B .e n ro llm e n t objects are created in th e first correspondence w hereas only one B .e n ro llm e n t object is created in th e second correspondence. Of course, correspondences expressing m ore intricate sem antics have a m ore com plex form . Also, because we prohibit the use of targ et relations in the body of a local correspondence, correspondences axe som etim es m ore cum bersom e th an if this restriction were lifted. 82 T ransform ing to Tuples Of course, relationships w ithout newly created OIDs can be defined using corre­ spondences. In our exam ple, we m ay w ant to assert the relationship B .te a c h e s as th e inverse of A. ta u g h t-b y . This is accom plished by: B .tea ch esC p c) :- (A .ta u g h t-b y (a x) AN D A .te a c h e r (x ) A N D A .t-n a m e(x tn ) AN D f ir s t - n a m e - o f ( t n fn ) A N D la s t - n a m e - o f ( t n In)) A N D i- c o u r s e [ c a ] , i-p e r s o n [p In fn ] ; Notice th a t to establish th e inverse relationship, B. te a c h e s m ust have ta r­ get objects (th a t correspond to th e source objects via some previously defined correspondence) or values in its relationship. T he interm ediate-w ff lists the in ­ term ediate relations th a t are used to look up the corresponding targ et objects. V ariables t n , f n and In are used to find the correspondence of an A .te a c h e r object denoted by x to a B .p erso n object. T he correspondence of source object a to targ et object c is stored in interm ediate relation i-c o u rs e . Some correspondences for targ et relations only need to look up interm ediate relation(s). For instance, the correspondence below uses values and objects stored in the interm ediate relation i-e n ro llm e n t. B . e n r o llm e n t-c o u r s e (e c ) : - i- e n r o llm e n t [ e c s g] ; 5 .1.2 IL O G ± S y n ta x E ach local correspondence in ILO G * defines either an intermediate relation or a target relation (they are also called defined relations). 83 In th e form er case, the syntax is given by: IR [ (* o b j e c t - t y p e ) te r m (s ) ] : - < so u r c e-w ff> A N D < in te r m e d ia te -w ff> ; or IR [ te r m (s ) 3 < so u rce-w ff> A N D < in te r m e d ia te -w ff> ; In either case, so u rc e -w ff is a q u ery 'to th e source database in a database query language (in the im plem entation, A P5). In te rm e d ia te -w ff is a list of atom s over interm ediate relations (relation nam e and variables a n d /o r values), term s is a sequence of zero or m ore (one or m ore in th e second case) variables a n d /o r values. Each variable occurring in th e rule head m ust also occur (free) in th e body. Target relations are specified by correspondences of th e form: TR( te r m (s ) ) : - < so u r c e-w ff> AN D < in te r m e d ia te -w ff> Note th a t object creation (*) is not p erm itted for targ et relations. O therw ise, th e syntax for targ et relation correspondences is th e sam e as for interm ediate relation correspondences. C ertain global restrictions apply to ILO G ± specifications. All colum ns of all relations are typed, either directly by th e source and targ et schem as or, in the case of interm ediate relations, im plicitly by the correspondences defining them . T he details of th e type inference are described below. Finally, no interm ediate or targ et relation can be defined, either directly or indirectly, in term s of itself; th a t is, recursion is not p erm itted. (T his is explored from a theoretical perspective in [38].) 5.1.3 T y p in g Issu es in IL O G * T he axities and types of source and targ et relations in th e correspondence specifica­ tion m ust correspond to their definitions in the actual source and targ et schemas. Interm ediate relations are specified in th e correspondence specification only. There 84 is no schem a where the interm ediate relations are defined. T hus, we have to derive the arity and types of interm ediate relations in th e correspondence specification. Each interm ediate-w ff m ust be a list of valid interm ediate relations, i.e. the interm ediate relations m ust be defined in a local correspondence and have the correct arities and types. Each source-wff m ust be a valid query to th e source database; i.e., each relation in the atom ic form ulas m ust co rresp o n d to a source schem a relation w ith respect to arity and type. (For instance, a variable used to denote ab stract type T cannot be used in slot position of an o th er relation of ab stract type U.) Interm ediate relations are specified either w ith a typed * and a list of typed variables, or sim ply a list of typed variables. T he * expression is typed to a target type. Variables corresponding to values are typed to a basic value type (such as integer or string). Variables corresponding to source objects are typed to a source type. T he arities of interm ediate relations are derived from th e num ber of variables (including the typed *) in the interm ediate relation specifications. If there exist m ultiple local correspondences for a single interm ediate relation (i.e. w ith th e sam e interm ediate relation nam e on the left h and side), th e current im plem entation requires all the types to be specified for all local correspondences. Two local correspondences w ith sam e interm ediate relation R on th e left h and side m ust have the sam e arity and types in b o th correspondences. If there is a * then the type defined in all occurrences of the interm ediate relations m ust correspond and th e types of objects to be asserted into th e slot m ust correspond to th e type declared w ith th e *. Two local correspondences w ith the sam e relation nam e but different arity or types is considered an error in the current im plem entation (since relation nam es uniquely identify relations). O ne could provide syntactic sugar for interm ediate relation type and arity specifications (sim ilar to relation specification of a world schem a) as p a rt of the correspondence specification, while still allowing untyped local correspondence specifications for those interm ediate relations. 85 5.1.4 Form al S em an tics This subsection describes the form al sem antics of ILO G *. Section 5.2 describes operational sem antics used in the prototype. T he operational sem antics can be understood w ithout reference to th e form al sem antics. T he form al sem antics associated w ith an ILO G * specification are based on the least fixpoint sem antics of logic program m ing. T he com plete details for this se­ m antics are presented in [38]; a brief description is included here for completeness. T here are two differences between ILO G * specification and conventional D atalog and logic program s: 1. the use of the relational calculus in so u rc e -w ffs and 2. the use of * for OID creation. R egarding th e first difference, we view each relation defined by a so u rc e -w ff as a source relation. T he creation of OIDs is accom plished in th e form al m odel by using certain Skolem functions; this approach was first intim ated in [51] and refined in [42]. Briefly, suppose th a t a correspondence R[(* o b j e c t - t y p e ) , xl,..,xm] < s o u r c e -w ff> AN D IR1, . . . ,IRn ; is included in th e ILOG* specification. In the form al sem antics we introduce a new m -ary function sym bol /# , and rew rite this correspondence as R [ / r ( x 1 , . . ,xm) , x l ,..,x m ] :- < so u rc e-w ff >, I R l ,..,I R n . Intuitively, for each m -tuple < c l,..,c m > which satisfies < source-w ff> A IR 1A ... AIRn, a new OID, denoted / r ( c 1 ) . . ,cm) will be “created” (m ore technically, will be included in R). M ore generally, suppose th a t all object creation correspondences are rew ritten in the above m anner. T hen the rew ritten version of th e ILO G * specification, along w ith th e d a ta from the source schem a, can be viewed as a logic program . This logic program has a unique m inim al model. We now obtain the intended m eaning of th e original ILO G ± specification applied to the source instance by replacing, in th e targ et relations, each non-atom ic term (i.e. term w ith at least one function 86 sym bol) by a distinct OID. T hus th e sem antics of an ILO G ^ specification V is given by th e m inim al m odel of the logic program associated w ith V . In this sense, th e sem antics associated w ith ILO G ^ specifications is fundam entally declarative. As detailed in [38], this sem antics can be com puted using th e least fixpoint ( “b o tto m -u p ” ) construction. Furtherm ore, the fram ew ork can be extended to include recursion and various kinds of negation (e.g. stratified) of interm ediate relations in rule bodies (local correspondences). T he use of least fixpoint sem antics shows a close relationship betw een th e approach here and th a t of IQL; subtle differences are discussed in [38]. 5 .1 .5 Invalid IL O G * S p ecifica tio n s Given an ILO G * specification V and source instance X, th e u ltim ate result of sem antics defined above is a weak instance over th e targ et schem a. T his weak instance m ay violate some set inclusion im plied by the targ et schem a, or it m ay violate a cardinality constraint. In th a t case we say th a t V is invalid on T, and view V on 1 as undefined. M ore generally, an ILO G * specification V from S to T is invalid if it is invalid on some instance T of S. Because the source-wffs in ILO G * specifications have the full power of the relational calculus, it is undecidable w hether a given ILO G 1 * specification V is invalid. (This can be dem onstrated using variations on the techniques of [25].) T his question is decidable in the case w here source-wffs are restricted to conjunctions of atom s over the source schema; this and related issues are explored in [38]. 5.2 T ransform ations in W orld B ase T he transform ation process supported by W orldBase is exhibited in Figure 1.6. As illustrated there, th e transform ation process requires a series of steps. W orldBase supports transform ation in the active store (workspace). It assum es a separate targ et schem a th a t is defined explicitly. Using the term inology of C hapter 2, the targ et instance is materialized by the transform ation subsystem . O ur language and pro to ty p e are based on the creative approach. New objects and relationships 87 are created into an instance of the target. However, a subset of th e language (w ithout the object creation sem antics) can be used to achieve derivative target worlds. In this thesis, we do not address th e issue of m aintaining the targ et, b u t we believe reasonably efficient u p d ate propagation to th e targ et is possible. T he transform ation process requires a source schem a, (e.g. Jo e’s entertainm ent schem a of figure 1.2) source world instance, (e.g. joe-entertainment-world) a target schem a (e.g. P a u l’s recommendation schem a of figure 1.3) and a transform ation specification in ILO G * th a t transform s source instance(s) to targ et instance(s). F irst, th e source schem a is loaded into an em pty workspace (to ensure no inform a­ tion exists in th e workspace th a t could interfere w ith th e transform ation). Then the source instance(s) to be transform ed are loaded into th e sam e workspace. Next, th e targ et schem a is loaded (into th e sam e workspace, b u t distinct and disjoint from th e source schem a in the workspace) to provide th e schem a for th e target in­ stance. T he user th en initiates the transform ation operation and provides it w ith a transform ation specification. T he ILO G * com piler evaluates the specification and creates new objects and tuples as instances of the targ et schem a to correspond to those of the source. If the transform ation process finishes w ithout aborting and no consistency constraints are violated, it is successful. W orldBase supports transform ation and m aterialization of the targ et in two different ways. T he first is to transform all the source instance(s) to correspond to th e targ et, th en populate target w orld(s) w ith the resulting instance. T his m ethod is called bulk transformation. T he second is to allow th e user to specify th e seeds of th e targ et world (in term s of ILO G ± specifications w ith th e sam e head b u t a m ore restricted source-wff specified) and filter the transform ation process to allow selection and m aterialization of only those related to the seeds in th e target. We call this m ethod transformation with, closure, or closure-transform. 5.2.1 B u lk T ran sform ation B ulk-transform refers to the operation th a t transform s all instances of a source schem a into instances of th e targ et schema. W orldBase does not require a target world d atab ase to hold the resulting instance, since th e resulting targ et instance is asserted to the workspace. A new targ et world (or several target worlds) m ay 88 be created to contain the newly created target database in the workspace (e.g. joe-rec-1 in C hapter 1). Saving th e targ et world(s) causes th e w orld’s population to persist. O p eration al Sem antics T he operational sem antics for bulk transform ation is essentially a tw o-phase al­ gorithm in which all targ et schem a objects are created and identified, followed by targ et relationship establishm ent. It is based on the construction of the least fixpoint using a bottom -up approach. T his is particularly sim ple for ILO G *, since recursion is n o t perm itted. We first sort the local correspondences so th a t if an interm ediate relation IR occurs in the body of a local correspondence L C , then all local correspondences w ith IR in th e head occur before L C . F urtherm ore, all local correspondences for a given interm ediate or targ et relation are clustered in this listing. T he operational sem antics is defined by com puting th e value of each defined relation in th e order given by th e listing. For defined relations which do not involve object creation, each correspondence can be interpreted as follows: if th e body (i.e. s o u rc e -w ff and conjunction of all th e atom s over interm ediate relations) is tru e in the source d atab ase extended w ith the interm ediate relations, for some instantiation, then th e tuple in the head should be tru e (if not already true) of its in stan tiated variables in the targ et database extended w ith the interm ediate relations. M ore formally, R(xl,..,xm) :- < so u rce-w ff> A N D IR1, ...,IRn ; or R[xl,..,xm] < so u rce-w ff> A N D IR1, ...,IRn ; is tra n slated into th e following relational calculus-like query: R = { ( x l ,. . ,xm) | 3 (free vars) (so u rc e -w ff A IR1 A . . . A IRn)} (T yping restrictions are included here im plicitly.) M ore th a n one local correspondence can be used to specify a targ et type or relation. M ultiple correspondences for th e same relation or type (w ithout object creation) effectively produce the union of th e results. 89 We now tu rn to local correspondences which create OIDs. OIDs for a given targ et schem a type T are created by correspondences of th e form R[ (* T ) , x l , . . ,xm ] < so u rce-w ff> A N D IR 1, . ..,IR n ; It is perm issible for m ore th a n one interm ediate relation to be used in the head for specifying T . Suppose th a t C i,..,C m are the clusters of correspondences for th e different interm ediate relations creating T . Conceptually, th e operational se­ m antics will create sets of OIDs for each cluster in tu rn . Suppose now th a t a given cluster creates OIDs for T using interm ediate relation R. For each correspondence of the form given above, the query { ( x l ,. . ,xm) | 3 (free v a rs)(so u rc e -w ff A IR1 A . . . A IRn)} is applied to the source and (previously defined) interm ediate relations to obtain a set of m -tuples (of values a n d /o r OIDs). Let S denote th e union of all of these sets. Intuitively, for each tuple ( a l , . . , am) of S, a unique OID, o, of T is created, and th e tuple ( o ,a l , . . , am) is included in R. N ote th a t th e first colum n of R is a key, and th a t th e second through last colum ns of R also form a key. In the actual im plem entation, th e local correspondences are considered singly, b u t the key dependencies on R are nevertheless m aintained. B ulk T ransform A lgorith m T he algorithm accepts as input source and targ et world specifications, a cor­ respondence specification, th e source instance to be transform ed, and the nam e of th e target instance to be created. It assum es th a t the source and targ et schem as are defined in A P5 virtual m em ory database. T he algorithm preprocesses ILO G * specifications by sorting them as outlined above (see section 5.2.1). T he sorted set of correspondences form s th e set of rules w ith which to transform one database of one schem a into another database of another schema. For each local correspondence in th e sorted list, th e algorithm generates a query as specified in the sem antics in section 5.2.1, which is evaluated to generate a list of tuples to be asserted into relations at the head of the correspondences. T he im plem entation is able to use th e relational-like query shown as th e sem antics 90 IN PU T: source-world-specification, source-w orld-instance(s), transform ation-specification (assum e unsorted), target-w orld-specification, target-w orld-instance (optional) M ETH O D : (a) Sort transform ation-specification based on dependence on interm ediate re­ lationships as specified in the operational sem antics; (b) Loop for each local-correspondence in the sorted transform ation-specification do: (b l) set result-tuples to th e results of evaluating th e query from local- correspondence to source; (b2) loop for each tuple in result-tuples do: if targ et relation is an interm e­ diate relation w ith object creation th en look-up-or-create object for the given tuple; otherw ise assert tuple to relation; (c) Check for conflicts in resulting target database; If target-w orld-instance is specified, populate th e world w ith th e target d atab ase using all objects and tuples in the targ et as its seeds; Figure 5.3: Bulk Transform A lgorithm to generate the tuples because the source instance is in th e sam e workspace as the interm ediate relations i.e. they are in th e sam e global database. (In a non­ virtual m em ory im plem entation w here the source instance is a separate database, generating th e tuples can be done in two steps. F irst, th e source-wff is used to generate the in stantiations of free variables. This is achieved by creating a query: { f r e e - v a r s | so u rc e -w ff} where f r e e - v a r s refer to free variables in so u rc e -w ff. T hen, th e in stan tiated variables from th e source d atab ase are used to generate instances from th e interm ediate relations. T he resulting instantiations of th e variables are used to form th e defined relation tuple.) If th e correspondence is for an interm ediate relation, the algorithm will check if the interm ediate relation already exists. If none is found, an interm ediate relation w ith th e specified nam e, arity and types is created. T he com piler then asserts tuples into defined relations w ithout *. In a correspondence th a t entails object creation, the interm ediate relation is first looked up w ith th e in stan tiated tuple (w ithout the first coordinate) for th e object at the first coordinate. If no tuple exists (hence, no such object exists), a new object is created and the tuple is 91 asserted to the interm ediate relation w ith the new object at the first coordinate. This ensures th a t th e in stan tiated tuple (w ithout th e first coordinate) is a key to the object at the first coordinate even if m ultiple correspondences are specified for th e sam e interm ediate relation. This is term ed “exist-or-create” semantics.. Because of th e ordering, target objects are already created before they are used in th e assertion of any local correspondence th a t requires them . 5.2.2 T ran sform ation w ith C losu re C losure-transform refers to the operation th a t transform s only relevant subset of instances of one source schem a into instances of th e targ et schem a. T he sem an­ tics for closure-transform is sim ilar to th a t of bulk-transform in effect, b u t very different in evaluation. In bulk-transform everything in the source instance is transform ed into the target. T he goal of closure-transform is to transform only the necessary tuples and objects from the source into the target given an initial set of seeds in th e targ et (th a t are also transform ed from th e source), so th e resulting world is indeed consistent w ith its closure specification.5 T he initial set of seeds is derived from seed correspondences, also specified in term s of ILO G *, using the same interm ediate and targ et relations on th e head of th e seed correspondence specification. T he seeds created w ith th e seed correspon­ dence m ust be a subset of those created using bulk transform on th e relations. T he correspondence specification for th e transform ation m ust be valid w ith th e seeds, i.e. the targ et seeds are derivable from th e local correspondences and th e source instance. C losure-transform requires a closure-based targ et world to contain th e targ et instance. T he transform ation process uses th e closure specification of th e targ et world and its initial set of seeds to decide which relationships are to be de­ rived from the source. C urrent im plem entation of the closure-transform operation assum es th a t th e targ et instance is em pty w hen th e operation is invoked. No ob­ ject equivalence sem antics are checked, except for those im plicit in th e sem antics of interm ediate relations and OID creation. 5Not necessarily the case with bulk-transform. Consistency must be checked separately. 92 O peration al Sem antics C losure-transform m im ics the generic traversal algorithm of Figure 4.2 described in C h ap ter 4 to traverse the closure from the seeds based on its specification. W hile traversing, it creates or asserts those tuples and objects related to the seeds based on correspondence specifications (by a process sim ilar to unification). T he process of deciding if a target tuple exists relating a given targ et object to other objects is recursive; it m ust check for th e existence of those objects in th e target as well as th e validity of the tuple based on the correspondence specification and the source instance. For exam ple, given an object o of type T (assum e o is a seed in th e targ et), and a relation R under T ’s closure such th a t R(@ !), th e closure transform algorithm m ust first find or create all x such th a t R(o, x) holds in th e targ et. To find all x such th a t R(o, x) holds, we have to look to th e local correspondence for R. A ssum ing the local correspondence for R has th e following form: R(a,b) < so u rce-w ff> and IRl,..,IRn; T hus, given local correspondence for R, and a targ et object o in the first position of R restricting the variable a, th e algorithm m ust find all x su b stitu tab le for the variable b th a t satisfies the local correspondence, and assert th e tuple(s) to be tru e of R. If R has no interm ediate relations, the local correspondence retu rn s the set of values th a t correspond to b based on the query on the source-wff and restricting a to equal o (an object or value substitution for a). O therw ise, this process finds objects substitutable for b th a t satisfies the local correspondence in several steps: first, it finds substitutions for variables in interm ediate relations th a t relate a to o ther variables; next, it uses the substitutions for variables to find o th er substi­ tutions for variables in < so u rc e -w ff>; finally, it uses these substitutions to find substitutions for the rem aining variables in other interm ediate relations. T he goal of th e above steps is to find th e m ost restrictive sub stitu tio n s for b th a t satisfies R such th a t the local correspondence is valid on the source instance. We call this process instantiation or trying to instantiate target variables. T he m ain problem in closure transform is to find instances corresponding to targ et variables given some (initially, one) known objects or values substitutable 93 for some of the target variables. This can be stated as follows: given local correspondence6 for R, R(xl,..,xm ) < so u rce-w ff> A N D IR1, . . . ,IRn ; some of th e targ et variables in R (e.g. { x i ,..,x j } ) can be su b stitu ted w ith known objects or values (initially from th e seeds). T he goal of th e in stan tiatio n process is to find all objects or value substitutions for th e rem aining targ et vari­ ables xk , . . ,x l such th a t < so u rce-w ff> AND IR 1, . . , IRn is tru e in th e source and interm ediate database. T his goal m ay be achieved in several steps (generalization of th e steps m en­ tioned above). F irst, th e algorithm tries to in stan tiate targ et variables in the individual IR is in th e correspondence given the su b stitu tio n of ( x i , . . , xj} for known objects. N ote th a t this is a recursive call. T his process should generate m ore known substitutions for variables in the correspondence. A fter all th e IR is w ith th e known variables are in stan tiated , th e < so u rce-w f f > is in stan tiated , su b stitu tin g known objects for variables. This m ay be speci­ fied as follows. Given th a t th e known variables are {xi, . .,x j) corresponding to {oi, . . ,oj}, and th e in stan tiated IRs are IRx . . . IRy, th e correspondence is tran slated into th e following relational calculus-like query: R = {(xl, . . ,xm) | 3(free vars) (so u rc e -w ff AIRxA...AIRyAxi=oiA.. .Axj=oj)} (T yping restrictions are included here im plicitly.) Note th a t only those interm e­ diate relations already in stan tiated are used in the query. T his should find m ore known substitutions for variables in the correspondence. N ext, we in stan tiate th e rem aining IR is given th e substitutions of th eir vari­ ables w ith in stan tiated (known) objects. Again this is a recursive call. T his process should retu rn all substitutions for variables in th e targ et (i.e. on th e head of the correspondence) th a t was not provided w ith known objects. 6In this example, we only show how to instantiate variables of one local correspondence for a given relation; however, all local correspondences for the same relation are evaluated in the actual implementation. 94 Finally, we present th e conditions for which the recursive process returns. Sup­ pose the correspondence below (for target relation or interm ediate relation w ith­ out *) is in stan tiated such th a t all substitutions for targ et variables x l , . . ,xm are known. R[xl,..,xm] < s o u r c e -w ff> A N D IR 1, ...,IRn ; Tuples consisting of objects su b stitu ted for x l ,. . ,xm axe asserted to be tru e of R. If the correspondence is for an interm ediate relation w ith *, R [*,xl, . . ,xm] < so u rce-w ff> AN D IR1, ...,IRn ; th e recursion process returns if an object is su b stitu tab le for th e *, or if all sub­ stitu tio n s for targ et variables x l , . . ,xm are known. In th e first case, there m ust be a (single) valid tuple of R, which are substitutions for x l , . . ,xm, for which AN D IR1, . ., IRn, su b stitu ted w ith th e tuple values, is valid in th e source and interm ediate database. In the second case, exist-or-create sem antics (also used in bulk transform ) are used to find (or create) an object in stan tiatio n for *. Below, we describe the algorithm s used in th e prototype; all th e algorithm s are presented in pseudo-code. ? C losure-T ransform A lgorith m Figure 5.4 shows th e m ain body of th e closure transform algorithm . It is sim ilar to th e traversal algorithm described in chapter 4, b u t find-object-closure and find-tup!e-closure of figures 4.3 and 4.4 are changed to recursively transform th e required d a ta (by trying to find susbtitutions for o ther target variables given a known set of objects) before finding the closure. T he closure transform algorithm first sorts and evaluates th e seed correspon­ dences to get a set of seeds for th e targ et world. Interm ediate relations th a t are needed are created at this point. Next, it traverses the set of seeds and calls transform-object’s-closure for all seed objects and transform-tuple’s-closure for all seed tuples. In this figure and the figures th a t describe other pseudo-codes, words in different fonts refer to functions (in pseudo-code) defined in this thesis. Figure 5.5 describes th e function to transform an o b ject’s closure. T he function is sim ilar to find-object-closure w ith th e exception of steps b l, b2, and b3, in which 95 IN PU T: source-world-schem a, sour ce-w orld-instance(s ), transform ation-specification (assum e unsorted), target-seed-specification, target-w orld-schem a, target-w orld-instance M ETH O D : (a) Sort target-seed-specification ; (b) Loop for each seed-local-correspondence in target-seed-specification do (bl) set result-tuples to the results of evaluating th e query from local- correspondence to source; (b2) loop for tuple in result-tuples do if correspondence is for interm ediate relation w ith object creation then look-up-or-create object for tuple and collect object found or created in seed-objects; otherw ise assert tuple to relation and collect tuple in seed-tuples; (c) A ssert all seeds to be in target-w orld-instance ; (d) Loop for object in seed-objects do transform-object's-closure (target-w orld-instance, object); (e) Loop for tuple in seed-tuples do transform-tuple’s-closure (target-w orld-instance, nil, tuple); F igure 5.4: Closure Transform Top Level Function 96 IN PU T: W orld, object M ETH O D : (a) If object is already visited, then retu rn ; otherw ise m ark the object as visited and p u t it in world (as dependent if not a seed) ; (b) F ind closure specification for object ; Loop for relp atte rn in the closure specification of object do (bl) F in d all local correspondences for relation in relp attern in th e transfor­ m ation specification; (b2) Sort it in order of dependence; (b3) Loop for each correspondence found do generate-target-tuples (correspondence, relp attern , object); (b4) find all tuples w ith object as specified in relp atte rn (generate query to targ et, save the results); (b5) loop for tu p in all tuples found in step b4 do transform-tuple’s-closure (W orld, relp attern , tu p ); Figure 5.5: Transform O b ject’s Closure Function it tries to transform relations to generate targ et tuples before generating the tuples in th e o b ject’s closure. Figure 5.6 describes th e function to transform a tu p le ’s closure. T he function is sim ilar to find-tuple-closure, except for the fact th a t find-object-closure calls are replaced by transform-object’s-closure calls. T he algorithm s in Figures 5.7 and 5.8 are th e m ain algorithm s th a t try to find valid substitutions for variables in local correspondences. They m aterialize and assert relevant targ et tuples to th e workspace, while transform ing supporting interm ediate and targ et inform ation. T he two algorithm s are m utually recursive; Figure 5.7 describes th e conditions for which recursion retu rn s, and Figure 5.8 de­ scribes th e m ain in stan tiatio n process. Com m ents in the pseudo-codes are enclosed in square brackets. Intuitively, the algorithm in Figure 5.7 tries to generate all th e targ et tuples in a given correspondence for a list of known instantiations (given by object-list and position-list). If an object at the * is found, th e algorithm returns. O therw ise, if all th e variables in the relation are found (i.e. objects are su b stitu ted for all 97 IN PU T: W orld, relp attern , tuple M ETH O D : (a) If tuple is already visited, th en return; otherw ise, m ark visited; (b) If no relp atte rn is provided w ith th e function call, the closure specification for the tu p le is looked up; (c) Ite rate over each annotation of relpattern: @ = transform-object’s-closure of object at th e @ position of tuple; ! = transform-object’s-closure of object at th e ! position of tuple; $ = retu rn t if object at $ position of tuple is a value or is already in the closure; nil otherwise; = p u t object at * position in the closure; m ark it visited (do not find its closure); (d) If tuple is a seed, then all objects m ust be in the closure otherw ise it is an error (inconsistent specifications); (e) If all th e objects are in the closure, then the tuple is p u t in the closure ; Figure 5.6: Transform T uple’s Closure Function IN PU T: correspondence, object-list, position-list M ETH O D : (a) If object is found in th e * position of a relation th en check validity of object a t th e * position and re tu rn object if valid; (b) otherw ise if all positions of a relation is given then assert tu p le into relation else instantiate-target-vars (correspondence object-list position-list) Figure 5.7: G enerate Target Tuples 98 variables in th e relation), th e tuple is asserted as an instance of th e targ et (or interm ediate) schem a. If not all th e variables are found, instantiate-target-vars of figure 5.8 are called w ith the known objects substitutions. Figure 5.8 tries to in stan tiate all the variables in a correspondence necessary for th e interm ediate or targ et relation relation being defined. T his algorithm is always called w ith a specific object su b stitu tio n or list of object substitutions (given by list-of-objects and list-of-slots) and a local correspondence. Its goal is to find all th e objects th a t axe su b stitu tab le for th e u n in stan tiated variables from th e source and interm ediate database dom ain (those which have no corresponding object substitutions). F irst, th e algorithm tries to in stan tiate interm ediate-w ffs w ith th e given vari­ able substitutions (the initial list of objects). Next, m ore substitutions are found from the source database (from th e so u rc e -w ff). Finally, th e rem aining in te rm e d ia te - w f f s are in stan tiated , and the relation being defined (interm ediate or targ et) is asserted w ith the resulting tuple(s). 5.2.3 S u m m ary W orldBase provides two different operations for d atab ase transform ation, bulk- transform and closure-transform. T his parallels the support provided for world databases, namely, schema-based and closure-based population. T he two tra n s­ form ation operations are useful under different circum stances. Bulk-transform is useful when a large subset of the source instance is to be transform ed. Closure- transform is useful when only a sm all subset of th e source instance is to be tra n s­ form ed, and the user can define the set of seeds w ith which he w ants th e targ et d atab ase populated. N either operation checks for key constraints of existing inform ation in the targ et since b o th operations assum e the targ et instance to be em pty w hen they were invoked. To extend the operation to check for keys, th e sem antics of th e * in interm ediate relations (i.e. th e sem antics of OID creation) m ust be changed to include querying th e workspace database (sim ilar to exist-or-create sem antics) w ith key relationships for the objects. 99 IN PU T: correspondence, list-of-objects, list-of-slots M ETH O D : (a) Divide interm ediate relations in correspondence into two: interm ediate- relation-w ith-object for interm ediate relations whose variables m ay be sub­ stitu ted w ith objects from list-of-objects and interm ediate-relation-w ithout- object for interm ediate relations whose variables are all unknow n (not sub­ stitu tab le w ith objects from list-of-objects); (b) [ in stan tiate interm ediate-w ff w ith direct objects ] Loop for interm ediate-relation in interm ediate-relation-w ith-object do (bl) find all correspondences w ith the interm ediate relation on th e head of the correspondence (correspondences for th e sam e relation); (b2) loop for new -corr in correspondences found in b l do do a su b stitu tio n on variables in new-corr for object(s); generate-target-tuples (new-corr, new -object-list, new -object-position); (c) [ in stan tiate interm ediate wff w ith objects derived from source ] If there are m ore vars to be in stan tiated then (cl) generate query to find instances of the variables in source-wff substi­ tu tin g variables w ith known values for objects; (c2) loop for interm ediate-relation in interm ediate-relation-w ithout-object do repeat steps b l and b2 on interm ediate-relation; (d) [ find tuple and assert ] generate query to find instances of the variables in th e correspondence (source-wff and interm ediate relations) su b stitu tin g variables w ith known values for objects; Loop for tuple found from the query do assert tuple; Figure 5.8: In stan tiate Target V ariables 100 C h ap ter 6 O n M ergin g D a ta b a ses In this chapter, we first exam ine th e general problem of merging object-based databases (PO B s) and the atten d an t problem of determ ining when two objects are equivalent. N ext, we exam ine the problem in term s of m erging worlds. We present our tw o-phase solution to the problem , in which an ‘object identification’ phase precedes a ‘constraint resolution’ phase. Families of keys are used to identify objects, after which various m echanism s are used to m erge th e rem aining d a ta and resolve m erging conflicts. For exam ple, preference specifications are used to resolve constraints violated in th e targ et-b y preferring facts from one of th e sources over th e others. Finally, we present our im plem entation of m erging in W orldBase and describe a preference m echanism th a t also provide hooks for com putation to resolve conflicts. 6.1 A sp e c ts o f M ergin g D a ta b a ses In dealing w ith m erging databases, there are three m ajor aspects to consider: the schem a, th e instance, and th e constraints. T he following subsections discuss the issues of schem a, instance and constraint m erging, respectively. 6.1.1 S ch em as and M ergin g An object-based d atab ase (POB) can be viewed as a triple (S',-, C,-, I{), w here S{ refers to th e stru ctu re or schema of th e database, C, refers to the set of constraints of th e database, J, refers to the instance of th e stru ctu re Si th a t is constrained by Ci, i.e., th e database itself. We view m erging (denoted by ©) of two PO B s as a 101 m apping onto a third PO B : (S i,C i,I\) © (£ 2 , £ 2 5/ 2 } = ► ( S 3 , C 3 , Is)- We separate th e schem as from th e constraints because we w ant to allow th e targ et PO B to have th e sam e stru ctu re b u t different constraints. To m erge two PO B s, one first has to decide if the two PO B s are structurally compatible. If they are, they m ay be m erged w ithout any transform ation of the PO B s. T he notion of schema compatibility varies depending on th e m odel and m erging m ethodology supported. In a relational m odel, schem a com patibility m ay be stated in term s of correspondence of relation nam es and colum n nam es of two schem as. In a graph-based or sem antic m odel, schem a com patibility m ay be stated in term s of subgraph properties. For exam ple, we could define Si and S 2 to be com patible if b o th are subschem as of some schem a S3. In trad itio n al database m erging, if the constraints on th e schem as to be merged are not equal, there is a conflict and the databases m ay not be m erged. W hen they are equal, th e m erged database typically has th e sam e constraints, i.e. if C\ = C2, th en C3 = Ci = C2 or th e m erging process will be aborted if th e m erged d ata violates C3. An alternative pursued here is to relax th e im pact of constraints on m erging PO B s, i.e. to allow C3 ^ Ci or C2. If some constraint in C3 is violated, m echanism s m ust be provided to resolve it in the m erged PO B . 6.1 .2 D eterm in in g O b ject E q u ivalen ce As m entioned above, we take a tw o-phase approach to m erging PO B s - phase 1 focuses on determ ining object equivalence, and phase 2 focuses on m erging the rem aining data. T here are several ways to determ ine th e equivalence of objects from different PO B s. T he m ost com m on m ethod is through keys. Keys are single valued properties of objects (associated w ith the object type) th a t uniquely identify an object. An exam ple of a key for objects of type person would be social-security- number, or (perhaps) th e pair ( last-name, first-name ). Keys m ight m ap to other objects rath e r th a n values (or printables). This exam ple dem onstrates th a t an object m ay have m ore th a n one valid key associated w ith it. We call th e set of keys for a type a key family. Each elem ent of th e key fam ily is either a single property or a set of properties which uniquely identifies objects of the type. Key families are useful w hen m erging overlapping 102 object-bases, where only a subset of inform ation about an object overlaps. If the intersection of key families for a p articular type is non-em pty, th e objects m ay be merged. A nother m ethod for m erging different object-bases is to use a look-up table. T he object-bases to be m erged m ay be disjoint, b u t there is a look-up table th a t determ ines the equivalence of the objects in the two object-bases. For instance, we m ay have an object-base of M ichelin tire types, and a sim ilar one for G oodyear tire types. A look-up table to indicate equivalence of the various types of tires from th e two object-bases could be provided to determ ine equivalence. In some cases, object equivalence is not determ ined by an exact m atch of keys. For th e purpose of determ ining if a car could be fitted w ith equivalent sized tires, th e diam eter of M ichelin and G oodyear tires th a t differ by 1 cm m ight be considered equivalent. In m erging two objects w ith such fuzzy keys, one also has to decide w hat th e value of the m erged key is. W hen dealing w ith m erging overlapping object-bases, one som etim es has to deal w ith relationship values th a t m ay be undefined. This is especially tru e if we m erge using key families. Since we use only a subset of th e key fam ily to determ ine equivalence, an object from one object-base which has no corresponding m atch m ay not have the rem aining keys in th e key fam ily defined. O ne way of dealing w ith this is to change the key constraints of the m erged object-base to be th a t of th e subset. A nother way is to allow optional keys, i.e. key relationships th a t m ay be undefined (m issing). O ptional keys allow for a m ore flexible m erging, b u t one m ay not be able to tell w hether two objects are equivalent if none of their defined key relationships overlap. A different way of identifying objects is to sim ply use global object identifiers as th e key, i.e. two objects are distinct if they have no keys associated w ith their type. W ith respect to their relationships, th e objects are indistinguishable except by th eir object identifiers (OID s). However, these objects m ay be identified when certain relationships are defined for them , e.g. if they are classified into certain subclasses. An exam ple is an object-base for chips and circuit boards. Each class of chip contains an indistinguishable set of chip objects of the sam e type (w ith class attrib u tes). However, chips which are used have a b o ard and p o s itio n relationships defined. Used chips m ay be identified by their b o ard and p o s itio n 103 properties. This problem is sim ilar to optional keys, b u t here some of the objects have no defined key. A nother tool for determ ining equivalence of objects is negative keys. Intuitively, a negative key helps in deciding if two objects are not equivalent. Negative keys m ay be stated and im plem ented as constraints. For instance: “No employee has m ore th a n one office” or “No person has m ore th an one city for a prim ary address.” In th e m ultiple keys m odel, key relations th a t axe defined b u t not used in the m erging process can act as negative keys. Negative keys m ay also be used for deciding non-equivalence of objects in object-bases w ith incom plete inform ation. If there are no overlapping keys th a t m ay be used to decide if two objects are equivalent, b u t they have negative keys th a t are specified and are different, we m ay safely assum e th a t th e two objects are not equivalent. P erhaps the m ost complex approach for determ ining equivalence is based on subgraph isomorphism. Here, two objects are equivalent if there is an isom orphism betw een selected subgraphs of the respective object-bases which m aps th e first object to the second. An exam ple would be th e query: “a husband and wife team w here the husband is the chairm an of th e com pany, and the wife is the vice chairm an whose father is in the board of directors of th e sam e com pany.” T he problem , in this case, is finding a feasible im plem entation to decide subgraph equivalence, since the general problem is N P-com plete. T he m echanism for PO B m erging described in this thesis incorporates key fam ilies as the m ethod for object identification. 6.1.3 M ergin g th e R em a in in g D a ta Once all the equivalent objects are identified, relationships and non-key attrib u tes m ay be m erged. Recall th a t schem as m ust be com patible. We assum e the source PO B s do not violate their own sets of constraints. If th e targ et non-key relation­ ships are unconstrained, th e union of th e relations is taken as th e final result. If th e targ et relationship is constrained, the resulting m erged instance m ay violate th e targ et constraints. A lthough th e user m ay include arb itrary constraints in C 3 , if C 3 is not some com bination of constraints in C\ and C2 , a m ethod m ust be provided to deal 104 w ith converting instances I\ and I 2 to be consistent w ith C3 . For exam ple, if C\ includes the constraint: relation has-phone is single-valued, and C 2 includes the sam e constraint, th en C3 can either constrain has-phone to be single-valued and provide m ethods to deal w ith cases w hen the relationship violates this constraint (such as when two phones exist for a single person, one from each source), or allow has-phone to be m ulti-valued. For each C\ and C 2 to be^ m erged (for com patible schem a), there exists a C 3 , th e m ost n atu ral constraints autom atically satisfied by th e m erged PO B . We denote this constraint set m n m (C i, C 2 ), for most natural merge. T he m ost n atu ral m erge is th e m ost restrictive fam ily of constraints such th a t no m ethod need to be provided to deal w ith any constraint violation w hen form ing the m erge of instances satisfying C\ and C 2 . T here m ay be cases w here th e instances from source PO B s m atch exactly or are disjoint (properties of the d ata), and thereby produce a m erged PO B th a t satisfies a m ore stringent set of constraints th a n th e m n m . T he targ et PO B m ay always be constrained by a m ore relaxed set of constraints (denoted C 3 > m n m (C i, C 2 )). For the target to have a m ore restricted set of c o n strain ts-th e norm al case-m echanism s m ust be provided to deal w ith constraint violations. T here is a spectrum of m echanism s possible, ranging from sim ple preferences, to filtering m echanism s, to allowing user-defined com putations. A preference mechanism allows the user to indicate which source PO B he prefers to take certain relationships from when constraint violations arise. As well as allowing for a m ore restrictive set of constraints for the targ et, the prefer­ ence m echanism can be used to indicate general preference. For instance, even if b o th the sources and the targ et constraints agree th a t a relation is m ulti-valued, one m ay still state a preference th a t only relationships from a preferred source be present in th e target. For exam ple, the relationship has-phone m ay be m ulti-valued in the source PO B s and the target PO B s, b u t the user m ay prefer has-phone values from / 1 , if he knows th a t J2 is an older version of the object-base, and I\ contains the newer inform ation. A filtering mechanism removes relationships and objects th a t violate con­ strain ts so th a t th e resulting object-base conforms to a set of constraints. For exam ple, a filter could be used to rem ove all objects in th e m erged object-base which have 2 phone num bers. T he result is a valid instance th a t correctly models 105 its schem a and constraints. T here are two kinds of filters: one th a t acts on the m erged PO B and has access to w here the instances come from; th e other sim ply acts on a single PO B , and removes only those th a t violate certain conditions w ith­ out caring w here they come from . T he second filtering m echanism can also be used to pre-process source PO B s to conform to a m ore restricted set of constraints. T he th ird m echanism allows th e user to specify computations or func­ tions to deal w ith constraint violations. This m ay be specified in term s of (condition:action) pairs. T here are two kinds of com putations. T he first involves com putation of a targ et relationship based on th e conflicting source relationships. A n exam ple is to allocate salary of an employee to be th e sum of all salaries asso­ ciated w ith an employee. T he second type of com putation involves com puting a new relationship based on some other relationships being m erged. A n exam ple is the com putation of average-salary, a weighted average th a t depends on th e source PO B s the relationships are from. 6.2 M ergin g in W orld B ase - in T h eory T his section describes the approach taken by W orldBase to support object-base m erging. F irst, we use the W orldBase d a ta m odel described in Section 3.1 for the object-bases and define the notion of keys and key families in this m odel. Next, we introduce the notion of schem a com patibility and object-base m ergeability in our fram ew ork, and outline our algorithm for m erging object-bases. Finally, we describe our preference m echanism , one of the techniques we provide for dealing w ith constraints. T he next section describes the W orldBase m erging m echanism from th e user’s point of view and discusses a preference m echanism (w ith hooks for user-defined com putations) we have im plem ented in th e prototype. 6.2.1 K ey s and C on strain ts For th e purpose of m erging, we restrict ourselves to three kinds of constraints: key constraints, (restricted) cardinality constraints and disjointness constraints. In this thesis, we support keys only on ab stract types. O ther form s of object identification are deferred to future research. 106 D efinition: For an ab stract type T , a key dependency on T is an expression k = T : (Ri .. .Rn) where each Ri relates T to some other type. (To simplify the notation, we assum e th a t the first coordinate of each Ri has type T ). A database instance I satisfies k if (a) for each o € I(T ), there is exactly one x such th a t (o, x) 6 I (R i ), which we denote as R i(o); (b) for each p air ol 5 o2 € I(T ), (-R i(oi),. . . , R n(oi)) = (R i(o2), . . . , R n(o 2 )} => G\ — o2 A ssertions about keys cannot be deduced or discovered from m ore basic principles; they are decided when th e schem a creator designs th e type after deliberation about their d a ta and constraints. Let K be th e set of keys defined by th e user on a schem a S. If T is an ab stract type in 5 , the key family of T , denoted K (T ), is th e set of keys in K for T. Let C be the set of cardinality constraints and disjointness constraints for S. C ardinality constraints in C, denoted C card, are of th e form: R[p, q] or R ~ x\p, q] w here R refers to th e binary relation nam e being constrained, p E {0,1}, q € {1, a>}, w here u > denotes the first infinite ordinal. (For simplicity, we will deal w ith only cardinality constraints for binary relations in this thesis; th e extension to m ultiple-slot relations should be straightforw ard, b u t messy.) T he cardinality constraint is interpreted as follows: the num ber of tuples selected w ith a fixed dom ain elem ent is at least p and at m ost q. For instance, last-name[l, 1] states th a t every person m ust have exactly one last-nam e; last-name- 1 [0 , 1 ] states th a t every last-nam e is assigned to at m ost one person; takes[l,a>] every student takes at least one course; takes-1 [0,u;] th a t a course is taken by zero or m ore students (unrestricted). D isjointness constraints in C , denoted C d,SJ, are specified as d is jo in t( T i,..,T n) where each T,-’s are subtypes of the sam e ab stract type. A constrained schema is a three-tuple (S , K , C ), w here S is a W D M schem a, K is th e set of key constraints defined on S t and C is a set of cardinality and disjointness constraints for S. (S , K ) refers to a key-constrained schem a. Using the n o tatio n introduced in C hapter 3 (since closure is not relevant to our discussions here), a world is referred to as a four-tuple (5, K ,C , I) w here I refers to the 107 d atab ase (or world) instance. This notation is used th roughout th e rem ainder of this chapter. 6.2.2 S ch em a C o m p a tib ility and E q u ivalen ce In this subsection, we describe when a pair of schemas are com patible using the notion of subschema defined below. O nly w hen two schem as are com patible can th e instances be m erged in our autom atic m erging m echanism . We also define desirable key properties of th e m erged target. D efin ition : A schem a £ is a subschema of S' if there exists an em bedding of S into S ' where (a) each type A in S m aps onto type A in £'; (b) each subtype B in S m aps onto subtype B in S '; (c) if B ISA C is in S , th e n 1 B ISA*C is in S ' ; (d) all relations R[T%, ..,Tn] in S m ap onto i?[Tl 5 ..,T n] in S'. Notice this allows S' to contain ex tra types. T he closure in (c) guarantees th a t they do not “get in th e way.” M ore relaxed notions of subschem a could also be explored. D efin ition : Two schem as, Si and S 2 are compatible if th ere exists a valid W DM schem a S 3 such th a t S 1 is a subschem a of S 3 and £ 2 is a subschem a of S 3 . W hen form ing the m erge ( S i,K i,C i,I i) © {£2 , K 2, C2, h ) into schem a £ 3 , a com bination K 3 of th e keys in K \ U I\ 2 is used. K 3 m ust satisfy certain properties described below. T he first property concerns th e relationship betw een K \ , K 2 and K 3 . D efin ition : Let (S i,K i), (S 2 , K 2), and {£3 , A 3 ) be key-constrained schemas w here £ 1 and £ 2 are subschem as of £ 3 . K 3 is implied by K \ and K 2 if (a) If T is an ab stract type of S\ and S 2 then K z{T) C K \{T ) fi K 2 (T). (b) If T is an ab stract type of £ 1 b u t not S 2 then K z(T ) C K i(T ). denotes closure. 108 (c) If T is an ab stract type of S 2 b u t not Sy th en K 3 (T) C K 2 (T). T he next property of K 3 will m ake it possible to determ ine equivalence of objects from th e two source worlds in an efficient m anner. To define this, we need some prelim inary notions. An ab stract type T directly depends on ab stract type or value type T ' in K if there is some R in a key on T relating T to T ' or one of its subtypes. K is acyclic if there is no cycle of direct dependence of types in K . K is stub-free if for all types T , T ', if T directly depends on T", th en T ' is a value type or K (T ') ^ 0 (indicating a dependence on another (non-value) type which eventually depends on a value type). Given (S', K ), K is a world equivalence specification if k is acyclic and stub-free. W hen we m erge worlds, we insist th a t the set of keys for th e targ et is a world equivalence specification. D e fin itio n : K 3 is a valid world equivalence specification for (Sy,Ky) and {S2 , K 2) if I< 3 is a world equivalence specification and is im plied by K i and K 2. N ote th a t K 3 m ight be em pty even though K y (T ) H K 2( T ) 7^ 0. In this case, no objects in I\ and I 2 of type T will be equivalent, and a disjoint union of these objects will be form ed in th e m erged world. A m erged d atab ase (S 3 , I \3, C3 , 1 3 ) is form ed from two source worlds { S u K u C u h ) and (S 2 , K 2 ,C 2 yI 2 ) in two phases. In th e first phase, a m erge of objects and key relationships from the two instances are com puted and its non-key relationships added to produce a weak instance, call it I 3 - weak• T he second phase uses various m echanism s provided to discrim inate th e sources of the relationships and to resolve conflicts in I3- weak to produce a valid I 3 . Section 6.2.3 focuses on th e first phase, and Section 6.2.4 describes th e second phase. 6.2.3 C o m p u tin g th e W eak M erge o f T w o O b ject-B a ses We define the notion of object equivalence in term s of key families. D e fin itio n : Two values are equivalent if they are equal in th e prim itive equality given for values, i.e. Vi = v 2 = > - Vi = v 2 . D e fin itio n : Let S be a W DM schem a, and k a key for T. Two objects x from ly and y from I 2 are equivalent (relative to k ), denoted x =k y , if x and y are objects of type T , and k (x ) is equivalent to k(y). 109 A weak merge is the result of the m erge of com patible worlds w ithout consider­ ing the non-key targ et constraints. T he weak m erge exists only if, for each T, for each x € -fi(T), y € F2 (T ), if x =k y for some k in / ^ ( T ) , then x =k> y for each k' in K z(T). A weak m erge may violate other constraints in C3. Two object-bases are strongly mergeable if there exists a weak m erge and it does not violate constraints in C3 . IN PU T: S 3 , K 3 , initial h-weak (equals to Ij), / 2 M ETH O D : (a) C om pute a topological sort of ab stract types in S 3 based on th e direct de­ pendence relationship of keys in K 3 - types th a t directly depend on value types come first. (b) For each ab stract type T (according to th e sorted order) do: Let k 6 K 3 (T). For each object y in J 2 of type T, if there is an x =k y in Is— weak-, and if x V for some k' € Jr 3 (T ), then abort [In this case, I\ and / 2 do not have a weak m erge relative to / 1 3 ]; O therwise, set y = x. If no equivalent object is found in l 3 - weak, then let y be a newly created object and insert y into I 3 -weak- Also, insert all key relationships for y. [The ordering of the keys ensures th a t all objects being depended on are already m erged, before being used as the key for deciding the equivalence of o ther objects.] (c) P opulate all ab stract type nodes T w ithout keys in K 3 by creating a copy y in 1 3 -weak of each corresponding object y in / 2. (d) If y € L 2(T) for some subtype T , then add y to T in l 3 - weak (and to each supertype of T in h-weak)• (e) For each relation R not in K 3 and each tuple (yi,--,yn) in h iR ) , add ( y i , » , y n ) t o I 3 -w e a k (R ') - Figure 6.1: A lgorithm to com pute th e W eak M erge In our approach, we first create a copy of I\ as initial l 3 ~W eaki th en J 2 is “added” on to it using th e object equivalence sem antics. T he algorithm to m erge a world instance (J2) onto an existing one (h-weak) is given as a single long transaction described below. N ot all types in a W DM schem a need to have their keys defined. O bjects th a t do not have keys defined for th eir type are considered distinct and separate from other objects of th e sam e type and are not m erged w ith any o th er objects. We 110 do not support th e notion of indistinguishable sets of objects w ith keyed subtypes m entioned in Section 6 . 1 .2 . We also do not support fuzzy keys, optional keys, or negative keys in this thesis. T he algorithm to com pute the weak m erge of I x and J 2 is shown in Figure 6.1. • \ 6.2.4 E n forcin g th e C o n strain ts A constrained merge (e.g. I 3 ) is the result of a m erge of com patible worlds th a t satisfies th e ta rg e t’s constraints (O 3 ) relative to th e m echanism s specified (prefer­ ence, filtering, and com putations). If a target (non-key) relation is unconstrained, the constrained m erge is form ed from the union of th e source relations. If the targ et relation is constrained, the resulting m erged relation m ay violate it. In this subsection, we study the relationship of cardinality and disjointness constraints betw een the targ et and th e sources, and outline a preference m echanism to deal w ith the constraints while m erging object-bases. C ardinality C on strain ts Table 6 . 1 describes some of the ways th a t cardinality constraints in C\ and Ci on a given binary relation can be com bined to yield a m eaningful constraint in C 3 . The values in th e boxes refer to constraints in C 3 . T he th ird value in each box in Table 6 . 1 is th e m ost n atu ral cardinality constraint im plicitly and autom atically satisfied by sim ply asserting all source relations into th e m erged object-base. T he set of these resulting constraints form s the m ost n atu ral m erge (m n m ). For instance, th e box for [1 ,1 ] in I x and [1 ,1 ] in I 2 results in [l,w] and not [1 ,1 ] because the relation values for a m erged object from different object-bases m ay differ. T h e preference mechanism allows a user to specify a slightly m ore constrained targ et (see table 6.1). S tating preference of a preferred source im plies th a t when objects from different sources are m erged, the values from th e preferred source are taken as the resulting values of the target. R elationships from th e other source are rem oved after the m erge, unless the relationship from the preferred source is undefined. For instance, suppose relation has-phone is constrained [0,1] from I\ and [0 ,a> ] from J 2, and we constrain the relation to “[0 ,u > ] prefer / 1” in the target. If o is a m erged person object whose has-phone value is “397-1441” in I x, then 111 R(T)[?,?] i 2 h [0 ,1 ] [1 ,1 ] [0 ,o;] M [0 ,1 ] [0 ,1 ] prefer I\ [0 ,1 ] prefer J 2 [0 ,o > ] [0 ,1 ] prefer I\ [0 ,1 ] prefer I 2 M [ 0 ,o,] prefer I\ [ 0 ,o,] prefer I 2 [°,w] [0 ,o,] prefer I\ [0 ,o,] prefer I 2 [0 ,0 ,] [1 ,1 ] [0 ,1 ] prefer I\ [0 ,1 ] prefer I 2 [0 ,u > ] [1 .1 ] prefer Ii [1 .1 ] prefer I 2 M [ 0 ,o,] prefer I\ [0,o,] prefer J 2 [ 0 ,0 ,] [l,o,] prefer I\ [l,u;] prefer / 2 M [0 ,o,] [0 ,o,] prefer I\ [0 ,o,] prefer I 2 [0,o,j [0 ,o»] prefer I\ [0 ,o,] prefer I 2 [0,wj [ 0 ,o > ] prefer I\ [0 ,o,] prefer I 2 [ 0 ,o;] [0 ,o,] prefer I\ [1 ,0 ,] prefer I 2 [0 ,0 ,] M ' [0 ,o,] prefer Ii [0 ,o?] prefer I 2 [0 ,w] [l,o,] prefer I\ [l,o,] prefer J 2 M [0 ,o,] prefer I\ [ 0 ,o;] prefer I 2 [ 0 ,0 ,] [1 ,0 ,] prefer /j [l,o,] prefer I 2 M Table 6.1: M erging of C ardinality C onstraints the targ et has the sam e value and nothing else, even if o has a phone value of “822-1511” from / 2. All has-phone values from I 2 are rem oved from th e target. However, if p is a m erged person object which has no has-phone value in I\, then all th e has-phone tuples from I 2 are p u t into the target. A preference m ay be stated for each relation for possible constraint violation or to indicate choice of source(s) of relationships for the new targ et world. A set of preference statem ents m ay be provided w ith th e m erging operation. T he preference m echanism decides which relationship(s) to remove from th e m erged world to resolve a constraint violation or discrim inate source choice. T here is no interaction of cardinality constraints for th e different directions of a given relation in the m ost n atu ral m erge case (e.g. if R relates T to T' and f?[l, o > ] and R ~ l [0,1] in Si, i?[l, 1 ], and i 2- 1 [l,u;] in S 2, then th e n atu ral m erge will include th e constraints i?[l,o,], and i?_ 1 [0 ,o;]). However, if different preferences axe used for th e sam e relation, th e preference m echanism m ay result in different o u tp u t 1 3 , depending on th e order th e preferences are evaluated. For this reason, we do not allow a user to have m ore th an one preferred source per relation. A shortcom ing of th e preference m echanism for cardinality constraints is th at it cannot be m ore restricted th a n the source constraints. Also, it cannot ensure to tality and onto-ness of the relations. T his can be overcome by providing a 112 filtering m echanism th a t supports com putation to restrict th e constraints of the source databases being merged. D isjoin t ness C on strain ts T he notion of m ost n atu ral m erge for disjointness constraints is m ore com plicated because of interactions of subtype relationships betw een (possibly overlapping) source schem as. For instance, if d is jo ix it(T i,T 2 ) is in C^,SJ, b u t ' C%% 9 * is em pty, two cases m ay apply: • T\ and T2 are not subtypes in S 2, in which case, the m ost n atu ral m erge is d i s j o i n t ( T 1, T2), or • either Ti or T2 or b o th axe in S 2, in which case, th e m ost n atu ral m erge is em pty (no disjointness constraint). T he crudest form of m ost n atu ral m erge is to have no disjointness constraints. Var­ ious refinem ents of th e sem antics of m ost n atu ral m erge for disjointness constraints m ay be possible, depending on th e d a ta m odel used and the sem antics of subtype relationships. We define the m ost n atu ral m erge of disjointness constraints for our m odel below. For simplicity, we only consider disjointness constraint specifi­ cation of two sub-types here. T he extension to m ultiple disjoint subtypes should be straightforw ard. Given a disjointness constraint specification D in th e form of d is jo in t( T i,T j) , we denote D* as the set of closure of disjointness constraints th a t could be derived from its subtypes. For instance, a disjointness constraint m ay be specified for a p air of disjoint types Ti and T2. This implies th a t all subtypes of T\ are disjoint from all subtypes of T2. Let D be th e set of disjointness constraints for an abstract type (the root type) in the database. We denote D* to be th e closure of disjointness constraints for th a t ab stract type in the database. T he m ost n atu ral m erge of D\ and D 2 is • J9j U if none of th e subtypes in is in S 2 , and none of th e subtypes in Z?2 is in Si; (i.e. no intersection of subtypes) • Dy if all subtypes in D \ are not in S 2 , b u t some of th e subtypes in D % are in Si; 113 • if all subtypes in D \ are not in S\, b u t some of th e subtypes in D \ are in S 2; • D \ fl otherw ise, i.e. the intersection of all disjointness constraints (in­ cluding im plicit disjointness constraints) of th e ab stract type from th e two databases. T he user m ay specify the disjointness constraints for th e targ et th a t is m ore restrictive th a n th e m ost n atu ral merge. In cases w here the targ et disjointness constraints are not violated in th e individual sources, th e sam e preference m ech­ anism m ay be used to rem ove those subtypes th a t violate th e targ et disjointness constraints. S tating a preferred source for a disjointness constraint during m erging causes th e preferred source’s types to be preferred w hen th e disjointness constraint is violated. T he m echanism also propagates preferences to removal of relationships con­ cerning th e rem oved objects. For exam ple, the user specifies th a t F and G are disjoint in the targ et, R is a relationship on G not on F , and a is an elem ent of b o th F and G, thus violating th e disjointness constraint. If we prefer F to be tru e of a , then we have to remove the relationship l i o n a when we remove a from G. T he preference m echanism m ust check the disjointness constraints before the cardinality constraints and th e relationships to be rem oved (via propagation) inherit th e preference (preferred source) of the disjointness constraints. 6.3 M ergin g in W orld B ase - in P ra ctice T his section describes the prototype im plem entation for m erging databases. M erg­ ing in W orldBase occurs only in the workspace. T he user m ust load th e equivalence and constraint specifications of the source worlds to be m erged as well as their schem a(s). As described earlier, worlds loaded in the workspace is m aintained as views in th e workspace. R estoring (or loading) a world affects th e workspace database by adding m ore population (w ith possible m erging) to the workspace database. U pdating or saving a world causes the w orld’s population to be con­ sistent w ith those in the workspace, i.e. the population of the world is rederived 114 from th e workspace (extracted by the world closure specification and its seeds when closure-based, or assigned all the instances of its schem a w hen schem a-based). We first describe loading and possible m erging of schem as and other specifi­ cations. T hen we describe m erging of instances. Finally, we describe a prim itive preference m echanism provided to deal w ith conflicts. 6.3.1 M ergin g S p ecification s T he workspace contains a database w ith a w orkspace schem a, a workspace in­ stance, workspace equivalence specifications and th e workspace constraint speci­ fications. Loading a schem a into the workspace augm ents the workspace schema. Likewise, loading an instance into th e workspace adds m ore objects and tuples to th e w orkspace instance. Loading a world equivalence specification into the w orkspace changes the workspace equivalence specification; and loading a world constraint specification into the workspace changes th e constraint specification of th e workspace. T he user m ay also change the w orkspace (its schem a, instance, equivalence specifications and constraint specifications) directly, since all the m eta­ d a ta inform ation axe present as objects and relations in th e w orkspace and m ay be m anipulated using the m anipulation language provided. M ergin g Schem as As m entioned in C hapter 1, W orldBase supports restoring (loading) a schem a stru ctu re into th e workspace w ith autom atic renam ing. T he user m ay provide a list of local (in schem a) and corresponding workspace nam es while restoring a w orld schem a. Types and relations of a schem a m ay be loaded w ith th eir original nam es or they m ay be renam ed in the workspace. A relation can be accessed either by its local nam es by providing the schem a it is from , or it m ay be accessed by its w orkspace nam e. W orldBase supports sim ple schem a m erging using nam e equivalence. Two types or relations from different schem as refer to the sam e type or relation in th e workspace if they are restored to the workspace w ith the sam e nam e. Some stru ctu ral equivalences are checked to detect incom patibility in the p a rts being m erged. Two relations are m erge-com patible if 115 • they have th e sam e type constraints in their a ttrib u te list, or • they have th e sam e derivation form ula, if derived. Two types are m erge-com patible if • they are b o th ab stract types w ith th e sam e nam e, or • they are b o th subtypes w ith the sam e root super type, and th eir m erging does not cause cycles in the resulting type lattice. S tructures w ith the sam e nam e which are m erge-com patible axe m erged in the workspace. O therw ise, an error is detected and th e schem a loading process is aborted. M erging E quivalences W orldBase supports m erging of equivalence specifications in th e workspace. Initial loading of equivalence specifications for types w ithout any equivalence specified are assigned to be th e equivalence specification of th e workspace. Each additional loading of another w orld’s equivalence specification for th e sam e type changes the w orkspace equivalence specification to th e m ost n atu ral m erge of th e world and w orkspace equivalences (i.e. the intersection of key fam ilies). T he equivalence specifications of the workspace are used in loading (and m erging) new instances, since the workspace is the targ et database. T he user m ay assert or change the equivalence specifications of the w orkspace directly, thus associating a different equivalence specification on the targ et to be used in th e merge. W hen a m erged world is first saved, the workspace equivalence specification is used. W hen such a world is resaved, since the w orkspace equivalence speci­ fications are used in identifying objects during loading, th e w orld’s equivalence specifications m ay be outdated. T he user m ay save th e w orkspace equivalence specifications (restricted to those for the schem a of th e w orld) as the w orld’s new equivalence specification. M erging C on strain ts W orldBase also supports m erging of constraints in th e workspace. Initial loading of constraints for specific types and relations are assigned to be th e constraints 116 of th e workspace types or relations. Each additional loading of another w orld’s constraints for the sam e type or relation changes the workspace constraints to the m ost n atu ral m erge of th e world and workspace constraints. A gain, w hen a m erged world is resaved, since only th e w orkspace constraint specifications are used in th e workspace, th e w orld’s constraints m ay be outdated. T he user m ay save the workspace constraint specifications (restricted to those for th e schem a of th e world) as th e w orld’s new constraint specification. 6.3.2 M ergin g In sta n ces M erging of worlds is achieved through restoring th eir instances into th e sam e workspace. T he algorithm in Figure 6.1 is used. If no keys are provided in the workspace for types in th e world being loaded, the restore sim ply recreates all objects in th e world in th e workspace and asserts th e tuples (disjoint instances). O therw ise, it uses exist-or-create sem antics to find an equivalent object in the workspace or to recreate a new object if none is found. O nce all th e objects to be restored are found or created, the tuples are restored. Once the instance is restored, the user m ay save it in a w orld-either an existing world, or a newly created one. However, since workspace equivalence specifications were used in the m erge, the resulting database in the workspace m ay not conform to the source w orlds’ equivalence specifications. T hus, if th e resulting database is saved into the source w orld(s), its equivalence specification m ust be saved from th e workspace as well. 6.3.3 E n forcin g C o n strain ts A set of preference specifications is specified w ith th e restore operation to deal w ith some of the constraint violations using th e preference m echanism described above. T he preference m echanism allows the user to specify a list of cardinality or disjointness constraints and their corresponding preferences. Hooks are provided to specify com putations. T he preference m echanism (including com putation hooks) form a prim itive m echanism for dealing w ith conflicts in th e W orldBase prototype. A filtering m echanism is not provided; however, A P 5’s constraint m echanism m ay 117 be used to interactively remove objects and tuples th a t violate constraints in the workspace. P referen ce S p ecification S yn tax A preference specification syntax (w ith com putation hooks) is given in appendix B. Briefly, th ere are three classes of specifications dealt w ith in th e prototype: car­ dinality preference specifications, disjointness preference specifications and com­ p u tatio n specifications. A cardinality preference consists of a preferred database, which is encoded as num eric value of either 1 or 2 , a relation nam e, th e slot of the relation being constrained, and its lower and u pper bound ( 0 or 1 , and 1 or - 1 (to indicate a>) respectively). A preferred database of 1 refers to th e workspace database, and 2 refers to th e world d atab ase being loaded. A disjointness prefer­ ence specification consists of a preferred database and a list of disjoint subtypes. A com putation specification consists of a relation nam e, its slot, and a function nam e. P referen ce Sp ecification Sem antics T he preference specifications are sorted and tran slated into a program which re­ moves tuples from the workspace originating from the non-preferred database, or calls a given com putation on d a ta th a t violates th e constraints. For each disjoint­ ness constraint specification, the program sim ply loops to find any object th a t violates th e constraint and removes it from the non-preferred d atab ase and the workspace d atab ase (if it is not also from th e preferred database). T his m ethod requires th a t th e preferred database not violate th e constraints. Not all disjoint­ ness constraints could be im plem ented using this m echanism , since each source d atab ase m ay already violate th e disjointness constraint being specified. In th e case of cardinality constraints, the resulting program loops through each object a t th e given slot and checks if the object is related to one or m ore other objects th a t could violate th e constraints. D epending on the preferred database, th e offending tuple is rem oved from th e non-preferred one as well as th e workspace d atab ase (if it is not also from the preferred database). N ote th a t this m ethod 118 also requires th a t the preferred database not violate th e constraints. An exam ple of a cardinality constraint preference is given in A ppendix D. In th e case of com putation, the resulting program invokes the nam ed function on the object at th e slot, a list of objects it is related to (by the relation), th e relation itself, and databases corresponding to 1 and 2, respectively. An exam ple of a com putation is presented in A ppendix D. T he preference m echanism im plem ented w ith its com putation hook is prim i­ tive. M ore sophisticated support tools (such as filtering m echanism s) are need to deal w ith constraints. 6.4 S u m m ary T his chapter outlines an approach to m erging databases and provides a specifi- cational approach to resolve some of the conflicts th a t m ay arise. W hile not all conflicts are resolvable autom atically, the W orldBase environm ent provides the support needed to experim ent w ith various constraints. A lthough th e constraints studied in this thesis are very restricted, it is believed th a t W orldBase provides a good basis to study other, m ore general constraints and th e effects of m erging them . 119 C h ap ter 7 E x p erim en ta l P r o to ty p e This chapter briefly describes the W orldBase prototype. A detailed description of the internal d a ta structures and functions is provided in A ppendix C. D etailed tran scrip ts of the prototype in operation are given in A ppendix D. T he gram m ars for specifying th e various p arts of th e system are provided in A ppendix B. We first introduce th e basic features of th e prototype and provide an overview of th e functionalities supported. T he next section describes th e different m odules of W orldBase. 7.1 F eatu res o f th e P r o to ty p e An ideal p rototype of W orldBase would require the use of a netw ork of w orksta­ tions, w ith each w orkstation supporting the W orldBase environm ent. However, since netw ork interaction is not the focus of this thesis, only one w orkstation is used in th e prototype, and its interaction w ith persistent m em ory is through a centralized file system . T he prototype is w ritten in Allegro Com m on Lisp 1 using AP5[15, 14, 16] and Popart[72]. It was developed at th e Inform ation Sciences Insti­ tu te on an H P 9000 series 300 w orkstation and tested on sm all personal databases. T he num ber of lines of code w ritten, while considerably large, is not representative of th e capabilities of the system since it relies on A P5 and P o p art for a num ber of its tasks. T he focus is on functionality rath er th a n efficiency, so portions of the code are suboptim al. 1 Allegro CL is a trademark of Franz Inc. 120 T he prototype provides the following features: 1 . a sim ple v irtu al database im plem enting th e w orldbase d a ta m odel (using AP5); 2 . world d atab ase support layer; 3. sim ple world registry; 4. workspace m anagem ent; 5. transform ation support; 6 . selection support; 7. m erging support. C om pared to th e ideal im plem entation, th e prototype has some lim itations: 1 . C oncurrency control is not im plem ented. Currently, whoever saves th e world last has th e m ost up to d ate version. T he file system decides th e order of saves w hen two users concurrently save a world. 2 . Access control is not supported. 3. R egistry support is rudim entary. Consistency of registry across th e netw ork is not supported, although hooks are provided to broadcast changes to other registries. 4. T he persistent storage stru ctu re of world databases, schem as and specifica­ tions is sim plistic using the existing file system . 5. Transform ation support is not autom atic, although th e necessary function­ alities are all provided. T he user m ust call th e appropriate functions to achieve th e desired transform ed world database. H igher level operations will be needed in a realistic environm ent. 6 . T he m erge operation is not autom atic, although th e necessary functionalities are all provided. T he user m ust explicitly load th e ap propriate worlds (pos­ sibly w ith preferences) to achieve th e desired m erged workspace database. Again, Higher level operations will be needed in a realistic environm ent. 121 7. A restricted set of constraints is supported. M ore general constraints m ust be supported in a realistic environm ent. 8 . No filtering m echanism is im plem ented. T he A P5 constraint checker will trigger on any constraint violations, b u t the repair is left to th e user. T he prototype does m inim al error checking and leaves m ost errors to be tra p p e d by A P5 at runtim e. T he current prototype does no static or sem antic type checking. T ype violations will cause th e A P5 queries to fail or result in error(s) at runtim e. T he AP5 transaction m echanism is used for each database operation so we can recover from errors and re tu rn to a consistent startin g state when th e transaction aborts from errors (and the effects of th e ab o rted transaction are undone). T he W orldBase prototype assumes: • T he nam e and owner of a world schem a or a world instance uniquely identify th e world schem a or world instance. • P athnam es of persistent worlds and their properties are derivable from the nam e and owner of a world schem a or world instance, and they do not change w ith tim e. • R elationships in equivalence specifications (binary) are always in the forw ard direction (w ith the object being identified in the first position of the relation). • Existence of a netw ork th a t provides support for accessing files from other w orkstations in the network. 7.2 W orld B ase M o d u les W orldBase m ay be divided into several m odules: th e registry, which contains in­ form ation on worlds and world schem a specifications; the world environm ent m an­ ager, which m anages worlds and th eir specifications, and the workspace database; and the selection, transform ation and m erging support which have been described in C hapters 4, 5 and 6 , respectively. Each subsection below describes a m odule’s functionality, our design choices and m ajor operations supported by the m odule. 122 7.2.1 R e g istr y T he registry contains a directory of worlds and world schem as in th e system . Specifically, it stores the nam e, owner and p ath n am e (location) properties. It also stores th e relationships between world databases and world schem as. It is a centralized d a ta dictionary replicated at each w orkstation in a netw ork and u p d ated via broadcast. We chose to im plem ent th e registry in th e sam e way worlds and world schemas are im plem ented. A registry m ay have a persistent and a v irtu al state. A user accesses and queries the virtu al registry and m ay save or restore th e registry to and from the persistent store. T he pathnam es of worlds and world schem as are assum ed to be derivable from th e nam e and owner of th e world or world schem a and do not change through tim e. W orld closure specifications, equivalence specifications, and constraint specifications are not stored in th e registry. They are autom atically assum ed to be under the ow ner’s directory, under subdirectories specifically for the closure, equivalence and constraint specifications. T he w orld’s nam e is used to identify the location of these specifications (saved in files). The registry is saved into persistent store periodically via th e checkpoint opera­ tion. Update-registry u pdates the workspace registry from the persistent store (this is an additive u p d a te -th e original virtu al registry is not rem oved). Cl ear-registry is used to em pty the v irtual registry of all inform ation. T he registry supports operations to register or unregister a world schem a and a world database. It also supports user queries of inform ation in th e registry and retu rn s values (or a set of values) as th e result of the query. T he values m ay be a nam e, owner, or pathnam e, etc, of a world schem a or a world database, or any com bination thereof. T he AP5 query language is used to query the v irtual registry. M echanism s to keep a registry up to date w ith other registries in th e netw ork are not provided in this prototype. 123 7.2.2 W orld E n viron m en t M an ager T he world environm ent support layer supports creation and m anipulation of worlds and th eir specifications and m aintains the workspace database. T he various m a­ nipulations include changing the states of world d atab ase com ponents through operations such as loading, m odifying or saving. We chose to im plem ent a world and its schem a as (m eta) objects in the workspace. Each world is represented as an object in th e workspace; its rela­ tionship to its population is also m aintained in (m eta) relations in th e workspace. Each w orld schem a is also represented as an object in th e workspace; its rela­ tionship to th e relations and types (also m odeled as objects in th e workspace) th a t m akes up its stru ctu re is m aintained in (m eta) relations in th e workspace. T he o th er specifications are not im plem ented as objects, b u t ra th e r as properties (value-based) of the world. A w orld’s population and its specifications are saved into files (as tex t), the form er, using an encoding schem e involving PID s, th e cur­ rent im plem entation assum es th a t th e pathnam es of a w orld’s specifications are derivable from th e nam e and owner, and no support is provided to change them . In the prototype, a closure-based world is populated using th e traversal algo­ rith m on th e world closure specification and its set of seeds. A schem a-based world is populated by generating a closure specification for the world and populating it w ith all the tuples of th e w orld’s schem a as its seeds. Below, we describe th e m ajor operations available for each world database com ponent. W o rld S c h e m a T he user m ay create and register a world schem a by supplying a schem a specifica­ tion, and its nam e and owner (m ay default to th e logged in user). O perations to load or unload a world schem a creates or deletes a workspace object corresponding to th e v irtual world schema. Restore or remove world schem a recreates (subject to sharing) the schem a stru ctu re in th e w orkspace or removes th e stru ctu res from th e workspace. A w orld’s schem a stru ctu re m ay be renam ed while being restored to the workspace. A utom atic tran slatio n of local nam es (of iV 124 some schem a) in queries to queries to th e workspace by su b stitu tin g workspace nam es for local nam es is provided. T he user m ay also use th e buffer nam es directly. P rim itive operations to add or rem ove a relation to and from an existing world schem a are provided (this is a restricted form of schem a m odification). However, rem oving a relation m ay cause inconsistencies (not checked by th e prototype) be­ cause other persistent worlds m ay be dependent on th e relation, or other relations in th e schem a are dependent on it (only subtypes are rem oved w ith a type). T he operation save-worldschema saves the declarations of a world schem a from th e w orkspace to persistent store. O perations to clear the world schem a of instances are provided; however, shared relations and types are not cleared unless a reference count indicates th a t th e relations are not shared. Also, the dom ain m ay not be cleared if there are worlds depending on the schem a in th e workspace. These restrictions m ay be overridden. An operation to kill a world schem a is also provided as long as th e world schem a specification is not referred to or used by any world. Since the world schem a properties are in the v irtual store, the A P5 query language is used to query the various properties of a world schema. W orld C losure O perations to restore or save a w orld’s closure specification are provided. Since th e closure specification has a virtual state, A P 5’s d a ta m anipulation language and query language are used to m anipulate and query th e closure specification of a world database. W orld E quivalence Specification O perations to restore or save a w orld’s equivalence specification are provided. R estoring a world equivalence specification affects th e equivalence specification of th e workspace, causing th e workspace equivalence specification to be th e m ost n a tu ra l m erge of the world and workspace equivalences (i.e. the intersection). A w orld’s equivalence specification is derived from th e environm ent a t save tim e, i.e. a relevant subset of th e equivalence specification of the environm ent is used 125 by the world. O perations are provided to add or remove a fam ily of equivalence specifications from a world equivalence specification. Since th e world equivalence specification has a virtu al state, A P 5’s d a ta m a­ n ipulation language and query language are used to m anipulate and query the equivalence specification of a world database. W orld C on strain t S p ecification O perations to restore or save a w orld’s constraint specification are provided. R estoring a w orld constraint specification affects th e constraints of the workspace, causing the workspace constraints to be the m ost n a tu ra l m erge of th e world and w orkspace constraints. We only deal w ith cardinality and disjointness constraints in this prototype. A w orld’s constraint specification is derived from th e workspace at save tim e; i.e. a relevant subset of the constraint specification of th e workspace is used by th e world. Since a world constraint specification has a v irtual state, A P 5’s d a ta m anipula­ tion language and query language are used to m anipulate and query th e constraint specification of a world database. W orld In stan ce T he user m ay create and register a world by supplying a schem a specification (its nam e and owner), and an optional world closure specification. If a world closure specification is specified, the world is closure-based. O therw ise, it is assum ed to be schem a-based. O perations to load or unload a world creates or deletes a workspace object corresponding to th e virtu al world. Restore-instance restores an im age of the population of a world. O bjects whose types have equivalence specifications specified (in the workspace) are identified by th eir relationships as specified by th e equivalence specification; otherw ise, new objects are created to correspond to th e persistent identifiers. Preferences and com putations m ay be specified. T he specifications are transform ed into a program th a t retracts some of th e facts restored to conform to certain constraints, or calls a function to perform some user specified task. Only three types of specifications are allowed. C ardinality constraint specifications restrict th e cardinality of binary 126 relations; preferences m ay be specified to remove certain tuples th a t violate the constraints. D isjointness constraint specifications behave in a sim ilar way; they m ay be specified to remove certain relationships th a t m ay violate the constraint. A com putation specification is transform ed into a call to a functions th a t accepts 4 argum ents: object, list of related objects, workspace database, and th e world being loaded. A p aram eter to the restore-instance operation decides if th e rules in th e workspace are tu rn ed on after th e restore to m aintain th e constraints in th e workspace. O perations to assert an object or tuple into a world are provided. A n operation to clear the world of its population is also provided; th e population is rem oved from th e world, b u t is not deleted from the workspace. Save-instance saves an im age of the w orld’s population into persistent store. In the case of a schem a-based world, the seeds of th e world are specified to be th e set of tuples th a t form an instance of the w orld’s schema. In th e prototype, all worlds are populated using the traversal algorithm . Remove-instance is not provided because th e w orld’s population m ay be shared by o ther worlds. Instead, the user m ay clear the dom ain of th e w orld’s schema. O perations to unload and kill a world, or kill all world instances of a world schem a, are provided. A world m ay be unloaded (i.e. rem oved from v irtu al store) if it has no population. A world m ay be killed (and unregistered) if there are no other worlds dependent on it; it is first unloaded from th e workspace, and rem oved from the registry. Since the world and its population are in the v irtual store, A P 5’s d a ta m anipu­ lation and query language are used to m anipulate and query the various properties and population of a world. W orkspace D atab ase T he environm ent m anager also m anages th e workspace database. It keeps track of worlds, w orld schem as and specifications th a t are loaded in the w orkspace, since they affect the workspace database. T he environm ent m anager also keeps track of reference counts to relations shared by world schem as loaded in th e workspace. Because the workspace is a database in its own right, a user m ay assert in­ form ation into or retract inform ation from th e workspace directly. Besides the 127 operations provided by the environm ent m anager m entioned previously, opera­ tions to change the actual d a ta in the workspace is supported by AP5. Since m eta-d ata is also represented as objects and relationships in th e workspace, a user m ay change them at will. O perations to add, replace, remove, clear, and query object equivalence specifications for object types in th e w orkspace are supported. O perations to query worlds and world schemas loaded in th e workspace are also provided. Functions to create a new world schem a or w orld d atab ase from th e workspace is also provided. Copy-env-into-schema copies all relations in th e w orkspace into a new world schem a, registers the new world schem a, and saves it to persistent store. Copy-env-into-world creates a new world of a specified schem a (typically the new copy of th e workspace schem a). It uses th e closure specification (of all closure specifications loaded) in th e workspace as th e closure specification for the world, if indicated. O therw ise, it is a schem a-based world. T he world equivalence speci­ fication is also extracted from th e workspace, if indicated; likewise, its constraint specifications. T he world m ay be populated and saved like any o ther worlds. If it is a schem a-based world, saving the world saves th e d atabase contents of the workspace (does not include th e m e ta d a ta ). Enforce-env-constraints asserts th e constraints into the workspace and tu rn s them on. 128 C h ap ter 8 C o n clu sion s and F u tu re D irectio n s We introduced th e W orldBase paradigm , a novel architecture for m ultiple database access for w orkstations appropriate to th e inform ation processing needs of th e 90’s. T he architecture is built around th e concept of worlds; these are concern-based d a ta aggregations which can be freely constructed, m odified and saved in a m anner sim ilar to files in a w ord-processing environm ent. A lthough presented using a sem antic d a ta m odel in this thesis, the underlying philosophy of th e W orldBase paradigm transcends the details of any d a ta m odel or m odeling paradigm . We also introduced the W orldBase sharing fram ew ork for access and storage of m ultiple, possibly inconsistent databases. T he fram ework provides for selection, transform ation and m erging of worlds. We also discussed the im plem entation of a p rototype system which realizes m any of the proposed ideas. This chapter concludes th e thesis w ith evaluations and directions for fu tu re research for the overall paradigm , th e selection, transform ation and m erging com ponents, and the prototype. 8.1 W orld B ase P arad igm In this section, we evaluate the contributions of th e W orldBase paradigm , discuss its disadvantages, and present fu tu re research directions th a t will alleviate some of its problem s. 129 8.1.1 F ocu sin g, cach in g, and co m m u n ica tio n A fundam ental advantage of the W orldBase paradigm , not shared by previous approaches to distrib u ted inform ation sharing, is th a t worlds provide a n atu ral and convenient m echanism for focusing on relevant d a ta sets. Unlike m ost view definition m ethodologies, W orldBase perm its th e creation of worlds form ed by selection in a transitive, concern-based m anner. T hus, users can form a world w hich includes a given set of relevant objects and relationships, and all other objects and relationships which are needed to “u n d erstan d ” th a t initial set. It is our belief th a t for a variety of database applications, including personal inform ation m anagem ent, adm inistrative d a ta m anagem ent, some form s of com­ m ercial d a ta m anagem ent and also engineering design applications, worlds provide a n atu ral unit of d a ta for m ost form s of interaction. Because worlds are focused and relatively small, they typically fit into the virtu al m em ory of a w orkstation, and can thus offer (after initial loading) a high speed interface for m ost d a ta in­ teractions. Indeed, in some cases, they could be loaded and m anipulated using lap-top com puters. U nder W orldBase, users can freely cache all worlds th a t they have created or m anipulated. A typical user will m aintain m any worlds, and call upon them for different d a ta intensive tasks as needed. This is convenient in those cases where the initial creation of worlds involved highly complex operations. It is also convenient in the case of long transactions, such as those th a t arise in engineering design applications. In th a t arena, users can use worlds to store interm ediate results and versions. Because worlds can be defined to be focused and “closed” over related d a ta ob­ jects and relationships, they can serve as a n atu ral unit of com m unication betw een users. T his is illustrated in p a rt by the extended exam ple of C hapter 1: a world concerning th e Japanese restau ran ts in Jo e’s database form s a n atu ral and self- contained d a ta set which can be interpreted and understood by o th er users. This perspective is fu rth er supported because worlds include th eir specifications, thus giving m uch of th e larger contextual inform ation needed to in terp ret correctly th e stored data. O f course, no com m unication can take place w ithout some agreem ent on conventions and term inology; in the case of worlds, some m echanism is needed so th a t th e intent behind entity types and relationships is understood. 130 8.1.2 A u to n o m y A utonom y is highly prized in th e real world. However, it was not a m ajor concern in trad itio n al database applications where centralized control is preva­ lent. T he current tren d tow ards a distributed netw ork of personal com puters and w orkstations[8], however, signifies a move away from centralized control tow ards a realization of the value of autonom y. D atabase integration approaches provided very lim ited autonom y to local databases. Each local database m ay m odify their d a ta locally, b u t is very re­ stricted in its stru ctu ral changes. T he Federated approach [33] sup p o rts lim ited autonom y in com ponent databases. T he Federation is a centralized agency th a t re­ stricts changes to exported schemas th a t are im ported by other com ponents. T he D PD M approach [49, 48] supports m ore autonom y for a d atab ase at th e expense of sharing support. For instance, it is h ard for a user to deduce w hat stru ctu re has been im posed on d a ta in rem ote databases. W orldBase lies at the extrem e end of th e spectrum w ith regards to providing autonom y in a d istrib u ted database environm ent. W orldBase supports independent databases w ith independent schem as. T he W orldBase paradigm forces a user to com pletely define a database (or database subset) he created. Because the instance is com pletely m aterialized, a w orld’s schem a, specifications and instance, w hether transform ed or m erged from other worlds, are independent. T he only “central” control in W orldBase is provided by th e registry, which contains nam es and owners of sharable worlds, and inform s others of th eir existence. T here is no central au th o rity figure th a t restricts a user’s actions on the databases he has access to. A utonom y is not w ithout price. T he user has th e burden of m anipulation, transform ation and integration of his worlds. He is provided w ith th e necessary tools, b u t their invocation is controlled by the user. T his autonom y m ay result in duplicates of data, w astage of space, and cause consistency problem s, unless policies are in itiated to share some of the inform ation in a coordinated m anner. A n interesting research direction is to investigate and classify different sharing policies th a t m ay be im posed on th e W orldBase environm ent. Different sharing policies m ay be decided by a group of users for a collection of world databases, 131 instead of being im posed on all databases and all users in th e W orldBase envi­ ronm ent. These sharing policies should resem ble sharing policies in stitu ted in the real world. 8.1.3 L ivin g w ith In co n sisten cy A fundam ental problem of distributed inform ation sharing is th a t of inconsistency betw een th e underlying databases, a t b o th th e schem a and d a ta levels. In general, it is difficult to get all constituent databases to agree to a single integrated schema. Forcing a single integrated schem a on all existing databases in th e world m ay be inefficient as well. T he W orldBase paradigm is unique in th a t it provides the initial layer of an architecture for providing useful access to inconsistent d ata, and a foundation for fu rth er research into this area. Inconsistency arises at the schem a level because different organizations have different perspectives and needs concerning overlapping d a ta sets. A fundam en­ ta l kind of inconsistency here results from differences in th e underlying taxonom y of objects and relationships. For exam ple, one group m ay view enrollm ents as relationships betw een students and courses, another m ay view them as abstract entities. M ore complex variations can arise, e.g. a p ainter is interested in wall surfaces, while an electrician is m ore concerned w ith wall interiors. O ther, sim pler kinds of inconsistencies arise from th e use of different units of m easurem ent, and from different conventions concerning how inform ation is tallied (e.g., w hether m eal price includes tip a n d /o r alcoholic beverages). T hrough schem a transfor­ m ations and m erging, W orldBase resolves a fundam ental aspect of th e overall problem of inconsistencies at the schem a level. In p articu lar, users can transform conflicting schem as and th eir associated d a ta into com patible schemas. However, this raises the m ore general research issues of how th e assum ptions underlying a given schem a can be articulated and m ade available to b o th user and com puter, how different schem as m ay be presented to users, and w hat com puter aids should be provided for the transform ation process. W orldBase also provides some basic tools for dealing w ith inconsistencies at th e d a ta level. In p articular, W orldBase perm its th e m erging of selected subsets of databases (in w orlds) having consistent structure. Currently, it is assum ed 132 th a t users can identify subsets of th e databases and worlds which are structurally consistent, and select them using th e W orldBase selection m echanism . Finally, while worlds m ust be internally consistent, different worlds m ay hold inconsistent d ata. T hus, users can hold two or m ore inconsistent worlds distinct (e.g. by renam ing th eir schem as to be disjoint) in th eir workspaces. As w ith th e case for schem a inconsistency, th e m achinery provided by W orld- Base for d a ta inconsistency provides only th e first layer of a com plete architecture, and raises a num ber of interesting research issues. For exam ple, it w ould be useful to provide autom atic or sem i-autom atic m echanism s for locating th e d a ta incon­ sistencies betw een different databases and worlds. Also, given two inconsistent worlds, w hat functionalities should be provided in this context? For exam ple, w hat are n a tu ra l analogs of the “cut and p aste” capabilities found in w ord-processing? Tools representing guidelines, m ethodologies, or heuristics are needed to aid users in th eir dealings w ith inconsistent inform ation. 8 .1 .4 Sharing in W orld B ase T he W orldBase paradigm identifies three separable, essentially independent as­ pects of th e database sharing problem , nam ely: database selection, transform ation and m erging. Each step is isolated and tools to resolve problem s specific to the steps are provided. T hus, the sharing support allows users to focus on different steps of database m anipulations, each of which m ay be quite complex. In accordance w ith the different aspects of sharing support, W orldBase divides a d atab ase into several parts: its schem a, closure specification, equivalence specifi­ cation, constraint specification and th e instance itself. T he bundling of m eta-d ata inform ation into different specifications facilitates sharing and integration since they help in com m unicating th e underlying in terp retatio n for th e world instance. T he specifications them selves m ay also be shared or reused by o ther users. T he separation of a world into several specificational su b p arts is found to be useful be­ cause interactions of th e p arts during m erging are m ade explicit. T he p arts which are not needed during the different m anipulation steps need not be specified. One problem new to th e W orldBase paradigm is th e issue of w hat tools to provide to individual users for m anaging large num bers of interrelated worlds. In 133 th e prototype, this is achieved by th e registry, which is organized as a UNIX-style directory structure. Should a database be used instead? Also, the user m ay w ant to save and re-use ILO G ^ program s for d a ta transform ations. T he histories of how worlds were created and m anipulated should also be readily available. An im p o rtan t concept th a t any organization scheme for W orldBase should su p p o rt is th a t of world group. This refers to a collection of worlds th a t are com patible w ith each other, and whose m em bers are generally loaded into the sam e workspace. For exam ple, a world group m ight be com posed of worlds for different ethnicities of restau ran ts. These worlds m ay be used either individually or in com bination w ithout conflict. A ssum ing th a t inform ation on worlds, th eir properties and relationships to o th er worlds (such as groups) are stored in a registry database, there is still the question of w hat world relationships to save. T here m ay be different kinds of world databases: one th a t is defined in term s of other d atabases-such as in a m erged world; one th a t is dependent on another database— such as in a transform ed world; or one th a t is derived from another world in other ways (e.g. through copy). T he different relationships to be supported vary depending on th e support given to m aintaining consistency am ong worlds. T he m ore the consistency su p p o rt needed, the m ore relationships m ust be m aintained in the registry. Because W orldBase allows m ultiple users to access the sam e world at the sam e tim e, some synchronization m echanism m ust be provided to avoid m ultiple users from m odifying the sam e world concurrently. Since users explicitly cache worlds into the workspace, a w orld m ay be kept in workspace for a long tim e. A strict locking m echanism m ay be inconvenient; some alternatives involve allowing users to broadcast a request to change a world w hen changes are desired, or allowing different versions of th e sam e world to co-exist. 8.1.5 D istr ib u te d In form ation S haring S y ste m T he system outlined in this thesis is only a first step in the investigation of a new way to organize and share inform ation. T he next steps lie in m aking W orldBase th e sole interface to distrib u ted inform ation. 134 T he W orldBase paradigm m ay be extended to provide a uniform , full-function interface for distrib u ted inform ation sharing. U nder this extended paradigm , all database access and u p d ate would be through W orldBase. Users would create worlds from underlying databases and other worlds; each world w ould hold d a ta relevant to a specific context. A typical user would m aintain m any worlds, and call upon them for different d a ta intensive tasks as needed. E ach user would m aintain a library of personal worlds, itself organized in a d atab ase or a directory stru ctu re. U nder W orldBase, users can freely cache all worlds th a t they have created or m anipulated. However, there are several issues th a t m ust be resolved before this m ay happen. In its present form , the only kind of u p d ate supported by W orldBase is to save a world, thereby overw riting w hatever was there before. In the case of sm all worlds, this is analogous to rew riting files, and is not especially expensive or inefficient. However, if W orldBase is to serve as th e sole interface to m ultiple databases, then m echanism s m ust also be provided for propagating u p d ates from a world to the w orld(s) it was defined from, and to the underlying databases. In its full form , this is th e classical view u p d ate problem for th e relational m odel, extended by th e presence OIDs in d a ta transform ations, and by recursion in th e closure specifications. A p artial solution to u p d ate propagation problem in W orldBase could be pro­ vided by extending the approaches of [40, 52, 65]. These utilize b o th syntactic and sem antic inform ation to propagate updates under various conditions in the context of th e relational model. T he syntactic inform ation includes the stru ctu re of th e databases and view definitions, while the sem antic inform ation is em bodied in integrity constraints and other d a ta properties. Extending these approaches to encom pass the presence of OID creation in W orldBase will be simplified by the form of th e tran slatio n language ILO G *. In particu lar, th e Skolem -functor based sem antics of ILO G * will provide a convenient m echanism for finding and m ain­ taining “th read s” relating th e OIDs created in a world to th e O ID s and values causing th eir creation. E xtending the approaches to incorporate th e recursion in closure specifications will present new challenges. A different aspect to this u p d ate propagation problem is the establishm ent of links betw een worlds to propagate updates, sim ilar to capabilities of M RDSM 135 (via m anipulation dependency) and th e Federated approach (user-provided u p d ate functions). T he links propagate updates specific to an object or a group of objects. Besides u p d ate propagation, updates to schemas m ust also be supported and changes propagated to some or all of its world instances. W orldBase provides only th e very basic relation and type addition operations, allowing users to augm ent the schem a. Changes to a schem a m ay be sim ulated by specifying a m odified (target) schem a and transform ing (some or all) instances of th e old schem a into instances of th e new schem a. Higher level support for schem a m odifications is desirable. One lim itation to the current W orldBase system is th a t worlds cannot be too large in population, because they m ay not fit into th e v irtu al m em ory of the workspace (of course, this is an ever-dim inishing problem [57]). However, too large a world defeats th e “focusing” m echanism of W orldBase. If W orldBase is to be th e sole interface to m ultiple databases, however, providing access to large databases is essential. E xtraction of focused subsets from large databases w ithout loading them into th e workspace should be supported. U pdate propagation from these subsets to the large databases is an im p o rtan t aspect of th e problem . C urrent research on W orldBase has focused on providing access to th e struc­ tu ra l portion of d atab ase schemas. T he trem endous recent interest in th e object- oriented approach, w ith its interleaving of stru ctu ral and behavioral com ponents, raises th e provocative research area of developing extensions to th e W orldBase paradigm which incorporate behavior at a fundam ental level. A nother interesting extension would be to use th e W orldBase paradigm to accom m odate heterogeneous databases w ith varying d a ta models. Different w orkspaces m ay support different d a ta models. All worlds loaded into th e sam e w orkspace m ust be of th e sam e d a ta model; thus consistency of the workspace is m aintained. T he closure m echanism could be generalized to different d a ta models to access a subset of a database into a workspace. T he transform ation language could be extended for use w ith different m odels, allowing queries to the source d atab ase to be in the query language of th e different m odels, and th e targ et to be of another m odel. W orlds m ay be transform ed from one d a ta m odel to another. However, only worlds of th e sam e m odel and com patible schem as m ay be merged. A final concern in providing a single full-function interface to distrib u ted infor­ m ation is suitable user interfaces. T he issue of providing suitable user interfaces for 136 W orldBase is m ulti-faceted. A lthough not im plem ented in the context of W orld- Base, th e issue of providing a graphical interface to the schem as and instances of various worlds is largely resolved by th e interfaces to sem antic m odels th a t have been developed in th e past few years (see [35]). 8.2 C losu re S p ecification T he closure m echanism presented is a stylized view definition m echanism . It is based on th e notion of annotated tem plates which direct th e selection m echanism . T he tem plate language is sim ple, w ith easy to un d erstan d prim itives. T he concern-based em phasis of th e closure m echanism appears to be unique am ong view definition m echanism s. T he traversal algorithm , w hich m ay include recursion, is an interesting and appropriate focusing m echanism . Also, as noted above, it justifies our focus on m ain m em ory m anipulations. T h e closure specification concept is extensible to other applications. It m ay be used as a specification of complex object boundaries. Also, th e sem antics described in this thesis could be refined or m ade m ore complex to accom m odate different applications. A useful direction for future research would be to focus on im plem entation efficiency. T he current version of W orldBase recom putes th e closure of a world at each explicit world population u p d ate or save. T his m ay be inefficient when the world is large and changes are m inor. For im plem entation efficiency, incremental algorithm s m ay be provided to save only increm ental updates when th e changes are m inor, and allow saving of the whole im age of th e world instance after m ajor changes. A n increm ental version of th e closure m echanism is needed. W e do not have a good sense of increm ental u p d ate w ith th e closure m echanism yet. To su p p o rt increm ental u p d ates, changes to a w orld’s population m ust be tracked; th e increm ental closure should find only those object and tuples affected. One issue in increm ental save is in m aintaining th e PID s used by th e original w orld’s population, and a syntax for indicating changes to th e world in persistent store. Such changes m ay include removal of objects, rem oval of tuples, addition of objects, or addition of tuples. 137 It would be useful to apply the closure m echanism as a view definition m ech­ anism in th e context of relational databases as well, allowing users to extract relevant clusters of d a ta sets from relational databases. An im p o rtan t aspect of th e problem is to im plem ent th e closure m echanism in th e context of large, indexed databases. O ne approach is to view th e closure m echanism in term s of D atalog rules and study th e n atu ral m appings of closure specifications to D atalog p ro ­ gram s. This direction of research will solidify th e connection betw een th e closure m echanism and D atalog, and w ith deductive databases in general. 8.3 IL O G * ILO G * is a sim ple b u t powerful language to transform d a ta of one stru ctu re to another. It is specificational in natu re, thus can be used to specify m ost tra n s­ form ations. ILO G ± has expressive power com parable to th e relational calculus, extended to p erm it th e inclusion of user-defined functions in an external language. T he language is also am enable to form al analysis (see [38]). T he language can be used in m any different contexts, e.g. federated, view creation, schem a transform ation, and schem a integration. T he transform ation language m ay be used w ith different d a ta m odels, or to deal w ith heterogeneous databases. It m ay be used w ith any d a ta m odel in which sets are not first class citizens-any d a ta m odel th a t has n atu ral sim ulation in th e relational m odel. A com bination of transform ation w ith closure is presented in C hapter 5. T he com bination allows selective transform ation of d atab ase subsets, based on the closure m echanism . T he language, as described, is cryptic and m ay be tedious to specify. One re­ search direction is to provide collections of n atu ral, basic transform ations (m acros) th a t provide a higher level interface to ILO G *. M ore user-friendly interfaces such as specificational, operational or graphical tools m ay be provided for novice users th a t allow them to use ILO G * for the m ore complex transform ations. C urrent im plem entation of ILOG * does not support type checking or consis­ tency checking. An im p o rtan t issue is developing com pile tim e checks on typing and constraint satisfaction. A nother interesting research direction is to apply 138 transform ation specifications to constraints, and study th e effects. A sim ilar ap ­ proach to m erging constraints m ay be used, such as the definition of most natural transformations of constraints using ILO G ± . A useful direction for a m ore efficient system is to provide increm ental u p ­ dates for transform ations. C urrently, th e transform ation process recom putes the instances of th e targ et at each transform ation. A n algorithm th a t transform s only increm ental upd ates on th e source to th e targ et will im prove efficiency, especially w hen transform ing large databases. P ropagation in th e reverse direction m ay be im plem ented by specifying a correspondence specification for the reverse direction and com puting the new population of th e source. Developing higher level tools to su p p o rt u p d ate propagation in th e reverse direction w ould also be useful. Sim ilar extensions for increm ental updates and a m ore efficient im plem entation are needed for th e closure-transform algorithm . Extensions to th e language provide other avenues for future research. T he language m ay be extended to be used to transform m ultiple sources into a single targ et. T his extension m ay involve extending th e sem antics of object creation in interm ediate relations to incorporate th e object equivalence sem antics. A nother extension m ay involve supporting transform ation of databases from one d a ta m odel to another in heterogenous databases environm ents. A nother interesting extension to the language is to allow sets as first class citizens. A fundam ental problem th a t m ust be addressed is how to detect if ILO G 1 * " specifications violate any of the intentions of a source database. W hile th e ques­ tion of addressing intentions of schem a com ponents m ay never be resolved, some advances would be m ade if some intentions m ay be com m unicated through world specifications. 8.4 D a ta b a se M ergin g W e have articulated a fam ily of relatively orthogonal abstractions for specifying object-base m erging. T he key com ponents are: schem a m erging, object m erging and m erging of rem aining tuples, and constraint resolution aided by th e preference m echanism . T his separation m akes it easier to specify a desired m erge. Also, the 139 m odular perspective on the various com ponents of a m erge m akes it m ore am enable to debugging, m odification, and form al analysis. T he problem and resolution for schem a m erging is simple. Using th e fram ew ork provided by ILO G ± , a user can transform incom patible schem as into com patible form s. T he renam ing of schem a structures supported during loading also provides a m eans to facilitate schem a m erging. T he world equivalence specification uses an extension on th e notion of keys in relational databases to m erge two worlds. O ur approach does n o t require speci­ fication of world equivalence specification a t world creation, only w hen they are to be m erged. We also require keys only for object types which th e user wishes to m erge in th e target; no keys are required for those n o t m erged in th e target. M ultiple keys m ay be specified; th e overlap of key fam ilies of two worlds is used as th e key for th e m erging process. T hus, different keys m ay be used to m erge a w orld w ith o th er worlds. W e identified a fundam ental role which keys can play in th e context of an object-oriented environm ent. A useful generalization will be to extend object equivalence m echanism s to incorporate th e various ways of identifying equivalent objects m entioned in C hapter 6. One such extension is to allow for undefined keys. O bjects m ay exist in the w orkspace w ithout th eir key relationships. Equivalence of objects w ith undefined keys is not determ ined u n til m ore inform ation is provided by th e user. W ith this m ethod, not all potentially equivalent objects are m erged a t world instance loading tim e. T he W orldBase approach also provides th e user th e flexibility of (re)defining resulting constraints of th e m erged database. We identified a class of constraints for which n a tu ra l preferences exist. We also identified acceptable com binations of constraints and unacceptable ones. A preference m echanism can be used to resolve some of th e conflicts th a t result from m erging two databases. T his m echanism also allows users to specify com putations to resolve the conflicts. T h e preference m echanism is ju st th e tip of th e iceberg in th e problem of aiding conflict resolution during m erging. O ther tools are needed to deal w ith oth er kinds of m erging conflicts. T he tools m ust provide the user w ith m ore au to m ated support for m erging, yet retain th e flexibility of allowing th e user to specify com putations to deal w ith those conflicts for which no tool is appropriate. 140 W e have th e freedom to explore preference m echanism s because th e m erged database is m aterialized rath er th an virtual. However, th e constraints studied in the thesis are very restricted. Also, th e treatm en t for disjointness constraints is not very refined. M ore general form s of constraints need to be studied. As w ith the o ther sharing tools, the efficiency of the overall m erging su p p o rt needs to be im proved. In th e thesis, we only address m erging betw een two databases w ith two way preferences. A m ultiple database m erge w ith m ultiple preferences w ould be an interesting direction for fu tu re research. A nother direction th a t could be taken is to consider the issues of m erging constraints from a theoretical point of view, isolated to th e extent th a t one can em bark on theoretical investigations. 8.5 T h e P r o to ty p e T he prototype exists as proof of the W orldBase concept. It also provides a platform for fu rth er experim entation. O ur prelim inary evaluation of th e overal system reveals th a t th e separation of su p p o rt into selection, transform ation and m erging fits well w ith th e overall fram ework. T he languages (world specifications, ILO G *, and preference specifica­ tion language), m ake it possible to have conceptually straightforw ard algorithm s in m ost cases. Also, AP5 provides a nice v irtual m em ory d atab ase support and is convenient to use. T he prototype also provides insights on how to transfer p arts o r all of the W orldBase technology to other platform s and environm ents. A m ain concern is th a t W orldBase relies heavily on A P5, extensively utilizing th e virtu al m em ory database. It seems likely th a t th e full com putation power of a program m ing lan­ guage and a virtu al m em ory d atab ase is needed to im plem ent an ideal W orldBase system , since it uses recursion in th e closure m echanism , su p p o rts associative ac­ cess and OID creation in ILO G *, and allows retractio n of d a ta from th e database in th e preference m echanism . Below, we outline some ideas on how different parts of W orldBase m ay be incorporated into existing environm ents. 141 Sm alltalk [29] has a virtu al m em ory object-base and provides a powerful pro­ gram m ing language. T he OID creation and preference m echanism could be im ­ plem ented in a straightforw ard m anner on a Sm alltalk environm ent. However, we cannot im plem ent th e W orldBase paradigm and sharing support directly on a Sm alltalk environm ent because Sm alltalk supports a different object m odel. T he Sm alltalk d a ta m odel is m ore object-centered th a n W DM . O ur m odel allows for concern-based focusing based on tuples as well as objects; on the o th er hand, Sm alltalk tuples are retrieved by m ethods. A variation of th e m odel th a t places em phasis on objects m ore th an relations can be used w ith th e closure m echanism to save objects in Sm alltalk. Sim ilar m odel considerations should be applied to the transform ation and m erging m echanism s. Prolog[13] supports associative access and recursion. Im plem entation of the W orldBase sharing tools is straightforw ard for the closure m echanism and OID creation. However, the preference m echanism will be difficult to im plem ent. Transferring the W orldBase technology to relational databases is m ore prob­ lem atic since m ost relational query languages do not su p p o rt recursion, nor do they support OID creation. M oreover, tran slatin g preference specifications into relational queries will be difficult, and m ay result in a very convoluted query. Also, there is no v irtual m em ory m anipulations in m ost relational system s. O n th e other h and, some kind of selection m echanism m ay be supported by th e rela­ tional d atab ase m anager th a t works w ith a m odified transform ation and m erging m echanism s (w ithout preferences). T he W orldBase sharing paradigm m ay still be used, even if each of th e sharing tools is modified. 142 R eferen ce L ist [1] S. A biteboul and R. Hull. IFO: A form al sem antic database m odel. A C M Trans, on Database Systems, 12(4):525-565, Dec. 1987. [2] S. A biteboul and P. K anallakis. O bject identity as a query language prim itive. In Proceedings of the A C M SIGM OD International Conference on Manage­ ment of Data, pages 159-173, 1989. [3] M .P. A tkinson and R. M orrison. Polym orphic nam es, types, constancy and m agic in a type secure persistent object store. In Proceedings of the Workshop on Persistent Object Systems: their design, implementation and use, pages 1-12, A ugust 1987. [4] F. Bancilhon. O bject-oriented database system s. In Proceedings of the A C M SIG AC T-SIG M O D Symposium on Principles of Database Systems, pages 152-162, 1988. [5] F. Bancilhon, S. C luet, and C. Delobel. A query language for th e O 2 object- oriented database system . In Proceedings of the Second International Work­ shop on Database Programming Languages, pages 122-138, Ju n e 1989. [6] C. B atini, M. Lenzerini, and S.B. N avathe. A com parative analysis of m ethod­ ologies for database schem a integration. A C M Computing Surveys, 18(4):323- 364, D ecem ber 1986. [7] C. Beeri. Form al m odels for object oriented databases. In Proceedings of the First International Conference on Deductive and Object-Oriented Databases, pages 370-395, 1989. [8] W illiam M. Bulkeley. P C netw orks begin to oust m ainfram es in some com­ panies. The Wall Street Journal, W ednesday, M ay 23:A1, 1990. [9] R. Carrick and R. Cooper (E ditors). Proceedings to the workshop on Persis­ tent Object Systems: their design, implementation, and U3e. U niversity of St. A ndrew s, D epartm ent of C om putational Science, A ugust 1987. [10] D. D. C ham berlin, J. N. Gray, and I. L. Traiger. Views, authorization, and locking in a relational database system . In Proceedings of the National Com­ puter Conference, pages 425-430. A FIPS, Ju n e 1975. 143 [11] P.P. Chen. T he entity-relationship m odel - tow ard a unified view of data. A C M Transactions on Database Systems, l(l):9 -3 6 , M arch 1976. [12] W eidong Chen and David S. W arren. C-Logic of com plex objects. In Proceed­ ings of the A C M SIG AC T-SIG M O D Symposium on Principles of Database Systems, pages 369-378, 1989. [13] W . F. Clocksin and C. S. Mellish. Programming in Prolog. Springer Verlag, B erlin Heidelberg, Germ any, 1981. [14] D. Cohen. A utom atic com pilation of logical specifications into efficient pro­ gram s. In A A A I, Ju n e 1986. [15] D. Cohen. Ap5 reference m anual. Technical rep o rt, U S C /Inform ation Sci­ ences In stitu te, 1987. [16] D. Cohen. Com piling complex database tran sitio n triggers. In Proceedings of the A C M SIGMOD International Conference on Management of Data, pages 225-234, 1989. [17] S. S. Cosm adakis and C. H. P apadim itriou. U pdates of relational views. Journal of the A C M , 31(4), O ctober 1984. [18] Scott D anforth, Setrag K hoshafian, and P atrick Valduriez. FAD, a database program m ing language. Technical R eport ACA-ST-151-85, Rev. 3, M CC, Jan u ary 1989. to appear in ACM TO DS. [19] U. Dayal. Queries and views in an object-oriented d a ta m odel. In Proceedings of the Second International Workshop on Database Programming Languages, pages 80-102, Ju n e 1989. [20] U. D ayal and P. B ernstein. On th e correct translation of u p d ate operations on relational views. A C M Transactions on Database Systems, 3(3):381-416, Sept 1982. [21] U. D ayal and P. A. Bernstein. O n th e u p d atab ility of relational views. In Proc. of Intl. Conf. on Very Large Data Bases. IE E E , Septem ber 1978. [22] U. Dayal and H.Y. Hwang. View definition and generalization for database integration in a m ultidatabase system . IE E E Transactions on Software E n ­ gineering, SE-10(6):628-644, 1984. [23] U m eshwar Dayal and Jo h n M. Sm ith. Probe: A know ledge-oriented database m anagem ent system . In On Knowledge Base Management Systems, pages 227-257. Springer-Verlag, 1986. 144 [24] A. Dearie, R. Connor, F. Brown, and R. M orrison. Napier88 - a d atab ase pro­ gram m ing language? In Proceedings of the Second International Workshop on Database Programming Languages, pages 179-195, Ju n e 1989. [25] R. A. D iPaola. T he recursive unsolvability of th e decision problem for a class of definite form ulas. J. A C M , 16(2):324-327, 1969. [26] M. M. A strahal et al. System R: A relational database m anagem ent system . A C M Transactions on Database Systems, 1(2), ju n e 1976. [27] D. H. F ishm an, D. Beech, H. P. C ate, E.C. Chow, et al. Iris: An object- oriented database m anagem ent system . A C M Transactions on Office Infor­ mation Systems, 5(l):48-69, Jan u ary 1987. [28] D avid B. G arlan. Views for tools in integrated environm ents. In Proceed­ ings of the International Workshop on Advanced Programming Environments, Ju n e 1986. [29] A. G oldberg and D. Robson. Smalltalk-80: The Language and its Im plem en­ tation. Addison-W esley, Reading, MA, 1983. [30] P ersistent Program m ing Research Group. PS-A lgol reference m anual. Tech­ nical rep o rt, U niversity of Glasgow, D epartm ent of C om puting Science, Uni­ versity of St A ndrew s, D epartm ent of C om putational Science, 1987. [31] M. Gyssens, J. Paredaens, and D. Van G ucht. A g raph oriented object d atab ase model. In Proceedings of the A C M SIG AC T-SIG M O D Symposium on Principles of Database Systems, 1990. to appear. [32] M. H am m er and D. McLeod. D atabase description w ith SDM: A sem an­ tic database model. A C M Transactions on Database Systems, 6(3):351-386, Septem ber 1981. [33] D. H eim bigner and D. McLeod. A federated architecture for inform ation system s. A C M Transactions on Office Information Systems, 3(3):253-278, Ju ly 1985. [34] M. F. Hornick and S. B. Zdonik. A shared, segm ented m em ory system for an object-oriented database. A C M Transactions on Office Information Systems, 5(l):70-95, 1987. [35] R. Hull and R. King. Sem antic database m odeling: Survey, applications, and research issues. A C M Computing Surveys, 19(3):201-260, Septem ber 1987. [36] R. Hull, R. M orrison, and D. Stem ple (E ditors). Proceedings of Second Inter­ national Workshop on Database Programming Languages. M organ K aufm an, Ju n e 1989. 145 [37] R. Hull, S. W idjojo, and D. S. W ile. A specificational approach to d atabase transform ation. Technical rep o rt, US C /Inform ation Sciences In­ stitu te, F ebruary 1990. [38] R. Hull and M. Yoshikawa. ILOG: D eclarative creation and m anipulation of object identifiers. Technical rep o rt, C om puter Science D ept., University of Southern California, 1990. to appear in VLDB 1990. [39] A. M. Keller. Updating Relational Databases through View. PhD thesis, D epartm ent of C om puter Science, Stanford University, 1985. [40] A. M. Keller. T he role of sem antics in tran slatin g view updates. IEE E Computer, pages 63-73, Jan u ary 1986. [41] S. K hoshafian and G. Copeland. O bject identity. In Proc. A C M Conf. on Object-Oriented Programming Systems, Languages, and Applications, pages 406-416, 1986. [42] M. Kifer and Jam es W u. A logic for object-oriented logic program m ing (M aier’s o-logic revisited). In Proceedings of the A C M SIG AC T-SIG M O D Symposium on Principles of Database Systems, pages 379-393, 1989. [43] R. King and D. McLeod. A database design m ethodology and tool for in­ form ation system s. A C M Transactions on Office Information Systems, 3(1), Jan u ary 1985. [44] G abriel M. K uper and M oshe Y. Vardi. A new approach to database logic. In Proceedings of the A C M SIG AC T-SIG M O D Symposium on Principles of Database Systems, pages 86-96, 1984. [45] C. Lecluse, P. R ichard, and F. Velez. O 2 : An object-oriented form al d a ta model. In Proceedings of the A C M SIGM OD International Conference on Management of Data, Chicago, Ju n e 1988. [46] W . Litwin. An overview of the m ultidatabase system M RDSM . In Proceedings of the A C M National Conference, Denver. ACM , O ct 1985. [47] W . Litw in and A. A bdellatif. M ultidatabase interoperability. IE E E Com­ puter, Dec 1986. [48] P. Lyngbaek. Information Modelling and Sharing in Highly Autonomous Database Systems. PhD thesis, U niversity of S outhern California, Los An­ geles, CA, A ugust 1984. [49] P. Lyngbaek and D. McLeod. O bject sharing in d istrib u ted inform ation sys­ tem s. A C M Transactions on Office Information Systems, 2(2):96-122, A pril 1984. 146 [50] P. Lyngbaek and V. V ianu. M apping a sem antic database m odel in the rela­ tional m odel. In Proceedings of the A C M SIGMOD International Conference on Management of Data, 1987. [51] D. M aier. A logic for objects. In Workshop on Foundations of Deductive Databases and Logic Programming, pages 6-26, W ashington, D .C ., A ugust 1986. [52] Y. M asunaga. A relational database view u p d ate tran slatio n m echanism . In Proc. of Intl. Conf. on Very Large Data Bases, pages 309-320, 1984. [53] Y. M asunaga. O bject identity, equality and relational concept. In Proceed­ ings of the First International Conference on Deductive and Object-Oriented Databases, pages 170-187, 1989. [54] A. M otro. Superviews: V irtual integration of m ultiple databases. IE E E Transactions on Software Engineering, SE-13(7), July 1987. [55] A. M otro and P. B unem an. C onstructing superviews. In Proceedings of the A C M SIGMOD International Conference on Management of Data. ACM SIGM OD, A pril 1981. [56] S. B. N avathe and J. P. Fry. R estructuring for large databases: T hree levels of abstraction. A C M Transactions on Database Systems, Ju n e 1976. [57] T he L aguna Beach P articipants. F uture directions in dbm s research. Sigmod Record, M arch 1989. [58] N. W . P ato n and P. M. D. Gray. Identification of database objects by key. In Proceedings of Second International Workshop on Object-Oriented Database Systems, B ad M unster am Stein-E bernburg, FR G , Septem ber 1988. [59] CLF P roject. Ap5 training m anual. Technical report, U S C /Inform ation Sci­ ences In stitu te, 1989. [60] A lan Purdy, Bruce Schuchardt, and D avid M aier. Integrating an object server w ith other worlds. A C M Transactions on Office Information Systems, 5(l):27-47, Jan u ary 1987. [61] S. P. Reiss. Pecan: P rogram developm ent system s th a t su p p o rt m ulti­ ple views. IE E E Transactions on Software Engineering, S E -ll(3):276-285, M arch 1985. [62] J. E. R ichardson, M. J. Carey, D. J. D eW itt, and D. T. Schuh. Persistence in exodus. In Proceedings of the Workshop on Persistent Object Systems: their design, implementation and use, pages 96-113, A ugust 1987. 147 [63] J. R osenburg and R. M orrison (Editors). Proceedings to the workshop on Persistent Object Systems: their design, implementation, and use. University of New Castle, A ustralia, Jan 1989. [64] G. M. Shaw and S. B. Zdonik. An object-oriented query algebra. In Pro­ ceedings of the Second International Workshop on Database Programming Languages, pages 103-112, Ju n e 1989. [65] A. Sheth, J. Larson, and E. W alkins. Tailor: A tool for u p d atin g views. In Proc. of Intl. Conf. on Extending Data Base Technology, 1988. [66] D. Shipm an. T he functional m odel and th e d a ta language D A PLEX. A C M Transactions on Database Systems, 6(1):140-173, M arch 1981. [67] N. C. Shu, B. C. Housel, and V. Y. Lum. Convert: A high level translation definition language for d a ta conversion. Communications of the ACM, Oct 1975. [68] N. C. Shu, B. C. Housel, R. W. Taylor, S. P. Ghosh, and V. Y. Lum. Express: A d a ta extraction, processing, and restructuring system . A C M Transactions on Database Systems, Ju n e 1977. [69] M. Stefik and D. G. Bobrow. O bject-oriented program m ing: Them es and variations. The A l Magazine, 1985. [70] F. W ai. D istribution and persistence. In Proceedings of Workshop on Persis­ tent Object Systems: their design, implementation and use, A ugust 1987. [71] S. W idjojo, R. Hull, and D. S. Wile. A specificational approach to database m erging. Technical rep o rt, U SC /Inform ation Sciences In stitu te, 1990. [72] D. W ile. P o p art: Producers of parsers and related tools, reference m anual. Technical rep o rt, U S C /Inform ation Sciences In stitu te, 1989. [73] D. S. W ile. Organizing program m ing knowledge into syntax-directed experts. In Proceedings of the International Workshop on Advanced Programming E n­ vironments, Ju n e 1986. [74] D. S. W ile and D. G. A llard. W orlds:an organizing stru ctu re for object-bases. In Proceedings of the A C M SIG S O F T /S IG P L A N Software Engineering Sym ­ posium on Practical Software Development Environments, Decem ber 1986. [75] D. S. W ile and D. G. A llard. A ggregation, persistence, and identity in W orlds. In Proceedings of Workshop on Persistent Object Systems: their design, im ­ plementation and use, Ja n 1989. 148 [76] D. S. W ile, N. M. G oldm an, and D. G. Allard. M aintaining object persistence in the com m on lisp fram ework. In Proceedings of Workshop on Persistent Object Systems: their design, implementation and use, A ugust 1987. 149 A p p en d ix A A P 5 T his appendix describes salient features of A P5 th a t are used by W orldBase. For a m ore com plete description, the reader is referred to [15, 59]. Since A P5 is an extension of Com m on Lisp, its syntax is Lisp-like. A .l A P 5 D a ta M o d el A P5 su p p o rts a d a ta m odel sim ilar to th a t of E ntity-R elationship [11] model. E ntities (also called objects) refer to any value th a t can be stored in a variable in Lisp (A P5 uses the sam e notion of objects as Com m on Lisp). Relations are nam ed abstractions for organizing data. Each relation contains sequences of objects of th e sam e length. T he length of the sequences is called th e arity of the relation. A tuple is an instance of a relation consisting of a sequence of objects. T ypes are u nary relations th a t denote (possibly infinite) sets of objects. All atom ic types in Com m on Lisp (such as c h a r a c te r , in te g e r , etc.) are recognized as types in AP5. A P5 supports and m aintains subtype and supertype relationships in a type lattice. T he root of th e type lattice is a special type E n tity which contains the set of all A P5 objects. A special subclass of E n tity represents ab stract objects in the virtu al store. Instances of this subclass are called dbobjects. A P5 supports two different types of rules: consistency rules and autom ation rules. Consistency rules state invariant conditions on d a ta in th e database and m onitor changes to the database. T ype constraints and cardinality constraints are subclasses of consistency rules. A utom ation rules provide for autom atic invocation of pro g ram s-th ey are triggered in response to specified transitions of the database. 150 C onstraints attached to worlds (in world constraint specification) is tran slated to consistency rules at explicit user com m ands. A P5 also supports the notion of atomic transitions and transactions. An atomic tran sitio n allows a program m er to group updates so th e d atab ase moves from one m eaningful state to th e next. A transaction allows a program m er to group updates so an abort-transaction w ithin the transaction leaves th e database state unchanged. A .2 D a ta D efin itio n L anguage A wff (well-formed formula) is an expression in first-order logic. T he sim plest wffs are th e constants True and False. A 'primitive wff is a list of relation nam e and sequence of objects (or variables) of length equal to th e arity of th e rela­ tion. Compound wffs can be constructed from prim itive wffs by th e addition of logical connectives a n d /o r logical quantifiers. A description is a list of th e form ( v a rs s . t . w f f ) , where v a rs is a list of free variables in w ff. A P5 distinguishes three kinds of relations: stored relation, derived relation and computed relation. T he contents of stored relations are determ ined by explicit assertions and retractions. T he contents of a derived relation is defined by a com putation th a t depends on the contents of other relations. A com puted relation contains invariant set of tuples th a t is determ ined by a com putation independent on th e contents of the database. A stored relation m ay be defined using defrelation m acro of th e form below by providing its nam e and arity. T ype constraints, equivalences, and other param eters m ay also be supplied. ( D e fR e la tio n <name> : a r i t y < in te g e r> : ty p e s C l i s t o f ty p e s> re q u iv s < l i s t o f e q u iv a le n c e s> ... ) Derived relations are defined using the sam e m acro by providing its nam e and derivation form ula. Its arity is not required. T ype constraints, equivalences, and o ther param eters m ay also be supplied. ( d e f r e l a t i o n <name> : a r i t y < in te g e r> :d e f i n i t i o n :d e r iv a tio n . . . ) 151 C om puted relation is defined using th e sam e m acro, b u t w ith th e : co m p u tatio n keyword specified: ( d e f r e la t io n <name> r a r ity .. ‘ .d e f i n i t i o n .. :d e r iv a tio n . . :com putation . . . ) In b o th derived and com puted relation, a Lisp form , Lisp predicate or Lisp function m ay be used in specifying the derivation or com putation. Exam ples of derived relations are specified in one of the tran scrip ts in A ppendix D. O ther annotations m ay be specified for relations to im prove perform ance. A .3 D a ta M a n ip u la tio n L anguage AP5 provides sim ple assertion, retraction and querying m echanism s for relations. • ( ?? < re la tio n -n a m e > <sequence o f o b je c ts > ) retu rn s tru e if th e sequence of objects are tru e of the relation. • ( + + < re la tio n -n a m e > <sequence o f o b je c ts > ) asserts th e sequence of objects to be tru e of the relation. • ( — <relation-nam e> <sequence o f o b jec ts> ) retracts the sequence of objects from the relation to m ake it false. A num ber of predefined relations exist in AP5. Some of these are rela­ tional analogs of Lisp functions for object com parison such as eq, e q l, e q u a l, s tr in g - e q u a l, =, etc. A P5 provides generators for certain relations so th a t program s can iterate through tuples of these relations. However, not all relations can be generated (e.g. trying to generate m em bers of infinite types). O ne of th e generator has the following form: (lo o p f o r < d escrip tio n > ) w here < d e s c rip tio n > is of the form (v a rs s . t . wff). For each iteration, the variables in the v a rs list of < d e sc rip tio n > are bound to some tuple th a t satisfies the description. These variables can be accessed w ithin . 152 A nother generator m acro used to retrieve sequences of objects has th e form: ( l i s t o f < d escrip tio n > ) If there is only one variable in the v a rs list in < d e s c rip tio n > , th e m acro gen­ erates a list of objects th a t satisfies the description. O therw ise, it generates a list of sequences of objects (corresponding to elem ents in v a rs ) th a t satisfies the description. O ther m acros provided for generation can be found in [15, 59]. 153 A p p en d ix B G ram m ars T he following gram m ars are described in POPART[72] form at. P o p art (also known as P roducers Of P arsers A nd R elated Tools) is used to build parsers for the gram ­ m ars. These parsers produce parse trees w hen given a string of th e gram m ar; the parse trees are then tran slated to Lisp forms. T he gram m ars specified and used in exam ples in th e body of th e thesis m ay differ slightly from the actual im plem entation and use in th e transcripts. T he languages used in th e prototype are influenced by A P5 and Lisp. Briefly, th e notations are sim ilar to BN F notation, and can be interpreted as follows: • term inals are quoted; nonterm inals are unquoted symbols. • { . . . } m eans “optional clause.” • <nonterm> ~ ’ , m eans “one or m ore occurrences of <nonterra> separated by com m a.” t # x nam es th e nonterm inal node w ith x. • { . . . } m eans optional clause. • + m eans one or m ore instances of. • | m eans “or.” • lexem e refers to the leaf of the parse tree. 154 • < | provides th e nam e of a lexical analyzer function to in terp ret leaves (lex­ emes) of the parse tree. • || is a P o p art directive (of no concern to th e reader). B .l S y n ta x for S chem a S p ecification < sc h e m a -sp e cific a tio n > := <name> <owner> < r e l-o r -ty p e -d e c la r a tio n > + ; < r e l-o r -ty p e -d e c la r a tio n > ’ ( < sta tu s> < im p l-ty p e -d e c la ra tio n > ’ ) ; < sta tu s> := ’ in t e r n a l I ’ shared ; < im p l-ty p e -d e c la ra tio n > < s to r e d -ty p e -d e c l> I < s to r e d -r e la tio n - d e c l> I < d e fin e d -ty p e -d e c l> I < d e f in e d -r e la tio n -d e c l> ; < sto r e d -ty p e -d e c l> := s to r e d -ty p e name#type { name#supertype , } ; < s to r e d -r e la tio n - d e c l> := s t o r e d - r e la t io n name#rel najne#type + ; < d e fin e d -ty p e -d e c l> := d e fin e d -ty p e name r e l- w f f - d e f n ; < d e f in e d -r e la tio n -d e c l> := d e f in e d - r e la t io n name r e l- w f f - d e f n ; 155 <w ff-defn> := (vars ’s . t . wff) ; The grammar for w ff is the same one specified in correspondence language gram­ mar in Appendix B.3. B .2 S y n ta x for C losure S p ecification < clo su re s p e c if ic a t io n > := ( < ob ject c lo su r e spec> I < r e la tio n c lo su r e spec> ) + ; < ob ject c lo su r e spec> := <type narae> ’ ( < r e la tio n p attern > “ , ; < r e la tio n p attern > := < r e la tio n name> ’ ( <annotation> + ’ ) ; < r e la tio n c lo su r e spec> := < r e la tio n name> ’ ( <annotation> + ’) ; <annotation> := »® I ’$ I ’ ! I ’* ; B .3 C orresp on d en ce L anguage S y n ta x corresp on d en ce-sp ec := ’TRANSFORM id#source-schem a id #target-sch em a ’ ( lo c a l-c o rr e sp o n d e n c e “ ’ ; ’ ) ; 156 lo c a l-c o rr e sp o n d e n c e := w f f - lh s # t a r g e t ( w ff#sou rce { 'AND in ter m e d ia te-w ff# lo n g } I in ter m e d ia te-w ff# sh o r t ' , ) ; w f f - lh s := id # ta r g e t-ty p e ’ [ { } ( ’* id # c o n str a iiit ’ ) > { ty p ed -v a r# ty p e + > '] I i d # t a r g e t - r e l ' ( v a r # r e l + ') ; ty p ed -v a r := var | ' ( var id # ty p e -c o n s tr a in t ') ; in te r m e d ia te -w ff := id ' [ var + '] ; w ff : = e x i s t - w f f I f o r a l l - w f f I and-w ff I o r-w ff I n o t-w ff I b a se-w ff ; e x i s t - w f f := ' ( 'E x is t s ’ ( var + ’ ) 'I w ff ’ ) ; f o r a l l - w f f := ' ( 'F o r a ll ' ( var + ') ' | w ff ’ ) ; and-w ff := 157 ’ ( , and w ff + ’ ) ; o r-w ff := *( ’ or w ff + ’ ) ; n o t-w ff := ’ ( ’not w ff ’ ) ; b a se -w ff := ’ ( id v a r -o r -c o n st + ’ ) ; v a r -o r -c o n st := var I con st I I ; co n st := s t r in g I in t e g e r I b ool; b o o l :« ’T I ’n i l ; s t r in g := lexem e <| s tr in g p ; in t e g e r := lexem e <1 in te g e r p ; id := lexem e <| alphanumeric ; var := lexeme <| alphanumeric B .4 P referen ce S p ecification S y n ta x p r e f e r e n c e - s p e c if ic a t io n := c a r d in a lit y -p r e fe r e n c e I d is j o in t n e s s - p r e f e r e n c e | com putation i I ; c a r d in a lit y -p r e fe r e n c e := 158 5c a rd -p ref <pref-db> <name> < slo t> <lower-bound> <upper-bound> ; d is j o in t n e s s - p r e f e r e n c e := ’d is j o in t - p r e f <pref-db> <name> *, ; com putation := ’ comp <name> < slo t> <function-nam e> ; <pref-db> := ’ 1 I ’2 ; <lower-bound> := ’ 0 I ’ 1 ; <upper-bound> := J1 I ’ -1 ; <name> := lexem e <| alphanumeric ; < slo t> := lexeme <i> := lexem e <1 alphanumeric 159 A p p en d ix C W orld B ase P r o to ty p e Im p lem en ta tio n This appendix describes internal d a ta structures and operations supported by the prototype. C .l In tern al D a ta S tru ctu res T his section describes types and relations used internally by the W orldBase pro­ totype. T he types and relations are im plem ented as A P5 relations. As such, they m ay be accessed directly using AP5 query language. Each relation is described in the following form at: first, th e relation nam e and a descriptive list of param eters for th a t relation is provided; next, th e use of th e relation is explained. T he actual type constraints on th e relations is different from th e descriptive list of param eters, although, in some cases, they m ay be deduced. W here relevant, the types of th e param eters are given in parenthesis in th e explanation. All types and relations are assum ed to be stored unless specified to be derived or com puted. C .1.1 R eg istr y D a ta S tru ctu res T he relations below hold inform ation of th e virtu al state of th e registry (in the workspace). They relate values to other values, and are w ritten out to specific files in th e w orkstation. 160 • schema-registry (schema-name schema-owner schema-path) relates a schem a, identified by schema-name and schema-owner (b o th of type symbol), to its persistent store sta te which is a file (pathname). • world-registry (world-name world-owner world-type world-path) relates a world identified by world-name and world-owner (b o th of type sym­ bol), to its type (either “closure-based” or “schem a-based”) and persistent store state which is a file (pathname). • world-and-schema-registry (w-name w-owner s-name s-owner) relates a world, identified by its w-name and w-owner, to its schem a, identified by its s-name and s-owner. C .l .2 W orld D a ta b a se S u p p ort S tru ctu res T he following types and relations contain inform ation on schem as and th eir prop­ erties th a t are loaded in a workspace. • schema a b stract type th a t contains ab stract objects corresponding to schemas loaded in the workspace (virtual schemas). • schema-name (schema symbol) relates a schem a object to its nam e (a symbol). • schema-owner (schema symbol) relates a schem a object to its owner (a symbol). • relations-in-schema (schema local-rel-name status type) relates a schem a to relations and types th a t form its structure; status and type refers to the properties of relations and types in th e schema. Types are treated as unary relations here. • schema-rel-name-to-buffer-name (schema local-rel-name buffer-rel-name) relates a relation nam e local to schema (local-rel-name of type symbol) to its w orkspace nam e. 161 • schema-rel-name-and-atts (schema local-rel-name attribute-list) relates a relation nam e local to schema to its list of attrib u tes (list of symbols) w hen the relation is stored. • schema-rel-name-and-defs (schema local-rel-name definition) relates a relation nam e local to schema to its definition (in A P5 wff) when the relation is defined. • schema-type-lattice (schema local-sub-type local-super-type) stores the type and subtype relationships of types local to schema. T he following types and relations contain inform ation on worlds and their properties th a t are loaded in a workspace. • world ab stract type th a t contains ab stract objects corresponding to worlds loaded in the workspace (virtual worlds). • world-name (world symbol) relates a world object to its nam e (a symbol). • world-owner (world symbol) relates a world object to its owner. • world-schema (world schema) relates a world object to its schem a (object of type schema). T his relationship has a m any to one restriction; each world m ay be related to only one schema, b u t m any worlds m ay have the sam e schema. • world-type-equivs (world name-of-buffer-type equiv-families-for-type) relates a type (symbol) in a world (ab stract object of type world) to a fam ily of equivalence specifications (a list of list of relations). • world-type-closure (world name-of-buffer-type closure-for-type) relates a type (symbol) in a world to its closure specification (a list). • world-rel-closure (world name-of-buffer-rel closure-for-rel) relates a relation (symbol) in a world to its closure specification. 162 • world-rel-constraints (world buffer-rel-name constrained-slot lower-bound upper-bound) relates a relation in a world to its cardinality constraints (integers) and the slot being constrained. • world-disjoint-constraints (world abstract-type list-of-subtypes) relates an ab stract type nam e (symbol) in a world to disjointness constraints of its subtypes (list of symbols). A w orld’s population is stored in th e following relations: • seed-obj-in-world (world dbobject) relates a world (of type world) to its seed objects. • seed-rel-in-world (world list) relates a world to its seed relationships. • dependent-obj-in-w orld (world d bobject) relates a world to its dependent objects. • dependent-rel-in-world (world list) relates a world to its dependent relationships. • values-in-world (world entity) relates a world to values in the world. • maxid (world integer) relates a world to its m axim um persistent identifier (PID ). This relationship is used to generate new PIDs. to be assigned to objects. • inworld (world dbobject) derived relation th a t relates a world to objects in the world (by being a seed or a dependent in th e world). • inworld-tup (world tuple) derived relation th a t relates a world to relationships in th e world (by being a seed or a dependent in the world). 163 W orkspace1 d a ta structures: • relation-refcount (buffer-rel-name integer) relates a workspace relation nam e to th e num ber of schem as th a t shares it. • relation-and-buffer-names (ap5:relation buffer-rel-name) relates a relation object to its workspace nam e. This is to keep track of the relatio n ’s identity even after its nam e is changed in th e workspace. • buffer-type-equivs (buffer-type-name buffer-equiv-family) relates a type (symbol) in the workspace to its equivalence specifications. T his value is usually affected when loading the equivalence specification for a world, b u t users m ay change it directly. T he equivalence specification in this relation is used in m erging objects (not th e equivalence specifications related to a world). • buffer-rel-constraints (buffer-rel-name slot lb ub) relates a relation in the workspace to its cardinality constraints specifica­ tions. T his value is usually affected when loading th e cardinality constraint specification for a world, b u t users m ay change it directly. T he constraint specification in this relation is used in the enforcem ent of cardinality con­ strain ts of inform ation in the workspace. • buffer-disjoint-constraints (buffer-super-most-type d-list) relates an ab stract type in the workspace to a list of its disjoint subtypes. T he disjoint specification in this relation is used in th e enforcem ent of disjointness constraints of inform ation in th e workspace. P opulation property: • obid (world entity integer) relates an ab stract object (entity) to its PID (integer) in world. ^ o te : the word “buffer” is synonymous to “workspace” in the prototype. 164 C .2 S u p p orted F u n ction s C .2.1 T h e R eg istry T he following operations are supported by th e registry: • clear-all-registry () removes all tuples in registry relations in th e workspace; i.e. clears th e virtual registry. • checkpoint-registry () saves th e state of th e v irtual registry into persistent store; i.e. w rites them to files. • update-registry () restores the registry from persistent store into workspace registry relations; i.e. reads th e registry files and restores them to th e virtual store. This is additive update. No registry inform ation in the virtu al m em ory is removed, unless the registry is cleared w ith clear-all-registry. • broadcast (name .positive message) inform s others of newly created or newly rem oved com ponents (by a m es­ sage). If :positive is true, the m essage indicates a world d atab ase or world schem a being asserted. If :positive is nil, the m essage indicates a retraction. This function is called each tim e a world or schem a is created or killed. • register-worldschema (name owner body &optional (force nil)) registers a worldschem a, com m its th e body to persistent store, and broad­ casts this inform ation to u p d ate other registries in the netw ork. If there is another world schem a w ith the sam e nam e and owner, th is operation will abort, unless force is true, in which case, th e existing world schem a is re­ placed. • unregister-worldschema (name owner) removes w orldschem a from the registry only if no worlds are dependent on it. Also broadcasts th e retraction to other registries. 165 • register-world (world-name world-owner schema-name schema-owner ^optional (closure-spec nil) (force nil)) registers a world. If closure-spec is provided, th e world is closure-based; otherw ise it is schem a-based. If there is another world w ith the sam e nam e and owner, this operation will abort, unless force is true, in which case, th e existing world is replaced. If the operation does n o t ab o rt, inform ation about th e newly registered world is broadcast to u p d ate o th er registries in th e network. • unregister-world (name owner) removes th e world inform ation from the registry. Also broadcasts th e retrac­ tion to other registries. • list-registered-schemas () lists nam e and owner of world schem as in th e virtu al registry. • list-registered-worlds () lists nam e and owner of worlds in the v irtual registry. • get-schema-path-in-registry (name owner) finds the location (pathnam e) of a schem a from th e registry given its name and owner. • get-world-path-in-registry (name owner) finds the location (pathnam e) of a world from th e registry given its name and owner. • get-world-type-in-registry (name owner) finds the type of a world from the registry given its name and owner. • get-schema-name-given-world-in-registry (name owner) finds th e schem a nam e of a world from th e registry. • get-schema-owner-given-world-in-registry (name owner) finds th e schem a owner of a world from th e registry. • get-schema-pair-given-world-in-registry (name owner) retu rn s a list of a list containing a schem a nam e and owner of a world. 166 • registered-schema (name owner) tru e if a world schem a w ith name and owner is registered, nil otherw ise. • registered-world (name owner) tru e if a world w ith name and owner is registered, nil otherw ise. Some of the operations listed above (those th a t queries properties in the registry) are not necessary since the user m ay use AP5 query language to query the stru ctu re of the registry directly. They m ay be a convenient shorthand. C .2.2 W orld B ase W ork sp ace S u p p ort W orld Schem a P rop erties A ccessors T he following consists of world schem a property accessors. Some of these accessors m ay not be necessary since the user m ay use A P5 to query the stru ctu res directly; however, these are provided for convenience. • make-schema (name owner) creates a schem a in the workspace; fails if schem a w ith name and owner already exists in the workspace. • unmake-schema (schema) removes schema (object) from th e workspace. • find-schema-name (schema) retu rn s th e nam e of schema. • find-schema-owner (schema) retu rn s th e owner of schema. • find-schema (name owner) retu rn s th e schem a nam ed name w ith owner owner. • find-buffer-name-of-rel-name (schema local-rel-name) retu rn s th e workspace nam e corresponding to local-rel-name of a relation in schema. 167 • find-rel-name-of-buffer-name (buffer-name &optional (schema nil)) retu rn s local-rel-name corresponding to a buffer-name of a relation in the workspace. This operation m ay result in different values being retu rn ed if th e schem a the relation is in is not provided. If schem a is provided, this will re tu rn a unique symbol corresponding to the local nam e of the relation in th e schema. • find-schema-and-rel-name-of-buffer-name (buffer-name) retu rn s a list of pairs (schem a and local-rel-nam e) corresponding to a buffer- name of a relation. • find-schema-of-buffer-name (buffer-name) returns a schem a th a t has a local relation corresponding to buffer-name. • find-status-and-type-of-rel-name-in-schema (schema local-rel-name) retu rn s a list of statu s and type of relation nam ed local-rel-name in schema. • find-status-of-rel-name-in-schema (schema local-rel-name) retu rn s the statu s of local-rel-name relation in schema. • find-type-of-rel-name-in-schema (schema local-rel-name) retu rn s the type of local-rel-name in schema. • find-base-and-sub-types-in-schema (schema) retu rn s a list of base (ab stract) types and subtypes in schema. • find-base-types-in-schema (schema) returns a list of base (ab stract) types in schema. • find-sub-types-in-schema (schema) retu rn s a list of subtypes in schema. • find-base-relation-in-schema (schema) retu rn s a list of base (stored) relations in schema. • find-all-relations-in-schema (schema) retu rn s a list of relations and types in schema. 168 • find-rel-def-in-schema (schema local-rel-name) retu rn s th e definition (derivation rule) of derived relation nam ed local-rel- name in schema. • find-rel-att-in-schema (schema local-rel-name) retu rn s the a ttrib u te specification of local-rel-name in schema. • find-rel-supertype-in-schema (schema local-rel-name) finds th e supertype of local-rel-name in schema. • find-super-most-type (schema subtype) finds th e highest supertype (an ab stract type) of subtype in schema. • find-ail-super-types (schema subtype) finds all supertypes of subtype in schema. • find-all-sub-types (schema supertype) finds all subtypes of supertype in schema. • type-of-ob (schema object) retu rn s a list of types (in schema) th a t object is classified under. • buffer-type-names-of-ob (schema object) retu rn s a list of workspace type nam es th a t object is classified under in schema. • local-types-of-ob (schema object) retu rn s a list of local-type nam es th a t object is classified under in schema. W orld P ro p erties A ccessors T he following consists of world property accessors: • make-world (name owner schema) creates a world in the workspace; fails if world w ith name and owner already exists in the workspace. • unmake-world (world) removes world (object) from th e workspace. 169 • find-world-name (world) retu rn s nam e of world. • find-world-owner (world) retu rn s owner of world. • find-world (name owner) retu rn s a world nam ed name w ith owner owner. • schema-of (world) retu rn s a schem a (object) of world. • find-closure-of-rel-m-world (world bufFer-rel-name) retu rn s the closure specification of relation (buffer-rel-name) in world. • find-closure-of-type-in-world (world buffer-type-name) 1 retu rn s th e closure specification of type (buffer-type-name) in world. • assert-world-type-closure (world buffer-type-name new-closure) replaces any existing closure specification of type buffer-type-name in world w ith new-closure. • assert-world-rel-closure (world buffer-rel-name new-closure) replaces any existing closure specification of relation buffer-rel-name in world w ith new-closure. • clear-world-closure (world) clears closure properties of world in the workspace. • add-to-world-type-equivs (world buffer-type-name equivs) adds equivalence specification (equivs) to the fam ily of equivalence specifica­ tions for buffer-type-name in world. • remove-world-type-equivs (world buffer-type-name equivs) removes equivalence specification (equivs) from th e fam ily of equivalence specifications for buffer-type-name in world. • clear-world-equivs (world) clears equivalence specification of (types in) world in th e workspace. 170 • clear-world-constraints (world) clears cardinality constraint specification of (relations in) world in th e workspace. • assert-world-rel-constraints (world relname slot lower-bound upper-bound) asserts relation cardinality constraint specification (replacing existing ones) of relation in world. W orld Schem a F unctions T he following functions are used to create and m anipulate world schemas. • compute-buffer-name-of-rel (schema-name schema-user rel-name name-pairs status) com putes w orkspace nam e of a local relation (locally nam ed rel-name) in a given schema. If rel-name has an associated nam e in nam e-pairs, th e asso­ ciated nam e is used as the workspace nam e. O therw ise, if relation statu s is internal, the associated nam e is a concatenation of the schem a-nam e and schem a-user; if shared, rel-nam e is used as th e workspace nam e. • adjust-to-buffer-names (schema expression) retu rn s an expression w ith all local relation and type nam es of schema in expression adjusted to workspace relation and type nam es. • adjust-to-local-names (schema expression) retu rn s an expression w ith all workspace nam es adjusted to local nam es in schema. • check-relation-equivalence (buffer-rel-name schema-1 att-or-def-1 status-1 type-1) checks if a relation to be restored is equivalent to one already existing in the workspace (loaded from another schem a). Two relations are equivalent if: (1) they are of th e sam e type (different statu s generates a w arning). (2) if defined-relation, th e definition m ust be equivalent (in th e prototype, equivalence is decided by equality of definition string). 171 (3) if stored-relation, all a ttrib u te types m ust be equivalent (i.e. identical after renam ing). (4) if subtype, m ust have the sam e super-m ost-type, and the resulting type lattice contain no cycles. • load-worldschema (name owner) creates a new schem a object nam ed name w ith owner owner in th e workspace. N ote th a t this operation does not restore the schem a stru ctu re into the workspace. • unload-worldschema (schema ^optional (force nil)) removes schema from th e workspace. This operation removes the unshared schem a stru ctu re from the workspace, and decrem ents the reference counts to shared relation. • restore-worldschema (schema ^optional (name-pairs nil)) restores the schem a defined in schema into th e workspace, renam ing the structures w ith its workspace nam e using the given name-pairs table. A ctu­ ally creates or looks-up types and relations corresponding to th e workspace- nam es. • remove-worldschema (schema) removes th e schem a defined in schema; decrem ents reference counts to shared relations. T his operation is called by unload-worldschema. • remove-a-relation-from-schema (schema local-rel-name) removes a relation from schem a in the workspace; m ay cause inconsistencies since other relations m ay depend on it b u t are not checked before removal (subtypes are also removed w hen a supertype is rem oved). • add-relation-to-schema (schema declaration ^optional (name-pairs nil)) adds a new relation to schema in th e workspace; name-pairs allows th e user to save a relation w ith a different nam e from its w orkspace nam e. • clear-worldschema-domain (schema ^optional (force nil) (shared-relations nil)) tries to rem ove all instances of schema. T he operation will ab o rt if there 172 are worlds loaded in th e workspace, unless force is true. T he operation will not rem ove tuples from shared relations (whose refcounts > 1 ) unless shared-relations is true. W orld M an ip u lation F unctions T he following functions are used to create and m anipulate worlds. Note again th a t in th e prototype, th e w ord “buffer” is synonym ous to “workspace” . • load-world (name owner) creates a world nam ed name w ith owner owner in the workspace. • compare-equivs-to-buffer-equivs (world buffer-type-name equiv-family) changes workspace equivalence specifications of buffer-type-name to include world’s equivalence specifications (m ost n atu ral m erge). • compare-card-constraint-to-buffer-constraint (world buffer-rel-name slot lower- bound upper-bound) changes workspace cardinality constraint specifications of buffer-rel-name to include world’s cardinality constraint (m ost n atu ral merge). • compare-disjoint-constraint-to-buffer-constraint (world buffer-super-most-type disjoint-types) adds disjoint constraints to the workspace (not m ost n a tu ra l m erge). • add-to-world-type-equivs (world buffer-type-name equiv-family) adds equiv-family to existing equivalence specifications of buffer-type-name in world. • restore-world-closure (world) restores world’s closure specification from persistent store. • restore-world-equivs (world) restores world’s equivalence specifications from persistent store; affects workspace equivalence specifications. • restore-world-constraints (world) restores world’s constraint declarations from persistent store; affects work­ space constraint specifications. 173 • restore-world (world ^optional (preference-spec nil)) does restore-world-closure; restore-w orld-equivs; restore-w orld-constraints; and restore-instance w ith preference-spec. • unload-world (world ^optional (force nil)) removes world (object) from the workspace if th ere are no objects or rela­ tionships (population) in the world. W hen force is tru e, th e world is rem oved regardless of objects or relationships in world. • unload-all-worlds-under-schema (schema ^optional (force nil)) unloads all worlds whose schem a is schema in th e workspace. • clear-domain-of-other-worlds (world) removes all other worlds in th e workspace of th e sam e schem a as world; removes all tuples of th e schem a of world (rem oves all instances of th e w orld’s schem a). • incorporate-buffer-equivs-to-world (world) world’s equivalence specification is reset to the ap propriate equivalence spec­ ifications from the workspace. • incorporate-buffer-constraints-to-world (world) world’s constraint specifications is reset to th e ap p ro p riate constraints from th e workspace. • save-world-closure (world) com m its world’s closure specification to persistent store. • save-world-equivs (world) com m its world’s equivalence specifications to persistent store; world’s equiv­ alence specifications are derived from the workspace. • save-world-constraints (world) com m its world’s constraint specifications to persistent store; world’s con­ strain ts are derived from the workspace. • unload-schema-and-worlds (schema) unloads worlds under schema; unloads schema. 174 • kill-worldschema (schema) unloads schema; unregisters schema; if there are no worlds dependent on it in th e registry. • kill-world (world) clears world’s population; unloads world; unregisters world. • kill-loaded-worlds-of-schema (schema) kills all loaded worlds (in th e w orkspace) whose schem a is schema. • kill-worlds-of-schema (schema) kills all worlds whose schem a is schema (loaded and unloaded in th e registry); • kill-schema-and-worlds (schema) kill-worlds-of-schema schema ; kill schema. W orld Save F unctions • clear-obid-for-world (world) clears obid relationships for world. • assignid (world object) assigns a pid (persistent identifier) to object in world. T he pid is unique w ith respect to world. • save-instance (world) saves an im age of world’s population into persistent store using th e traversal algorithm ; assigns pids to objects in world. • save-world (world) does save-world-dosure (world); save-world-equivs (world); save-world-con- straints (world); and save-instance (world). W orld B ase M erging S u pp ort T he following functions are used to in m erging worlds: • find-type-of-buffer-relation (schema buffer-rel-name) finds type of buffer-rel-name in schema. 175 • find-defs-of-buffer-relation (schema buffer-rel-name) finds definition (derivation form ula) of derived relation or type whose w orkspace nam e is buffer-rel-name in schema. • find-atts-of-buffer-relation (schema buffer-rel-name) finds attrib u tes of buffer-rel-name in schema. • find-supertype-of-buffer-relation (schema buffer-rel-name) finds supertype of buffer-rel-name in schema. • adjust-to-local-named-pairs (expression pairs) retu rn s an expression w ith all nam es th a t appears in pairs adjusted to the nam es associated w ith it in pairs (renam es). • assert-relations-and-schemas (new-schema types-and-relations rel-info-list) asserts types-and-relations to be of new-schema; rel-info-list is a list of triples th a t provides correspondence of types and relations’ w orkspace-nam es to its local-nam es and statuses. • copy-buffer-into-schema (schema-name schema-owner ^optional (relation-info- list nil) (force nil)) copies all relations in the workspace into a new schem a w ith the given schema-name and schema-owner; registers th e new schem a, and saves it to persistent store. • save-worldschema (schema) saves schem a’s specification in the workspace into persistent store. • copy-buffer-into-world (world-name world-owner schema-name schema-owner &key (use-closure nil) (use-equivs nil) (use constraints nil) (force nil)) creates a new world nam ed world-name, ow ner world-owner of th e specified schem a (corresponding to schema-name and schema-owner); uses the closure declaration in th e workspace as th e closure for th e new world if use-closure is true; uses equivalences from the workspace as th e equivalence for th e new world if use-equivs is true; uses constraints from th e w orkspace as th e con­ strain ts for th e new world if use-constraints is true. 176 T he following functions are used to m erge instances: • restore-instance (world &key (preference-specification nil) (rule-enforcement nil)) restores th e im age of the population of world; objects whose types have equiv­ alence specifications specified are identified by th e equivalence of th e specified relationships; otherw ise, new objects are created to correspond to th e per­ sistent identifiers, preference-specification is tran slated into a program th a t retracts some of th e facts restored to conform to certain constraints. T he program is called w hen the restore com pletes its tw o-phase restore. If rule- enforcement is true, the rules in the workspace are tu rn ed on after th e restore to m aintain th e constraints in th e workspace. T his operation is invoked by restore. • translate-preferences (world list-of-preferences) translates preferences into a program th a t does th e desired actions; cardi­ nality constraint preferences are tem plates th a t m ay be used w ith certain relations; disjointness constraints preferences behave in a sim ilar way; com­ pu tatio n s be specified w ith a function th a t accepts 5 argum ents: object, relation being com puted, list-of-related-objects, d b l and db2. • assert-constraint-of-rel (relname slot lower-bound upper-bound ^optional (remove-other-constraints-of-rel nil) asserts a cardinality constraint in the w orkspace (generates A P5 rules). • assert-disjoint-types (!ist-of-disjoint-types) asserts a disjointness constraint am ong all the disjoint types (generates AP5 rules). • enforce-buffer-constraints () tran slates workspace constraint specifications into A P5 rules and tu rn s them on. M erging operation could be provided to m erge two worlds into a new workspace by: • Merge (world-1 world-2 world-3 ^optional preferences) 177 T his has th e effect of first loading world-1 and its schem a into a new workspace, and m erging world-2 and its schem a using th e specified preferences and com putations specified, and saving th e resulting workspace database into world-3. W orld B ase S election S u pp ort • add-seed-object (world object) adds a seed object to world. • add-seed-rel (world tuple) adds a seed tuple to world. • add-seeds (world list) adds a set of seeds (objects and tuples) to world. • remove-seed-object (world object) removes seed object from world. • remove-seed-rel (world tuple) removes seed tuple from world. • remove-dependent-object (world object) rem oves dependent object from world. • remove-dependent-rel (world tuple) rem oves dependent tuple from world. • remove-value-object (world entity) rem oves value (entity) from world. • remove-seeds (world list-of-seeds) rem oves a set of seeds (objects and tuples in list-of-seeds) from world. • remove-dependents (world list-of-dependents) removes a set of dependents (objects and tuples in list-of-dependents) from world. • remove-all-seeds (world) removes all seeds from world. 178 • remove-all-dependents (world) removes all dependents from world. • clear-world (world) rem oves all seeds and dependents from world. • traverse (World &key object-prefix object-postfix tuple-postfix seed-objects seed- tuples) traverses objects and relationships based on th e closure specifications (p a t­ terns) for world startin g from seed-objects and seed-tuples. Calls object-prefix on each object initially encountered and calls object-postfix w hen returning from visiting the object. Calls tuple-postfix after tuple is visited. • populate-with-seeds (world seeds) calls traverse w ith appropriate prefix and postfix functions to p u t the closure of seed objects and tuples in world. • update-world (world) if world is closure-based, then world is repopulated w ith seeds in th e world. If it is. schem a-based, the world is assigned a closure and given all th e tuples in th e w orld’s schem a as its seeds and its population u pdated. W orld B ase T ransform ation Supp ort • bulk-transform (source-schema target-schema list-of-correspondences) w here source schem a and targ et schem a is a list of schem a nam e and owner. T his operation transform s instances of source-schema into instances of target- schema based on the correspondences. • closure-transform (source-schema target-schema list-of-correspondences seed- correspondences target-id) w here source-schema and target-schema are lists of schem a nam e and owner. T his operation assum es th a t the schem a defined by targ et schem a is disjoint2 from th e source schem a. It assum es th a t th e targ et schem a is em pty (no 2 A least requirement is that there will be no interference with the transformation process. 179 instances present). It transform s instances of th e source schem a into in­ stances of the targ et schem a based on seed-correspondences. It traverses and transform s th e necessary objects and relationships related to th e closure of resulting seeds into the target. W orld B ase W orkspace M anagem ent T he following consists of functions to access or m anipulate properties of th e workspace: • add-to-buffer-type-equivs (buffer-type-name equiv-family) adds equivalence specification equiv-family to the fam ily of equivalence spec­ ifications for buffer-type-name. • replace-buffer-type-equivs (buffer-type-name equiv-family) replaces existing equivalence specification for buffer-type-name by equiv- family, a fam ily of equivalence specifications. • equived-typep (buffer-type-name) retu rn s tru e if buffer-type-name has an equivalence specification specified in the workspace. • find-equived-types-from-list (buffer-type-list) retu rn s a list of types from buffer-type-list which has equivalence specifica­ tions specified. • clear-buffer-equivs () clears the equivalence specifications of th e workspace. • clear-buffer-constraints () clears th e w orkspace constraint specifications. • clear-buffer-domain () removes all tuples in relations in the workspace. • new-buffer () removes all worlds, schem as and their corresponding relations and tuples in the workspace. 180 • exist-relationp (buffer-rel-name) retu rn s tru e if relation buffer-rel-name has a positive reference count. • add-relation-count (buffer-rel-name) increm ents reference count of buffer-rel-name by 1. • decrement-relation-count (buffer-rel-name) decrem ents reference count of buffer-rel-name by 1; if reference count = 0, then relation buffer-rel-name is deleted from th e workspace. • list-buffer-worlds () retu rn s all worlds in th e workspace. • list-buffer-schemas () retu rn s all schem as in th e workspace. C .2.3 M iscella n eo u s L ibrary F u n ction s T he following are functions th a t m anipulates sets (to identify equivalence, differ­ ence and union of equivalence specifications and fam ily of equivalence specifica­ tions). • concatenate-symbols (&rest symbols) produces a hyphenated symbol; used to generate w orkspace nam es. • set-equivalent (se tl set2) determ ines equivalence of nested (1 level) sets. • intersection-equivalence (se tl set2) finds th e intersection of two nested (1 level) sets. • difference-equivalence (se tl set2) finds th e difference of two nested (1 level) sets. • union-equivalence (se tl set2) finds th e union of two nested (1 level) sets. 181 T he following functions creates paths for worlds and their properties. • create-component-path (type name owner) creates a path n am e according to type, nam e and owner. • create-registry-path (name) creates a p a th for registry relation name. • create-schema-path (name owner) creates a p a th for world schem a nam ed name w ith owner owner. • create-world-path (name owner) creates a p a th for world nam ed name w ith owner owner. • create-closure-path (name owner) creates a p a th for closure specification of world nam ed name w ith owner owner. • create-equivs-path (name owner) creates a p a th for equivalence specifications of world nam ed name w ith owner owner. • create-constraints-path (name owner) creates a p a th for constraint specification of world nam ed name w ith owner owner. • create-transform-path (name) creates a p a th for transform ation specification of nam e name. • write-schema (schema-path schema-body) com m its schema-body to persistent store (file at schema-path). • write-world-equivs (world equiv-decls) com m its equivalence declarations (equiv-decls) o f world to persistent store. • write-world-constraints (world constraint-specs) com m its constraint-specs of world to persistent store. • write-world-closure (world closure-specs) com m its closure-specs of world to persistent-store. 182 A generally useful function removes all tuples from a given relation in th e workspace. T his function is called by clear-worldschema-domain, and other func­ tions. • remove-tuples-in-rel (relation) removes tuples in relation (relation refers to an actual relation object, not its nam e). 183 A p p en d ix D T h e P r o to ty p e in O p eration T his appendix provides an actual W orldBase prototype execution of the exam ple presented in C h ap ter 1. T he steps taken in this exam ple are illu strated in Figure D .l. Each box in the figure represents a world. T he first line in each box is th e world nam e, the second is th e world schem a, and th e th ird is th e owner. We assum e th a t entertainment, recommendation and restaurant-guide are schem as pictorially depicted in Figures 1.2, 1.3, 1.4, respectively. Each bold line connecting different boxes indicates a derivation of some sort; th e derivation operation is described in the figure. For instance, th e line connect­ ing two boxes on th e upper left-hand corner of the figure depicts th e user ex tract­ ing Japanese restau ran ts from entertainment-world into a joe-entertainment-world. W orlds th a t have no incom ing lines (e.g. entertainment-world, recommendation- world and restaurant-info) are created and populated by user assertions. In this exam ple, the user is interested in Japanese restau ran ts from Jo e’s entertainment-world database whose schem a (entertainment) is shown in Figure 1.2; highly recom m ended restau ran ts of P a u l’s recommendation-world database whose schem a (recommendation) is shown in Figure 1.3; and a public re stau ra n t guide d atab ase called restaurant-info whose schem a (restaurant-guide) is shown in Figure 1.4. To view th e fragm ents from Jo e ’s and P a u l’s databases, th e user first extracts Japanese restau ran ts from Jo e’s entertainment-world d atab ase and saves it into joe-entertainment-world. Hd also extracts highly recom m ended restau ran ts from P a u l’s recommendation-world d atabase into paul-recommendation-world. Because 184 closure transform (seeds: transformed japanese restaurants) extract japanese restaurants extract highly recommended restaurants > t bulk transform merge worlds with identical schemas; save into my-rec workspace database (user may create new world schema and world o f the schema and data in the workspace) merge worlds with joe-rec-2 recommendation user joe-rec-1 recommendation user restaurant-info restaurant-guide public my-rec recommendation user entertainment-world entertainment joe recommendation-world recommendation paul joe-entertainment-world entertainment user paul-recommendation-world recommendation user 2 L - l L compatible schemas in workspace Figure D .l: An Extended Exam ple using W orldBase 185 they are of incom patible schem as, one of them m ust be transform ed before they can be viewed together in th e sam e world. In this exam ple, th e user transform s joe-entertainment-world to be an instance of recommendation schem a. T he result is tem porarily saved into joe-rec-1. W orld- Base also su p p o rts closure and transform at th e sam e tim e. T he user can use closure-transform to selectively transform Japanese restau ran ts into a world of recommendation schem a. He calls this world joe-rec-2. U ltim ately, th e user w ants to view joe-rec-1 and paul-recommendation-world si­ m ultaneously. He does this by loading them into th e sam e w orkspace, effectively m erging them . T he user m ay save the result in a different w orld, call it my-rec. T his world contains highly recom m ended restau ran ts from recommendation-world, and transform ed Japanese restau ran ts from entertainment-world. T he user m ay load this world together w ith restaurant-info since th eir schem as m ay be merge- able. Note th a t although there are some stru ctu ral incom patibilities betw een rec­ ommendation and restaurant-guide, as long as they are distinctly nam ed (such as r - a d d r e s s , vs. lo c a t i o n ) , the two schem as are still m ergeable. Sections D .l, D.2, and D.3 shows actual tran scrip ts of user interactions w ith W orldBase. In p articu lar, Section D .l shows how worlds and th eir schem as are created, populated and saved. Section D .l is divided into three subsections. T he first shows how entertainment-world is created and populated, and Japanese restau ­ ran ts extracted into joe-entertainment-world. T he second shows how restaurant-info is created and populated. T he th ird shows how recommendation-world is cre­ ated and populated, and highly recom m ended restau ran ts extracted into paul- recommendation-world. Section D.2 shows tran scrip ts of th e two kinds of transform ations supported in W orldBase. T he first is a bulk-transform . It transform s all instances in entertainment-world into instances of recommendation which th e user then saves into joe-rec-1. T he second is a closure-transform . It transform s only those related to transform ed Japanese restau ran ts (seeds in joe-rec-2), and populates joe-rec-2 at th e sam e tim e. Section D.3 shows tran scrip ts of two cases of m erging: m erging w ith iden­ tical schem as (m erge joe-rec-1 and paul-recommendation-world of recommendation 186 schem a), and m erging w ith com patible schem as (m erge my-rec and restaurant- info). M erging is shown w ith the preference m echanism th a t sup p o rts preference and com putation specifications to resolve conflicts. We assum e th e sam e workspace is used throughout, and operations to cleax th e w orkspace are shown as p a rt of th e tran scrip t w hen relevant. T he operations ■ have been decoupled to a m ore prim itive level, so interactions of different p arts m ay be seen. For instance, the operation to load a world is perform ed in several steps: lo a d -w o rld th a t creates a virtual world object, r e s to r e - w o r ld - e q u iv s to restore th e world equivalence specification, r e s t o r e - w o r l d - c o n s t r a i n t s to restore the world constraint specification, and r e s t o r e - i n s t a n c e to restore th e w orld population. Likewise, th e operation to load a world schem a is separated to: lo ad -w o rld sch em a th a t creates a virtu al schem a object, and re sto re -w o rld sc h e m a th a t recreates (and m erges) th e schem a stru ctu re in the workspace. T he tran scrip ts have been edited in several places to- reduce th e volum e of its o u tp u t and m ake it m ore understandable, b u t they still present accurate potrayals of th e capabilities of the prototype. W orldBase supports a verbose m ode and non-verbose m ode of execution. The verbose m ode provides inform ative messages to th e user to indicate th e steps taken in th e operations. In Sections D .l, D.2 and D.3, we assum e a non-verbose m ode of execution. Section D.4 presents m ore verbose executions of some of th e m ore complex operations. Section D.5 provides persistent form s of th e world schemas and databases in the form of text files. T he tran scrip ts use the following form at. Briefly, the user in p u ts an s- expression (in the form of A P5, Lisp or W orldBase functions) at th e < cl> prom pt. T he system evaluates the expression and retu rn s a value or a set of values in the next line(s) and provides another prom pt. Com m ents in th e actual tran scrip t axe preceded by one or m ore semicolons. In cases w here p arts have been edited or rem oved, ellipses and com m ents are provided to help th e reader. T he values retu rn ed m ay be an object pointer, in which case, it is denoted as #,(DB0 DBOBJECT <type> < lo c> ), w here <loc> m ay change w hen th e object is moved around after garbage collection. A user m ay set a variable to contain the object, b u t he cannot reference an object directly using th e above form at. W hen 187 an object is declassified, its type is set to u n c la s s if ie d , and it may be garbage collected. Schemas are represented as objects in the environment and are printed as SCHEMA: <nameXowner>; and worlds, also represented as objects, are printed as WORLD: <name><owner>. D .l C reate and P o p u la te W orld D a ta b a ses This section provides tran scrip ts of how a user creates, populates, and saves worlds and th eir schem as. D .1 .1 C rea te E n terta in m en t S ch em a an d W orld s In this subsection we show how a user deals w ith en tertain m en t inform ation from scratch. F irst, a schem a (entertainment) is specified and registered. T hen, a world (entertainment-world) is created and registered. To po p u late th e world, the user first loads the schem a and the world into th e workspace; po p u late it w ith instances asserted to the schem a in th e workspace, and save it. N ext, we show how d a ta in th e w orkspace is extracted (specifically, Japanese restau ran ts) and saved in a newly created world (joe-entertainment-world). Create Entertainm ent Schema A w orld schem a is created by registering it w ith th e registry. <cl> (reg ister-w o rld sch em a ’en tertain m en t ’t i n i ’ ((sh a red s to r e d -ty p e en tertain m en t) (shared sto r e d -ty p e r esta u ra n t en tertain m en t) (shared s to r e d -ty p e th e a t r e en terta in m en t) (shared s t o r e d - r e la t io n name en tertain m en t s t r in g ) (shared s t o r e d - r e la t io n r a tin g en tertain m en t s t r in g ) (shared s t o r e d - r e la t io n phone en tertain m en t s t r in g ) ( in t e r n a l s t o r e d - r e la t io n s p e c ia l t y resta u ra n t s t r in g ) ( in t e r n a l s t o r e d - r e la t io n a v g -p r ic e r e sta u r a n t s t r in g ) ( in t e r n a l s t o r e d - r e la t io n r -ty p e resta u ra n t s t r in g ) ( in t e r n a l s t o r e d - r e la t io n current-show th e a t r e s t r in g ) 188 (in t e r n a l s t o r e d - r e la t io n t - t y p e th e a tr e s t r i n g ) ) t ) T <cl> ( li s t - r e g is t e r e d - s c h e m a s ) ((ENTERTAINMENT TIN I)) ;; assuming no other schemas are registered Create Entertainm ent-W orld Entertainment-world is created by registering it w ithout a closure specification; t is a directive' to r e g is te r - w o r ld to replace any existing w orld w ith th e sam e nam e and owner. <cl> (r e g is t e r -w o r ld 'en terta in m en t-w o rld ' t i n i 'en terta in m en t ' t i n i n i l t ) T Load Entertainm ent T he world schem a entertainment is loaded into th e w orkspace in two steps: lo ad -w o rld sch em a creates a schem a object, and re sto re -w o rld sc h e m a restores th e schem a’s stru ctu re into th e workspace. N ote th a t th e schem a is restored w ith some of its types and relationships renam ed. T he world instance entertainment- world is also loaded. <cl> ( s e t f ent (load-w orldschem a 'en terta in m en t ’t i n i ) ) SCHEMA:ENTERTAINMENTTINI <cl> (restore-w orld sch em a ent '( ( r e s t a u r a n t jo e -r e s ta u r a n t) ( s p e c ia lt y s p e c ia lt y ) (cu rren t-sh ow sh o w in g ))) SCHEMA:ENTERTAINMENTTINI <cl> ( s e t f en t-w o rld (load -w orld 'en terta in m en t-w o rld ' t i n i ) ) W ORLD:ENTERTAINMENT-WORLDTINI 189 P opulate Entertainm ent D om ain Once th e schem a is restored, th e user m ay assert new d a ta into it. Because of th e renam ing of schem a structures, th e user m ay refer to the workspace nam es of types and relations, or he m ay use the nam es in th e schem a specification w ith th e function a d j u s t - t o - b u f f er-n am es to tra n sla te th e nam es. T he user th en creates a few objects and relationships as instances of entertain­ ment. <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’ en tertain m en t ’t i n i ) ’ ( l e t ( ( r (m ak e-d b ob ject))) (atom ic (++ en tertain m en t r) (++ resta u ra n t r) (++ name r "m a-ri-na") '(++ phone r "578-5050") (++ r a t in g r "****") (++ s p e c ia l t y r " in a ri" ) (++ a v g -p r ic e r "$$$") (++ r -ty p e r "jap anese")) r ) ) ) # , (DBO DBOBJECT JOE-RESTAURANT 25570257) <cl> .... ;; c r e a te an American resta u ra n t # , (DBO DBOBJECT JOE-RESTAURANT 25361937) <cl> .... ;; c r e a te a th e a tr e # , (DBO DBOBJECT THEATRE 25694129) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’ en tertain m en t ’t i n i ) ’ ( l e t ( ( r (m ak e-d b ob ject))) (atom ic 190 (++ en tertain m en t r) (++ resta u ra n t r) (++ name r "kifune") (++ phone r "822-1595") (++ ratin g r "***") C++ sp ec ia lty r "teriyaki") C++ a v g -p r ic e r "$$") (++ r -ty p e r "jap anese")) r ) ) ) # , (DBO DBOBJECT JOE-RESTAURANT 25867537) <cl> .... ;; c r e a te a Mexican resta u ra n t # , (DBO DBOBJECT JOE-RESTAURANT 25993225) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (find-schema ’entertainment ’t in i) ’ ( l e t ( ( r (m ak e-d b ob ject))) (atom ic (++ entertainment r) (++ resta u ra n t r) (++ name r "minato") (++ phone r "305-1104") (++ rating r "*****") (++ sp ec ia lty r "sukiyaki") (++ sp ec ia lty r "sashimi") (++ a v g -p r ic e r "$$") (++ r-type r "japanese"))))) # , (DBO DBOBJECT JOE-RESTAURANT 26120817) <cl> .... ;; c r e a te a Thai resta u ra n t # , (DBO DBOBJECT JOE-RESTAURANT 26250833) Since entertainm ent-w orld is schem a-based, saving the w orld saves th e instance of th e schem a in th e workspace. No seeds need to be specified. At this point, 191 there are 7 e n te rta in m e n t objects, three of which are Japanese restau ran ts. T he p ersistent version of entertainm ent-w orld is provided in section D.5. <cl> (s a v e -in s ta n c e en t-w o rld ) NIL Create Joe-Entertainm ent-W orld T he user is interested only in Japanese restau ran ts of entertainment-world. He creates a new closure-based world (joe-entertainment-world), loads it, populates it by asserting japanese restau ran ts as its seeds, and saves it. T he operations lo a d -w o rld creates a world object in the workspace and r e s to r e - w o r ld - c lo s u r e restores the w orld’s closure specification to th e workspace. If th e world has a population, r e s t o r e - i n s t a n c e restores its population to th e workspace. <cl> (r e g is t e r - w o r ld ’jo e -e n te r ta in m en t-w o r ld ’t i n i ’ en tertain m en t ’t i n i ’ ((en terta in m en t (name © $) (r a tin g < 3 !) (phone Q !)) (r e sta u ra n t ( s p e c i a l t y © !) (a v g -p r ic e © !) (r -ty p e © ! ) ) ) t ) T <cl> ( s e t f j o e - e n t (load -w orld ’jo e -e n te r ta in m en t-w o r ld ’t i n i ) ) WORLD: JOE-ENTERTAINMENT-WORLDTINI <cl> (r e s to r e -w o r ld -c lo s u r e j o e - e n t ) (NIL NIL) <cl> (a d d -seed s jo e - e n t (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’en tertain m en t ’t i n i ) ’ ( l i s t o f r s .t . (and (r e sta u ra n t r) 192 (r-type r "japanese")))))) (NIL NIL NIL) <cl> (save-instance jo e-en t) NIL Save-instance traverses th e objects and relationships in th e w orkspace and saves th e instances into a file, assigning PID s (integers) to objects. T h e P ID s are only m eaningful w ithin a p articu lar world. T he sam e PID m ay be used by o ther worlds. T he file containing persistent state of joe-entertainment-world is in section D.5. T h e traversal process could be seen in th e verbose version of a s a v e - in s ta n c e operation in section D.4. T h e sta te of th e d atab ase in the workspace can be queried. T here are still 6 restau ran ts and 1 th eatre, three restau ran ts are in joe-entertainment-world. Only tuples related to Japanese restau ran ts, as specified by th e closure specification of joe-entertainment-world, are in th e world. <cl> (eval (adjust-to-buffer-nam es (find-schema ’entertainment ’t in i) ’ ( l i s t o f (x) s.t. (restaurant x)))) ( # , (DBO DBOBJECT JOE-RESTAURANT 22445457) # , (DBO DBOBJECT JOE-RESTAURANT 22445473) # , (DBO DBOBJECT JOE-RESTAURANT 22445489) # , (DBO DBOBJECT JOE-RESTAURANT 22445505) # , (DBO DBOBJECT JOE-RESTAURANT 22445521) # , (DBO DBOBJECT JOE-RESTAURANT 22407321)) <cl> ( l i s t o f x s.t. (inworld joe-en t x)) ( # , (DBO DBOBJECT JOE-RESTAURANT 22445521) # , (DBO DBOBJECT JOE-RESTAURANT 22445505) # , (DBO DBOBJECT JOE-RESTAURANT 22445473)) <cl> ( l i s t o f x s.t. (inworld-tup joe-en t x)) ((JOE-RESTAURANT #,(DB0 DBOBJECT JOE-RESTAURANT 22445473)) 193 (ENTERTAINMENT #,(DBO DBOBJECT JOE-RESTAURANT 22445473)) (ENTERTAINMENT-TINI- R-TYPE # , (DBO DBOBJECT JOE-RESTAURANT 22445473) "japanese") (ENTERTAINMENT-TINI-AVG-PRICE # , (DBO DBOBJECT JOE-RESTAURANT 22445473) "$$") (SPECIALTY # , (DBO DBOBJECT JOE-RESTAURANT 22445473) "sukiyaki") (SPECIALTY # , (DBO DBOBJECT JOE-RESTAURANT 22445473) "sashim i") (PHONE # , (DBO DBOBJECT JOE-RESTAURANT 22445473) "305-1104") (RATING # , (DBO DBOBJECT JOE-RESTAURANT 22445473) "*****") (NAM E # , (DBO DBOBJECT JOE-RESTAURANT 22445473) "minato") (JOE-RESTAURANT #,(DBO DBOBJECT JOE-RESTAURANT 22445505)) (ENTERTAINMENT #,(DBO DBOBJECT JOE-RESTAURANT 22445505)) (ENTERTAINMENT-TINI-R-TYPE # , (DBO DBOBJECT JOE-RESTAURANT 22445505) "japanese") (ENTERTAINMENT-TINI-AVG-PRICE # , (DBO DBOBJECT JOE-RESTAURANT 22445505) "$$") (SPECIALTY # , (DBO DBOBJECT JOE-RESTAURANT 22445505) " te r iy a k i" ) (PHONE # , (DBO DBOBJECT JOE-RESTAURANT 22445505) "822-1595") (RATING # , (DBO DBOBJECT JOE-RESTAURANT 22445505) "***") (NAME # , (DBO DBOBJECT JOE-RESTAURANT 22445505) "kifune") (JOE-RESTAURANT #,(DBO DBOBJECT JOE-RESTAURANT 22445521)) (ENTERTAINMENT #,(DBO DBOBJECT JOE-RESTAURANT 22445521)) (ENTERTAINMENT-TINI-R-TYPE # , (DBO DBOBJECT JOE-RESTAURANT 22445521) "japanese") (ENTERTAINMENT-TINI- AVG-PRICE # , (DBO DBOBJECT JOE-RESTAURANT 22445521) "$$$") (SPECIALTY # , (DBO DBOBJECT JOE-RESTAURANT 22445521) " in a ri" ) (PHONE # , (DBO DBOBJECT JOE-RESTAURANT 22445521) "578-5050") (RATING # , (DBO DBOBJECT JOE-RESTAURANT 22445521) "****") (NAM E # , (DBO DBOBJECT JOE-RESTAURANT 22445521) "m a-ri-na")) 194 D .1 .2 C rea te R esta u ra n t S ch em a and W orld R egister Schem a and World T he following steps are taken in creating and loading of rest-guide and creating and populating a schem a-based w orld r e s t a u r a n t - i n f o. <cl> (r eg ister-w o rld sch em a ’r e s t - g u id e ’t i n i ’ ((sh a r e d sto r e d -ty p e r e sta u r a n t) (sh ared sto r e d -ty p e f a s t - f o o d r esta u r a n t) (shared s to r e d -ty p e araerican r e sta u r a n t) (shared s to r e d -ty p e fo r e ig n r esta u r a n t) (shared sto r e d -ty p e ad d ress) (shared s t o r e d - r e la t io n name resta u ra n t s t r in g ) (shared s t o r e d - r e la t io n lo c a t io n resta u ra n t ad d ress) (shared s t o r e d - r e la t io n phone resta u ra n t s t r in g ) (shared s t o r e d - r e la t io n a v g -p r ic e resta u ra n t in t e g e r ) (shared s t o r e d - r e la t i o n hours resta u ra n t s t r in g ) (shared s t o r e d - r e la t io n c l a s s i f i c a t i o n resta u ra n t s t r in g ) (shared s t o r e d - r e la t io n s p e c i a l t i e s resta u ra n t s t r in g ) (shared s t o r e d - r e la t io n f a c i l i t i e s r e sta u ra n t s t r in g ) (in t e r n a l s t o r e d - r e la t io n s t r e e t address s t r in g ) ( in t e r n a l s t o r e d - r e la t io n c i t y add ress s t r in g ) ) t ) T A schem a or world m ay be registered w ith force, i.e. existing schem a or world is replaced. T he following tran scrip t shows w hat happens w hen th ere is an existing world of the sam e nam e and owner. <cl> (r e g is t e r -w o r ld ’r e s ta u r a n t-in fo ’t i n i ’r e s t - g u id e ’t i n i n i l t ) Warning: r e p la c in g world named RESTAURANT-INFO c r e a to r TINI T 195 Load Schem a and World T he world restaurant-info is provided w ith equivalence specifications for its objects of type r e s t a u r a n t and a d d re ss. <cl> ( s e t f r e s t (load-w orldschem a ’r e s t - g u id e ’t i n i ) ) SCHEMA:REST-GUIDETINI <cl> (restore-w orld sch em a r e s t ’ ( ( c l a s s i f i c a t i o n p a u l-r -ty p e ) (a v g -p r ic e c o s t ) (name r-name) (phone r -p h o n e ))) SCHEMA:REST-GUIDETINI <cl> ( s e t f r e s t - i n f o (lo a d -w o rld ’r e s ta u r a n t-in fo ’t i n i ) ) W ORLD:RESTAURANT-INFOTINI <cl> (a d d -to -w o r ld -ty p e -e q u iv s r e s t - i n f o ’resta u ra n t ’ ((name phone) (name a d d r e s s ))) NIL <cl> (a d d -to -w o r ld -ty p e -e q u iv s r e s t - i n f o ’ address ’ ( ( s t r e e t c i t y ) ) ) NIL <cl> (sa v e -w o rld -e q u iv s r e s t - i n f o :in c o r p o r a te -fr o m -b u ffe r n i l ) Saving e q u iv a len ce spec of world W ORLD:RESTAURANT-INFOTINI as ((ADDRESS ((STREET CITY))) (RESTAURANT ((NAME PHONE) (NAM E ADDRESS)))) NIL P opulate R estaurant-info Since restaurant-info is a schem a based world, no seed need to be specified. <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’r e s t - g u id e ’t i n i ) 196 ’ ( l e t ( ( r (m ake-dbob ject)) (a (m ak e-d b ob ject))) (atom ic (++ address a) . . . ; ; s t r e e t and c i t y o f address (++ r e sta u r a n t r) (++ fo r e ig n r) (++ lo c a t io n r a) (++ name r "minato") (++ phone r "305-1104") (++ a v g -p r ic e r 18) (++ s p e c i a l t i e s r "sushi") (++ hours r "Sun-W 5-9; Th-Sat 5-10") (++ c l a s s i f i c a t i o n r "japanese") (++ f a c i l i t i e s r "banquet")) r ) ) ) # , (DBO DBOBJECT FOREIGN 23640529) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’r e s t - g u id e ’t i n i ) ’ ( l e t ( ( r (m ake-dbob ject)) (a (m ak e-d b ob ject))) (atom ic (++ add ress a) (++ s t r e e t a "4356 L in coln Blvd") (++ c i t y a "Marina Del Rey") (++ r e sta u ra n t r) (++ american r) (++ name r "marie c a lle n d e r s" ) (++ lo c a t io n r a) (++ phone r "822-5956") (++ a v g -p r ic e r 8) (++ c l a s s i f i c a t i o n r "american") (++ hours r "M-Th 4 -1 1 ; F -Sat 3 -1 0 ; Sun 10-3") (++ s p e c i a l t i e s r " p ies" ) (++ f a c i l i t i e s r "bakery")) r ) ) ) # , (DBO DBOBJECT AMERICAN 20250457) 197 <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’ r e s t - g u id e ’t i n i ) ’ ( l e t ( ( r (m ake-dbob ject)) (a (m ak e-d b ob ject))) (atom ic (++ add ress a) (++ s t r e e t a "4371 G lencoe Ave") (++ c i t y a "Marina Del Rey") (++ r e sta u r a n t r) (++ fo r e ig n r) (++ name r "m a-ri-na") (++ lo c a t io n r a) (++ phone r "578-5050") (++ a v g -p r ic e r 24) (++ hours r "M-Sat 5-12") (++ c l a s s i f i c a t i o n r "japanese") (++ s p e c i a l t i e s r "tempura") (++ f a c i l i t i e s r "karaoke")) r ) ) ) # , (DBO DBOBJECT FOREIGN 20551209) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’r e s t - g u id e ’t i n i ) ’ ( l e t ( ( r (m ake-dbob ject)) (a (m ak e-d b ob ject))) (atom ic (++ add ress a) (++ s t r e e t a "405 Washington S t." ) (++ c i t y a "Venice") (++ resta u ra n t r) (++ fo r e ig n r) (++ name r "kifune") (++ lo c a t io n r a) (++ phone r "822-1595") (++ a v g -p r ic e r 17) (++ hours r "M-F 5-11; Sat-Sun 5-10") (++ c l a s s i f i c a t i o n r "japanese") (++ s p e c i a l t i e s r " te r iy a k i" ) (++ f a c i l i t i e s r "banquet")) r ) ) ) # , (DBO DBOBJECT FOREIGN 20730849) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’r e s t - g u id e ’t i n i ) ’ ( l e t ( ( r (m ake-dbob ject)) (a (m ak e-d b ob ject))) (atom ic 198 C++ add ress a) C++ s t r e e t a "8360 M anchester Ave.") C++ c i t y a "Playa Del Rey") (++ resta u ra n t r) (++ name r "acapulco") (++ lo c a t io n r a) (++ phone r "822-4031") (++ a v g -p r ic e r 17) (++ c l a s s i f i c a t i o n r "mexican") C++ hours r "M-F 10-10; Sat-Sun 11-11") C++ s p e c i a l t i e s r "tacos") C++ f a c i l i t i e s r "happy hour 4 -8 " )) r ) ) ) # , (DBO DBOBJECT RESTAURANT 20909057) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’ r e s t - g u id e ’t i n i ) ’ ( l e t ( ( r (m ake-dbob ject)) (a (m ak e-d b ob ject))) (atom ic (++ add ress a) (++ s t r e e t a "4676 Adm iralty Way") (++ c i t y a "Marina Del Rey") (++ r e sta u r a n t r) (++ f o r e ig n r) (++ name r "minato") (++ lo c a t io n r a) (++ phone r "305-1104") (++ a v g -p r ic e r 19) (++ hours r "M-F 5 -9 :3 0 ; Sat-Sun 5-10") (++ c l a s s i f i c a t i o n r "japanese") (++ s p e c i a l t i e s r "crab r o ll" ) (++ f a c i l i t i e s r "banquet")) r ) ) ) # , (DBO DBOBJECT FOREIGN 21059041) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’r e s t - g u id e ’t i n i ) ’ ( l e t ( ( r (m ake-dbob ject)) (a (m ak e-d b ob ject))) (atom ic (++ add ress a) (++ s t r e e t a "2928 Washington B lv d ." ) (++ c i t y a "Venice") 199 (++ r e sta u r a n t r) (++ fo r e ig n r) (++ name r "eastw ind ca fe" ) (++ lo c a t io n r a) (++ phone r "823-9678") (++ a v g -p r ic e r 9) (++ hours r "4-11") (++ c l a s s i f i c a t i o n r "th ai" ) (++ s p e c i a l t i e s r "pad t h a i" ) ) r ) » # , (DBO DBOBJECT FOREIGN 21236849) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’r e s t - g u id e ’t i n i ) ’ ( l e t ( ( r (m ake-dbob ject)) (a (m a k e-d b o b ject))) (atom ic (++ add ress a) (++ s t r e e t a "13515 Washington B lvd ." ) (++ c i t y a "Venice") (++ r e sta u r a n t r) (++ name r "miami sp ic e " ) (++ lo c a t io n r a) (++ phone r "306-7979") (++ a v g -p r ic e r 25) (++ hours r "M-F 5 -2 ; Sat-Sun 7-12") (++ c l a s s i f i c a t i o n r "cuban") (++ s p e c i a l t i e s r "jazz") (++ f a c i l i t i e s r "dancing, ambiance")) r ) ) ) # , (DBO DBOBJECT RESTAURANT 21410569) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’ r e s t - g u id e ’t i n i ) ’ ( l e t ( ( r (m ake-dbob ject)) (a (m ak e-d b ob ject))) (atom ic (++ address a) (++ s t r e e t a "13715 F i j i Way") (++ c i t y a "Marina Del Rey") (++ resta u ra n t r) (++ name r " el t o r it o " ) (++ lo c a t io n r a) (++ phone r "823-8941") (++ a v g -p r ic e r 10) (++ c l a s s i f i c a t i o n r "mexican") (++ hours r "M-F 5 -11; Sat-Sun 5-2") (++ s p e c i a l t i e s r " q u esa d illa " ) 200 (++ f a c i l i t i e s r "harbor view " )) r ) ) ) # , (DBO DBOBJECT RESTAURANT 21559065) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’r e s t - g u id e ’t i n i ) ’ ( l e t ( ( r (m ake-dbob ject)) (a (m ak e-d b ob ject))) (atom ic (++ add ress a) (++ s t r e e t a "590 Washington S t ." ) (++ c i t y a "Marina Del Rey") (++ r e sta u ra n t r ) (++ f o r e ig n r) (++ name r "akbar") (++ lo c a t io n r a) (++ phone r "822-4116") (++ a v g -p r ic e r 24) (++ hours r "M-F 5 -11; Sat-Sun 5-10") (++ c l a s s i f i c a t i o n r "indian") (++ s p e c i a l t i e s r " tan d oori" )) r ) ) ) # , (DBO DBOBJECT FOREIGN 23265113) <cl> (s a v e -in s ta n c e r e s t - i n f o ) NIL The persistent states of schema specification rest-guide and world instance restaurant-info are in D.5. Clear Restaurant G uide Clear-worldschem a-dom ain removes instances of rest-guide. The operation can be forced, i.e. the instance removed even if worlds of the schema exist. Worlds may be unloaded (the corresponding object removed from the workspace), leaving the objects that made up its population unaffected. <cl> (clear-w orldschem a-dom ain r e s t ) Cannot c le a r domain o f SCHEMA:REST-GUIDETINI u n t i l a l l dependent w orlds (WORLD:RESTAURANT-INFOTINI) are unloaded. 201 NIL <cl> (u n lo a d -a ll-w o rld s-u n d er-sch em a r e s t ) Cannot d e le t e W ORLD:RESTAURANT-INFOTINI. There are o b j e c ts in t h i s world. (NIL) <cl> (u n lo a d -a ll-w o rld s-u n d er-sch em a r e s t t ) D e le tin g world W ORLD:RESTAURANT-INFOTINI from workspace. (NIL) <cl> (clear-w orldschem a-dom ain r e s t ) NIL D .1 .3 C rea te R ec o m m e n d a tio n S ch em a an d W orld s T his subsection shows tran scrip ts of how a user creates, populates an d saves recommendation-world and extracts highly recom m ended restau ran ts into paul- recommendation-world. R egister Schem a and World Recommendation relates a restaurant to a person who recommended it and an integer rating from 1 to 10. <cl> (r eg ister-w o rld sch em a ’recommendation ’t i n i ’ ((sh a r e d s to r e d -ty p e r e sta u r a n t) (shared s to r e d -ty p e person) (shared s t o r e d - r e la t i o n r-name r e sta u r a n t s t r in g ) (shared s t o r e d - r e la t i o n r-phone resta u ra n t s t r in g ) (shared s t o r e d - r e la t i o n s p e c i a l t i e s r e sta u r a n t s t r in g ) ( in t e r n a l s t o r e d - r e la t i o n r -ty p e r e sta u r a n t s t r in g ) ( in t e r n a l s t o r e d - r e la t i o n r -a d d r e ss r e sta u r a n t s t r in g ) ( in t e r n a l s t o r e d - r e la t io n c o s t r e sta u r a n t in t e g e r ) (shared s t o r e d - r e la t i o n recommendation resta u ra n t p erson in t e g e r ) (shared s t o r e d - r e la t io n p-name person s t r i n g ) ) t ) T 202 <cl> (r e g is t e r -w o r ld ’recommendation-world ’t i n i ’ recommendation ’t i n i n i l t ) T Load Schem a and W orld W hen recom m endation is loaded, some of its relationships are renam ed to m erge w ith those of r e s t- g u id e . A verbose version of re s to re -s c h e m a is shown in section D.4. <cl> ( s e t f rec (load-w orldschem a ’recommendation ’t i n i ) ) SCHEMA:RECOMMENDATIONTINI <cl> (restore-w orld sch em a rec ’ ( ( r - t y p e p a u l-r -ty p e ) (r -a d d r e ss p a u l-r -a d d r e s s) (c o s t c o s t ) ) ) SCHEMA:RECOMMENDATIONTINI <cl> ( s e t f rec-w o rld (lo a d -w o rld ’recom m endation-world ' t i n i ) ) W ORLD:RECOMMENDATION-WORLDTINI Populate R ecom m endation Instances of recommendation schem a are asserted to th e workspace. <cl> ( s e t f p i ( l e t ((p (m ak e-d b ob ject))) (atom ic (++ person p) (++ p-name p "larry") p ) ) ) ) ) # , (DBO DBOBJECT PERSON 22068457) <cl> .... ;; p2 = c r e a te person named "yingsha" # , (DBO DBOBJECT PERSON 22187425) 203 <cl> .... ; ; p3 = c r e a te perso n named "m ichael" it, (DBO DBOBJECT PERSON 22250041) <cl> .... ;; p4 = c r e a te person named "nancy" it, (DBO DBOBJECT PERSON 22312657) <cl> ( s e t f r l (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a 'recommendation ’t i n i ) ’ ( l e t ( ( r (m ak e-d b ob ject))) (atom ic (++ resta u ra n t r) (++ r-name r "miami sp ic e" ) (++ r-phone r "306-7979") (++ r -ty p e r "cuban") (++ r -a d d r e ss r "13515 Washington Blvd V enice") (++ c o s t r 22) r ) ) ) ) ) # , (DBO DBOBJECT RESTAURANT 24051369) <cl> ( s e t f r2 .... ;; c r e a te t h a i resta u ra n t named "eastw ind cafe" .... ;; phone "823-9678", s p e c i a l t i e s " ic e c o f f e e " , .... ;; c o st 8 , and i t s r -a d d r e s s. ) # , (DBO DBOBJECT RESTAURANT 24237657) <cl> ( s e t f r3 c r e a te jap an ese resta u ra n t named "minato", phone "305-1104", s p e c i a l t i e s "soba", c o s t 18, and i t s r -a d d r e s s. ) # , (DBO DBOBJECT RESTAURANT 24333593) 204 <cl> ( s e t f r4 .... ;; c r e a te mexican r e sta u r a n t named " el t o r i t o " , .... ;; phone "823-8941", s p e c i a l t i e s " q u e s a d illa s " , .... ;; c o s t 13, and i t s r -a d d r e ss. ) # , (DBD DBOBJECT RESTAURANT 24428097) <cl> ( s e t f r5 .... ;; c r e a te in d ia n resta u ra n t named "akbar", .... ;; phone "822-4116", s p e c i a l t i e s " tan doori", .... c o s t 22, and i t s r -a d d r e ss. ) # , (DBO DBOBJECT RESTAURANT 24522593) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ (progn (++ recommendat ion r l Pi 6) (++ recommendat ion r2 p i 8) (++ recommendation r3 Pi 7) (++ recommendation r5 Pi 10) (++ recommendation r2 p2 8) (++ recommendation r3 p2 9) (++ recommendation r3 p3 10) (++ recommendation r2 p4 9) (++ recommendation r3 p4 5) (++ recommendation r5 p4 9) ) ) ) NIL 205 Changes m ay be m ade to th e d atab ase using AP5. <cl> ( s e t f la r r y (any p s . t . (and (person p) (p-name p " la r r y " ))) # , (DBO DBOBJECT PERSON 24070745) <cl> p i # , (DBO DBOBJECT PERSON 24070745) <cl> (— p-name p i "larry") NIL <cl> (++ p-name p i "paul") NIL <cl> ( l i s t o f x s . t . (p-name p i x )) ("paul") Recommendation-world is saved, causing th e changes to persist. <cl> ( s a v e -in s ta n c e rec-W orld) NIL Create Paul-recom m endation-W orld T he user is interested in highly recom m ended restau ran ts, restau ran ts w ith higher th a n 7 rating. He creates a closure-based world, Paul-recommendation-world and populates it w ith recom m endation relationships as its seeds. Since there is a relation closure specified for recom m endation, the closure of th e seeds are saved in the world. <cl> ( r e g is t e r -w o r ld ’paul-recom m endation-w orld ’t i n i ’recommendation ’t i n i ’ ((r e s ta u r a n t (r-name 0 $) (r-phone 0 !) (r -ty p e 0 !) (r -a d d r e ss 0 !) ( s p e c i a l t i e s 0 !) (c o s t 0 ! ) ) (recommendation ! ! !) (p erson (p-name 0 ! ) ) ) t ) T 206 <cl> ( s e t f p a u l-r e c (load -w orld ’paul-recom m endation-w orld ’t i n i ) ) WORLD:PAUL-RECOMMENDATIQN-WORLDTINI <cl> ( r e s to r e -w o r ld -c lo s u r e p a u l-r e c ) A ss e r tin g ty p e c lo su r e RESTAURANT: ((R-NAME < 3 $) (R-PHONE Q !) (PAUL-R-TYPE < 3 !) (PAUL-R-ADDRESS < 3 !) (SPECIALTIES < 3 !) (COST < 3 ! ) ) A ss e r tin g r e l a t i o n c lo su r e (RECOMM ENDATION ! ! !) A sse r tin g ty p e c lo su r e PERSON: ((P-NAME < 3 !) ) (NIL NIL NIL) <cl> (defun generate-good-recom m endations () ( l e t ((tem p valu es (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x y z) s . t . (and (recommendation x y z) (> z 7 ) ) ) ) ) ) f in a lv a lu e s ) ( s e t f f in a lv a lu e s (mapcar (fu n c tio n (lambda ( l i s t ) ' (recommendation , ( f i r s t l i s t ) , (second l i s t ) , (t h ir d l i s t ) ) ) ) te m p v a lu e s)) f i n a l v a l u e s ) ) GENERATE-GOOD-RECOMMENDATIONS <cl> (a d d -seed s p a u l-r e c 207 (generate-good -recom m en d ation s)) (NIL NIL NIL NIL NIL NIL NIL) <cl> (s a v e -in s ta n c e p a u l-r e c ) NIL <cl> ( l i s t o f x s . t . (in w orld p a u l-r e c x )) ( # , (DBO DBOBJECT PERSON 21089593) # , (DBO DBOBJECT PERSON 21089609) # , (DBO DBOBJECT RESTAURANT 21291105) # , (DBO DBOBJECT PERSON 21089625) # , (DBO DBOBJECT PERSON 21089641) # , (DBO DBOBJECT RESTAURANT 21291857) # , (DBO DBOBJECT RESTAURANT 21291881)) <cl> ( l i s t o f x s . t . (in w o rld -tu p p a u l-r e c x )) ((PERSON # , (DBO DBOBJECT PERSON 21089593)) (P-NAME # , (DBO DBOBJECT PERSON 21089593) "nancy") (PERSON # , (DBO DBOBJECT PERSON 21089609)) (P-NAME # , (DBO DBOBJECT PERSON 21089609) "m ichael") (RESTAURANT #,(DBO DBOBJECT RESTAURANT 21291105)) (R-NAME # , (DBO DBOBJECT RESTAURANT 21291105) "minato") .... ;; i t s c o s t , s p e c i a l t i e s , r -a d d r e s s, r - t y p e , r-phone (PERSON # , (DBO DBOBJECT PERSON 21089625)) (P-NAME # , (DBO DBOBJECT PERSON 21089625) "yingsha") (RESTAURANT #,(DBO DBOBJECT RESTAURANT 21291857)) (R-NAME # , (DBO DBOBJECT RESTAURANT 21291857) "akbar") .... ;; i t s c o s t , s p e c i a l t i e s , r -a d d r e s s, r - t y p e , r-phone (PERSON # , (DBO DBOBJECT PERSON 21089641)) (P-NAME # , (DBO DBOBJECT PERSON 21089641) "paul") (RESTAURANT #,(DBO DBOBJECT RESTAURANT 21291881)) (R-NAME # , (DBO DBOBJECT RESTAURANT 21291881) "eastw ind ca fe" ) .... ;; i t s c o s t , s p e c i a l t i e s , r -a d d r e s s, r - t y p e , r-phone 208 (RECOMM ENDATION # (DBO DBOBJECT # (DBO DBOBJECT (RECOMM ENDATION # (DBO DBOBJECT # (DBO DBOBJECT (RECOMM ENDATION # (DBO DBOBJECT # (DBO DBOBJECT (RECOMM ENDATION # (DBO DBOBJECT # (DBO DBOBJECT (RECOMM ENDATION # (DBO DBOBJECT # (DBO DBOBJECT (RECOMM ENDATION # (DBO DBOBJECT # (DBO DBOBJECT (RECOMM ENDATION # (DBO DBOBJECT # (DBO DBOBJECT RESTAURANT 21291881) PERSON 21089641) 8) RESTAURANT 21291857) PERSON 21089641) 10) RESTAURANT 21291881) PERSON 21089625) 8) RESTAURANT 21291105) PERSON 21089625) 9) RESTAURANT 21291105) PERSON 21089609) 10) RESTAURANT 21291881) PERSON 21089593) 9) RESTAURANT 21291857) PERSON 21089593) 9 )) D .2 T ransform ing W orlds To view japanese restau ran ts and highly recom m ended restau ran ts, th e user ex­ tra c ts Japanese restau ran ts and transform s it to be an instance of recommendation. T he user can either transform extracted world joe-entertainment-world into an in­ stance of recommendation using b u l k - t r a n s f orm operation, or he m ay transform and ex tract at th e sam e tim e, using c l o s u r e - t r a n s f orm operation. T he first subsection shows a tra n scrip t of th e user transform ing joe-recommen- dation-schema using bulk transform , and saving the result in joe-rec-1. T he next subsection shows a tran scrip t of the user extracting and transform ing a subset of entertainment-world, and saving th e result in joe-rec-2. T h e closure specified on joe-rec-2 ensures th a t it is isom orphic to joe-rec-1 in population. D .2 .1 B u lk T ransform Since the sam e w orkspace is used, the user m ust clear th e dom ain of recommen­ dation to prevent existing d a ta from interfering w ith th e results of th e transfor­ m ation. c le ar-w o rld sc h e m a-d o m ain can be used to rem ove instances from th e 209 world schema; th e second p aram eter forces the rem oval of instances even though worlds exist in th e workspace th a t depends on th e schema; th e th ird p aram eter forces rem oval of tuples in shared relations. Since some of th e relations in recom­ mendation are shared w ith rest-guide, th e user gets w arning m essages w hen tuples to shared relations are being removed. Setting U p for Bulk Transform <cl> (clear-w orldschem a-dom ain r e c t t ) Warning: removing t u p le s in shared r e la t io n # , (DBO RELATION COST). Warning: removing t u p le s in shared r e la t io n # , (DBO RELATION PAUL-R-TYPE). Warning: removing t u p le s in shared r e la t io n # , (DBO RELATION SPECIALTIES). Warning: removing t u p le s in shared r e la t io n # , (DBO RELATION R-PHONE). Warning: removing t u p le s in shared r e la t io n # , (DBO RELATION R-NAME). Warning: removing t u p le s in shared r e la t io n # , (DBO TYPE RESTAURANT). NIL To ensure th a t only instances in joe-entertainment-world is transform ed, the user first unloads o ther worlds of th e sam e schem a and clears th e dom ain of enter- • tainment. T hen th e w orld joe-entertainment-world is restored. T he user m ay query th e workspace to ensure th a t only Japanese restau ran t inform ation are restored in th e workspace. N ote the two different ways a user m ay reference relations, by th eir local nam es using a d ju s t- to - b u f f e r - n a m e s , or by th eir workspace nam es. <cl> (c le a r -d o m a in -o f-o th e r -w o r ld s j o e - e n t ) W ORLD:JOE-ENTERTAINMENT-WORLDTINI <cl> ( r e s t o r e - in s t a n c e j o e - e n t ) NIL 210 ;; ch eck in g in s ta n c e in en tertain m en t schema <cl> ( l i s t o f (x) s . t . (jo e -r e s ta u r a n t x ) ) ) ) ( # , (DBO DBOBJECT JOE-RESTAURANT 26332033) # , (DBO DBOBJECT JOE-RESTAURANT 26250569) # , (DBO DBOBJECT JOE-RESTAURANT 26171169)) <cl> ( l i s t o f (x y) s . t . (name x y ) ) ) ) ( ( # , (DBO DBOBJECT JOE-RESTAURANT 26332033) "ma-ri-na" ( # , (DBO DBOBJECT JOE-RESTAURANT 26250569) "kifune") ( # , (DBO DBOBJECT JOE-RESTAURANT 26171169) "m inato")) <cl> ( l i s t o f x s . t . (in w orld j o e - e n t x )) ( # , (DBO DBOBJECT JOE-RESTAURANT 26332033) # , (DBO DBOBJECT JOE-RESTAURANT 26250569) # , (DBO DBOBJECT JOE-RESTAURANT 26171169)) <cl> ( l i s t o f (x y) s . t . (r a tin g x y ) ) ) ) ( ( # , (DBO DBOBJECT JOE-RESTAURANT 26332033) "****») ( # , (DBO DBOBJECT JOE-RESTAURANT 26250569) "***") ( # , (DBO DBOBJECT JOE-RESTAURANT 26171169) "*****")) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’en tertain m en t ’t i n i ) ’ ( l i s t o f (x y) s . t . (r -ty p e x y ) ) ) ) ( ( # , (DBO DBOBJECT JOE-RESTAURANT 26332033) "japanese" ( # , (DBO DBOBJECT JOE-RESTAURANT 26250569) "japanese" ( # , (DBO DBOBJECT JOE-RESTAURANT 26171169) "japanese" <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’en tertain m en t ’t i n i ) ’ ( l i s t o f (x y) s . t . (a v g -p r ic e x y ) ) ) ) ( ( # , (DBO DBOBJECT JOE-RESTAURANT 26332033) "$$$") ( # , (DBO DBOBJECT JOE-RESTAURANT 26250569) "$$") ( # , (DBO DBOBJECT JOE-RESTAURANT 26171169) "$$")) ;; ch eck in g th a t ta r g e t schema i s empty <cl> (e v a l ( a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x) s . t . (r e sta u ra n t x)))) NIL <cl> ( l i s t o f (x) s . t . (person x)))) NIL ... ) ] j e t c . <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x y) s.t. (p-name x y)))) NIL Transform Functions T he transform ation requires th a t a v g -p ric e , m odeled as strings be transform ed to integer values. Also, th e r a t in g , m odeled as strings (num ber of stars), is tra n s­ form ed to integers in th e functions provided below. T he functions are defined by th e user. T he relations, a v g - p r ic e - to - c o s t and s t a r - t o - i n t e g e r are relational form of Lisp functions. They are used in ILO G * specifications. <cl> (defun c o n v e r t - a v g - p r ic e - t o - c o s t (p r ic e ) (cond ( ( s t r in g - e q u a l p r ic e "$") 10) ( ( s t r in g - e q u a l p r ic e "$$") 20) ( ( s t r in g - e q u a l p r ic e "$$$") 30) ( ( s t r in g - e q u a l p r ic e "$$$$") 40) ( ( s t r in g - e q u a l p r ic e "$$$$$") 50) ( t 1 0 0 ))) CONVERT-AVG-PRICE-TO-COST 212 ;; d e fin e fu n c tio n s as r e la t io n s t o transform v a lu e s <cl> ( d e f r e la t io n a v g - p r ic e - t o - c o s t : a r it y 2 :ty p es ( s t r in g in te g e r ) :d e r iv a tio n (L isp -F u n ctio n c o n v e r t - a v g - p r ic e - t o - c o s t 1 1) :g en era to r ((sim p le g e n e r a to r (a ou tp u t) (and (s tr in g p a) ( c o n v e r t - a v g - p r ic e - t o - c o s t a))))) # , (DBO RELATION AVG-PRICE-TO-COST) T <cl> (defun c o n v e r t - s t a r - t o - in t e g e r ( s t a r ) (cond ( ( s t r in g - e q u a l s ta r "*") 2) ( ( s t r in g - e q u a l s ta r "**") 4) ( ( s t r in g - e q u a l s ta r "***") 6) ( ( s t r in g - e q u a l s ta r "****") 8) ( ( s t r in g - e q u a l s t a r "*****") 10) ( t 0))) CONVERT-STAR-TO-INTEGER <cl> ( d e f r e la t io n s t a r - t o - i n t e g e r : a r it y 2 :ty p e s ( s t r in g in t e g e r ) :d e r iv a tio n (L isp -F u n ction . c o n v e r t - s t a r - t o - in t e g e r 1 1) :g en era to r ((sim p le g e n e r a to r (a output) (and (s tr in g p a) ( c o n v e r t - s t a r - t o - i n t e g e r a))))) # , (DBO RELATION STAR-TO-INTEGER) T 213 T ransform ation S p ecification The transformation specification used to transform instances of entertainment to instances of recommendation is given in a file called e n t - t o - r e c . t r , which is shown below. transform en tertain m en t t i n i recommendation t i n i (i- p e r s o n [(* person ) (p s t r i n g ) ] :- (eq u al p " jo e" ); p-name (p n) :- i-p e r s o n [p n ] ; i- r e s t a u r a n t [(* r e sta u r a n t) (n s t r in g ) (p s t r i n g ) ] :- (and (r e sta u r a n t x) (name x n) (phone x p ) ); r-name (r n) :- i- r e s t a u r a n t [r n p] ; r-phone (r p) :- i- r e s t a u r a n t [r n p] ; s p e c i a l t i e s (r s) :- (and (r e sta u r a n t x) (name x n) (phone x p) ( s p e c i a l t y x s ) ) and i- r e s t a u r a n t [r n p] ; r -ty p e (r y) :- 214 (and (r e sta u r a n t x) (name x n) (phone x p) (r -ty p e x y )) and i- r e s t a u r a n t [r n p] ; c o s t (r c) (and (r e sta u r a n t x) (name x n) (phone x ph) (a v g -p r ic e x amt) ( a v g - p r ic e - t o - c o s t amt c ) ) and i- r e s t a u r a n t [ r n ph] ; recommendation (r p r t i ) :- (and (r e sta u r a n t x) (name x rn) (phone x ph) (r a tin g x r t ) ( s t a r - t o - i n t e g e r r t r t i ) (s t r in g - e q u a l pn " joe" )) and i- r e s t a u r a n t [ r rn p h ] , i-p e r s o n [p pn] ) T ransform O p eration F irst, th e transform ation specification is parsed and tra n slated to lisp form s as shown below. T hen b u lk -tra n s fo rm is called w ith th e tran slated specifications. ; ; ; p arse and t r a n s la t e correspondence grammar 215 <cl> ( s e t f l i s t - o f - e n t - t o - r e c - c o r r s (e v a l ( f l : :l i s p - t o - l i s t s ( c o r r - t o - l i s p (POE:MustParseFromfile (c r e a te -tr a n sfo r m -p a th 3e n t - t o - r e c ) 3c r :co rresp o n d en ce-sp ec ’ c r : :c o r r ) ) ) ) ) Reading f i l e " /u s e r s /t in i/n e w w o r ld /d e m o /e n t-to - r e c .tr " ((ENTERTAINMENT TINI) (RECOM M ENDATION TINI) ((T I-PERSON (* PERSON) (P STRING)) (P) (EQUAL P "joe") NIL) ((R P-NAME P N) NIL NIL ((I-PERSON P N ))) ((T I-RESTAURANT (* RESTAURANT) (N STRING) (P STRING)) (N X P) (AND (RESTAURANT X) (NAM E X N) (PHONE X P )) NIL) ((R R-NAME R N) NIL NIL ((I-RESTAURANT R N P ) ) ) ((R R-PHONE R P) NIL NIL ((I-RESTAURANT R N P ))) ((R SPECIALTIES R S) (N P X S) (AND (RESTAURANT X) (NAM E X N) (PHONE X P) (SPECIALTY X S )) ((I-RESTAURANT R N P ))) ((R R-TYPE R Y) (N P X Y) (AND (RESTAURANT X) (NAM E X N) (PHONE X P) (R-TYPE X Y)) ((I-RESTAURANT R N P ))) ((R COST R C) (N PH X A M T C) (AND (RESTAURANT X) (NAM E X N) (PHONE X PH) (AVG-PRICE X AM T) (AVG-PRICE-TO-COST A M T C)) ((I-RESTAURANT R N PH))) ((R RECOM M ENDATION R P RTI) (RN PH X RT RTI PN) (AND (RESTAURANT X) (NAM E X RN) (PHONE X PH) (RATING X RT) (STAR-TO-INTEGER RT RTI) (STRING-EQUAL PN " jo e" )) ((I-RESTAURANT R RN PH) (I-PERSON P P N )))) <cl> (b u lk — transform l i s t - o f - e n t - t o - r e c - c o r r s ) T ransform ation done NIL 216 T he user m ay check th e resulting instance of recommendation. In p articu lar, c o s t and recom m endation ratings have been transform ed accordingly. ;; querying th e domain of recommendation <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x) s . t . (r e sta u r a n t x ) ) ) ) ( # , (DBO DBOBJECT RESTAURANT 23127185) # , (DBO DBOBJECT RESTAURANT 23127217) # , (DBO DBOBJECT RESTAURANT 23127201)) <cl> ( l i s t o f (x) s . t . (p erson x ) ) ) ) ( # , (DBO DBOBJECT PERSON 23194617)) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x y) s . t . (r-name x y ) ) ) ) ( ( # , (DBO DBOBJECT RESTAURANT 23127201) "m a-ri-na") ( # , (DBO DBOBJECT RESTAURANT 23127217) "kifune") (#,(DBO DBOBJECT RESTAURANT 23127185) "minato")) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x y) s . t . ( s p e c i a l t i e s x y ) ) ) ) ( ( # , (DBO DBOBJECT RESTAURANT 23127201) " in a ri" ) ( # , (DBO DBOBJECT RESTAURANT '23127217) " te r iy a k i" ) ( # , (DBO DBOBJECT RESTAURANT 23127185) " sukiyaki") ( # , (DBO DBOBJECT RESTAURANT 23127185) " sash im i" )) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x y) s . t . (r -a d d r e ss x y ) ) ) ) NIL 217 .... ;; c h e c k in g r e s u l t i n g in s ta n c e < c l > ( e v a l ( a d j u s t - t o - b u f f e r - n a m e s ( f i n d - s c h e m a ’ r e c o m m e n d a t i o n ’ t i n i ) ’ ( l i s t o f ( x y ) s . t . ( c o s t x y ) ) ) ) ( ( # , (DBO DBOBJECT RESTAURANT 23127185) 20) ( # , (DBO DBOBJECT RESTAURANT 23127217) 20) ( # , (DBO DBOBJECT RESTAURANT 23127201) 3 0 )) < c l > ( e v a l ( a d j u s t - t o - b u f f e r - n a m e s ( f i n d - s c h e m a ’ r e c o m m e n d a t i o n ’ t i n i ) ’ ( l i s t o f ( x y z ) s . t . ( r e c o m m e n d a t i o n x y z ) ) ) ) ( ( # , (DBO DBOBJECT RESTAURANT 23127185) # , (DBO DBOBJECT PERSON 23194617) 10) ( # , (DBO DBOBJECT RESTAURANT 23127217) # , (DBO DBOBJECT PERSON 23194617) 6) ( # , (DBO DBOBJECT RESTAURANT 23127201) # , (DBO DBOBJECT PERSON 23194617) 8 )) < c l > ( e v a l ( a d j u s t - t o - b u f f e r - n a m e s ( f i n d - s c h e m a ’ r e c o m m e n d a t i o n ’ t i n i ) ’ ( l i s t o f ( x y ) s . t . ( p - n a m e x y ) ) ) ) ( ( # , (DBO DBOBJECT PERSON 23194617) " joe" )) Save R esu lts to Joe-rec-1 T he user m ay th en create a world (Joe-rec-1) to contain newly transform ed data. < c l > ( r e g i s t e r - w o r l d ’ j o e - r e c - 1 ’ t i n i ’ r e c o m m e n d a t i o n ’ t i n i ) T < c l > ( s e t f j o e - r e c - 1 ( l o a d - w o r l d ’ j o e - r e c - 1 ’ t i n i ) ) W ORLD: J0E-REC-1TINI 218 < c l > ( s a v e - i n s t a n c e j o e - r e c - 1 ) NIL D .2 .2 C losu re T ransform To do closure-transform , th e user first sets up th e workspace so th a t only th e popu­ lation of entertainment-world is in the workspace. N ext, he creates a ta rg e t closure- based world (joe-rec-2) w ith ap p ro p riate world closure specification. He then calls closure-transform w ith seed correspondences for joe-rec-2 and correspondence spec­ ification th a t specifies a transform ation of entertainment to recommendation. T he tran scrip ts are shown below. S e ttin g up for C losure T ransform F irst, ensure th a t th e dom ains are cleared, and entertainment-world is loaded. No keys need to be specified if entertainment is first cleared. <cl> (clear-w orldschem a-dom ain (fin d -sch em a ’ en tertain m en t ’t i n i ) t t ) (NIL NIL (NIL NIL NIL) ...) <cl> (clear-w orldschem a-dom ain (fin d -sch em a ’recommendation ’t i n i ) t t ) Warning: removing t u p le s in shared r e la t io n #,(DBO RELATION COST). Warning: rem oving t u p le s in shared r e la t io n #,(DBO RELATION PAUL-R-TYPE). Warning: rem oving t u p le s in shared r e la t io n #,(DBO RELATION SPECIALTIES). Warning: removing t u p le s in shared r e la t io n #,(DBO RELATION R-PHONE). Warning: rem oving t u p le s in shared r e la t io n #,(DBO RELATION R-NAME). Warning: rem oving t u p le s in shared r e la t io n #,(DBO TYPE RESTAURANT). 219 ((NIL) (NIL NIL NIL) -------) < c l > ( s e t f e n t - w o r l d ( l o a d - w o r l d ’ e n t e r t a i n m e n t - w o r l d ’ t i n i ) ) W ORLD:ENTERTAINMENT-WORLDTINI < c l > ( r e s t o r e - i n s t a n c e e n t - w o r l d ) NIL Check th a t entertainment dom ain contains d a ta to be transform ed, and recom­ mendation dom ain is em pty. I; c h e c k d o m a i n o f e n t e r t a i n m e n t < c l > ( l i s t o f ( x ) s . t . ( e n t e r t a i n m e n t x ) ) ) ) ( # , (DBO DBOBJECT THEATRE 26429961) # , (DBO DBOBJECT JOE-RESTAURANT 26418257) #,(DB0 DBOBJECT JOE-RESTAURANT 26429985) # , (DBO DBOBJECT JOE-RESTAURANT 26430009) # , (DBO DBOBJECT JOE-RESTAURANT 26430033) # , (DBO DBOBJECT JOE-RESTAURANT 26410441) # , (DBO DBOBJECT JOE-RESTAURANT 26430057)) . . . . ;; e t c . < c l > ( e v a l ( a d j u s t - t o - b u f f e r - n a m e s ( f i n d - s c h e m a ’ e n t e r t a i n m e n t ’ t i n i ) ’ ( l i s t o f ( x y ) s . t . ( r a t i n g x y ) ) ) ) ((#,(DBO DBOBJECT JOE-RESTAURANT 26418257) "***") (#,(DBO DBOBJECT JOE-RESTAURANT 26429985) "****") (#,(DBO DBOBJECT THEATRE 26429961) "***") ( #,(DBO DBOBJECT JOE-RESTAURANT 26430009) "***") (#,(DBO DBOBJECT JOE-RESTAURANT 26430033) "***") (#,(DBO DBOBJECT JOE-RESTAURANT 26410441) "*****") (#,(DBO DBOBJECT JOE-RESTAURANT 26430057) "* * * * ")) < c l > ( e v a l ( a d j u s t - t o - b u f f e r - n a m e s 220 (fin d -sch em a ’ en tertain m en t ’t i n i ) ’ ( l i s t o f (x y) s . t . (r -ty p e x y ) ) ) ) ( ( # , (DBO DBOBJECT JOE-RESTAURANT 26418257) "american") (#,(DB0 DBOBJECT JOE-RESTAURANT 26429985) "japanese") ( # , (DBO DBOBJECT JOE-RESTAURANT 26430009) " jap an ese”) ( # , (DBO DBOBJECT JOE-RESTAURANT 26430033) "mexican") ( # , (DBO DBOBJECT JOE-RESTAURANT 26410441) "japanese") ( # , (DBO DBOBJECT JOE-RESTAURANT 26430057) " th a i" )) <cl> (e v a l (a d ju s t-to -b u ffe r -n a m e s (fin d -sch em a ’en tertain m en t ’ ( l i s t o f (x y) s . t . (a v g -p r ic e x y ) ) ) ) ( ( # , (DBO DBOBJECT JOE-RESTAURANT 26418257) "$") ( # , (DBO DBOBJECT JOE-RESTAURANT 26429985) "$$$") ( # , (DBO DBOBJECT JOE-RESTAURANT 26430009) ”$$") ( # , (DBO DBOBJECT JOE-RESTAURANT 26430033) ”$$") ( # , (DBO DBOBJECT JOE-RESTAURANT 26410441) ”$$") ( # , (DBO DBOBJECT JOE-RESTAURANT 26430057) "$")) ;; check domain of recommendation i s empty <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation , ’t i n i ) ’ ( l i s t o f (x) s . t . (r e sta u r a n t x ) ) ) ) NIL <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x) s . t . (person x ) ) ) ) NIL <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x y) s . t . (r-name x y ) ) ) ) ’t i n i ) 221 NIL . . . . ; ; e t c . C reate T arget W orld N e x t , a c l o s u r e - b a s e d t a r g e t w o r l d is c r e a t e d a n d l o a d e d ( j o e - r e c - 2 ) . < c l > ( r e g i s t e r - w o r l d ’ j o e - r e c - 2 ’ t i n i ’ r e c o m m e n d a t i o n ’ t i n i ’ ( ( r e s t a u r a n t ( r - n a m e 0 $ ) ( r - p h o n e 0 ! ) ( r - t y p e 0 ! ) ( r - a d d r e s s © ! ) ( s p e c i a l t i e s 0 ! ) ( c o s t © ! ) ( r e c o m m e n d a t i o n 0 ! ! ) ) ( p e r s o n ( p - n a m e 0 ! ) ) ) ) T < c l > ( s e t f j o e - r e c - 2 ( l o a d - w o r l d ’ j o e - r e c - 2 ' t i n i ) ) WORLD:J0E-REC-2TINI < c l > ( r e s t o r e - w o r l d - c l o s u r e j o e - r e c - 2 ) (NIL NIL) T ransform ation S p ecification T h e s a m e t r a n s f o r m a t i o n s p e c i f i c a t i o n ( i n e n t - t o - r e c . t r ) is u s e d a s t h e m a i n c o r ­ r e s p o n d e n c e s p e c i f i c a t i o n . H o w e v e r , s e e d s m u s t b e s u p p l i e d f o r t h e t a r g e t w o r l d . T h i s is p r o v i d e d i n t e r m s o f I L O G * ( i n f i le s e e d - e n t - t o - r e c . t r ) g i v e n b e l o w . t r a n s f o r m e n t e r t a i n m e n t t i n i r e c o m m e n d a t i o n t i n i ( i - r e s t a u r a n t [ ( * r e s t a u r a n t ) ( n s t r i n g ) ( p s t r i n g ) ] ( a n d ( r e s t a u r a n t x ) ( n a m e x n ) ( p h o n e x p ) ( r - t y p e x " j a p a n e s e " ) ) ) 222 P arse and T ranslate S p ecification s T he correspondence and seed specifications are parsed and tra n slated to lisp forms. <cl> ( s e t f l i s t - o f - e n t - t o - r e c - c o r r s (e v a l ( f l : : l i s p - t o - l i s t s ( c o r r - t o - l i s p (POE :Mu.stParseFromf i l e (c r e a te -tr a n sfo r m -p a th *e n t - t o - r e c ) ’ c r :corresp on d en ce-sp ec ’c r : :c o r r ) ) ) ) ) Reading f i l e " /u s e r s /t in i/n e w w o r ld /d e m o /e n t-to - r e c .tr " ((ENTERTAINMENT TINI) (RECOMM ENDATION TINI) ((T I-PERSON (* PERSON) (P STRING)) (P) (EQUAL P "joe") NIL) ((R P-NAME P N) NIL NIL ((I-PERSON P N ))) ((T I-RESTAURANT (* RESTAURANT) (N STRING) (P STRING)) (N X P) (AND (RESTAURANT X) (NAM E X N) (PHONE X P )) NIL) ((R R-NAME R N) NIL NIL ((I-RESTAURANT R N P ) )) ((R R-PHONE R P) NIL NIL ((I-RESTAURANT R N P ) ) ) ((R SPECIALTIES R S) (N P X S) (AND (RESTAURANT X) (NAM E X N) (PHONE X P) (SPECIALTY X S )) ((I-RESTAURANT R N P ))) ((R R-TYPE R Y) (N P X Y) (AND (RESTAURANT X) (NAM E X N) (PHONE X P) (R-TYPE X Y)) ((I-RESTAURANT R N P ))) ((R COST R C) (N PH X A M T C) (AND (RESTAURANT X) (NAM E X N) (PHONE X PH) (AVG-PRICE X AM T) (AVG-PRICE-TO-COST A M T C)) ((I-RESTAURANT R N PH))) ((R RECOM M ENDATION R P RTI) (RN PH X RT RTI PN) (AND (RESTAURANT X) (NAME X RN) (PHONE X PH) (RATING X RT) (STAR-TO-INTEGER RT RTI) (STRING-EQUAL PN " jo e" )) ((I-RESTAURANT R RN PH) (I-PERSON P P N )))) 223 <cl> ( s e t f s e e d - o f - e n t - t o - r e c - c o r r s ( r e s t ( r e s t (e v a l ( f l : : l i s p - t o - l i s t s ( c o r r - t o - l i s p (POE:MustParseFromfile (c r e a te -tr a n sfo r m -p a th ’ s e e d - e n t - t o - r e c ) ’ c r :corresp on d en ce-sp ec *c r : : c o r r ) ) ) ) ) ) ) Reading f i l e " /u s e r s /t in i/n e w w o r ld /d e m o /s e e d - e n t - t o - r e c .t r " (((T I-RESTAURANT (* RESTAURANT) (N STRING) (P STRING)) (N P X) (AND (RESTAURANT X) (NAME X N) (PHONE X P) (R-TYPE X "japanese")) NIL)) C losure T ransform O p eration C lo su re-tra n sfo rm is called w ith the correspondence specification, seed specifi­ cation and targ et world. Once transform ed, th e results in joe-rec-2 is saved. <cl> (c lo su r e -tr a n sfo r m l i s t - o f - e n t - t o - r e c - c o r r s s e e d - o f - e n t - t o - r e c - c o r r s ’ ( j o e - r e c - 2 t i n i ) ) T ransform ation done NIL ;; sa v e ta r g e t world <cl> ( s a v e -in s ta n c e j o e - r e c - 2 ) NIL T h e resulting instance under recommendation m ay be queried. T h e reader m ay also look at th e saved version of joe-rec-2 in section D.5. N ote th a t r - a d d r e s s is em pty since it is not transform ed from entertainment. <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s 224 (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x) s . t . (r e sta u ra n t x ) ) ) ) ( # , (DBO DBOBJECT RESTAURANT 23752745) # , (DBO DBOBJECT RESTAURANT 23753985) # , (DBO DBOBJECT RESTAURANT 23754001)) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x) s . t . (person x ) ) ) ) ( # , (DBO DBOBJECT PERSON 23992289)) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’ recommendation ’t i n i ) ’ ( l i s t o f (x y) s . t . (p-name x y ) ) ) ) ( ( # , (DBO DBOBJECT PERSON 23992289) " jo e" )) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x y) s . t . (r-name x y ) ) ) ) ( ( # , (DBO DBOBJECT RESTAURANT 23754001) "m a-ri-na") ( # , (DBO DBOBJECT RESTAURANT 23753985) "kifune") ( # , (DBO DBOBJECT RESTAURANT 23752745) "minato")) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x y) s . t . ( s p e c i a l t i e s x y ) ) ) ) ( ( # , (DBO DBOBJECT RESTAURANT 23754001) " in a ri" ) ( # , (DBO DBOBJECT RESTAURANT 23753985) " te r iy a k i" ) ( # , (DBO DBOBJECT RESTAURANT 23752745) "sashim i") ( # , (DBO DBOBJECT RESTAURANT 23752745) " su k iy a k i" )) e t c , 225 < c l > ( e v a l ( a d j u s t - t o - b u f f e r - n a m e s ( f i n d - s c h e m a ’ r e c o m m e n d a t i o n ’ t i n i ) ’ ( l i s t o f ( x y ) s . t . ( r - a d d r e s s x y ) ) ) ) NIL < c l > ( e v a l ( a d j u s t - t o - b u f f e r - n a m e s ( f i n d - s c h e m a ’ r e c o m m e n d a t i o n ’ t i n i ) ’ ( l i s t o f ( x y ) s . t . ( c o s t x y ) ) ) ) ( ( # , (DBO DBOBJECT RESTAURANT 23754001) 30) ( # , (DBO DBOBJECT RESTAURANT 23753985) 20) ( # , (DBO DBOBJECT RESTAURANT. 23752745) 2 0 )) < c l > ( e v a l ( a d j u s t - t o - b u f f e r - n a m e s ( f i n d - s c h e m a ’ r e c o m m e n d a t i o n ’ t i n i ) ’ ( l i s t o f ( x y z ) s . t . ( r e c o m m e n d a t i o n x y z ) ) ) ) ( ( # , (DBO DBOBJECT RESTAURANT 23754001) # , (DBO DBOBJECT PERSON 23992289) 8) ( # , (DBO DBOBJECT RESTAURANT 23753985) # , (DBO DBOBJECT PERSON 23992289) 6) ( # , (DBO DBOBJECT RESTAURANT 23752745) # , (DBO DBOBJECT PERSON 23992289) 10)) < c l > ( l i s t o f x s . t . ( i n w o r l d j o e - r e c - 2 x ) ) ( # , (DBO DBOBJECT PERSON 23992289) # , (DBO DBOBJECT RESTAURANT 23752745) # , (DBO DBOBJECT RESTAURANT 23753985) # , (DBO DBOBJECT RESTAURANT 23754001)) D .3 M ergin g W orlds T he user has already loaded three schem as into the workspace. R enam ing p arts of th eir stru ctu res during loading causes a schem a subset to be m erged (see loading recommendation verbosely in section D.4). 226 In this section, th e user loads m ultiple worlds into th e sam e w orkspace, m erging th e objects in th e worlds using equivalence specifications. In restoring/m erging a w orld instance, equivalence specifications in th e workspace is used. T he w orkspace equivalence is asserted indirectly by loading world equivalence specifications into th e w orkspace. T his is achieved in several steps. F irst, th e w orld’s equivalence specification is asserted in the workspace. N ext, th e w orld’s equivalence specifi­ cation is saved; this save does not incorporate th e w orkspace equivalence speci­ fication into the world equivalence specification. Finally, th e w orld’s equivalence specification is restored to the w orkspace, and is incorporated into th e workspace equivalence specification. W orldBase w arns th e user if the w orkspace equivalence specification is different from the world equivalence specification. C onstraints are also restored in th e sam e way. D .3 .1 M erging: Id en tica l S ch em a In this exam ple, th e user merges either joe-rec-1 or joe-rec-21 (in this tran scrip t joe-rec-2) w ith paul-recommendation-world. B oth w orlds are of recommendation schem a. T h e m erge is done by first restoring one w orld into the w orkspace (e.g. joe-rec-2), and restoring th e second world into the sam e workspace. T h e m erge is invoked w ith preferences, i.e. some of the relationships are preferred from either th e newly loaded d atab ase or th e workspace. S e ttin g up for M ergin g T he user removes all other worlds and instances except for joe-rec-2. In the resu lt­ ing m erge, a w eighted average is assigned to com pute th e c o s t of restau ran ts; the function is given below. Intuitively, 2 is added to c o s t in one of the worlds, and th e resulting average of all the costs are com puted and assigned to th e r e s t a u r a n t object. All o th er c o s t relationships are rem oved except for th e newly assigned value. Also, th e equivalences for th e two worlds are provided an d restored to the workspace. 1They are equivalent. 227 <cl> ( li s t - b u f f e r - w o r ld s ) (WORLD: J0E-REC-2TINI W ORLD:ENTERTAINMENT-WORLDTINI W ORLD: JOE-ENTERTAINMENT-WORLDTINI) <cl> (u nload-w orld en t-w o rld t ) NIL <cl> (unload-w orld jo e - e n t t ) NIL <cl> ( l i s t - b u f f e r - w o r l d s ) (WORLD: J0E-REC-2TINI) <cl> (a d d -to -w o r ld -ty p e -e q u iv s j o e - r e c - 2 ’r e sta u r a n t ’ ((r-nam e r -p h o n e ))) NIL <cl> (a d d -to -w o r ld -ty p e -e q u iv s j o e - r e c - 2 ’person ’ ( (p-nam e))) NIL <cl> (sa v e -w o r ld -e q u iv s j o e - r e c - 2 : in c o r p o r a te -fr o m -b u ffe r n i l ) NIL <cl> (r e sto r e -w o r ld -e q u iv s j o e - r e c - 2 ) (NIL NIL) <cl> (defun com pute-avg-of-num bers (relnam e ob r e s u l t s dbl db2) ;; w eighted avg? - must be more than 0 r e s u l t s ( l e t ((sum 0) new-tup a v g -v a l) (when debugging (form at t "“& Results f o r o b je c t ~S are ”S_y," ob r e s u l t s ) ) (lo o p f o r r e s in r e s u l t s do 228 ( l e t ((tu p ‘ (,reln am e ,ob , r e s ) ) ) (cond ( ( i s - t u p - i n dbl tup) ( s e t f sum (+ sum r e s 2 ) ) ) ( t ( s e t f sum (+ sum r e s ) ) ) ) (rem o v e-tu p -in db2 tup) (rem o v e-tu p -in dbl tup t ) ) ) ( s e t f a v g -v a l (round ( / sum (le n g th r e s u l t s ) ) ) ) ( s e t f new-tup ‘ ( }relname ,ob , a v g -v a l) ) (when debugging (form at t "~&New tu p le i s ~S~%" new -tu p)) ( e v a l ‘ (++ , relname ’ ,ob ,a v g - v a l ) ) ) ) T <cl> ( s e t f p a u l-r e c (lo a d -w o rld ’paul-recom m endation-w orld ’t i n i ) ) W ORLD:PAUL-RECOMMENDATION-WORLDTINI <cl> (a d d -to -w o r ld -ty p e -e q u iv s p a u l-r e c ’r e sta u r a n t ’ ( (r-name r -p h o n e ))) NIL <cl> (a d d -to -w o r ld -ty p e -e q u iv s p a u l-r e c ’person ’ ( (p-nam e)) ) NIL <cl> (sa v e -w o r ld -e q u iv s p a u l-r e c :in c o r p o r a te -fr o m -b u ffe r n i l ) NIL <cl> (r e sto r e -w o r ld -e q u iv s p a u l-r e c ) (NIL NIL) 229 R esto re and M erge T he tran scrip ts below shows th e m erge of paul-recommendation-world w ith th e buffer database. T he preference specification specifies th a t s p e c i a l t i e s in th e resulting d atab ase is constrained 0 to u > . However, since preference is specified as 1, the values from th e workspace is preferred. T his preference removes th e relation­ ship only w hen th e resulting targ et contains inform ation from b o th th e workspace and th e newly restored database. M atching values, or undefined values from th e preferred d atab ase will not cause tuples to be rem oved from th e workspace. <cl> ( r e s t o r e - in s t a n c e p a u l-r e c :p r e fe r e n c e -sp e c ' ((c a r d -p r e f 1 s p e c i a l t i e s 1 0 -1 ) (comp c o s t 1 com pu te-avg-of-num bers))) Removing (SPECIALTIES #,(DBO DBOBJECT RESTAURANT 23752745) "soba") from th e w orlds W ORLD:PAUL-RECOMMENDATION-WORLDTINI. Removing (SPECIALTIES #,(DB0 DBOBJECT RESTAURANT 23752745) "soba") from th e workspace. R e su lts f o r o b je c t #,(DB0 DBOBJECT RESTAURANT 23752745) are (18 20) Removing (COST #,(DB0 DBOBJECT RESTAURANT 23752745) 18) from th e w orlds W ORLD:PAUL-RECOMMENDATION-WORLDTINI. Removing (COST #,(DB0 DBOBJECT RESTAURANT 23752745) 18) from th e w orlds (WORLD: JOE-REC-2TINI). Removing (COST #,(DB0 DBOBJECT RESTAURANT 23752745) 18) from th e workspace. Removing (COST #,(DB0 DBOBJECT RESTAURANT 23752745) 20) from th e w orlds W ORLD:PAUL-RECOMMENDATION-WORLDTINI. Removing (COST #,(DB0 DBOBJECT RESTAURANT 23752745) 20) from th e w orlds (WORLD:J0E-REC-2TINI). Removing (COST #,(DB0 DBOBJECT RESTAURANT 23752745) 20) from th e w orksp ace. New tu p le i s (COST # } (DB0 DBOBJECT RESTAURANT 23752745) 20) NIL 230 T he user m ay query the resulting workspace dom ain to ensure th a t th e popu­ lation of paul-recommendation-world are loaded and m erged. Below, he specifically queries a r e s t a u r a n t object th a t has been m erged, and exam ines its results. Note th a t th e s p e c i a l t i e s relationships are taken from th e buffer (preferred db = 1), and s p e c i a l t i e s from paul-recommendation-world is rem oved. N ote also th a t th e cost is th e result of a weighted average. <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f ( x ) s . t . (r e sta u ra n t x ) ) ) ) ( # , (DBO DBOBJECT RESTAURANT 27206577) # , (DBO DBOBJECT RESTAURANT 27217833) # , (DBO DBOBJECT RESTAURANT 23752745) # , (DBO DBOBJECT RESTAURANT 23753985) # , (DBO DBOBJECT RESTAURANT 23754001)) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x) s . t . (person x ) ) ) ) ( # , (DBO DBOBJECT PERSON 27217857) # , (DBO DBOBJECT PERSON 27217881) # , (DBO DBOBJECT PERSON 27217905) # , (DBO DBOBJECT PERSON 27217929) # , (DBO DBOBJECT PERSON 23992289)) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f (x y) s . t . (r-name x y ) ) ) ) ( ( # , (DBO DBOBJECT RESTAURANT 27206577) "eastw ind ca fe" ) ( # , (DBO DBOBJECT RESTAURANT 27217833) "akbar") ( # , (DBO DBOBJECT RESTAURANT 23754001) "m a-ri-na") ( # , (DBO DBOBJECT RESTAURANT 23753985) "kifune") ( # , (DBO DBOBJECT RESTAURANT 23752745) "minato")) 231 <cl> ( s e t f mnt ( f i r s t (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f x s . t . (and (r e sta u ra n t x) (r-name x "m inato")) ) ) ) ) ) #,(DBO DBOBJECT RESTAURANT 23752745) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f x s . t . ( s p e c i a l t i e s mnt x ) ) ) ) ("sashim i" "sukiyaki") <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ( l i s t o f x s . t . (c o s t mnt x ) ) ) ) (2 0 ) The user may create new world(s) to contain the resulting database. In this case, he creates a schema-based world (paul-plus-joe-1), and a closure-based world (paul-plus-joe-2), populates the closure based world with all r e sta u r a n t objects, and saves them. A persistent form of the two worlds are presented in section D.5. <cl> (r e g is t e r - w o r ld ’p a u l - p l u s - j o e - i ’t i n i ’recommendation ’t i n i ) T <cl> ( s e t f p j 1 (lo a d -w o rld ’p a u l- p lu s - j o e - 1 ’t i n i ) ) WORLD:PAUL-PLUS-J0E-1TINI <cl> (s a v e - in s t a n c e p j l ) NIL ;; c r e a tin g a second world (c lo s u r e based) <cl> (r e g is t e r -w o r ld ’p a u l- p lu s - j o e - 2 ’t i n i 232 ’recommendation ’t i n i ’ ((r e s ta u r a n t (r-name ® $) (r-phone @ !) (r -ty p e < 3 !) (r -a d d r ess < 3 !) ( s p e c i a l t i e s 0 !) (c o s t 0 !) (recommendation 0 ! !) ) (person (p-name < 3 ! ) ) ) ) T <cl> ( s e t f pj2 (lo a d -w o rld ’p a u l- p lu s - j o e - 2 ’t i n i ) ) W ORLD:PAUL-PLUS-JOE-2TINI <cl> (r e s to r e -w o r ld -c lo s u r e p j2) (NIL NIL) <cl> (a d d -seed s pj2 (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ' ( l i s t o f r s . t . (r e sta u ra n t r ) ) ) ) ) (NIL NIL NIL NIL NIL) <cl> (s a v e -in s ta n c e p j2) NIL ;; ch eck ing p o p u la tio n o f c lo su r e -b a se d world <cl> ( l i s t o f x s . t . (in w orld pj2 x )) ( # , (DBO DBOBJECT PERSON 24281457) # , (DBO DBOBJECT PERSON 24281409) # , (DBO DBOBJECT PERSON 24281441) # , (DBO DBOBJECT PERSON 24281425) # , (DBO DBOBJECT PERSON 23992289) # , (DBO DBOBJECT RESTAURANT 23754001) # , (DBO DBOBJECT RESTAURANT 23753985) # , (DBO DBDBJECT RESTAURANT 23752745) # , (DBO DBOBJECT RESTAURANT 24281377) 233 # , (DBO DBOBJECT RESTAURANT 24281393)) D .3 .2 M erging: O verlap p in g S ch em as T he user would like to view th e m erged world (paul-plus-joe-1) under recommen­ dation together w ith restaurant-info world. Since the two schem as overlap for type r e s t a u r a n t , th e user m ay m erge r e s t a u r a n t objects from th e two worlds by spec­ ifying its equivalence specifications. M erging r e s t a u r a n t objects m ay cause other relationships to have m ultiple, perhaps am biguous values. In this case, we provide preference specifications to deal w ith th e resulting am biguities in th e tuples. To m erge th e two worlds, th e user m ust set up the w orkspace for th e m erge. As­ sum ing th e schem as are already loaded and m erged, he first clears th e workspace. T hen he loads th e first world (restaurant-info) into th e w orkspace, sets up the w orkspace equivalence specification for the m erge, and loads and m erges the sec­ ond w orld (paul-plus-joe-1) into th e sam e workspace. Once th e objects are m erged, th e tuples are reconciled to th eir constraints using preference specifications. S e t t i n g u p f o r R e s t o r e / M e r g e T he following shows tran scrip ts to clear th e workspace; set up equivalence speci­ fication for the w orkspace (which is the intersection of equivalence specifications of th e two worlds to be m erged); setting up functions or th a t deal w ith conflicts in th e resulting merge; and restoring th e first world (restaurant-info). <cl> (clear-w orldschem a-dom ain (fin d -sch em a ’recommendation ’t i n i ) t t ) Warning: rem oving t u p le s in shared r e la t io n # , (DBO RELATION COST). Warning: rem oving tu p le s in shared r e la t io n # , (DBO RELATION PAUL-R-TYPE). Warning: rem oving t u p le s in shared r e la t io n # , (DBO RELATION SPECIALTIES). Warning: removing tu p le s in shared r e la t io n # , (DBO RELATION R-PHONE). 234 Warning: removing t u p le s in shared r e la t io n # , (DBO RELATION R-NAME). Warning: removing t u p le s in shared r e la t io n # , (DBO TYPE RESTAURANT). ((NIL) (NIL NIL NIL) -------) ;; p r e v io u s ly loaded r e s ta u r a n t-in fo world <cl> r e s t - i n f o W ORLD:RESTAURANT-INFOTINI T he workspace equivalence is asserted indirectly by loading world equivalence specifications into th e workspace. T his is achieved in several steps. F irst, the w orld’s equivalence specification is asserted in th e workspace. N ext, th e w orld’s equivalence specification is saved; this save does not incorporate th e workspace equivalence specification into th e world equivalence specification. Finally, the w orld’s equivalence specification is restored to th e workspace, and is incorporated into th e workspace equivalence specification. <cl> (a d d -to -w o r ld -ty p e -e q u iv s r e s t - i n f o 'r e s ta u r a n t ’ ((name phone) (name a d d r e s s))) NIL <cl> (a d d -to -w o r ld -ty p e -e q u iv s r e s t - i n f o ’ add ress ’ ( ( s t r e e t c i t y ) ) ) NIL <cl> (sa v e -w o rld -e q u iv s r e s t - i n f o : in c o r p o r a te -fr o m -b u ffe r n i l ) Saving e q u iv a len ce spec of world WORLD:RESTAURANT-INFOTINI as ((ADDRESS ((STREET CITY))) (RESTAURANT ((NAME PHONE) (NAM E ADDRESS)))) NIL <cl> (r e s to r e -w o r ld -e q u iv s r e s t - i n f o ) (NIL NIL) 235 <cl> ( r e s t o r e - in s t a n c e r e s t - i n f o ) NIL T he tran scrip ts below sets up th e next world to be loaded, an d the functions required to com pute m axim um of num bers. T he function is provided as com pu­ tatio n to deal w ith th e resulting m erge. Because lo a d -w o rld is decoupled from restore to show th e various com ponent interactions, th e user m ust first load the w orld and its equivalence specification before restoring the world instance. <cl> ( s e t f p a u l- p lu s - j o e (lo a d -w o rld ’p a u l- p lu s - j o e - 1 ’t i n i ) ) WORLDPAUL-PLUS-J0E-1TINI <cl> (a d d -to -w o r ld -ty p e -e q u iv s p a u l- p lu s - j o e ’resta u ra n t ’ ((r-nam e r -p h o n e ))) NIL <cl> (a d d -to -w o r ld -ty p e -e q u iv s p a u l- p lu s - j o e ’person ’ ((p-narae))) NIL <cl> (sa v e -w o rld -e q u iv s p a u l- p lu s - j o e : in c o r p o r a te -fr o m -b u ffe r n i l ) NIL <cl> (r e s to r e -w o r ld -e q u iv s p a u l- p lu s - j o e ) (NIL NIL) <cl> (defun compute-max-of-numbers (relnam e ob r e s u l t s dbl db2) ( l e t (new-tup m ax-val) (when debugging (form at t ""&Results f o r o b je c t ”S are “S“% " ob r e s u l t s ) ) (lo o p f o r r e s in r e s u l t s do ( l e t ((tu p ‘ (,reln am e ,ob , r e s ) ) ) (rem o v e-tu p -in db2 tup) 236 (rem o v e-tu p -in dbl tup t ) ) ) ( s e t f m ax-val (ap p ly # ’max r e s u l t s ) ) ( s e t f new-tup ( (.reln am e ,ob , m ax-val)) (when debugging (form at t "~&New tu p le i s ~S~%" n ew -tu p)) (e v a l ‘ (++ .relnam e ’ ,ob , m a x - v a l)))) COMPUTE-MAX-OF-NUMBERS R esto re and M erge R estoring th e instance of paul-plus-joe-1 causes it to be m erged w ith the existing w orkspace d atab ase (th a t contains only rest-info). T he preference specification causes som e of th e tuples of th e m erged objects to be rem oved, from th e world itself, and from th e workspace database. For instance, since s p e c i a l t i e s is spec­ ified to prefer th e world being loaded (preferred d atab ase = 2), tuples from rest- info which do not m atch those in paul-plus-joe is rem oved from th e w orld and the workspace. Those th a t m atch, or those not specified in paul-plus-joe rem ains. <cl> ( r e s t o r e - in s t a n c e p a u l- p lu s - j o e :p r e fe r e n c e -sp e c (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’recommendation ’t i n i ) ’ ((c a r d -p r e f 2 s p e c i a l t i e s 1 0 -1 ) (comp c o s t 1 com pute-m ax-of-num bers)) ) ) To remove (SPECIALTIES #,(DB0 DBOBJECT FOREIGN 13166521) "tempura") from th e w orlds (WORLDRESTAURANT-INFOTINI). To remove (SPECIALTIES # , (DBO DBOBJECT FOREIGN 13166521) "tempura") from th e workspace. To remove (SPECIALTIES #,(DB0 DBOBJECT FOREIGN 13166201) "crab r o l l " ) from th e w orlds (WORLDRESTAURANT-INFOTINI). To remove (SPECIALTIES # , (DBO DBOBJECT FOREIGN 13166201) "crab r o l l " ) from th e workspace. To remove (SPECIALTIES #,(DB0 DBOBJECT FOREIGN 13195281) "pad th a i" ) from th e w orlds (WORLDRESTAURANT-INFOTINI). 237 To remove (SPECIALTIES #,(DBO DBOBJECT FOREIGN 13195281) "pad th a i" ) from th e workspace. R e su lts f o r o b je c t #,(DB0 DBOBJECT FOREIGN 13166489) are (20 17) To remove (COST #,(DB0 DBOBJECT FOREIGN 13166489) 20) from th e w orlds WORLDMY-REC-PLUS-J0E1TINI. To remove (COST #,(DB0 DBOBJECT FOREIGN 13166489) 20) from th e w orlds (WORLDRESTAURANT-INFOTINI). To remove (COST #,(DB0 DBOBJECT FOREIGN 13166489) 20) from th e w orksp ace. To remove (COST #,(DB0 DBOBJECT FOREIGN 13166489) 17) from th e w orlds WORLDMY-REC-PLUS-J0E1TINI. To remove (COST #,(DB0 DBOBJECT FOREIGN 13166489) 17) from th e w orlds (WORLDRESTAURANT-INFOTINI). To remove (COST #,(DB0 DBOBJECT FOREIGN 13166489) 17) from th e workspace. New tu p le i s (COST #,(DB0 DBOBJECT FOREIGN 13166489) 20) R e su lts f o r o b je c t #,(DB0 DBOBJECT FOREIGN 13166521) are (30 24) To remove (COST #,(DB0 DBOBJECT FOREIGN 13166521) 30) from th e w orlds WORLDMY-REC-PLUS-J0E1TINI. To remove (COST #,(DB0 DBOBJECT FOREIGN 13166521) 30) from th e w orlds (WORLDRESTAURANT-INFOTINI). To remove (COST #,(DB0 DBOBJECT FOREIGN 13166521) 30) from th e workspace. To remove (COST #,(DB0 DBOBJECT FOREIGN 13166521) 24) from th e w orlds WORLDMY-REC-PLUS-J0E1TINI. To remove (COST #,(DB0 DBOBJECT FOREIGN 13166521) 24) from th e w orlds (WORLDRESTAURANT-INFOTINI). To remove (COST #,(DB0 DBOBJECT FOREIGN 13166521) 24) from th e workspace. New tu p le i s (COST #,(DB0 DBOBJECT FOREIGN 13166521) 30) R e su lts f o r o b je c t #,(DB0 DBOBJECT FOREIGN 13195281) are (8 9) To remove (COST #,(DB0 DBOBJECT FOREIGN 13195281) 8) from th e w orlds WORLDMY-REC-PLUS-J0E1TINI. 238 To remove (COST #,(DB0 DBOBJECT FOREIGN 13195281) 8) from th e w orlds (WORLDRESTAURANT-INFOTINI). To remove (COST #,(DB0 DBOBJECT FOREIGN 13195281) 8) from th e workspace. To remove (COST #,(DB0 DBOBJECT FOREIGN 13195281) 9) from th e w orlds WORLDMY-REC-PLUS-J0E1TINI. To remove (COST #,(DB0 DBOBJECT FOREIGN 13195281) 9) from th e w orlds (WORLDRESTAURANT-INFOTINI). To remove (COST # , (DBO DBOBJECT FOREIGN 13195281) 9) from th e workspace. New tu p le i s (COST #,(DB0 DBOBJECT FOREIGN 13195281) 9) R e su lts f o r o b je c t #,(DB0 DBOBJECT FOREIGN 13195177) are (22 24) To remove (COST #,(DB0 DBOBJECT FOREIGN 13195177) 22) from th e w orlds WORLDMY-REC-PLUS-J0E1TINI. To remove (COST #,(DB0 DBOBJECT FOREIGN 13195177) 22) from th e w orlds (WORLDRESTAURANT-INFOTINI). To remove (COST #,(DB0 DBOBJECT FOREIGN 13195177) 22) from th e workspace. To remove (COST #,(DB0 DBOBJECT FOREIGN 13195177) 24) from th e w orlds W0RLDMY-REC-PLUS-J0E1TINI. To remove (COST #,(DB0 DBOBJECT FOREIGN 13195177) 24) from th e w orlds (WORLDRESTAURANT-INFOTINI). To remove (COST #,(DB0 DBOBJECT FOREIGN 13195177) 24) from th e workspace. New tu p le i s (COST #,(DB0 DBOBJECT FOREIGN 13195177) 24) R e su lts f o r o b je c t #,(DB0 DBOBJECT FOREIGN 13166201) are (20 19) To remove (COST #,(DB0 DBOBJECT FOREIGN 13166201) 20) from th e w orlds WORLDMY-REC-PLUS-J0E1TINI. To remove (COST #,(DB0 DBOBJECT FOREIGN 13166201) 20) from th e w orlds (WORLDRESTAURANT-INFOTINI). To remove (COST #,(DB0 DBOBJECT FOREIGN 13166201) 20) from th e workspace. To remove (COST #,(DB0 DBOBJECT FOREIGN 13166201) 19) 239 from th e w orlds W0RLDMY-REC-PLUS-J0E1TINI. To remove (COST # ,(0 8 0 DBOBJECT FOREIGN 13166201) 19) from th e w orlds (WORLDRESTAURANT-INFOTINI). To remove (COST #,(DB0 DBOBJECT FOREIGN 13166201) 19) from th e workspace. New tu p le i s (COST #,(DB0 DBOBJECT FOREIGN 13166201) 20) NIL T he resulting workspace d atabase can be queried by th e user. T he values of a v g - p ric e and s p e c i a l t i e s of m erged r e s t a u r a n t s should be of p articu lar interest. <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a ’r e s t - g u id e ’t i n i ) *( l i s t o f (x y) s . t . (name x y ) ) ) ) ( ( # , (DB0 DBOBJECT FOREIGN 13195177) "akbar") ( # , (DB0 DBOBJECT RESTAURANT 13195209) " el t o r it o " ) ( # , (DB0 DBOBJECT RESTAURANT 13195241) "miami sp ic e" ) ( # , (DB0 DBOBJECT FOREIGN 13195281) "eastw ind ca fe" ) ( # , (DBO DBOBJECT FOREIGN 13166201) "minato") ( # , (DBO DBOBJECT RESTAURANT 13166457) "acapulco") ( # , (DBO DBOBJECT FOREIGN 13166489) "kifune") ( # , (DBO DBOBJECT FOREIGN 13166521) "m a-ri-na") ( # , (DBO DBOBJECT AMERICAN 13166561) "marie c a lle n d e r s" ) ( # , (DBO DBOBJECT FOREIGN 13141145) "minato")) <cl> (e v a l (a d ju st-to -b u ffe r -n a m e s (fin d -sch em a 'r e s t - g u id e ’t i n i ) } ( l i s t o f (x y) s . t . (a v g -p r ic e x y ) ) ) ) ( ( # , (DBO DBOBJECT FOREIGN 13166201) 20) ( # , (DBO DBOBJECT FOREIGN 13195177) 24) ( # , (DBO DBOBJECT FOREIGN 13195281) 9) ( # , (DBO DBOBJECT FOREIGN 13166521) 30) ( # , (DBO DBOBJECT FOREIGN 13166489) 20) 240 ( # , (DBO DBOBJECT RESTAURANT 13195209) 10) (it, (DBO DBOBJECT RESTAURANT 13195241) 25) ( # , (DBO DBOBJECT RESTAURANT 13166457) 17) ( # , (DBO DBOBJECT AM ERICAN 13166561) 8) ( # , (DBO DBOBJECT FOREIGN 13141145) 18)) < c l > ( e v a l ( a d j u s t - t o - b u f f e r - n a m e s ( f i n d - s c h e m a ’ r e s t - g u i d e ’ t i n i ) } ( l i s t o f G c y ) s . t . ( s p e c i a l t i e s x y ) ) ) ) ( ( # , ( DBO DBOBJECT FOREIGN 13166521) " i n a r i " ) ( # , (DBO DBOBJECT FOREIGN 13166201) " s u k i y a k i " ) ( # , (DBO DBOBJECT FOREIGN 13166201) " s a s h i m i " ) ( # , (DBO DBOBJECT FOREIGN 13195281) " i c e c o f f e e " ) ( # , (DBO DBOBJECT FOREIGN 13195177) " t a n d o o r i " ) ( # , (DBO DBOBJECT RESTAURANT 13195209) " q u e s a d i l l a " ) ( # , (DBO DBOBJECT RESTAURANT 13195241) " j a z z " ) (it, (DBO DBOBJECT RESTAURANT 13166457) " t a c o s " ) (it, (DBO DBOBJECT FOREIGN 13166489) " t e r i y a k i " ) ( # , (DBO DBOBJECT AM ERICAN 13166561) " p i e s " ) ( # , (DBO DBOBJECT FOREIGN 13141145) " s u s h i " ) ) S a v in g th e W o rk s p a c e T he user m ay create a new world as a copy of the w orkspace database. He first creates a new world schem a (re s ta u ra n t-a n d -re c o m m e n d a tio n ) to contain a copy of the w orkspace schem a. N ext, he creates a new world instance ( r e s t- a n d - r e c - 1 ) of th e new schem a and populates it w ith th e w orkspace instance. T he resulting instance can be viewed in its persistent form in section D.5 u nder r e s t- a n d - r e c - 1 . < c l > ( s e t f d e b u g g i n g t ) T < c l > ( c o p y - b u f f e r - i n t o - s c h e m a ’ r e s t a u r a n t - a n d - r e c o m m e n d a t i o n ’ t i n i ) 241 In clu d in g b ase r e la t io n P-NAME a t t s : (PERSON STRING) In c lu d in g b ase r e la t io n RECOM M ENDATION a t t s : (RESTAURANT PERSON INTEGER) In c lu d in g b ase r e la t io n COST a t t s : (RESTAURANT INTEGER) In clu d in g b ase r e la t io n PAUL-R-ADDRESS a t t s : (RESTAURANT STRING) In clu d in g b ase r e la t io n PAUL-R-TYPE a t t s : (RESTAURANT STRING) In c lu d in g b ase r e l a t i o n SPECIALTIES a t t s : (RESTAURANT STRING) In c lu d in g b ase r e l a t i o n R-PHONE a t t s : (RESTAURANT STRING) In c lu d in g b ase r e la t io n R-NAME a t t s : (RESTAURANT STRING) In c lu d in g base ty p e PERSON In c lu d in g base ty p e RESTAURANT In c lu d in g base r e l a t i o n REST-GUIDE-TINI-CITY a t t s : (ADDRESS STRING) In clu d in g base r e l a t i o n REST-GUIDE-TINI-STREET a t t s : (ADDRESS STRING) In c lu d in g base r e l a t i o n FACILITIES a t t s : (RESTAURANT STRING) In c lu d in g base r e la t io n HOURS a t t s : (RESTAURANT STRING) In c lu d in g base r e l a t i o n LOCATION a t t s : (RESTAURANT ADDRESS) In clu d in g b ase ty p e ADDRESS In c lu d in g subtype FOREIGN supertype RESTAURANT In c lu d in g subtype AMERICAN sup ertyp e RESTAURANT In c lu d in g subtype FAST-FOOD supertype RESTAURANT NIL <cl> ( s e t f debugging n i l ) NIL <cl> ( s e t f rnr (fin d -sch em a ’restaurant-and-recom m endation ’t i n i ) ) SCHEMARESTAURANT-AND-RECOMMENDATIONTINI <cl> (c o p y -b u ffe r -in to -w o r ld ’r e s t - a n d - r e c -1 ’t i n i ’restau ran t-and-recom m endation ’t i n i 242 :u s e -c lo s u r e n i l :u se -e q u iv s t ) A ss e r tin g equiv o f world W0RLDREST-AND-REC-1TINI ty p e RESTAURANT: ((R-NAME R-PHONE)) A ss e r tin g eq u iv o f world W0RLDREST-AND-REC-1TINI ty p e PERSON: ((P-NAME)) A ss e r tin g eq u iv o f world W0RLDREST-AND-REC-1TINI ty p e ADDRESS: ( (REST-GUIDE-TINI-STREET REST-GUIDE-TINI-CITY)) Saving eq u iv s o f world W0RLDREST-AND-REC-1TINI NIL <cl> (s a v e -in s ta n c e (fin d -w o r ld ’r e s t - a n d - r e c -1 ’t i n i ) ) NIL D .4 V erb ose M o d e O p eration s T his section provides tran scrip ts of some of the above operations in verbose mode. T he verbose m ode is useful in understanding th e steps taken by th e operations, since it provides inform ative m essages while perform ing th e operations. D .4 .1 R e sto r e S ch em a T his subsection provides tran scrip ts of re sto re -w o rld sc h e m a operation for enter­ tainment, recommendation and rest-guide in verbose m ode. R esto re E n tertain m en t Schem a T he following is a tra n scrip t of re sto re -w o rld sc h e m a of entertainment in verbose m ode. <cl> ( s e t f debugging t ) T <cl> (restore-w orld sch em a ent ’ ((r e s ta u r a n t jo e -r e s ta u r a n t) ( s p e c ia lt y s p e c ia l t y ) 243 (cu rren t-sh ow sh ow in g))) B asetype d e c la r a tio n : (DEFRELATION ENTERTAINMENT :DERIVATION BASETYPE) % B asetyp e d e c la r a tio n : (DEFRELATION JOE-RESTAURANT DERIVATION BASETYPE) su p ertyp e d e c la r a tio n : (++ SUBTYPE (RELATIONP ’JOE-RESTAURANT) (RELATIONP ’ENTERTAINMENT)) B asetyp e d e c la r a tio n : (DEFRELATION THEATRE DERIVATION BASETYPE) su p ertyp e d e c la r a tio n : (++ SUBTYPE (RELATIONP ’THEATRE) (RELATIONP ’ENTERTAINMENT)) B asetyp e d e c la r a tio n : (DEFRELATION N A M E :TYPES (ENTERTAINMENT STRING)) B asetype d e c la r a tio n : (DEFRELATION RATING :TYPES (ENTERTAINMENT STRING)) B asetype d e c la r a tio n : (DEFRELATION PHONE :TYPES (ENTERTAINMENT STRING)) B asetype d e c la r a tio n : (DEFRELATION SPECIALTY :TYPES (JOE-RESTAURANT STRING)) B asetype d e c la r a tio n : (DEFRELATION ENTERTAINMENT-TINI-AVG-PRICE :TYPES (JOE-RESTAURANT STRING)) B asetype d e c la r a tio n : (DEFRELATION ENTERTAINMENT-TINI-R-TYPE :TYPES (JOE-RESTAURANT STRING)) B asetyp e d e c la r a tio n : (DEFRELATION SHOW ING :TYPES (THEATRE STRING)) B asetyp e d e c la r a tio n : (DEFRELATION ENTERTAINMENT-TINI-T-TYPE :TYPES (THEATRE STRING)) SCHEMA:ENTERTAINMENTTINI 244 R esto re R estau ran t Schem a The following is a transcript of restore-w orld sch em a of rest-guide in verbose mode. <cl> (restore-w orld sch em a r e s t ’ ( ( c l a s s i f i c a t i o n p a u l-r -ty p e ) (a v g -p r ic e c o s t ) (name r-name) (phone r -p h o n e ))) B asetype d e c la r a tio n : (DEFRELATION RESTAURANT :DERIVATION BASETYPE) B asetype d e c la r a tio n : (DEFRELATION FAST-FOOD DERIVATION BASETYPE) su p ertyp e d e c la r a tio n : (++ SUBTYPE (RELATIONP ’FAST-FOOD) (RELATIONP ’RESTAURANT)) B asetype d e c la r a tio n : (DEFRELATION AMERICAN DERIVATION BASETYPE) su p ertyp e d e c la r a tio n : (++ SUBTYPE (RELATIONP ’AMERICAN) (RELATIONP ’ RESTAURANT)) B asetype d e c la r a tio n : (DEFRELATION FOREIGN DERIVATION BASETYPE) su p ertyp e d e c la r a tio n : (++ SUBTYPE (RELATIONP ’FOREIGN) (RELATIONP ’RESTAURANT)) B asetype d e c la r a tio n : (DEFRELATION ADDRESS DERIVATION BASETYPE) B asetype d e c la r a tio n : (DEFRELATION R-NAME :TYPES (RESTAURANT STRING)) B asetype d e c la r a tio n : (DEFRELATION LOCATION :TYPES (RESTAURANT ADDRESS)) B asetype d e c la r a tio n : (DEFRELATION R-PHONE -.TYPES (RESTAURANT STRING)) B asetype d e c la r a tio n : (DEFRELATION COST :TYPES (RESTAURANT INTEGER)) 245 B asetype d e c la r a tio n : (DEFRELATION HOURS .-TYPES (RESTAURANT STRING)) B asetype d e c la r a tio n : (DEFRELATION PAUL-R-TYPE :TYPES (RESTAURANT STRING)) B asetype d e c la r a tio n : (DEFRELATION SPECIALTIES :TYPES (RESTAURANT STRING)) B asetype d e c la r a tio n : (DEFRELATION FACILITIES :TYPES (RESTAURANT STRING)) B asetype d e c la r a tio n : (DEFRELATION REST-GUIDE-TINI-STREET :TYPES (ADDRESS STRING)) B asetyp e d e c la r a tio n : (DEFRELATION REST-GUIDE-TINI-CITY :TYPES (ADDRESS STRING)) SCHEMA:REST-GUIDETINI R esto re R ecom m en d ation Schem a T he following is a tran scrip t of re s to re -w o rld sc h e m a of recommendation in ver­ bose m ode. Note th a t not all relations are created; some are found in th e workspace and shared. <cl> (restore-w orld sch em a rec ' ( ( r - t y p e p a u l-r -ty p e ) (r -a d d r e ss p a u l-r -a d d r e ss) (c o s t c o s t ) ) ) Stored ty p e RESTAURANT s t a t u s SHARED e x i s t s - sh a rin g i t . B asetype d e c la r a tio n : (DEFRELATION PERSON DERIVATION BASETYPE) Stored r e l a t i o n R-NAME s t a t u s SHARED e x i s t s - sh a rin g i t . Stored r e l a t i o n R-PHONE s t a t u s SHARED e x i s t s - sh a rin g i t . Stored r e l a t i o n SPECIALTIES s t a t u s SHARED e x i s t s - sh a rin g i t . Warning: s t a t u s o f e x i s t i n g r e l a t i o n PAUL-R-TYPE o f schema SCHEMA:REST-GUIDETINI i s not equal t o th a t o f schema SCHEMA:RECOMMENDATIONTINI. Stored r e l a t i o n PAUL-R-TYPE s t a t u s INTERNAL e x i s t s - sh a rin g i t . 246 B asetype d e c la r a tio n : (DEFRELATION PAUL-R-ADDRESS .-TYPES (RESTAURANT STRING)) Warning: s t a t u s of e x i s t i n g r e la t io n COST o f schema SCHEMA:REST-GUIDETINI i s not equal to th a t of schema SCHEMA:RECOMMENDATIONTINI. Stored r e l a t i o n COST s t a t u s INTERNAL e x i s t s - sh arin g i t . B asetyp e d e c la r a tio n : (DEFRELATION RECOM M ENDATION :TYPES (RESTAURANT PERSON INTEGER)) B asetyp e d e c la r a tio n : (DEFRELATION P-NAME :TYPES (PERSON STRING)) SCHEMA:RECOMMENDATIONTINI D .4 .2 C learin g D om ain s T he following are tran scrip ts to c le a r-w o rld sc h em a -d o m a in in verbose mode. U n s h a r e d S c h e m a T he following is a verbose m ode tran scrip t to clear the dom ain of a world schema. Since th e schem a is unshared by o ther schem as, all th e tuples of th e schem a are rem oved. <cl> (clear-w orldschem a-dom ain r e s t ) Removing t u p le s in REST-GUIDE-TINI-CITY. Removing t u p le s in REST-GUIDE-TINI-STREET. Removing t u p le s in FACILITIES. Removing t u p le s in SPECIALTIES. Removing t u p le s in PAUL-R-TYPE. Removing t u p le s in HOURS. Removing t u p le s in COST. Removing t u p le s in R-PHONE. Removing t u p le s in LOCATION. Removing t u p le s in R-NAME. Removing t u p le s in ADDRESS. Removing t u p le s in FOREIGN. 247 Removing t u p le s in AMERICAN. Removing t u p le s in FAST-FOOD. Removing t u p le s in RESTAURANT. NIL Shared Schem a C lear-w o rld sch em a-d o m ain can be evaluated w ith th e second p aram eter tru e to indicate rem oval of th e instance regardless of world depending on it. However, it does not rem ove tuples of relations shared by other schem as. <cl> (clear-w orldschem a-dom ain (fin d -sch em a ’r e s t - g u id e ’t i n i ) t n i l ) Removing t u p le s in REST-GUIDE-TINI-CITY. Removing tu p le s in REST-GUIDE-TINI-STREET. Removing tu p le s in FACILITIES. Not removing t u p le s in shared r e la t io n # , (DBO RELATION SPECIALTIES). Not removing t u p le s in shared r e la t io n # , (DBO RELATION PAUL-R-TYPE). Removing t u p le s in HOURS. Not removing t u p le s in shared r e la t io n # , (DBO RELATION COST). Not removing t u p le s in shared r e la t io n # , (DBO RELATION R-PHONE). Removing t u p le s in LOCATION. Not removing t u p le s in shared r e la t io n # , (DBO RELATION R-NAME). Removing tu p le s in ADDRESS. Removing t u p le s in FOREIGN. Removing t u p le s in AMERICAN. Removing t u p le s in FAST-FODD. Not removing t u p le s in shared r e la t io n # , (DBO TYPE RESTAURANT). 248 ((NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL) '...) T he th ird p aram eter to clear-w o rld sc h em a-d o m ain is used to indicate re­ m oval of tuples in shared relations. Tuples of shared relations are rem oved w ith a warning. <cl> (clear-w orldschem a-dom ain (fin d -sch em a ’recommendation ’t i n i ) t t ) Removing t u p le s in P-NAME. Removing t u p le s in RECOMMENDATION. Warning: removing tu p le s in shared r e la t io n # , (DBO RELATION COST). Removing t u p le s in COST. Removing t u p le s in PAUL-R-ADDRESS: Warning: rem oving t u p le s in shared r e la t io n # , (DBO RELATION PAUL-R-TYPE). Removing t u p le s in PAUL-R-TYPE. Warning: rem oving t u p le s in shared r e la t io n # , (DBO RELATION SPECIALTIES). Removing t u p le s in SPECIALTIES. Warning: rem oving tu p le s in shared r e l a t i o n # , (DBO RELATION R-PHONE). Removing t u p le s in R-PHONE. Warning: removing t u p le s in shared r e la t io n # , (DBO RELATION R-NAME). Removing t u p le s in R-NAME. Removing t u p le s in PERSON. Warning: removing t u p le s in shared r e la t io n # , (DBO RELATION RESTAURANT). Removing t u p le s in RESTAURANT. ((NIL NIL NIL NIL NIL) ...) D .4 .3 B u lk T ransform T h e following is a tra n scrip t of b u lk -1 r a n s f orm in verbose mode. 249 <cl> (b u lk -tran sform l i s t - o f - e n t - t o - r e c - c o r r s ) S o r tin g l o c a l c o r r e sp o n d e n c e s... E v a lu a tin g l o c a l correspondence C(T I-PEESON (* PERSON) (P STRING)) (P) (EQUAL P "joe") NIL) C reatin g in ter m e d ia te r e la t io n IREL-I-PERSON A sse r tin g in ter m e d ia te r e la t io n a t t vars are ((* PERSON) (P STRING)) r e s t u p le i s ("joe") E v a lu a tin g l o c a l correspondence ( (T I-RESTAURANT (* RESTAURANT) (N STRING) (P STRING)) (N X P) (AND (RESTAURANT X) (NAM E X N) (PHONE X P )) NIL) C reating in ter m e d ia te r e la t io n IREL-I-RESTAURANT A sse r tin g in ter m e d ia te r e la t io n a t t vars are ((* RESTAURANT) (N STRING) (P STRING)) r e s t u p le i s ("m a-ri-na" "578-5050") A ss e r tin g in ter m e d ia te r e la t io n a t t v a rs are ((* RESTAURANT) (N STRING) (P STRING)) r e s t u p le i s ("kifune" "822-1595") A sse r tin g in ter m e d ia te r e la t io n a t t v a rs are ((* RESTAURANT) (N STRING) (P STRING)) r e s t u p le i s ("minato" "305-1104") E v a lu a tin g lo c a l correspondence ((R P-NAME P N) NIL NIL ((I-PERSON P N ))) a s s e r t in g to ta r g e t r e l P-NAME tu p le ( # , (DBO DBOBJECT PERSON 23522905) "joe") E v a lu a tin g lo c a l correspondence ((R R-NAME R N) NIL NIL ( (I-RESTAURANT R N P ) )) a s s e r t in g to ta r g e t r e l R-NAME tu p le ( # , (DBO DBOBJECT RESTAURANT 25895489) "minato") a s s e r t in g to ta r g e t r e l R-NAME 250 tu p le ( # , (DBO DBOBJECT RESTAURANT 25895513) "kifune") a s s e r t in g to ta r g e t r e l R-NAME tu p le ( # , (DBO DBOBJECT RESTAURANT 25895537) "m a-ri-na") E v a lu a tin g l o c a l correspondence ((R R-PHONE R P) NIL NIL ( (I-RESTAURANT R N P ) )) a s s e r t in g t o ta r g e t r e l R-PHONE tu p le ( # , (DBO DBOBJECT RESTAURANT 25895489) "305-1104") a s s e r t in g t o t a r g e t r e l R-PHONE tu p le ( # , (DBO DBOBJECT RESTAURANT 25895513) "822-1595") a s s e r t in g to ta r g e t r e l R-PHONE tu p le ( # , (DBO DBOBJECT RESTAURANT 25895537) "578-5050") E v a lu a tin g l o c a l correspondence ( (R SPECIALTIES R S) (N P X S) (AND (RESTAURANT X) (NAM E X N) (PHONE X P) (SPECIALTY X S )) ((I-RESTAURANT R N P ) )) a s s e r t in g to ta r g e t r e l SPECIALTIES t u p le ( # , (DBO DBOBJECT RESTAURANT 25895489) "sashim i") a s s e r t in g to ta r g e t r e l SPECIALTIES tu p le (# , (DBO DBOBJECT RESTAURANT 25895489) "sukiyaki") a s s e r t in g to ta r g e t r e l SPECIALTIES tu p le ( # , (DBO DBOBJECT RESTAURANT 25895513) " te r iy a k i" ) a s s e r t in g t o ta r g e t r e l SPECIALTIES tu p le ( # , (DBO DBOBJECT RESTAURANT 25895537) " in a ri" ) E v a lu a tin g lo c a l correspondence ( (R R-TYPE R Y) (N P X Y) (AND (RESTAURANT X) (NAM E X N) (PHONE X P) (R-TYPE X Y)) ((I-RESTAURANT R N P ))) a s s e r t in g to ta r g e t r e l PAUL-R-TYPE tu p le ( # , (DBO DBOBJECT RESTAURANT 23143745) "japanese") a s s e r t in g to ta r g e t r e l PAUL-R-TYPE 251 t u p le ( # , (DBO DBOBJECT RESTAURANT 23143769) "japanese") a s s e r t in g t o ta r g e t r e l PAUL-R-TYPE t u p le ( # , (DBO DBOBJECT RESTAURANT 23142097) "japanese") E v a lu a tin g l o c a l correspondence C(R COST R C) (N PH X A M T C) (AND (RESTAURANT X) (NAM E X N) (PHONE X PH) (AVG-PRICE X AM T) (AVG-PRICE-TO-COST A M T C)) ((I-RESTAURANT R N PH))) a s s e r t in g to ta r g e t r e l COST tu p le ( # , (DBO DBOBJECT RESTAURANT 25893865) 30) a s s e r t in g to ta r g e t r e l COST tu p le (#,(DB0 DBOBJECT RESTAURANT 25895545) 20) a s s e r t in g to t a r g e t r e l COST tu p le ( # , (DBO DBOBJECT RESTAURANT 25895521) 20) E v a lu a tin g l o c a l correspondence ((R RECOM M ENDATION R P RTI) (RN PH X RT RTI PN) (AND (RESTAURANT X) (NAM E X RN) (PHONE X PH) (RATING X RT) (STAR-TO-INTEGER RT RTI) (STRING-EQUAL PN " jo e" )) ((I-RESTAURANT R RN PH) (I-PERSON P PN))) a s s e r t in g to ta r g e t r e l RECOM M ENDATION tu p le ( # , (DBO DBOBJECT RESTAURANT 23127201) # , (DBO DBOBJECT PERSON 23194617) 8 ) a s s e r t in g to ta r g e t r e l RECOM M ENDATION tu p le ( # , (DBO DBOBJECT RESTAURANT 23127217) # , (DBO DBOBJECT PERSON 23194617) 6) a s s e r t in g t o ta r g e t r e l RECOM M ENDATION tu p le ( # , (DBO DBOBJECT RESTAURANT 23127185) # , (DBO DBOBJECT PERSON 23194617) 10) Removing in ter m e d ia te r e l a t i o n s . . . T ransform ation done 252 NIL D .4 .4 C losu re T ransform T he following is a tra n scrip t of c lo s u r e - tr a n s f o r m in verbose m ode. <cl> (c lo su r e -tr a n sfo r m l i s t - o f - e n t - t o - r e c - c o r r s s e e d - o f - e n t - t o - r e c - c o r r s ’ ( j o e - r e c -2 t i n i ) ) U nloading a l l o th er worlds of schema SCHEMA:RECOMMENDATIONTINI ex cep t WORLD:J0E-REC-2TINI D e le tin g world W ORLD:RECOMMENDATIQN-WORLDTINI from workspace. D e le tin g world W ORLD:PAUL-RECOMMENDATION-WQRLDTINI from w orkspace. D e le t in g world W ORLD:J0E-REC-1TINI from workspace. Removing t u p le s in P-NAME. Removing t u p le s in RECOM M ENDATION. Removing t u p le s in COST. Warning: Removing t u p le s from shared r e l a t i o n #,(DB0 RELATION COST) Removing tu p le s in PAUL-R-ADDRESS. Removing tu p le s in PAUL-R-TYPE. Warning: Removing t u p le s from shared r e la t io n # , (DBO RELATION PAUL-R-TYPE) Removing t u p le s in SPECIALTIES. Warning: Removing t u p le s from shared r e la t io n # , (DBO RELATION SPECIALTIES) Removing t u p le s in R-PHONE. Warning: Removing t u p le s from shared r e la t io n # , (DBO RELATION R-PHONE) Removing t u p le s in R-NAME. Warning: Removing tu p le s from shared r e la t io n # , (DBO RELATION R-NAME) Removing t u p le s in PERSON. 253 Removing t u p le s in RESTAURANT. Warning: Removing t u p le s from shared r e la t io n # , (DBO TYPE RESTAURANT) C reatin g in ter m e d ia te r e l a t i o n IREL-I-PERSON C reatin g in ter m e d ia te r e l a t i o n IREL-I-RESTAURANT E v a lu a tin g seed correspondence ((T I-RESTAURANT (* RESTAURANT) (N STRING) (P STRING)) (N P X) (AND (RESTAURANT X) (NAM E X N) (PHONE X P) (R-TYPE X " jap anese")) NIL) A ss e r tin g seed to in ter m e d ia te r e l . a t t vars are ((* RESTAURANT) (N STRING) (P STRING)) r e s u l t tu p le i s ("m a-ri-na" "578-5050") A ss e r tin g seed to in ter m e d ia te r e l . a t t vars are ((* RESTAURANT) (N STRING) (P STRING)) r e s u l t tu p le i s ("kifune" "822-1595") A ss e r tin g seed t o in ter m e d ia te r e l . a t t vars are ((* RESTAURANT) (N STRING) (P STRING)) r e s u l t tu p le i s ("minato" "305-1104") C a llin g p r e f ix fu n c tio n on o b je c t # , (DBO DBOBJECT RESTAURANT 27426473) and world WORLD:J0E-REC-2TINI F indin g su b tu p les v ia p a tte r n (R-NAME @ $) To g e n e ra te ta r g e t t u p le s f o r r e la t io n R-NAME To g e n era te ta r g e t t u p le s fo r r e l a t i o n IREL-I-RESTAURANT a s s e r t in g to ta r g e t r e la t io n R-NAME tu p le ( # , (DBO DBOBJECT RESTAURANT 27426473) "minato") Applying tu p le p o s t f i x to tu p le (R-NAME # , (DBO DBOBJECT RESTAURANT 27426473) "minato") 254 F in d in g su b tu p les v ia p a tte r n (R-PHONE < 3 !) To g e n era te ta r g e t t u p le s fo r r e la t io n R-PHONE To g en era te ta r g e t t u p le s f o r r e la t io n IREL-I-RESTAURANT a s s e r t in g to t a r g e t r e la t io n R-PHONE tu p le ( # , (DBO DBOBJECT RESTAURANT 23920313) "305-1104") Applying tu p le p o s t f i x t o tu p le (R-PHONE # , (DBO DBOBJECT RESTAURANT 23920313) "305-1104") F in d in g su b tu p les v ia p a tte r n (PAUL-R-TYPE < 3 !) To g en era te ta r g e t t u p le s f o r r e la t io n PAUL-R-TYPE To g en era te ta r g e t tu p le s fo r r e la t io n IREL-I-RESTAURANT a s s e r t in g to ta r g e t r e l a t i o n PAUL-R-TYPE tu p le ( # , (DBO DBOBJECT RESTAURANT 23920313) "japanese") A pplying tu p le p o s t f i x t o tu p le (PAUL-R-TYPE # , (DBO DBOBJECT RESTAURANT 23920313) "japanese") F indin g su b tu p les v ia p a tte r n (PAUL-R-ADDRESS < 3 !) F indin g su b tu p les v ia p a tte r n (SPECIALTIES < 3 !) To g en era te ta r g e t t u p le s f o r r e la t io n SPECIALTIES To g en era te ta r g e t t u p le s fo r r e la t io n IREL-I-RESTAURANT a s s e r t in g t o ta r g e t r e la t io n SPECIALTIES tu p le ( # , (DBO DBOBJECT RESTAURANT 26672881) "sukiyaki") a s s e r t in g to ta r g e t r e la t io n SPECIALTIES tu p le ( # , (DBO DBOBJECT RESTAURANT 26672881) "sashirai") A pplying tu p le p o s t f i x to tu p le (SPECIALTIES # , (DBO DBOBJECT RESTAURANT 26672881) "sashirai") Applying tu p le p o s t f i x to tu p le (SPECIALTIES # } (DBO DBOBJECT RESTAURANT 26672881) "sukiyaki") F in d in g su b tu p les v ia p a tte r n (COST < 3 !) To g e n era te ta r g e t tu p le s fo r r e la t io n COST 255 To g e n e ra te ta r g e t t u p le s fo r r e la t io n IREL-I-RESTAURANT a s s e r t in g t o ta r g e t r e la t io n COST tu p le ( # , (DBO DBOBJECT RESTAURANT 23920481) 20) Applying tu p le p o s t f i x to tu p le (COST # , (DBO DBOBJECT RESTAURANT 23920481) 20) F indin g su b tu p les v ia p a tte r n (RECOM M ENDATION 0 ! !) To g e n era te ta r g e t t u p le s f o r r e l a t i o n RECOM M ENDATION To g e n era te ta r g e t t u p le s f o r r e l a t i o n IREL-I-RESTAURANT To g e n era te ta r g e t t u p le s f o r r e l a t i o n IREL-I-PERSON A sse r tin g in ter m e d ia te r e la t io n a t t vars are ((* PERSON) (P STRING)) r e s t u p le i s ("joe") a s s e r t in g to ta r g e t r e la t io n RECOM M ENDATION tu p le ( # , (DBO DBOBJECT RESTAURANT 23752745) #, (DBO DBOBJECT PERSON 23992289) 10) C a llin g p r e f ix fu n c tio n on o b je c t # , (DBO DBOBJECT PERSON 23992289) and world WORLD:J0E-REC-2TINI A ss e r tin g #,(DB0 DBOBJECT PERSON 23992289) dependent in world W ORLD:J0E-REC-2TINI F in d in g su b tu p les v ia p a tte r n (P-NAME © !) To g e n e ra te ta r g e t t u p le s f o r r e la t io n P-NAME To g e n era te ta r g e t t u p le s f o r r e l a t i o n IREL-I-PERSON a s s e r t in g to ta r g e t r e la t io n P-NAME tu p le ( # , (DBO DBOBJECT PERSON 23992289) "joe") Applying tu p le p o s t f i x to tu p le (P-NAME # , (DBO DBOBJECT PERSON 23992289) "joe") Applying p o s t f i x to #,(DB0 DBOBJECT PERSON 23992289) Applying tu p le p o s t f i x to tu p le (RECOM M ENDATION #,(DB0 DBOBJECT RESTAURANT 23752745) # , (DBO DBOBJECT PERSON 23992289) 10) 256 Applying postfix to #,(DB0 DBOBJECT RESTAURANT 23752745) C a llin g p r e f ix fu n c tio n on o b je c t # , (DBO DBOBJECT RESTAURANT 23753985) and world W ORLD:J0E-REC-2TINI F in d in g su b tu p les v ia p a tte r n (R-NAME < 3 $) To g e n e ra te ta r g e t t u p le s f o r r e la t io n R-NAME To g en era te ta r g e t t u p le s f o r r e la t io n IREL-I-RESTAURANT a s s e r t in g to ta r g e t r e l a t i o n R-NAME tu p le ( # , (DBO DBOBJECT RESTAURANT 23753985) "kifune") Applying tu p le p o s t f i x to tu p le (R-NAME # , (DBO DBOBJECT RESTAURANT 23753985) "kifune") F indin g su b tu p les v ia p a tte r n (R-PHONE < 3 !) To g en era te ta r g e t t u p le s f o r r e la t io n R-PHONE To g e n e ra te ta r g e t t u p le s f o r r e la t io n IREL-I-RESTAURANT a s s e r t in g to ta r g e t r e la t io n R-PHONE tu p le ( # , (DBO DBOBJECT RESTAURANT 23753985) "822-1595") A pplying tu p le p o s t f i x to tu p le (R-PHONE # , (DBO DBOBJECT RESTAURANT 23753985) "822-1595") To g en era te ta r g e t t u p le s f o r r e la t io n PAUL-R-TYPE To g en era te ta r g e t t u p le s f o r r e la t io n IREL-I-RESTAURANT a s s e r t in g to ta r g e t r e la t io n PAUL-R-TYPE tu p le ( # , (DBO DBOBJECT RESTAURANT 23753985) "japanese") A pplying tu p le p o s t f i x t o tu p le (PAUL-R-TYPE # , (DBO DBOBJECT RESTAURANT 23753985) "japanese") F in d in g su b tu p les v ia p a tte r n (PAUL-R-ADDRESS < 3 !) F indin g su b tu p les v ia p a tte r n (SPECIALTIES < 3 !) To g e n era te ta r g e t tu p le s f o r r e la t io n SPECIALTIES To g e n e ra te ta r g e t t u p le s fo r r e la t io n IREL-I-RESTAURANT 257 a s s e r t in g t o t a r g e t r e la t io n SPECIALTIES tu p le ( # , (DBO DBOBJECT RESTAURANT 23753985) " te r iy a k i" ) Applying tu p le p o s t f i x to tu p le (SPECIALTIES # , (DBO DBOBJECT RESTAURANT 23753985) " teriy a k F indin g su b tu p les v ia p a tte r n (COST @ !) To g e n era te ta r g e t t u p le s f o r r e la t io n COST To g en era te ta r g e t t u p le s f o r r e la t io n IREL-I-RESTAURANT a s s e r t in g t o ta r g e t r e la t io n COST tu p le ( # , (DBO DBOBJECT RESTAURANT 23753985) 20) Applying tu p le p o s t f i x to tu p le (COST # , (DBO DBOBJECT RESTAURANT 23753985) 20) F indin g su b tu p le s v ia p a tte rn (RECOM M ENDATION Q ! !) To g e n era te ta r g e t tu p le s fo r r e l a t i o n RECOM M ENDATION To g e n era te ta r g e t t u p le s f o r r e la t io n IREL-I-RESTAURANT To g e n e ra te ta r g e t t u p le s fo r r e la t io n IREL-I-PERSON A sse r tin g tu p le ("joe") t o r e la t io n IREL-I-PERSON A ss e r tin g in ter m e d ia te r e la t io n a t t vars are ((* PERSON) (P STRING)) r e s t u p le i s ("joe") Object a lrea d y a s s e r te d in t a b le IREL-I-PERSON a s s e r t in g to ta r g e t r e la t io n RECOM M ENDATION tu p le ( # , (DBO DBOBJECT RESTAURANT 23753985) # , (DBO DBOBJECT PERSON 23992289) 6) A pplying tu p le p o s t f i x to tu p le (RECOM M ENDATION #,(DB0 DBOBJECT RESTAURANT 23753985) # , (DBO DBOBJECT PERSON 23992289) 6) Applying p o s t f i x t o #,(DB0 DBOBJECT RESTAURANT 23753985) C a llin g p r e f ix fu n c tio n on o b je c t # , (DBO DBOBJECT RESTAURANT 23754001) and world WORLD:J0E-REC-2TINI F in d in g sn b tu p les v ia p a tte r n (R-NAME ® $) To g en era te ta r g e t t u p le s f o r r e la t io n R-NAME To g en era te ta r g e t t u p le s f o r r e la t io n IREL-I-RESTAURANT a s s e r t in g to ta r g e t r e l a t i o n R-NAME tu p le ( # , (DBO DBOBJECT RESTAURANT 23754001) "m a-ri-na") Applying tu p le p o s t f i x to tu p le (R-NAME # , (DBO DBOBJECT RESTAURANT 23754001) "m a-ri-na") F in d in g su b tu p les v ia p a tte r n (R-PHONE @ !) To g en era te t a r g e t t u p le s f o r r e la t io n R-PHONE To g en era te ta r g e t t u p le s f o r r e la t io n IREL-I-RESTAURANT a s s e r t in g t o ta r g e t r e l a t i o n R-PHONE tu p le ( # , (DBO DBOBJECT RESTAURANT 23754001) "578-5050") A pplying tu p le p o s t f i x to tu p le (R-PHONE # , (DBO DBOBJECT RESTAURANT 23754001) "578-5050") F indin g su b tu p le s v ia p a tte r n (PAUL-R-TYPE 0 !) To g e n e ra te ta r g e t t u p le s fo r r e la t io n PAUL-R-TYPE To g en era te ta r g e t t u p le s f o r r e la t io n IREL-I-RESTAURANT a s s e r t in g to ta r g e t r e la t io n PAUL-R-TYPE tu p le ( # , (DBO DBOBJECT RESTAURANT 23754001) "japanese") Applying tu p le p o s t f i x to tu p le (PAUL-R-TYPE # , (DBO DBOBJECT RESTAURANT 23754001) "japanese") F in d in g su b tu p le s v ia p a tte r n (PAUL-R-ADDRESS 0 !) F indin g su b tu p le s v ia p a tte r n (SPECIALTIES < 5 !) To g e n e ra te ta r g e t t u p le s f o r r e la t io n SPECIALTIES To g e n e ra te ta r g e t t u p le s f o r r e la t io n IREL-I-RESTAURANT a s s e r t in g to ta r g e t r e la t io n SPECIALTIES tu p le ( # , (DBO DBOBJECT RESTAURANT 23754001) " in a ri" ) 259 Applying tuple postfix to tuple (SPECIALTIES #,(DBO DBOBJECT RESTAURANT 23754001) "inari") F in d in g su b tu p le s v ia p a tte r n (COST < 3 !) To g e n era te ta r g e t t u p le s f o r r e la t io n COST To g e n era te ta r g e t t u p le s f o r r e l a t i o n IREL-I-RESTAURANT a s s e r t in g t o t a r g e t r e l a t i o n COST tu p le ( # , (DBO DBOBJECT RESTAURANT 23754001) 30) A pplying tu p le p o s t f i x to tu p le (COST # , (DBO DBOBJECT RESTAURANT 23754001) 30) F indin g su b tu p le s v ia p a tte r n (RECOMM ENDATION (3 ! !) To g e n e ra te ta r g e t tu p le s fo r r e la t io n RECOM M ENDATION To g en era te ta r g e t t u p le s fo r r e la t io n IREL-I-RESTAURANT To g en era te ta r g e t tu p le s f o r r e la t io n IREL-I-PERSON A ss e r tin g tu p le (" joe" ) to r e la t io n IREL-I-PERSON A ss e r tin g in term ed ia te r e la t io n a t t v a rs are ((* PERSON) (P STRING)) r e s t u p le i s ("joe") O bject a lrea d y a s s e r te d in t a b le IREL-I-PERSON a s s e r t in g to ta r g e t r e la t io n RECOM M ENDATION tu p le ( # , (DBO DBOBJECT RESTAURANT 23754001) # , (DBO DBOBJECT PERSON 23992289) 8 ) A pplying tu p le p o s t f i x t o tu p le (RECOMM ENDATION #,(DB0 DBOBJECT RESTAURANT 23754001) # , (DBO DBOBJECT PERSON 23992289) 8 ) A pplying p o s t f i x to #,(DB0 DBOBJECT RESTAURANT 23754001) T ransform ation done NIL 260 D .5 P e r siste n t Store This section provides th e persistent form s of various w orld schem as and world instances created in th e exam ple. D .5 .1 W orld S ch em as Each world schem a is stored as a text file. Each line is a relation or type speci­ fication. Stored types m ay be specified w ith th eir supertype(s). Stored relations are specified w ith th eir type restrictions on th eir slots. E n tertain m en t ((SHARED STORED-TYPE ENTERTAINMENT) (SHARED STORED-TYPE RESTAURANT ENTERTAINMENT) (SHARED STORED-TYPE THEATRE ENTERTAINMENT) (SHARED STORED-RELATION N A M E ENTERTAINMENT STRING) (SHARED STORED-RELATION RATING ENTERTAINMENT STRING) (SHARED STORED-RELATION PHONE ENTERTAINMENT STRING) (INTERNAL STORED-RELATION SPECIALTY RESTAURANT STRING) (INTERNAL STORED-RELATION AVG-PRICE RESTAURANT STRING) (INTERNAL STORED-RELATION R-TYPE RESTAURANT STRING) (INTERNAL STORED-RELATION CURRENT-SHOW THEATRE STRING) (INTERNAL STORED-RELATION T-TYPE THEATRE STRING)) R ecom m en d ation ((SHARED STORED-TYPE RESTAURANT) (SHARED STORED-TYPE PERSON) (SHARED STORED-RELATION R-NAME RESTAURANT STRING) (SHARED STORED-RELATION R-PHONE RESTAURANT STRING) (SHARED STORED-RELATION SPECIALTIES RESTAURANT STRING) (INTERNAL STORED-RELATION R-TYPE RESTAURANT STRING) (INTERNAL STORED-RELATION R-ADDRESS RESTAURANT STRING) (INTERNAL STORED-RELATION COST RESTAURANT INTEGER) 261 (SHARED STORED-RELATION RECOM M ENDATION RESTAURANT PERSON INTEGER) (SHARED STORED-RELATION P-NAME PERSON STRING)) R est-G u id e ((SHARED STORED-TYPE RESTAURANT) (SHARED STORED-TYPE FAST-FODD RESTAURANT) (SHARED STORED-TYPE AMERICAN RESTAURANT) (SHARED STORED-TYPE FOREIGN RESTAURANT) (SHARED STORED-TYPE ADDRESS) (SHARED STORED-RELATION N AM E RESTAURANT STRING) (SHARED STORED-RELATION LOCATION RESTAURANT ADDRESS) (SHARED STORED-RELATION PHONE RESTAURANT STRING) (SHARED STORED-RELATION AVG-PRICE RESTAURANT INTEGER) (SHARED STORED-RELATION HOURS RESTAURANT STRING) (SHARED STORED-RELATION CLASSIFICATION RESTAURANT STRING) (SHARED STORED-RELATION SPECIALTIES RESTAURANT STRING) (SHARED STORED-RELATION FACILITIES RESTAURANT STRING) (INTERNAL STORED-RELATION STREET ADDRESS STRING) (INTERNAL STORED-RELATION CITY ADDRESS STRING)) D .5 .2 W orld In sta n ces T he w orld’s population is stored as tex t file and can be in terp reted as follows: • each line depicts an object or a tuple; • th e first item of each line is a code: — D for dependent object, — S for seed object, — V for value, — R for seed relationship, — T for dependent relationship. 262 O bjects have the following form at: { D /S /V ), { PID ), { list of types of object, or a value ). R elationships have the following form at: { T /R ), ( tuple w ith PID s ). E n tertain m en t-w orld D 1 (ENTERTAINMENT RESTAURANT) V 2 "marie C allenders" R (NAM E 1 2) D 3 (ENTERTAINMENT RESTAURANT) V 4 "ma-ri-na" R (NAM E 3 4) D 5 (ENTERTAINMENT THEATRE) V 6 "UA Marina" R (NAM E 5 6 ) D 7 (ENTERTAINMENT RESTAURANT) V 8 "kifune" R (NAM E 7 8 ) D 9 (ENTERTAINMENT RESTAURANT) V 10 "acapulco" R (NAM E 9 10) D 11 (ENTERTAINMENT RESTAURANT) V 12 "minato" R (NAM E 11 12) D 13 (ENTERTAINMENT RESTAURANT) V 14 "eastw ind cafe" R (NAM E 13 14) V 15 "***» R (RATING 1 15) V 16 "****" R (RATING 3 16) R (RATING 5 15) R (RATING 7 15) 263 R (RATING 9 15) V 17 "*****" R (RATING 11 17) R (RATING 13 16) V 18 "822-5956" R (PHONE 1 18) V 19 "578-5050" R (PHONE 3 19) V 20 "823-3957" R (PHONE 5 20) V 21 "822-1595" R (PHONE 7 21) V 22 "822-4031" R (PHONE 9 22) V 23 "305-1104" R (PHONE 11 23) V 24 "823-9678" R (PHONE 13 24) V 25 "straw berry pie" R (SPECIALTY 1 25) V 26 "apple cobbler" R (SPECIALTY 1 26) V 27 " in ari" R (SPECIALTY 3 27) V 28 " te r iy a k i" R (SPECIALTY 7 28) V 29 "chorizos" R (SPECIALTY 9 29) V 30 "sukiyaki" R (SPECIALTY 11 30) V 31 "sashim i" R (SPECIALTY 11 31) V 32 "panang" 264 R (SPECIALTY 13 32) V 33 "$" R (AVG-PRICE 1 33) V 34 "$$$" R (AVG-PRICE 3 34) V 35 "$$" R (AVG-PRICE 7 35) R (AVG-PRICE 9 35) R (AVG-PRICE 11 35) R (AVG-PRICE 13 33) V 36 "american" R (R-TYPE 1 36) V 37 "Japanese" R (R-TYPE 3 37) R (R-TYPE 7 37) V 38 "mexican" R (R-TYPE 9 38) R (R-TYPE 11 37) V 39 "thai" R (R-TYPE 13 39) V 40 "teenage mutant n in ja t u r t le s " R (CURRENT-SHOW 5 40) V 41 "cinema" R (T-TYPE 5 41) Joe-E n tertain m en t-w orld V 2 "ma-ri-na" T (NAME 1 2) V 3 "****" T (RATING 1 3) V 4 "578-5050" T (PHONE 1 4) V 5 "in ari" 265 T (SPECIALTY 1 5) V 6 "$$$" T (AVG-PRICE 1 6) V 7 "Japanese" T (R-TYPE 1 7) S 1 (ENTERTAINMENT RESTAURANT) V 9 "kifune" T (NAM E 8 9) V 1 0 " * * * " T (RATING 8 10) V 11 "822-1595" T (PHONE 8 11) V 12 " teriy a k i" T (SPECIALTY 8 12) V 13 "$$" T (AVG-PRICE 8 13) T (R-TYPE 8 7) S 8 (ENTERTAINMENT RESTAURANT) V 15 "minato" T (NAM E 14 15) V 16 "***♦*" T (RATING 14 16) V 17 "305-1104" T (PHONE 14 17) V 18 "sashimi" T (SPECIALTY 14 18) V 19 "sukiyaki" T (SPECIALTY 14 19) T (AVG-PRICE 14 13) T (R-TYPE 14 7) S 14 (ENTERTAINMENT RESTAURANT) 266 R ecom m en d ation -w orld D 1 (RESTAURANT) V 2 "miami sp ice" R (R-NAME 1 2) D 3 (RESTAURANT) V 4 "eastw ind cafe" R (R-NAME 3 4) D 5 (RESTAURANT) V 6 "minato" R (R-NAME 5 6) D 7 (RESTAURANT) V 8 " el t o r it o " R (R-NAME 7 8 ) D 9 (RESTAURANT) V 10 "akbar" R (R-NAME 9 10) V 11 "306-7979" R (R-PHONE 1 11) V 12 "823-9678" R (R-PHONE 3 12) V 13 "305-1104" R (R-PHONE 5 13) V 14 "823-8941" R (R-PHONE 7 14) V 15 "822-4116" R (R-PHONE 9 15) V 16 " ice c o ffe e " R (SPECIALTIES 3 1.6) V 17 "soba" R (SPECIALTIES 5 17) V 18 " q u esa d illa s" R (SPECIALTIES 7 18) V 19 "tandoori" 267 R (SPECIALTIES 9 19) V 20 "cuban" R (R-TYPE 1 20) V 21 "thai" R (R-TYPE 3 21) V 22 "japanese" R (R-TYPE 5 22) V 23 "mexican" R (R-TYPE 7 23) V 24 "indian" R (R-TYPE 9 24) V 25 "13515 Washington Blvd Venice" R (R-ADDRESS 1 25) V 26 "2928 Washington Blvd Venice" R (R-ADDRESS 3 26) V 27 "4676 Adm iralty Way Marina D el R (R-ADDRESS 5 27) V 28 "13715 F i j i Way Marina Del Rey1 R (R-ADDRESS 7 28) V 29 "590 Washington S t. Marina Del R (R-ADDRESS 9 29) V 30 22 R (COST 1 30) V 31 8 R (COST 3 31) V 32 18 R (COST 5 32) V 33 13 R (COST 7 33) R (COST 9 30) D 34 (PERSON) V 35 6 R (RECOM M ENDATION 1 34 35) R (RECOM M ENDATION 3 34 31) V 36 7 R (RECOMM ENDATION 5 34 36) V 37 10 R (RECOM M ENDATION 9 34 37) D 38 (PERSON) R (RECOM M ENDATION 3 38 31) V 39 9 R (RECOM M ENDATION 5 38 39) D 40 (PERSON) R (RECOM M ENDATION 5 40 37) D 41 (PERSON) R (RECOM M ENDATION 3 41 39) V 42 5 R (RECOM M ENDATION 5 41 42) R (RECOM M ENDATION 9 41 39) V 43 "yingsha" R (P-NAME 38 43) V 44 "michael" R (P-NAME 40 44) V 45 "nancy" R (P-NAME 41 45) V 46 "paul" R (P-NAME 34 46) P au l-R ecom m en d ation -W orld V 2 "eastw ind cafe" T (R-NAME 1 2) V 3 "823-9678" T (R-PHONE 1 3) V 4 "thai" T (R-TYPE 1 4) V 5 "2928 Washington Blvd Venice" 269 T (R-ADDRESS 1 5 ) V 6 " ic e c o ffe e" T (SPECIALTIES 1 6) V 7 8 T (COST 1 7) D 1 (RESTAURANT) V 9 "paul" T (P-NAME 8 9) D 8 (PERSON) R (RECOM M ENDATION 1 8 7) V 11 "akbar" T (R-NAME 10 11) V 12 "822-4116" T (R-PHONE 10 12) V 13 "indian" T (R-TYPE 10 13) V 14 "590 Washington S t. Marina Del T (R-ADDRESS 10 14) V 15 "tandoori" T (SPECIALTIES 10 15) V 16 22 T (COST 10 16) D 10 (RESTAURANT) V 17 10 R (RECOM M ENDATION 10 8 17) V 19 "yingsha" T (P-NAME 18 19) D 18 (PERSON) R (RECOM M ENDATION 1 18 7) V 21 "minato" T (R-NAME 20 21) V 22 "305-1104" T (R-PHONE 20 22) V 23 "japanese" T (R-TYPE 20 23) V 24 "4676 Adm iralty Way Marina Del T (R-ADDRESS 20 24) V 25 "soba" T (SPECIALTIES 20 25) V 26 18 T (COST 20 26) D 20 (RESTAURANT) V 27 9 R (RECOM M ENDATION 20 18 27) V 29 "michael" T (P-NAME 28 29) D 28 (PERSON) R (RECOM M ENDATION 20 28 17) V 31 "nancy" T (P-NAME 30 31) D 30 (PERSON) R (RECOM M ENDATION 1 30 27) R (RECOM M ENDATION 10 30 27) Jo e -re c -1 D 1 (RESTAURANT) V 2 "minato" R (R-NAME 1 2) D 3 (RESTAURANT) V 4 "kifune" R (R-NAME 3 4) D 5 (RESTAURANT) V 6 "ma-ri-na" R (R-NAME 5 6) V 7 "305-1104" R (R-PHONE 1 7) V 8 "822-1595" R (R-PHONE 3 8) V 9 "578-5050" R (R-PHONE 5 9) V 10 "sashim i" R (SPECIALTIES 1 10) V 11 "sukiyaki" R (SPECIALTIES 1 11) V 12 " teriy a k i" R (SPECIALTIES 3 12) V 13 "in ari" R (SPECIALTIES 5 13) V 14 "japanese" R (R-TYPE 1 14) R (R-TYPE 3 14) R (R-TYPE 5 14) V 15 30 R (COST 5 15) V 16 20 R (COST 3 16) R (COST 1 16) D 17 (PERSON) V 18 8 R (RECOM M ENDATION 5 17 18) V 19 6 R (RECOM M ENDATION 3 17 19) V 20 10 R (RECOM M ENDATION 1 17 20) V 21 "joe" R (P-NAME 17 21) J o e - r e c - 2 V 2 "minato" 272 T (R-NAME 1 2) V 3 ”305-1104" T (R-PHONE 1 3) V 4 "japanese" T (R-TYPE 1 4) V 5 "sashim i" T (SPECIALTIES 1 5) V 6 "sukiyaki" T (SPECIALTIES 1 6) V 7 20 T (COST 1 7) V 9 "joe" T (P-NAME 8 9) D 8 (PERSON) V 10 10 T (RECOM M ENDATION 1 8 10) S 1 (RESTAURANT) V 12 "kifune" T (R-NAME 11 12) V 13 "822-1595" T (R-PHONE 11 13) T (R-TYPE 11 4) V 14 " teriy a k i" T (SPECIALTIES 11 14) T (COST 11 7) V 15 6 T (RECOMM ENDATION 11 8 15) S 11 (RESTAURANT) V 17 "ma-ri-na" T (R-NAME 16 17) V 18 "578-5050" T (R-PHONE 16 18) T (R-TYPE 16 4) 273 V 19 " in ari" T (SPECIALTIES 16 19) V 20 30 T (COST 16 20) V 21 8 T (RECOM M ENDATION 16 8 21) S 16 (RESTAURANT) P au l-P lu s-Joe-1 D 1 (RESTAURANT) V 2 "minato" R (R-NAME 1 2) D 3 (RESTAURANT) V 4 "kifune" R (R-NAME 3 4) D 5 (RESTAURANT) V 6 "ma-ri-na" R (R-NAME 5 6) D 7 (RESTAURANT) V 8 "akbar" R (R-NAME 7 8) D 9 (RESTAURANT) V 10 "eastw ind cafe" R (R-NAME 9 10) V 11 "305-1104" R (R-PHONE 1 11) V 12 "822-1595" R (R-PHONE 3 12) V 13 "578-5050" R (R-PHONE 5 13) V 14 "822-4116" R (R-PHONE 7 14) V 15 "823-9678" 274 R (R-PHONE 9 15) V 16 "sukiyaki" R (SPECIALTIES 1 16) V 17 "sashim i" R (SPECIALTIES 1 17) V 18 " teriy a k i" R (SPECIALTIES 3 18) V 19 "in ari" R (SPECIALTIES 5 19) V 20 "tandoori" R (SPECIALTIES 7 20) V 21 " ic e co ffe e" R (SPECIALTIES 9 21) V 22 "japanese" R (R-TYPE 1 22) R (R-TYPE 3 22) R (R-TYPE 5 22) V 23 "indian" R (R-TYPE 7 23) V 24 'ithai" R (R-TYPE 9 24) V 25 "4676 Adm iralty Way Marina Del Rey R (R-ADDRESS 1 25) V 26 "590 Washington S t. Marina Del Rey R (R-ADDRESS 7 26) V 27 "2928 Washington Blvd Venice" R (R-ADDRESS 9 27) V 28 20 R (COST 3 28) V 29 30 R-(COST 5 29) V 30 22 R (COST 7 30) V 31 8 R (COST 9 31) R (COST 1 28) D 32 (PERSON) V 33 10 R (RECOM M ENDATION V 34 6 R (RECOM M ENDATION R (RECOM M ENDATION D 35 (PERSON) V 36 9 R (RECOM M ENDATION R (RECOM M ENDATION D 37 (PERSON) R (RECOM M ENDATION D 38 (PERSON) R (RECOM M ENDATION R (RECOM M ENDATION D 39 (PERSON) R (RECOM M ENDATION R (RECOM M ENDATION V 40 "joe" R (P-NAME 32 40) V 41 "nancy" R (P-NAME 35 41) V 42 "michael" R (P-NAME 37 42) V 43 "yingsha" R (P-NAME 38 43) V 44 "paul" R (P-NAME 39 44) 1 32 33) 3 32 34) 5 32 31) 7 35 36) 9 35 36) 1 37 33) 1 38 36) 9 38 31) 7 39 33) 9 39 31) 276 P au l-P lu s- Joe-2 V 2 "ma-ri-na" T (R-NAME 1 2) V 3 "578-5050" T (R-PHONE 1 3) V 4 "japanese" T (R-TYPE 1 4) V 5 " in ari" T (SPECIALTIES 1 5) V 6 30 T (COST 1 6) V 8 "joe" T (P-NAME 7 8) D 7 (PERSON) V 9 8 T (RECOM M ENDATION 1 7 9) S 1 (RESTAURANT) V 11 "kifune" T (R-NAME 10 11) V 12 "822-1595" T (R-PHONE 10 12) T (R-TYPE 10 4) V 13 " teriy a k i" T (SPECIALTIES 10 13) V 14 20 T (COST 10 14) V 15 6 T (RECOM M ENDATION 10 7 15) S 10 (RESTAURANT) V 17 "minato" T (R-NAME 16 17) V 18 "305-1104" T (R-PHONE 16 18) 277 T (R-TYPE 16 4) V 19 "4676 Adm iralty Way Marina D el Rey" T (R-ADDRESS 16 19) V 20 "sashimi" T (SPECIALTIES 16 20) V 21 "sukiyaki" T (SPECIALTIES 16 21) T (COST 16 14) V 23 "yingsha" T (P-NAME 22 23) D 22 (PERSON) V 24 9 T (RECOM M ENDATION 16 22 24) V 26 "michael" T (P-NAME 25 26) D 25 (PERSON) V 27 10 T (RECOM M ENDATION 16 25 27) T (RECOM M ENDATION 16 7 27) S 16 (RESTAURANT) V 29 "akbar" T (R-NAME 28 29) V 30 "822-4116" T (R-PHONE 28 30) V 31 "indian" T (R-TYPE 28 31) V 32 "590 Washington S t. Marina Del Rey" T (R-ADDRESS 28 32) V 33 "tandoori" T (SPECIALTIES 28 33) V 34 22 T (COST 28 34) V 36 "paul" 278 T (P-NAME 35 36) D 35 (PERSON) T (RECOM M ENDATION 28 35 27) V 38 "nancy" T (P-NAME 37 38) D 37 (PERSON) T (RECOM M ENDATION 28 37 24) S 28 (RESTAURANT) V 40 "eastw ind cafe" T (R-NAME 39 40) V 41 "823-9678" T (R-PHONE 39 41) V 42 "thai" T (R-TYPE 39 42) V 43 "2928 Washington Blvd Venice" T (R-ADDRESS 39 43) V 44 " ice c o ffe e" T (SPECIALTIES 39 44) T (COST 39 9) T (RECOM M ENDATION 39 35 9) T (RECOM M ENDATION 39 22 9) T (RECOM M ENDATION 39 37 24) S 39 (RESTAURANT) R estau ran t-In fo D 1 (RESTAURANT FOREIGN) V 2 "minato" R (NAM E 1 2) D 3 (RESTAURANT AMERICAN) V 4 "marie ca lle n d e r s" R (NAM E 3 4) D 5 (RESTAURANT FOREIGN) V 6 "ma-ri-na" 279 R (NAM E 5 6) D 7 (RESTAURANT FOREIGN) V 8 "kifune" R (NAM E 7 8) D 9 (RESTAURANT) V 10 "acapulco" R (NAM E 9 10) D 11 (RESTAURANT FOREIGN) R (NAM E 11 2) D 12 (RESTAURANT FOREIGN) V 13 "eastw ind cafe" R (NAM E 12 13) D 14 (RESTAURANT) V 15 "miami sp ice" R (NAM E 14 15) D 16 (RESTAURANT) V 17 " el to r it o " R (NAM E 16 17) D 18 (RESTAURANT FOREIGN) V 19 "akbar" R (NAM E 18 19) D 20 (ADDRESS) R (LOCATION 1 20) D 21 (ADDRESS) R (LOCATION 3 21) D 22 (ADDRESS) R (LOCATION 5 22) D 23 (ADDRESS) R (LOCATION 7 23) D 24 (ADDRESS) R (LOCATION 9 24) D 25 (ADDRESS) R (LOCATION 11 25) 280 D 26 (ADDRESS) R (LOCATION 12 26) D 27 (ADDRESS) R (LOCATION 14 27) D 28 (ADDRESS) R (LOCATION 16 28) D 29 (ADDRESS) R (LOCATION 18 29) V 30 "305-1104" R (PHONE 1 30) V 31 "822-5956" R (PHONE 3 31) V 32 "578-5050" R (PHONE 5 32) V 33 "822-1595" R (PHONE 7 33) V 34 "822-4031" R (PHONE 9 34) R (PHONE 11 30) V 35 "823-9678" R (PHONE 12 35) V 36 "306-7979" R (PHONE 14 36) V 37 "823-8941" R (PHONE 16 37) V 38 "822-4116" R (PHONE 18 38) V 39 18 R (AVG-PRICE 1 39) V 40 8 R (AVG-PRICE 3 40) V 41 24 R (AVG-PRICE 5 41) 281 V 42 17 R (AVG-PRICE 7 42) R (AVG-PRICE 9 42) V 43 19 R (AVG-PRICE 11 43) V 44 9 R (AVG-PRICE 12 44) V 45 25 R (AVG-PRICE 14 45) V 46 10 R (AVG-PRICE 16 46) R (AVG-PRICE 18 41) V 47 "Sun-W 5 -9 ; Th-Sat 5-10" R (HOURS 1 47) V 48 "M-Th 4 -1 1 ; F-Sat 3-10; Sun 10-3" R (HOURS 3 48) V 49 "M-Sat 5-12" R (HOURS 5 49) V 50 "M-F 5-11; Sat-Sun 5-10" R (HOURS 7 50) V 51 "M-F 10-10; Sat-Sun 11-11" R (HOURS 9 51) - V 52 "M-F 5 -9 :3 0 ; Sat-Sun 5-10" R (HOURS 11 52) V 53 "4-11" R (HOURS 12 53) V 54 "M-F 5 -2 ; Sat-Sun 7-12" R (HOURS 14 54) V 55 "M-F 5-11; Sat-Sun 5-2" R (HOURS 16 55) R (HOURS 18 50) V 56 "japanese" R (CLASSIFICATION 1 56) 282 V 57 "american" R (CLASSIFICATION 3 57) R (CLASSIFICATION 5 56) R (CLASSIFICATION 7 56) V 58 "mexican" R (CLASSIFICATION 9 58) R (CLASSIFICATION 11 56) V 59 "thai" R (CLASSIFICATION 12 59) V 60 "cuban" R (CLASSIFICATION 14 60) R (CLASSIFICATION 16 58) V 61 "indian" R (CLASSIFICATION 18 61) V 62 "sushi" R (SPECIALTIES 1 62) V 63 "pies" R (SPECIALTIES 3 63) V 64 "tempura" R (SPECIALTIES 5 64) V 65 " te r iy a k i" R (SPECIALTIES 7 65) V 66 "tacos" R (SPECIALTIES 9 66) V 67 "crab r o ll" R (SPECIALTIES 11 67) V 68 "pad th a i" R (SPECIALTIES 12 68) V 69 "jazz" R (SPECIALTIES 14 69) V 70 " q u esad illa" R (SPECIALTIES 16 70) V 71 "tandoori" 283 R (SPECIALTIES 18 71) V 72 "banquet" R (FACILITIES 1 72) V 73 "bakery" R (FACILITIES 3 73) V 74 "karaoke" R (FACILITIES 5 74) R (FACILITIES 7 72) V 75 "happy hour 4-8" R (FACILITIES 9 75) R (FACILITIES 11 72) V 76 "dancing, ambiance" R (FACILITIES 14 76) V 77 "harbor view" R (FACILITIES 16 77) V 78 "4676 Adm iralty Way" R (STREET 20 78) V 79 "4356 L in coln Blvd" R (STREET 21 79) V 80 "4371 Glencoe Ave" R (STREET 22 80) V 81 "405 Washington S t." R (STREET 23 81) V 82 "8360 M anchester Ave." R (STREET 24 82) R (STREET 25 78) V 83 "2928 Washington Blvd. R (STREET 26 83) V 84 "13515 Washington Blvd R (STREET 27 84) V 85 "13715 F i j i Way" R (STREET 28 85) V 86 "590 Washington S t." R (STREET 29 86) V 87 "Marina Del Rey" R (CITY 20 87) R (CITY 21 87) R (CITY 22 87) V 88 "Venice" R (CITY 23 88) V 89 "Playa Del Rey" R (CITY 24 89) R (CITY 25 87) R (CITY 26 88) R (CITY 27 88) R (CITY 28 87) R (CITY 29 87) R estau ran t-an d -R ecom m en d ation D 1 (PERSON) V 2 "nancy" R (P-NAME 1 2) D 3 (PERSON) V 4 "michael" R (P-NAME 3 4) D 5 (PERSON) V 6 "yingsha" R (P-NAME 5 6) D 7 (PERSON) V 8 "larry" R (P-NAME 7 8) D 9 (PERSON) V 10 "joe" R (P-NAME 9 10) D 11 (RESTAURANT FOREIGN) V 12 9 285 R (RECOM M ENDATION 11 1 12) D 13 (RESTAURANT FOREIGN) R (RECOM M ENDATION 13 1 12) D 14 (RESTAURANT FOREIGN) V 15 10 R (RECOM M ENDATION 14 3 15) R (RECOM M ENDATION 14 5 12) V 16 8 R (RECOM M ENDATION 13 5 16) R (RECOM M ENDATION 11 7 15) R (RECOM M ENDATION 13 7 16) D 17 (RESTAURANT FOREIGN) R (RECOM M ENDATION 17 9 16) D 18 (RESTAURANT FOREIGN) V 19 6 R (RECOM M ENDATION 18 9 19) R (RECOM M ENDATION 14 9 15) D 20 (RESTAURANT FOREIGN) V 21 18 R (COST 20 21) D 22 (RESTAURANT AMERICAN) R (COST 22 16) D 23 (RESTAURANT) V 24 17 R (COST 23 24) D 25 (RESTAURANT) V 26 25 R (COST 25 26) D 27 (RESTAURANT) R (COST 27 15) V 28 20 R (COST 18 28) V 29 30 286 R (COST 17 29) R (COST 13 12) V 30 24 R (COST 11 30) R (COST 14 28) V 31 "590 Washington S t . Marina Del R (PAUL-R-ADDRESS 11 31) V 32 "2928 Washington Blvd Venice" R (PAUL-R-ADDRESS 13 32) V 33 "4676 Adm iralty Way Marina Del R (PAUL-R-ADDRESS 14 33) V 34 "japanese" R (PAUL-R-TYPE 20 34) V 35 "american" R (PAUL-R-TYPE 22 35) R (PAUL-R-TYPE 17 34) R (PAUL-R-TYPE 18 34) V 36 "mexican" R (PAUL-R-TYPE 23 36) R (PAUL-R-TYPE 14 34) V 37 "thai" R (PAUL-R-TYPE 13 37) V 38 "cuban" R (PAUL-R-TYPE 25 38) R (PAUL-R-TYPE 27 36) V 39 "indian" R (PAUL-R-TYPE 11 39) V 40 "sushi" R (SPECIALTIES 20 40) V 41 "pies" R (SPECIALTIES 22 41) V 42 " teriy a k i" R (SPECIALTIES 18 42) V 43 "tacos" R (SPECIALTIES 23 43) V 44 "jazz" R (SPECIALTIES 25 44) ' V 45 " q u esa d illa " R (SPECIALTIES 27 45) V 46 "tandoori" R (SPECIALTIES 11 46) V 47 " ic e c o ffe e" R (SPECIALTIES 13 47) V 48 "sashimi" R (SPECIALTIES 14 48) V 49 "sukiyaki" R (SPECIALTIES 14 49) V 50 "in ari" R (SPECIALTIES 17 50) V 51 "305-1104" R (R-PHONE 20 51) V 52 "822-5956" R (R-PHONE 22 52) V 53 "578-5050" R (R-PHONE 17 53) V 54 "822-1595" R (R-PHONE 18 54) V 55 "822-4031" R (R-PHONE 23 55) R (R-PHONE 14 51) V 56 "823-9678" R (R-PHONE 13 56) V 57 "306-7979" R (R-PHONE 25 57) V 58 "823-8941" R (R-PHONE 27 58) 288 V 59 "822-4116" E (R-PHONE 11 59) V 60 "minato" R (R-NAME 20 60) V 61 "marie C allenders" R (R-NAME 22 61) V 62 "ma-ri-na" R (R-NAME 17 62) V 63 "kifune" R (R-NAME 18 63) V 64 "acapulco" R (R-NAME 23 64) R . (R-NAME 14 60) V 65 "eastw ind cafe" R (R-NAME 13 65) V 66 "miami sp ice" R (R-NAME 25 66) V 67 " el t o r it o " R (R-NAME 27 67) V 68 "akbar" R (R-NAME 11 68) D 69 (ADDRESS) V 70 "Marina Del Rey" R (REST-GUIDE-TINI-CITY 69 70) D 71 (ADDRESS) R (REST-GUIDE-TINI-CITY 71 70) D 72 (ADDRESS) R (REST-GUIDE-TINI-CITY 72 70) D 73 (ADDRESS) V 74 "Venice" R (REST-GUIDE-TINI-CITY 73 74) D 75 (ADDRESS) V 76 "Playa Del Rey" 289 R (REST-GUIDE-TINI-CITY 75 76) D 77 (ADDRESS) R (REST-GUIDE-TINI-CITY 77 70) D 78 (ADDRESS) R (REST-GUIDE-TINI-CITY 78 74) D 79 (ADDRESS) R (REST-GUIDE-TINI-CITY 79 74) D 80 (ADDRESS) R (REST-GUIDE-TINI-CITY 80 70) D 81 (ADDRESS) R (REST-GUIDE-TINI-CITY 81 70) V 82 "4676 Adm iralty Way" R (REST-GUIDE-TINI-STREET 69 82) V 83 "4356 L in coln Blvd" R (REST-GUIDE-TINI-STREET 71 83) V 84 "4371 G lencoe Ave" R (REST-GUIDE-TINI-STREET 72 84) V 85 "405 Washington S t." R (REST-GUIDE-TINI-STREET 73 85) V 86 "8360 M anchester Ave." R (REST-GUIDE-TINI-STREET 75 86) R (REST-GUIDE-TINI-STREET 77 82) V 87 "2928 Washington Blvd." R (REST-GUIDE-TINI-STREET 78 87) V 88 "13515 Washington Blvd." R (REST-GUIDE-TINI-STREET 79 88) V 89 "13715 F i j i Way" R (REST-GUIDE-TINI-STREET 80 89) V 90 "590 Washington S t." R (REST-GUIDE-TINI-STREET 81 90) V 91 "banquet" R (FACILITIES 20 91) V 92 "bakery" 290 R (FACILITIES 22 92) V 93 "karaoke" R (FACILITIES 17 93) R (FACILITIES 18 91) V 94 "happy hour 4-8" R (FACILITIES 23 94) R (FACILITIES 14 91) V 95 "dancing, ambiance" R (FACILITIES 25 95) V 96 "harbor view" R (FACILITIES 27 96) V 97 "Sun-W 5-9; Th-Sat 5-10" R (HOURS 20 97) V 98 "M-Th 4 -11; F -Sat 3-10; Sun 10-3" R (HOURS 22 98) V 99 "M-Sat 5-12" R (HOURS 17 99) V 100 "M-F 5-11; Sat-Sun 5-10" R (HOURS 18 100) V 101 "M-F 10-10; Sat-Sun 11-11" R (HOURS 23 101) V 102 "M-F 5 -9 :3 0 ; Sat-Sun 5-10" R (HOURS 14 102) V 103 "4-11" R (HOURS 13 103) V 104 "M-F 5 -2 ; Sat-Sun 7-12" R (HOURS 25 104) V 105 "M-F 5-11; Sat-Sun 5-2" R (HOURS 27 105) R (HOURS 11 100) R (LOCATION 20 69) R (LOCATION 22 71) R (LOCATION 17 72) 291 R (LOCATION 18 73) R (LOCATION 23 75) R (LOCATION 14 77) R (LOCATION 13 78) R (LOCATION 25 79) R (LOCATION 27 80) R (LOCATION 11 81) 292</i> 
Abstract (if available)
Abstract This dissertation provides a framework to study the problem of sharing information in a network of highly autonomous workstations. It focuses on providing the necessary basic support for accessing shared information, and provides a framework to study other forms of synchronized sharing. This thesis introduces WorldBase -- a system for storage and access of distributed, possibly inconsistent, possibly overlapping information. WorldBase provides flexible structure manipulations, allowing users to conceptually group related objects and relations together, and share them among people on the network.
The WorldBase environment supports storage of clustered, structured information. The unit of information cluster is called a world database (or simply a world). A world consists of a collection of objects and relationships. It is a unit for conceptual grouping, focusing, communicating, and sharing of information. Each world has a world schema specification, and a collection of other specifications that constrain the world. A world schema specification specifies the schema of objects and relationships in the world. The World schema specification and other specifications constrain a world's population and express some of its semantics.
WorldBase provideds a "file-like" paradigm to deal with worlds. it uses the virtual memory of a workstation to provide a WorldBase workspace with which the user interacts. The user has access to collections of worlds and their specifications in persistent store but cannot access their contents directly. Loading a world from persistent memory to a workspace activates the world so its contents can be accessed by the user. The user may load multiple worlds into a workspace, effectively merging them. Each workspace contain a working database consisting of a merge of all loaded worlds. Different combinations of worlds (loaded and merged in the workspace) may result in different, possibly conflicting, working databases.
Support for manipulating worlds in workspaces is provided. The user may create worlds, create and assert information into them, and save them to persistent store. WorldBase provides a selection mechanism to extract a database subset of interest when creating or adding to a world. It supports transformation of information in worlds. It supports merging of worlds in the workspace such that two objects in distinct worlds representing the same real world object may be merged into a single object in the workspace. These operations effectively support user access to multiple worlds.
A prototype of WorldBase is implemented in AP5, a database programming language extension of Common Lisp developed at USC/Information Sciences Institute. The prototype provides an effective platform for accessing multiple object-bases and for studying various forms of sharing synchronizations and update propagation among autonomous object-bases. The focus of the WorldBase system is to provide the basic primitives for dealing with object-base collections; sophisticated user interface is not provided. This is appropriate because each environment in which it is embedded will present a different facade to the user, and a different organization of primitives into common, yet complex operations. World specifications are also separated into their primitive components: schema specifications, closure specifications, equivalence specifications, and constraint specifications. These define the santics of an object-base. By avoiding overcommittal to complex functionality and specifications, WorldBase provides a sound basis on which to build sharable workstation environments.
The support WorldBase provides form the basic, primitives with which to access object-bases. The set of operations WorldBase provides can be used to implement a more complex set of operations on collections of object-bases. The primitives help isolate the problems with each specific step, and allow users to focus on each particular step before going on to the next step. 
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button
Conceptually similar
Toward a multi-formalism specification environment
PDF
Toward a multi-formalism specification environment 
Robust loop closures for multi-robot SLAM in unstructured environments
PDF
Robust loop closures for multi-robot SLAM in unstructured environments 
Edge indexing in a grid for highly dynamic virtual environments
PDF
Edge indexing in a grid for highly dynamic virtual environments 
Sensory acquisition for emergent body representations in neuro-robotic systems
PDF
Sensory acquisition for emergent body representations in neuro-robotic systems 
A learning-based object-oriented framework for conceptual database evolution
PDF
A learning-based object-oriented framework for conceptual database evolution 
A modular approach to hardware -accelerated deformable modeling and animation
PDF
A modular approach to hardware -accelerated deformable modeling and animation 
Feature-preserving simplification and sketch-based creation of 3D models
PDF
Feature-preserving simplification and sketch-based creation of 3D models 
Intelligent near-optimal resource allocation and sharing for self-reconfigurable robotic and other networks
PDF
Intelligent near-optimal resource allocation and sharing for self-reconfigurable robotic and other networks 
Speeding up multi-objective search algorithms
PDF
Speeding up multi-objective search algorithms 
A syntax-based statistical translation model
PDF
A syntax-based statistical translation model 
Linking eyes to mouth: a schema-based computational model for describing visual scenes
PDF
Linking eyes to mouth: a schema-based computational model for describing visual scenes 
Constraint based analysis for persistent memory programs
PDF
Constraint based analysis for persistent memory programs 
Schema evolution for scientific asset management
PDF
Schema evolution for scientific asset management 
A notation for rapid specification of information visualization
PDF
A notation for rapid specification of information visualization 
Execution monitoring in multi-agent environments
PDF
Execution monitoring in multi-agent environments 
A script-based approach to modifying knowledge -based systems
PDF
A script-based approach to modifying knowledge -based systems 
Heterogeneous view integration and its automation
PDF
Heterogeneous view integration and its automation 
Extendible tracking:  Dynamic tracking range extension in vision-based augmented reality tracking systems
PDF
Extendible tracking: Dynamic tracking range extension in vision-based augmented reality tracking systems 
Structural indexing for object recognition
PDF
Structural indexing for object recognition 
MOVNet: a framework to process location-based queries on moving objects in road networks
PDF
MOVNet: a framework to process location-based queries on moving objects in road networks 
Action button
Asset Metadata
Creator Widjoo, Surjatini (author) 
Core Title Sharing persistent object-bases in a workstation environment 
Contributor ProQuest (digitizer) 
School Graduate School 
Degree Doctor of Philosophy 
Degree Program Computer Science 
Degree Conferral Date 1990-08 
Publication Date 08/31/1990 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag OAI-PMH Harvest 
Language English
Advisor Wile, David A. (committee chair), Hull, Rick (committee member), Parker, Alice C. (committee member) 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-oUC11257160 
Unique identifier UC11257160 
Identifier DP22809.pdf (filename) 
Legacy Identifier DP22809 
Document Type Dissertation 
Rights Widjoo, Surjatini 
Internet Media Type application/pdf 
Type texts
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author. 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email cisadmin@lib.usc.edu