Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
OBJECT IDENTIFIERS AND DATABASE UPDATE LANGUAGES by Jianwen Su A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) April 1991 Copyright 1991 Jianwen Su UMI Number: D P22835 All rights reserved INFORMATION TO ALL U SER S T he quality of this reproduction is d ep en d en t upon th e quality of the copy subm itted. In th e unlikely event that the author did not sen d a com plete m anuscript and th ere are m issing p ag es, th e se will be noted. Also, if m aterial had to be rem oved, a note will indicate the deletion. UM I' Dissertation Publishing UMI DP22835 Published by P roQ uest LLC (2014). Copyright in th e D issertation held by th e Author. Microform Edition © P roQ uest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United S ta tes C ode ProQuest” P roQ uest LLC. 789 E ast E isenhow er Parkw ay P.O . Box 1346 Ann Arbor, Ml 4 8 1 0 6 -1 3 4 6 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90089-4015 This dissertation, w ritten by Jianw en Su under the direction of h.l$........ D issertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillm ent of re quirem ents for the degree of Date J22 .1. DISSERTATION COMMITTEE Ph^ CpS ’91 s n ® D O C TO R OF PH ILO SOPH Y Dean of Graduate Studies Chairperson Acknowledgements I am greatly indebted to my advisor, Rick Hull, for his continuous guidance and support throughout my graduate study at USC. In particular, I thank him for encouraging me to pursue the graduate program at USC and sharing his professional insight with me. I would like to thank Professor Seymour Ginsburg for various advice and especially for offering the year-round seminar series on database theory at USC. For the stimulating discussions which I enjoyed in the seminar, I am also grateful to him and other participants of this seminar: Dr. Guozhu Dong, Stephen Kurtzman, Xiaoyang Wang, Dan Tian, Y. Cho, Amy Chen, Jason Chen et al. I also benefited greatly from discussions with Professor Dennis Mcleod. His profound knowledge of database systems helped me to shape a balanced view of the whole area of database research. Finally, I would like to thank ISI for providing a wonderful environment for the research in the final two years, part of which eventually lead to this thesis. This research was supported in part by NSF grant IRI-87-19875. Contents A cknow ledgem ents ii List O f Figures v A bstract vi 1 Introduction 1 2 P relim inaries 7 2 . 1 A Semantic Data M o d e l...................................... .................................... 7 2.2 A Simple Transaction Language S L ......................................................... 12 3 D ep en d en cy Preservation 18 3.1 Functional and Inclusion D ependencies.................................................. 19 3.2 The Relational Data Model ...................................................................... 20 3.3 Decidability of Dependency Preservation Problems ........................... 23 3.3.1 Reducing to the Relational F ram ew o rk ..................................... 24 3.3.2 Functional D ependencies............................................................... 30 3.3.3 Acyclic Inclusion Dependencies .................................................. 36 3.3.4 Functional and Acyclic Inclusion Dependencies........................ 39 3.4 C o m p le x ity .................................................................................................... 46 4 O bject M igration 52 iii 4.1 Migration Inventories.................................................................................. 53 4.2 SL Transactions Schema And Inventories.............................................. 62 5 E xtended M anipulation Languages 76 5.1 CSL and C S L + ........................................................................................... 77 5.2 Migration Inventories.................................................................................. 81 6 A pplications 95 R eference List 100 iv List Of Figures 2.1 A Database Schema D ................................................................................ 10 2.2 An Instance d = (o, a, oq) of D ................................................................... 11 3.1 A Relational Simulation oF e l ....................................................................... 26 3.2 Relationship of / rei and gT e j .......................................................................... 28 3.3 Relationship of / s ^m and giei ...................................................................... 34 4.1 A Class Hierarchy For Four O p e ratio n s................................................... 56 4.2 An Example of Object Life C y c le ............................................................. 59 4.3 A Database Schem a....................................................................................... 60 4.4 The Migration Graph for P (Q Q P )* ......................................................... 65 5.1 A “Pure” Encoding of a Computation of M ................................. 84 v Abstract Update languages for an object-based data model are studied. The model includes object identifiers, classes, class hierarchies, and attributes which range over atomic or “printable” values. A simple update language is examined which has five operators for creating, deleting, modifying, and “migrating” objects in a database. Two research directions are pursued: one is on transactions and static database constraints while the other is on transactions and a new family of dynamic database constraints. The two classes of static constraints considered are functional and acyclic inclusion dependencies. Although these dependencies are originally defined over relational databases, they are generalized to the context of object-based models. The problem of dependency preservation for transactions is studied. It is shown that testing functional and acyclic inclusion dependency preservation by transac tions on semantic databases is decidable. The proof is based on translating se m antic schemas, databases, transactions, and dependencies into the corresponding schemas, databases, transactions, and dependencies in the relational context, where the problem is already known to be decidable. The translations yield considerable insight into the semantic constructs of object identifiers and class hierarchies. Sev eral lemmas reveal the relationships among these translations. For the first time, complexity results concerning dependency preservation problems are presented. De pendency preservation testing is shown to be co-NP-complete, even in the simplest cases. The second direction of the study concerns object migration. In a class hierar chy, a “role set” is a set of classes in which objects can possibly reside at a time instant. For a given set of transactions, a “migration pattern” is a sequence of V I role sets that some object could pass through during its “life cycle.” Migration inventories, which are sets of migration patterns, restrict the patterns of objects to be updated. They are proposed here as a new kind of dynamic database constraint. The relationships between transactions and migration inventories are studied. A set of transactions is “sound” with respect to an inventory if objects updated by the transactions follow the patterns in the inventory; it is “complete” if each pattern in the inventory can be the migration pattern of some object. An essential problem is to characterize the sound and complete set of migration patterns corresponding to a set of transactions. Four kinds of migration patterns are investigated with respect to both the above mentioned simple language and two extensions of it: one which incorporates positive testing conditions and another which incorporates both positive and negative conditions. It is shown that the sets of migration patterns produced by transactions in the simple language correspond to regular sets; and each regular set of patterns can be generated by some transactions in the sim ple language. When conditionals are incorporated, the set of migration patterns generated by these transactions is recursively enumerable but not necessarily re cursive; and conversely, each recursively enumerable set of patterns can always be produced or simulated by transactions with only positive testing conditions. As a consequence, soundness and completeness is decidable for the simple language and undecidable for the two extensions. V ll Chapter 1 Introduction As current database applications become more and more complex, it is increas ingly desirable to develop techniques to model, organize, and m anipulate behaviors and to incorporate behaviors into databases. The growing popularity of object- oriented databases is evidence of this trend. Im portant work on dynamic aspects of databases includes practically-oriented research on behavior modeling and transaction design [Bro81, MBW80, BR84, KM85, NCL+87, BMSW89], encap sulating behaviors and structural data, e.g., object-oriented databases including Gemstone [CM84], Vbase [AH87b], O2 [LRV8 8 , LR89], IRIS [Bee8 8 ], etc.; and theoretical studies on transactions as specification languages [AV89, AV8 8 a], and on dynamic constraints [Via87, Via8 8 j. In this paper, we conduct a theoretical investigation on transactions and their relationships to static and dynamic database constraints. The framework and techniques developed here can provide part of the basis for reasoning about types and classes in transactions [HJ90], transaction design [BR84, KM85, BMSW89], and study of methods in OODBs; and lead to a new view of behavior specifications and dynamic constraints which extends and generalizes [AV89, AV8 8 a, Via87, Via8 8 ]. In our investigation, a simple semantic data model is considered which includes “object identifiers,” classes, class hierarchies, and attributes ranging over printable values. The model can be viewed as a proper subset of many semantic models, e.g., IFO [AH87a], SDM [HM81], TAXIS [MBW80], GSM [HK87], etc. Several transaction languages are studied. The simplest language among those, named SL, 1 contains five operators to support data manipulations on the semantic model: the c re a te and d e le te operators create and delete objects in the database (respec tively); the modify operator changes the attribute values of objects; the operator g e n e ra liz e removes specified objects from a class, but those objects are not completely deleted from the database (it cannot be applied to objects in “root” classes); and finally the operator s p e c ia liz e adds a set of specified objects into a (nonroot) class. Operators and hence transactions are parameterized, i.e., they can have variables. The variables are assigned constants before the operators and transactions can be applied. These operators are natural adaptations of a relational transaction language (which consists of in s e r t, d e le te , and modify operators) studied, by Abiteboul and Vianu [AV8 8 b, AV89]. The major difference between the relational language used by Abiteboul-Vianu and the one studied here is that the languages here include extensions to incorporate objects and allow object-based manipulations. The study presented here on transactions for semantic databases consists of two different themes. The first theme of the investigation is related to static database constraints, specifically, functional and acyclic inclusion dependencies. The problem of the preservation of dependencies is examined. In the second theme of the research, we propose a new kind of dynamic constraint based on the migration patterns of objects during their “life cycles.” We initiate the theoretical investigation of these dynamic constraints with respect to transactions for semantic databases. We briefly explain each theme in the following. The notions of a functional dependency [Cod70] and an inclusion dependency [CFP84] over a semantic database schema are defined in the natural manner. Given an SL transaction and a set of functional and/or acyclic inclusion dependencies, the preservation problem is to tell whether the transaction has the following property: if the database satisfies every dependency in the given set of dependencies, then any application of the transaction to the database will also satisfy these dependencies. The problem of the preservation of a set of functional and acyclic inclusion dependencies under transactions in the context of relational databases is studied in [AV89], where it is shown to be decidable. In this study of the dependency 2 preservation problems, it is shown that the problems of preservation of a set of functional and/or acyclic inclusion dependencies in a semantic framework are still decidable. the complexity of dependency preservation problems in the contexts of both relational and semantic databases is considered, which extends the work of [AV89]. It is shown here that the complexity of these problems is co-NP-complete [GJ79] in the following simplest cases. ( 1 ) There are a single non-trivial functional dependency, a single class with two attributes, and a transaction consisting of only c re a te and d e le te operations. (2) There are a single acyclic inclusion depen dency, two classes with only one attribute defined on each class, and a transaction consisting of only c re a te and d e le te operations. One of the contributions of this investigation is in the techniques developed for proving the decidability results of preservation problems. Instead of direct proofs, “reductions” to translate the problems in the semantic context into the relational context are used. These reductions involve the translations of semantic schemas, database instances, SL transactions, and dependencies into the corresponding relational schemas, relational database instances, relational transactions, and dependencies. In each case, the translation of database instances and the transla tion of transactions satisfy a form of commutativity relationship. This turns out to be an im portant property for the reductions to work. Three reductions are pre sented, one for functional dependencies, one for acyclic inclusion dependencies, and one for both functional and acyclic inclusion dependencies. Detailed constructions and proofs for the first two reductions are provided, and the last reduction is illustrated with a detailed construction and a proof sketch. These translations and reductions, which simulate object-based semantic databases and transactions using the relational model, lead to a deep insight into fundam ental aspects of object identity and semantic constructs such as classes and classes hierarchies in the context of database updates. The second theme extends the spirit of the first by studying the interaction of transactions and a novel family of dynamic constraints. It focuses on “object migrations,” a notion which is emerging as an im portant functionality that should be supported by object-oriented database systems [AH87a, HK87, HJ90, Per90, 3 DL90]. Our study provides a theoretical analysis on the patterns of how objects can migrate, and on specifying migration patterns as dynamic constraints on databases. In a class hierarchy, an object can belong to several classes simultaneously and objects can migrate or move between classes. The set of classes in which an object lives at a time instant is a role set. During the life span of an object, the sequence of role sets through which the object passes is called a migration pattern. In many database applications, the possible migration patterns of an object are often restricted. For example, in an operating system, processes which are ready to run will be put into the ready queue. Only processes in the ready queue can directly become running processes. This restricts the sequences of states of processes. Based on these considerations, a new kind of dynamic constraint, named migration inventory, is proposed. A migration inventory is essentially a set of migration patterns through which objects are allowed to migrate. A given set of transactions is sound with respect to a migration inventory if objects updated by the transactions only migrate according to the patterns in the inventory; it is complete if each migration pattern in the inventory can be the migration pattern of some object. Soundness and completeness for transactions are examined. The fundamental problem is to characterize the set of all migration patterns with respect to which a set of (parameterized) transactions is both sound and complete. The simple transaction language SL and two extensions, CSL and CSL+, are studied here. Basically, CSL and CSL+ extend SL with conditionals before each update operation. Conditionals in CSL+ are positive whereas both positive and negative conditionals are allowed in CSL. Specifically, we study four kinds of migration pattern, based on the two independent factors of laziness and immediate-start. A lazy pattern “discards” consecutively repeated role sets, i.e., it “records” only when an object migrates to a different role set. The idea of immediate start is to focus only on patterns in which objects are created at the first step starting from the empty database. Note th at a migration inventory, which is the set of possible migration patterns, is a set closed under taking prefixes. In characterizing migration patterns, the following results are shown. In terms of each of the four kinds of migration patterns, the set 4 of migration patterns generated by each finite set of transactions in the language SL is always regular; and conversely, every regular migration inventory can be generated by some finite set of transactions in the language SL. In terms of each of the four kinds of migration patterns, the set of migration patterns generated by each finite set of transactions in CSL (and hence CSL+) is recursively enumer able. In terms of nonimmediate-start pattern (lazy or nonlazy), every recursively enumerable migration inventory can be generated by a finite set of transactions in CSL+ (and hence CSL). In terms of imm ediate-start patterns (lazy or nonlazy), every recursively enumerable migration inventory is a left quotient of the set of mi gration patterns generated by a finite set of transactions in CSL+ (and hence CSL) by a regular set. In other words, each pattern in the inventory can be generated with a padding. If such padding is not allowed, then the exact characterizations for immediate-start patterns of CSL and CSL+ transactions are open. However, it is shown that each context-free migration inventory can be generated by a finite set of transactions in CSL (or CSL+) in terms of (lazy or nonlazy) imm ediate-start migration patterns (without padding). The final topic of the thesis illustrates how the results obtained on object migration can be applied in the context of transaction design methodologies of [KM85, NCL+87, BMSW89]. Two different transaction models are presented, both restricting the orders in which update transactions on databases can be applied. The techniques developed in our study on object migration are then used to study these practical transaction models. The paper is organized as follows. The simple formal semantic model and the simple update language SL for the data model are introduced Chapter 2 . The problem of the preservation of functional and acyclic inclusion dependencies in the semantic model is studied in Chapter 3. The formal notions of migration patterns and migration inventories as dynamic constraints are given in Chapter 4, and the characterization of SL transactions in terms of migration inventories is provided. The two extensions to SL, named CSL+ and CSL, are introduced in Chapter 5, and the results concerning them presented. The application of the techniques is illustrated in Chapter 6 . 5 Since the two themes investigated in this thesis are not strongly related, readers who are interested in the part on dependency preservation need read only Chap ters 2 and 3; readers who are interested in the part on object migration can ignore Chapter 3. 6 Chapter 2 P reliminar ies Basic terminologies and definitions are provided in this chapter. In particular, Section 2.1 presents a formal semantic data model. A simple transaction language SL for the semantic model is defined in Section 2.2. 2.1 A Semantic Data Model The semantic model is object-based, and is a proper subset of many existing semantic data models such as IFO [AH87a], SDM [HM81], GSM [HK87], etc. The model contains classes, class hierarchies, and attributes which range over atomic values. Let G = (V, E) be a directed graph where V is a (finite) set of vertices and E a set of edges over V. A pair of vertices in V is weakly connected if there is an undirected path between them. A subgraph G' — (V , E') of G, where V' Q V and E' C E, is a weakly-connected component of G if 1 . each pair of vertices in V' is weakly connected; and 2. for each v G V — V , v is not weakly connected to any vertex in V‘. A directed graph G = (V, E) is a specialization-graph if 7 (1 ) G is acyclic; and (2 ) for each pair of weakly connected vertices u,v £ V, there exists a vertex w £ V which has directed paths from both u and v. Intuitively, a specialization graph consists of several weakly connected components, and each component has a root which has directed paths from all other vertices in the component. This notion is motivated by the ISA Rules in IFO [AH87a]. For simplicity, we assume th at there is a universal domain of constants. It is easily seen that the results obtained here can be generalized to models having more than one domain. Formally, we assume the existence of the following pairwise disjoint and countably infinite sets: ® 14 = {a, 6 , c ,...} of constants', ® C = {P, Q, R , ...} of class names; © A = {A, B, C , ...} of attribute names; ® O = {0 x5 0 2 , 0 3 ,...} of abstract objects, with a total ordering < 0 such that 0{ < 0 O j iff i < 3 : 9 V = { x ,y ,z , ...} of variables. D efinition: A (semantic) database schema is a triple D — (C ,isa , A) where: 1. C C C is a finite set of class names; 2. isa C C x C such that (C ,isa) is a specialization-graph. The reflexive and transitive closure of isa is denoted by isa*; 3. A : C — + powersetfin(A) is a total mapping such that A (P ) D A(Q) = 0 whenever P ^ Q. Intuitively, a database schema consists of a (finite) set of classes, subclass relationships, and attributes which range over U. Due to inheritance, the set of all 8 attributes defined on class P is the set of attributes which are defined on ancestors of P by A, i.e., it is the set A *{P) = {A \ 3Q,P isa* Q A A € A (Q )}. Since multiple inheritance is allowed, attributes inherited from different ancestors could possibly yield conflicts. Hence, the condition “A(P)D A(Q) — 0 whenever P ^ Q” for assigning attributes to classes stated in the above definition is used to prevent such conflicts. A semantic database schema consists of k (> 1 ) disjoint directed acyclic graphs which are weakly-connected graphs. Due to the requirement th at the class hierarchy is a specialization graph, each weakly-connected component has a root. N o ta tio n : Let D — (C ,isa, A) be a schema. A class P £ C is an isa-root if there does not exist a class Q ( E C such that P isa Q. D efin itio n : Let D — (C ,isa, A) be a semantic database schema. A (database) instance of D is a triple d — (o, a, o) where 1. o : C — > powersetfin(C?) such that (a) o (P ) C o(Q) if P isa Q, and (b) o(P) H o(Q) = 0 if P and Q are not weakly connected; 2. a is a total mapping from Upec (o (P ) x A (P )) to tt\ 3. o £ O such that for each class P in C and each object o', if o' € o (P ) then o' <o o. The set of all instances of D is denoted by inst(D). Informally, a database instance consists of a set o (P ) of abstract objects for each class P , a value a(o, A) for each object o in class P and attribute A of P , and a “next” abstract object o such that no object >o o occurs in any class. The object is used when new object(s) are created in the database. In our model, each abstract object in O can be “created” into a database at most once. If o is an abstract object and A an attribute, the value a(o, A) is also denoted as o.A. 9 SSN Name Major Salary First-Enroll Works-In -Appoint P E R SO N ST U D E N T EM PLO Y EE G R A D -A SSIST D — (C, isa, A) where C = {PERSON, EMPLOYEE, STUDENT, GRAD-ASSIST} isa = {(GRAD-ASSIST, EMPLOYEE), (GRAD-ASSIST, STUDENT), (EMPLOYEE, PERSON), (STUDENT, PERSON) } A(PERSON) = {SSN, Name} A(EM PLOYEE) = {Salary, Works-In} A(STUDENT) = {Major, First-Enroll} A(GRAD-ASSIST) = {%-Appoint} Figure 2.1: A Database Schema D (a) o(PERSON) = {oi, o2, 0 3 , 0 4 ; o5} o(EMPLOYEE) = {o1,o3,o 4} o(STUDENT) = {0l,o 2,o4} o(GRAD-ASSIST) = {o4} (b) a is shown by the following tables PERSON SSN Name 01 0302 Charles °2 3698 David 03 6657 Faith O4 9709 Chris O s 0067 Michelle EMPLOYEE Salary Works-In 0 1 150 History 0 3 140 CS 0 4 200 EE STUDENT Major First-Enroll 01 History 1986 0 2 CS 1988 0 4 EE 1989 GRAD-ASSIST %-Appoint 01 49% Figure 2.2: An Instance d = (o,a, o6) of D 11 t E x am p le 2.1: The database schema D shown in Figure 2.1 contains four classes: PERSON, EMPLOYEE, STUDENT, and GRAD-ASSIST. The class hierarchy and attributes for each class are explicitly shown in Figure 2.1. An instance of the schema D which contains five objects is shown in Figures 2.2. □ 2.2 A Simple Transaction Language SL We now introduce the manipulation language SL. The language is similar to the relational transaction language of [AV89], with two major differences: 1 . Our operations manipulate “object identifiers” since SL uses an object-based data model, while [AV89] used the relational model and their operations focused on tuple manipulations. For example, the “create” operator in SL always creates an object with a (new) identifier in our model, regardless of whether there already exists an object which has exactly the same attribute values and belongs to the same set of classes. The “insert” operator in the relational language of [AV89] creates a tuple only when the requested tuple does not already exist in the relation (database). 2. SL contains two new operators, “specialize” and “generalize,” to support object migration. I The language SL, however, does not allow user to manipulate or “grasp” object identifiers in direct ways. Objects are m anipulated by a process of selecting the relevant objects according to their attribute values and then performing updates (changing attribute values, deleting, or migrating objects). The following notion of a “condition” is used to select relevant objects. An atomic condition is an expression of one of the following forms: A = a, A ^ a, A — x, or A ^ x, where A € A is an attribute name, a G U is a constant, and a: € V is a variable. An atomic condition is ground if it does not contain a variable, i.e., it is of form of 12 either ‘A = a ’ or ‘A 7^ a ’. A condition is a set of atomic conditions. A condition is ground if is contains only ground atomic conditions. Given a condition F, Att(T) = {A | A £ A and A appears in F} is the set of attribute names referenced by F. For an attribute A £ Att(T), A is defined in T if there exists either a variable x 6 V or a constant a £ U such that A = x or A — a is an atomic condition in T. Let the set of attributes defined by F, denoted Aittdef(r), be the set {A | A is defined in T}. Suppose S C A is a set of attributes. A tuple over S is a (total) mapping t from S to U, i.e., t(A) € U for each A £ S. Tuple(S) denotes the set of all tuples over S. t(A) is also denoted by ‘t.A ’ for A £ S. Suppose S' is a set of attributes and A £ S. A tuple t over S satisfies a ground atomic condition A = a (or A 7 ^ a), written as t |= A = a (or t |= A ^ a), if t.A — a (or t.A 7 ^ a). If F is a ground condition and Att(T) C S, a tuple t over S satisfies J T, denoted t f= F, if t satisfies all atomic conditions in T. The empty condition 0, \ in particular, is satisfied by every tuple. Furthermore, let Sat(T) = U^«(r)c5 {^ € Tuple(S) | t |= F}. A ground condition T is satisfiable if Sat(T) is nonempty. Non-satisfiable conditions are denoted as e. It is straightforward that Sat(T) = 0 if and only if Sa£(r) n Tupie(Af^(r)) = 0 . Let D — (C ,isa, A) be a semantic database schema and d = (o, a, ot) an instance of D. For each class P £ C and each object o £ o(P ), the tuple over A *(P) yielded by o in d, denoted b, is defined by o(A) = a(o, A) for each A £ A*(P). Now let r be a ground condition with Att(T) C A*(P). An object o £ o(P ) satisfies T, denoted o |= T, if o j= F. Let Sat(F,d, P) = {o £ o (P) | o |= T}. Intuitively, an “atomic update” with respect to a database schema is an operation on instances of the schema which satisfies some syntactic restrictions. A “transaction” is then a sequence of atomic updates. I D efinition: Let D = (C ,isa, A) be a database schema. An atomic update (expression) on D is an expression of one of the following forms: 13 1. c re a te (P , F) where (a) P 6 C is an isa-root, and (b) T is a condition such that Att(T) = Aitdef(r) = A (P ); 2. d elete(P , F) where (a) P 6 C is an isa-root, and (b) F is a condition such that ./Iti(r) C A (P ); 3. modif y(P, T, F') where (a) P £ C, and (b) r and F' are two conditions such that Att(T), Att(Tr) C A *(P) and Attdef(F') = Att(r'); 4. g en era lize(P ,F ) where (a) P 6 C is not an isa-root, and (b) T is a condition such that Ati(F) C A*(P); 5. s p e c ia liz e (P , Q, F, T') where (a) P and Q are two classes in C such that Q isa P , and (b) T and T' are two conditions such that A tt(r) C A *(P) and Att^ei(T') = A tt(F') = A*(Q) — A*(P). An atomic update is ground if F (and T') are ground. Intuitively, the operator create creates an object using a new object identifier and assigns attribute values specified by the condition T; the operator d e le te deletes from the database all objects which satisfy the specified condition; modify first selects all objects satisfying the condition T and then change their attribute values according to F'. The last two operator g en era lize and s p e c ia liz e are used to support object migration: g en era lize cancels the membership of the specified class and its descendant classes of all the objects currently in the class which satisfy 14 the given condition T; s p e c ia liz e adds objects satisfying the condition T into a specified class, with new attribute values specified by T'. Using the five operators, a transaction is then defined as a sequence of these operations. D efinition: Let D be a database schema. An (SL) transaction T on D is a sequence 0y;.. .; 9n where 1 . n > 0 ; and 2. $i is an atomic update for each i E [l..nj. T is empty if * = 0 and atomic if i = 1. T is ground if for each i E [1 ..«■ ], 9 % is ground. T is parameterized if it is not ground. A transaction schema is a finite set of transactions. In the following, we describe the semantics of atomic updates. Generally, the semantics of each update is a mapping from inst(D) to inst(D). We first consider ground atomic transactions. D efinition: Let D = (C ,isa, A) be a database schema. The semantics of a ground atomic update on a schema D is a mapping from inst(D) to inst(D) defined as follows: Let d = (o,a, O j) be a database instance of D. 1. [[crea te^ , T)J{d) = j d lf ^ [ (o , a , o ,-+1) otherwise where o'(Q) = { ° (P) U W if 0 = ^ I o(Q) otherwise and 1 a' = aU {((<h, A ),a) \ A E A (P ), A = a E T}. 2. f[d elete(P ,r)](d ) = j * lf ^ \ I (o , a , O i) otherwise l A mapping is also viewed as a binary relation. 15 where o'(Q ) = < ° {Q) ~ S “t{T' d’P) * Q P o(Q) otherwise and a' = a — {((o, A), a) | o E Sat(T,d, P), A E A, a E U }. 3. |[m o d ify (P ,r,r/)](d) = \ d if r - e or T - e [ (o, a , O i) otherwise where a' = a — {((o, A), a) | o E Sat(T,d, P), A E Att(T'),a E U} U {((o ,A ),6 ) | o e Sat(F,d,P),A — b E T'}. 4. |[g e n e ra liz e (P ,r)](tO = j d lf ^ ( (o , a , O i) otherwise where oW ) = / ° W ) - 5“<(r’^ P) * ' Q ^ P ° (Q) otherwise and a ' = a - {((o, A), a) \ o E Sat(T,d,Q),3Q,Q isa* P , A e A(Q)}. rr / ^ ,,-n. ,, ( d if F = e or T7 = e 5. [ [ s p e c ia liz e ^ , Q, T ,r )J(d) = < [ (o , a , O i) otherwise where o'(R) = ! ° (M) U (5“<(r’d’P) ~ ° iQ)) “ Q “ * R [ o(R) otherwise and a' = a — {((o, A), a) \ A E Att(T'), o € Sat(T,d,P ) — o(Q),a E U } U {((o, A ), a )\o E (Sat(F,d, P) - o{Q)), A = a E V}. The semantics of (ground, parameterized) transactions is now given. Intuitively, a ground transaction defines a mapping on databases instances. D efin itio n : The semantics of an empty transaction is the identity mapping on inst(D); the semantics of a ground transaction T = Q\\...; 9n (n > 1) on D is a mapping on inst(D) defined by2 {[T]J = j[^ij o • • • o [[#n]]. When variables are present, a transaction maps from “assignments” to mappings on database instances. Suppose now that Xi,... ,x n are all variables occurring in a transaction T. Then the parameterized transaction T is denoted by T (xi , ..., xn). *fog(x) - g(f(x)). 16 An assignment is a mapping from V to It, i.e., from variables to constants. If a is an assignment, T[ct] is the transaction obtained from T by substituting all occurrences of each variable x by a(x). Thus, T[a] is a ground transaction. D efin itio n : The semantics of a transaction T (x i,..., xn) is a mapping from assignments to mappings on inst(D) defined by: p p , ... ,a:n)J(Q!) = [[TfajJ for all assignments a. 17 Chapter 3 Dependency Preservation In this chapter, we study the relationship between the simple transaction language SL and two kinds of dependencies, namely functional and inclusion dependencies. It is known [AV89] that for a relational transaction language RL which is similar to SL, it is decidable whether an arbitrary set of transactions preserves a set of ■functional and acyclic inclusion dependencies. Here we first present a framework which simulates semantic schemas, instances, and SL transactions using relational schemas, databases, and relational transactions. Using this simulation, we present translations to reduce the problem of testing preservation of only functional de pendencies or of only acyclic inclusion dependencies in semantic databases to the problem of testing preservation of functional dependencies or of acyclic inclusion dependencies in relational databases (respectively). Finally, we extend the above reductions to the problem of preservation of functional and acyclic inclusion de pendencies. The second objective in this chapter is to characterize the complexity of the problem of testing dependency preservation. We show that the problem is at least co-NP-hard [GJ79] for the general case. And it is co-NP-complete for the follow ing simple cases: ( 1 ) for a single nontrivial functional dependency, the semantic databases containing at least one class which has at least two attributes, and the SL transaction consisting of only create and d e le te operations, (2 ) for a single nontrivial functional dependency, the relational database schema having at least one relation schema with arity at least 2 , and the relational transaction consisting 18. of only in se r t and d e le te operations, (3) for a single acyclic inclusion dependency, I and the semantic schema containing at least two classes which have at least one attribute each, and the transaction consisting of only create and d e le te opera tions, (4) for a single acyclic inclusion dependency, the relational schema having at least two unary relations, and the relational transaction consisting of only insert and delete operations. The remainder of this chapter is organized as follows. Section 3.1 introduces the notions of functional and inclusion dependencies and preservation of dependencies, in both relational and semantic contexts. Section 3.2 briefly reviews the relational data model, functional and inclusion dependencies, a simple transaction language and the decidability result from [AV89]. A translation of semantic databases and SL transactions into relational databases and relational transactions is presented in Section 3.3, along with preservation results for semantic databases. The complexity results are shown in Section 3.4. I I 3.1 Functional and Inclusion Dependencies We now introduce the notions of “functional dependency” [Cod70] and “inclusion dependency” [CFP84]. Although the notions originated in relational databases, j incorporating them into semantic databases is straightforward. D efin itio n : Let D = (C ,isa, A) be a database schema. A functional dependency (FD) over D is an expression X — ► Y where X , Y C A*(P) for some class P E C. For an instance d = (o, a, of) E inst(D ), d satisfies the FD X — » ■ Y if for each class P E C and each pair of objects o, o' E o(P), X Y C A *(P) and VA E X, o.A — o'.A imply VA E Y, o.A — o'.A. D efinition: Let D = (C ,isa, A) be a database schema as above. An inclusion dependency (IND) over D is an expression P[Ai,...,An] C Q[Bx,..., B n\ where n > 1, P, Q E C are class names, A i,..., An E A *(P) are distinct attributes on P , and B i , ... , B n E A *(Q) are distinct attributes on Q. 19 For an instance d — (o, a ,o,) £ inst(D), d satisfies the IND P [ A i,...,A n] C Q[Bj, ..., Bn] if for every object o £ o(P), there is an object o' £ o(Q) such that V * £ [ 1 . .7 7 .] , o.Ai = o'.Bi. Let £ be a set of dependencies over the semantic schema D. The set {d £ inst(D) | d satisfies every a £ £} is denoted by SatD(G). When D is known from the context, it is simply written as Sat(£). When £ = { < 7 } is a singleton set, Sa£(£) is also denoted by Sat(a). D efinition: Let D be a semantic database schema, £ a set of functional and inclusion dependencies over D , and T an SL transaction on D. T preserves £ if d £ Sat('E) implies [pT[ a]J(d) £ Sat(£) for each instance d £ inst(D) and each assignment a : V — »li. Suppose that D is a semantic database schema with k weakly-connected com ponents D i, ..., Dk and £ is a set of inclusion dependencies over D. £ is acyclic if the graph (V, E) is acyclic, where the vertex set V = {Lfi,..., D * ,} and the edge set E = {(Di,Dj) | there exist classes P in the component Di, Q in the com ponent Dj, and attributes Ai,...,An £ A *(P), B \ , ..., Bn £ A *(Q), such that P K . . . , i ] c g [ 5 1,..,B n]6S}. 3.2 The Relational Data Model In this section, the relational data model is formally introduced. The notions of functional and inclusion dependency over relational databases are then defined. Familiarities with the relational model (e.g., [U1182, Mai83]) is assumed. A short description of the relational transaction language of [AV88b, AV89] is presented and the decidability result from [AV89] of dependency preservation with respect to this language is stated. We start with defining the notions of relation schema, database schema, and instances. Recall that the notion of “a tuple” is defined in Chapter 2. 20 D efinition: A relation schema is a finite set of attribute names, i.e., a finite subset of A. Let R be a relation schema. A relation instance over R is a finite set of tuples over R. The set of all instances of R is denoted as inst(R). If t is a tuple over a relation schema R and X C J?, we denote by t[X] the tuple over X which is the projection of t onto X. D efin itio n : A relational database schema D is a finite set of relation schema. A relational database instance d of D is a mapping from D such th at d(R) E inst(R) for each R E D. The set of all instances of D is denoted as inst(D). Now let D be a relational database schema. A functional dependency (FD) is an expression X — > Y where1 X Y C R for some R E D. A relational database instance d E inst(D) is said to satisfy the FD X — » Y if for each R E D and every pair of tuples ufv E d(R), X Y C R and u[X\ — v[X] imply u\Y] = u[T]. An inclusion dependency (IND) is an expression R[A\,..., A n] C • • •, Bn\ where R, S E D, A i ,.. ., An are distinct elements of R , and F fi,.. ., B n are distinct elements of S. A relational database instance d E inst(D) is said to satisfy the IND R[Ai, ..., A n] C S[BU ..., B n] if for every tuple u E d(i?), there exists a tuple v E d(S) such th at u{Ai) — v{Bf) for i E [l..n]. Let D be a relational database schema. A set S of inclusion dependencies is acyclic if the directed graph (D,E) is acyclic where E = {(R,R') | R, R' E D and there exist A \,..., An C R and B \,..., Bn C R' such that R[At..., An] C R'[Bi,..., Bn] E S}. For each set S of functional and inclusion dependencies, let S ’ at r»(E) denote the set {d E inst(D) \ d satisfies every a E S}. When D is known from the context, it is simply written as 5at(E). When E = {<r} is a singleton set, Sat(E) is also denoted by Sat (a). 1Following the conventional notation, we use X Y to denote X U Y , X A to denote X U {A}, etc. 21 We now briefly describe a relational transaction language used in [AV89] and state a relevant result from there. Let f l b e a relational database schema and R be a relation schema in D. An atomic condition on R is an expression of one of the following forms: A = a,a,A — where A € R is an attribute, a G li is a constant, and x G V is variable. A condition on R is defined as a finite set of atomic conditions on R. Similar to conditions in the framework of semantic databases, for a condition F, Att(T) (Ai^ef(T)) denotes the set of attributes occurring in F (in the form A — t for some t £ Id U V). An atomic operation on D is an expression of one of the following forms: 1. i n s e r t e r ) , where R is a relation schema in D and F is a condition on R such that Att^cf (F) = R ; 2. deleteft(r), where R is a relation schema in D and T is a condition on R\ 3. m odifyfl(F || F'), where R is a relation schema in D and F,r' are two condi tions on R. A (parameterized) relational transaction (or RL transaction) on a relational database schema D is a sequence of atomic operations on D. The semantics of relational transactions is defined in analogy to the semantics of SL transactions for semantic databases given in Chapter 2. The formal definition can be found in [AV88b, AV89]. D efin itio n : Let D be a relational database schema, £ a set of functional and inclusion dependencies over D , £ ' C £, and T a relational transaction on D. T preserves S' under £ if d G S'at(£) implies [[T[Q!]J(d) € S at(£') for each instance d G inst(D) and each assignment a : V — > li. T h e o re m 3.1: ([AV89], Theorem 6.6) Let D be a relational database schema and £ be a set of functional and acyclic inclusion dependencies. Suppose £ x, ..., £ n 22 are subsets of £ and T\ , ..., Tn are relational transactions. It is decidable whether for each i G [l..n], T * preserves £,• under £. □ The proof presented in [AV89] deals with the special case when £^ = £ for every i G [l..n]. However, it is easy to extend the proof to the general case. 3.3 Decidability of Dependency Preservation Problems In this section, a framework to translate semantic databases and SL transactions into relational databases and relational transactions is presented. Using this trans lation framework, the following results are shown. Testing if an arbitrary SL trans action preserves a set of functional dependencies or a set acyclic inclusion depen dencies can be reduced to the problem of testing if a set of relational transactions preserves a set of dependencies under another set of dependencies in the relational framework. The reductions are further extended to the case where both functional and acyclic inclusion dependencies are considered. Since the preservation problem in the relational context is decidable, it follows that the preservation problem in the semantic context is also decidable. This section is organized into four subsections. Two translations are first pre sented in § 3.3.1. One translates a semantic schema D into a relational schema Drel and the other translates each database instance d of D into a correspond ing relational database f T C \(d). A mapping grc\ which maps SL transactions to RL transactions is then defined. Lemma 3.4 states that the two mappings f T C i(d) and < 7 re i essentially “commute” with each other, modulo some homomorphism. Using the mappings f T ei(d) and gT e i, three reductions for the preservation problems are presented subsequently. § 3.3.2 discusses how functional dependency preservation by an SL transaction can be translated into functional dependency preservation by RL transactions. § 3.3.3 demonstrates a reduction for the case of acyclic inclusion dependencies. Finally, § 3.3.4 considers the case where both functional and acyclic inclusion dependencies are present. 23 3.3.1 R ed u cin g to th e R ela tio n a l F ram ew ork We start with a discussion on how semantic database schemas and instances are translated into relational schemas and instances (respectively); and then describe a simulation of SL transactions using RL transactions. Let D — (C, isa, A) be a semantic database schema. We assume that D has k (> 1) weakly-connected components Di = (C,-,isa;, A,-) for i £ [!..&]. For each Di, we construct a relation schema DtA = {A \ A £ A (P ) for some class P £ C,-} U {Bp | P £ C,-}, where {Bp \ P € C,} C A is a disjoint (from the set of attributes in Di) set of distinct new attribute names, called the set of class attributes for Di. For each class P in Di, the attribute Bp is called a class attribute for P. The relational database schema simulating D is defined as DtA = {D{el | * G [l..k]}. N o ta tio n : Let D be a semantic database schema with k (> 1) weakly-connected components D\ , ... , D*. Denote by D\A the relation schema corresponding to Di obtained in the above process for i £ [1..&] and denote by DT el the relational database schema corresponding to the semantic schema D. Before we describe the translation of instances, we assume the existence of the following: ® a countably infinite set of new constants J \ f = {fi\, //2 , /^3 , • • •} such th at AfnU is empty; ® a new constant v 0 M U IA. The definitions of relation and database instances are extended to the new domain 14 U Af {i in the natural way. The relational databases simulating the semantic databases to be described below will use the constants from this domain. Let 1,0 £ 14 be two distinct constants. Suppose that d = (o, a, o\) is an instance of D. The relational database instance drel corresponding to d is obtained from d as follows, where ddA = dTel(D{el) is a relation instance of D{el for i £ [!..&]. 24 Let be a one-to-one total function from O x A into M and Ri the isa-root of the component Di. Unless explicitly mentioned, is assumed for the remainder of this section. Let be the mapping from objects to tuples defined as follows. For each o G o ( R i ) , th ^ io ) is a tuple in Tuple(Z)tel): a if P G C t -, A G A ’ (P), o G o(P), and o.A — a, 1 if A — Bp, P G C i, and o G o (P ), h^(o, A) if A = Bp, P G C t, and o $ o(P), h^OjA) otherwise. Then d f 1 = {thfl(o) | o G o(i?j)}. Finally, let / re] denote the above translation, i.e., a mapping from inst(D) to inst(Drel) such that / rei(d) = <fel. N o ta tio n : Let D be a semantic database schema with k (> 1) weakly-connected components ..., Df., and d an instance of D. Denote by cTel (or d f for i G [1 ..&]) the relational database instance (relation instance) obtained from the above transformation. Intuitively, one tuple is created for each object o in o(Ri). The defined attribute values of o are recorded in the tuple and undefined attribute values are determined by the function h^. For each class attribute, the tuple takes value 1 if o is an object in that class otherwise the values is determined by h/i. As will be shown in § 3.3.2, each functional dependency on class P will be translated into a functional dependency over the relation schema D[cl whose left-hand-side includes the class attribute Bp. Thus, distinct values for the class attribute Bp assigned to objects not in class will not cause a violation of the functional dependency translated. E x a m p le 3.2: Consider again the semantic schema D in Example 2.1. Since D has one weakly-connected component, the corresponding relational schema contains a single relation schema Dr {x = {Name, SSN, Salary, Works-In, Major, First-Enroll, %-Appoint, B PERSO N ) ^STUDENT; ^EMPLOYEE; ^GRAD-ASSIST} ■ The relation instance corresponding to the semantic instance in Figure 2.2 is shown in Figure 3.1. □ 25 Name SSN Sal. Wor. Maj. Fir.-En. %-A. B p B p B $ B q Charles 0302 150 Hist. Hist. 1986 49 l 1 l l David 3698 /H V - 2 CS 1988 t* 3 l 1 Faith 6657 140 cs g e 1988 H 7 1 1 /* 8 Chris 9709 2 0 0 EE EE 1989 gio 1 1 1 A *n Michelle 0067 g-12 /H3 ^14 Pi5 g ie l g n gi8 ^19 Figure 3.1: A Relational Simulation cT el We now consider the translation of SL transactions into relational transactions. Let D = (C, Isa, A) be a semantic database schema with k (> 1) weakly- connected components Di, i G [l..k]. Suppose Ri is the isa-root of Di for i € [l..k]. D efin itio n : The function gie\ mapping from SL transactions to RL transactions is defined inductively as follows: 1. if T is empty, then grei ( T ) is the empty relational transaction. 2. gTe\ ( T ; 9) = < 7rei(T); 9' where 9 is an atomic SL transaction and 9' is a relational transaction obtained from 9 in the following manner: C a se 1: 9 = create(jR,-, F). Here R, is the isa-root of Di. Let S be the set of attributes, which are defined on descendants of Ri but not on Ri, that is, S — {A \ A £ A (Ri),3P € C i,A G A(P)}. Letting Fi = {A — v \ A G S}, 9r = in ser t£>rei(r U Fx U {Bp{ = 1} U {Bp = 0 | Bp G D{el, P ^ Ri})- C a se 2: 9 — delete(i?;, F). In this case, 9' — delete£,rei(r). C a se 3: 9 = modify(P,r,r/). 26 If P is a class in the component Di, then O ' = m o d if y ^ (T U {Bp = 1} || T'). C a se 4 ' 0 = general ize(P, T). Let S be the set of attributes directly defined (i.e., not inherited) on P or any descendant of P. Thus, S = {A | A ( E A (Q) for some Q isa* P}. Let IT = {A = v | A e S}. Then O ' = modifyDr.i(r U {BP = 1} || Tj U {BQ = 0 | B q g D p1 , 0 isa* C a se 5: 6 = s p e c ia liz e ^ , Q , T, T'). O ' = modify£> je.(r U {BP = 1 ,B Q ^ 1} || V U {BR = 1 | Q isa* R}). L em m a 3.3: Let T be an SL transaction on D and a an assignment from variables to U. Then, ^rei(T[a]) and ^rel(T)[a] define the same mapping. P ro o f: Obviously, the mapping grc\ does not change any variables in T, nor does it introduce new variables. Hence, it “commutes” with assignments. □ As we shall show below, the translations from semantic databases to relational databases and from semantic transactions to relational transactions behave similar to “homomorphisms,” or in other words, the two translations “commute.” How ever, there is a technical problem in showing the commutativity. This is due to the use of many new values for class attributes in order to translate semantic in stances satisfying certain functional dependencies to relational instances satisfying the corresponding functional dependencies. In the following, we introduce the m ap ping £ to eliminate those new values for class attributes and then show that the comm utativity holds under £. Let £ be a mapping from Af to {i', 0} defined as: e £(a;) = 0 if x = h^o, Bp) for a class attribute Bp where P in D; ® £(x) = v if x = A)) for a non-class attribute A; 27 d ----------------------- ~ --------------------- fre l | fre\ U { d ) ----------------------------------|[fci(T)]|(/„i(<i)) = 4 /^ ( [ T ] ( d ) ) ft.l(J ) Figure 3.2: Relationship of f T S \ and giei © £(x) = x otherwise. The mapping £ is extended to include such that it is the identity mapping on elements in li U {//}; it is further extended to relations, relational databases, and semantic databases in the natural way. L em m a 3.4: For any ground SL transaction T and d E inst(D ), the following holds (see also Figure 3.2): &Tj(d))) = M d ))). P ro o f: The proof uses an induction on the number n of atomic updates in T. W hen n = 0, T is an empty transaction. It is obvious that / ^ ( [ [ ^ ( d ) ) = /rel(d) = [fi'rei(71 )Jj(/ rei(d)), and hence the result holds. We assume now th at the induction hypothesis holds when T has n atomic updates. Induction Step: T' = T; 8 and T' has n + 1 atomic updates. Now gvt.\{T') = grei(T)] 6 '. By the induction hypothesis, « / r d ([T ](d ))) = « b „ l( T ) ] ] ( /ld(d))). Suppose that D has k weakly-connected components Di,i E [l..fc] and d E inst(D). Let di be the sub-instance of d corresponding to D,. It is easy to see that a M l T 'K d i ) ) ) = a h ^ ( T ' ) M a(dim for i e [1-fc] 28 implies W ithout loss of generality, we assume now th at D has one weakly-connected component with isa-root R. In the following context, we also use DT el to mean the relation schema corresponding to D. We will consider the following five cases, based on what 0 is. C a se 1: 9 = create(i?,F). By the definition of gT e i, 6' = insert £>rei(T U ^ U {BR = 1} U {BP = 0 | BP e DT e \ P ^ R}) where r x = {A = v | A < E D™\ A £ A(R), VP € C , A ^ BP}. Since |[T ;0 ](oO = p M 7 l W ) and 9 is a create operation, there are two possible situations. Let o be the new object created by 9. (1) There already exists at least one object o’ in |[T j(d) such that o’ resides only in the class R and o,of have the same values for all attributes. In this case the two objects o, o' are said to be indistinguishable from each other. (2) There are no such objects in QjjT]](d) which are indistinguishable from o. The proofs for ( 1 ) and (2 ) are similar, so only the proof for (1 ) is given. Since there exists at least one object in [[T]](d) which is indistinguishable from the object o created by 0 and indistinguishable objects yield the same tuple under the mapping (/rei o £), f(/rd ([r](< i))) = f(/™i(|[T; «]](<<))) = « /„ ,( [T ](d ) )). From the induction hypothesis, Let o' be one of the objects in [ [^ (d ) which is indistinguishable from o. From the definition of / rei (using thtl), £ (^ (o ) ) = ^(4 M (o'))- induction hypothesis, (since o' is in class R in HjF]](d),) ( ( K ( o < ) ) e a f c lf r g t /r e lM ) ) . 29 On the other hand, it is obvious that the tuple created by O ' is exactly £(£^(o))- Hence, = « [ s r d ( r ) ] ( / ^ ( J ) ) ) u {$(«*>))} = « [sM (r)l(/rd (< 0 )). Case 2: 6 — d e le te (P , F). Then, gIe\{T') = gxA(T)\0' where O ' = delete^rei(F). It remains to show: { ( / r e l f T ; » ] ( < £ ) ) ) = < E (I> r« l(r ); ( / « l ( r f ) ))- The following is a proof for the direction 3 , the other direction being similar. Suppose £ (/rei([[y; 0]](d))) ^ £(|[&-ei(T);#']] (/rei(d))). Then there exists a u in £dD7rei(T); @'~§(fTei(d))) but not in £ (/rei([[T; ^H(d))). Since 6 is a d elete operation, and « I f c i ( T ) : 9 ' ] ( / r . i ( r f ) ) ) £ f ( [ s « i ( r ) l ( / « i ( < i ) ) ) , where the right-hand-side of the above inequalities are equal by the induction hy pothesis. This implies u e £ (/rei([[T]](d))) — £ (/rei([[T; $](d))). Thus there is an object < ? € [T j(d ), £ (^ M (o)) = u and o |= F. It can be easily shown th at u |= T and hence u £ (/rej(j][T’; 0 j](d))), a contradiction. Cases 3 - 5: The three cases when 0 — m odify(P, T, T'), 6 = g e n e ra liz e (P , F), or 0 = s p e c ia liz e (P , F) can be shown by similar arguments. □ 3 .3 .2 F u n ctio n a l D e p e n d e n c ie s We now present the first main theorem of this chapter. It shows a correspondence between the preservation of functional dependencies in semantic databases and 30 the preservation of functional dependencies in relational simulations of semantic databases. T h e o re m 3.5: Let D be a semantic schema, X — » Y a functional dependency on D, and P the class in D such that X Y C A*(P) and for any other class Q, X Y C A *(Q) implies Q isa* P. Given an SL transaction T on D , there is a relational transaction Tp on D T el such that: T preserves X — ► Y on D iff Tp preserves X B p — > Y on Drel. P ro o f: W ithout loss of generality, it is assumed that D has only one weakly- connected component. Let Tp = Tdeiete- , gT ei(T); deleteDrei({Bp 1}), where Tdejete consists of a sequence of the following d e le te operations (in any order): (1) For each A G DTel, there is an operation: d e le te ({ A = i/}). (2) Let C be the set of classes in D. A set of classes S C C is classifiable if S is nonempty and VP, Q G C, P € S and P isa* Q implies Q G S. Now, for each non-classifiable set of classes S, there is an operation: delete({P p = 1 | P G 5} U {BP ^ 1 | P g S}). Intuitively, a set of classes is classifiable if it corresponds to a role set. The transaction X d e i e t e deletes tuples which do not correspond to objects since the values of these tuples under class attributes are not “well-formed.” The transaction X d e i e t e is used to find a witness which corresponds to a semantic database by / rei whenever there is a witness that Tp does not preserve the functional dependency. We now show th at T preserves X — > Y if and only if Tp preserves X B p — > Y. (If.) T does not preserve X — ► Y implies Tp does not preserve X B p — * Y. Since T does not preserve X — > Y, there exist a semantic database d = (o, a, ot) G inst(D) and an assignment a : V — * U such th at d G Sat(X — ► Y) but I T M I M i s * t ( x y ). 31 _____ Claim 1. The relational instance f rel(d) G Sat(XBp — Y ). Proof of Claim 1: Suppose f ie\{d) does not satisfy X B p — > Y. Then there exist two tuples u, v G fiei(d) such th at ujAfPp] = u[A T jE ?p] but u[A] v[A\ for some A G Y. By the definition of / rei, there exist two objects o, o' in d such that th (o) = u and t^M (o/) — v. Furthermore, from the definition of the function t ^ , it is obvious that u[Bp] = v[Bp] = 1. This implies o, o' G o(-P) and it is easy to conclude that d £ Sat(X — > ■ Y). This contradicts to d G Sat{X — *Y). □ Claim 2. [TP [a]]|(/rel(d)) £ Sat{XBP -* Y). Proof of Claim 2: Let [T[o:]]](d) = (o', a', < ? * ')• Since [T]](d) £ Sat(X — * F ), there exist two objects o, o' G o'(P) such that VA G X , a'(o, A) = a'(o', A) but 3A G Y,a'(o, A) 7 ^ a'(o', A). By the definition of / re1, ^ ( 0 ) and are in • M P W J M ) . On the other hand, for each object occurring in d, the set of classes in which the object resides is classifiable. This implies that the tuple in / rei(d) obtained from the object will not be deleted by the transaction Tdeiete- Also, the definition of / r e 1 implies that the special constant v does not appear in / rei(d). Because Tdeiete is a ground transaction, [ T d e i e t e J ( / r e l ( d ) ) = / r e l (< * ) and also By Lemma 3.4, { (/rd lrla O trf))) = « [ s „ J( r [ a ] ) ] ( /r„(<J))). Since o ,o’ G o'(P), ^(o)[T?p] = th^o'^Bp] = 1 and {a | 3A G X L, a — a'(o, A) or a = a^o', A)} C IA. Therefore, there exist two tuples u, v G such that £ ( ^ ( ° ) ) = £(u) and £ (M ° ')) = f(v). 32 By the definition of £, it must be the case that th^o^X B pY ] = u[X B pY ] and also that th^o'^X B pY] = v[XBpY\. In particular, u[BP\ = v[BP) = t^(o)[BP] = 1(= 4 > ') [ £ p D - It follows that u,v £ [< 7rel(T[o:]); d elete({I?p ^ ^ } ) J ( / r e l ( ^ ) ) - Hence, £(«),£(u) € |[Tp[a]]](/rei(d)). Since \L4 € X ,a'(o, A) = a'(o', A) and 3A 6 F, a'(o, A) a'(</, A), it is easy to verify that there exist two tuples u \ v' e p M a J J (/«!(</)) such that £(u') = u, £(i/) = v, and in particular, u'[XBp] = v'[XBp\ but u'\Y\ ^ v'[Y], □ (O nly if.) Tp does not preserve X B p — * ■ Y implies T does not preserve X — * Y. Since Tp does not preserve X B p — > Y, there exist a relation instance r € inst(DTcl) and an assignment a. :V — > li U {p} such that r £ Sat(XBp — * ■ Y) but [ r F [a]](r) g - y). Let r' = [T delete]](r). Since [ T d e i e t e ( ^ ) ] = r '- [Tp[a]]](rO = [ f c l(T[a]);delete({HP / l } ) ] ( [ T delete](rO ) = [< 7rei(T[a]);delete({Bp ± l} )]([T delete] ( r)) = [ r p[«]](r ) Hence, [ r P [a]](r') £ Sat(XBP -* Y). Obviously, r’ £ Sat{XBP -> Y), because r £ Sat(XBp — > y ) and r' C r. It is also easy to verify that for each tuple t £ r': 1. there exists a class Q £ C such that t(Bg ) = 1; and 2. t(Bq) — 1 implies (a) for every class Q1 £ C, if Q isa* Q' then t{Bq>) = 1; and (b) for every A £ A*(Q), t(A) £ U. We now define a family of partial functions, each of which maps inst(DTel) to inst(D) and behaves like an “inverse” of f T e\. Let s £ inst(DTel) be a relation instance with m > 0 tuples which satisfy the conditions (1) and (2) above. Suppose h is a one-to-one total function from Tupie(T>rel) to O. Let f s h eni(s) = (o ,a,oi+1), where 33 9ta(T) 9rel( T ) f h J sem fh J sem T fh J sem Figure 3.3: Relationship of f^em and gFe1 ® o(Q) — {h(t) | t £ s, Q G C, = 1}, • a(/t(i), A) = a if t G s, A G A (Q), t(Bq) = 1, and t(A ) = a , and ® / = maximum index of objects in the set | i € s}. /sem i‘ s undefined elsewhere. A property similar to that of Lemma 3.4 holds for / S g m. The main difference is th at the equality relationship there becomes the inclusion relationship here. This reflects the fact that two tuples can collapse if they hold the same values for all attributes whereas two objects can never collapse. L em m a 3.6: For each given ground SL transaction T and s G inst(DTcl), there is a one-to-one total function h : Tuple(Z)rel) — > O such that: P ro o f: The argument is by induction on SL transactions T . It is trivially true when T is an empty transaction. For the induction step, suppose T; 9 is the SL transaction. By the induction hypothesis, there is an h such th at (Figure 3.3) /.L (II> ,d(r)](s)) c Mk/jLM)- Case 1 : 6 — c re a te (R , T). 34 Suppose grei(T;$) = grei(T); 8'. Then I > r d ( T ; « ) ] ( * ) = p ' ] ( [ » r d ( r ) ] ( s ) ) . The semantics of 8' is to add a tuple into There are two subcases: either the tuple already exists in [fifrei(T)]j(s); or it is not in b rel(T)]](s). In the first subcase, letting h' — h, it is straightforward that # ) ] (s))c It- 0 ] ( / £ m(s)). In the second subcase, let t be the new tuple added by 8 '. Suppose o is the abstract object used by 8 when creating the new object, and h(t) = om. Let t' be the tuple such that h(t') = Ok. Since om, O k are not used in the semantic instance / S e„x ([Ij? rel]])} define h'(t) = o*., h'(t') = om, and h'(x) = h(x) for all other tuples x. It is easy to see that the result holds. Cases 2 - 5: The cases when 8 is d e le te , modify, g e n e ra liz e , or s p e c ia liz e can be shown by similar arguments with h! = h. □ In order to show that T does not preserve X — ► Y, we need to construct a witness instance d € inst(D). Recall that r' = [[3rde ieteJ(r )- By the above lemma, there exists an h such that /^em([.9 rel(-/'[«])]]I?’')) Q Let d = fsem(r')- ft follows easily that d 6 Sat(X Y). It can be shown that [[T|a]]](d) £ Sat(X — > Y). Since [pp[ai]]](r') ^ Sat(XBp — > Y), there exist two tuples u,v € H2p[a]J(r/) C [[<7rei(r[a ])]](r /) such th at u[XBp\ = v[XBp\ but u[y] ^ From the construction of Tp, u[Bp] = v[Bp\ = 1. Because X Y C A *(P), there exist two objects o,o' e / s f c e m fc ei(T H )l!(r')) ^ [F [a ]]( /£ m(r')) = |[T1[a]J!(d), such that VA € X,o.A = o'.A, but 3A £ Y,o.A ^ o'.A. Hence, T does not preserve X — > Y . This concludes the proof of the theorem. □ T h e o re m 3.7: Given a set E of functional dependencies over a semantic database schema D, and an SL transaction T on D, the problem whether T preserves S is decidable. P ro o f: Suppose that E = {Xi — » Tj | 1 < i < n}. Let P* be the class such that 35 (1) XiYi C A*(Pi) and (2) for every class Q, XiYi C A *(Q) then Q isa* Pi. Now let fji = XiBpi — > Y{ for i G £ ' = { < 7; | 1 < i < n }, and = Tp;. Using a slight generalization of the proof of Theorem 3.5, it can be shown that: T preserves S if and only if for every i G [l..n], Tpi preserves c r* under S'. Since the latter is decidable, the problem of functional dependency preserva tion in semantic databases is decidable. □ 3 .3 .3 A c y c lic In clu sio n D e p en d en cies We now consider acyclic inclusion dependencies. It turns out th at the reduction presented for preservation of functional dependencies does not work for inclusion ■dependencies. The reason is that to obtain a “well behaved” witness of not pre serving the dependency, say P[Al5..., An] C Q[Bi, ..., Bn], deletions of tuples in jthe relation “containing” the class Q are needed. The deletions, unfortunately, do not guarantee the result being a witness, because an inclusion dependency may be violated as. a result of such deletions. However, it is shown that the problem of deciding preservation of acyclic inclusion dependencies on a semantic schema is still decidable. The result is obtained by a modification of the reduction for functional dependencies. T h e o re m 3.8: Let D be a semantic schema with k weakly-connected components D i,..., Dfc, P[A1, ..., An] C P'[A[,..., A'n] an acyclic inclusion dependency over D , where P, P' are classes in Di,Dj (respectively), and T an SL transaction on D. Suppose further that D rel(TUel, DT A) is the corresponding relational database (relation) schema simulating D(Di,Dj). Then there exists a relational transaction T' such that T preserves P[A\ , . .., An] C P'[A'1?. . . , iff r preserves D f[ A u BP] C D f [ A [,..., A'n, B P > }. 36 iProof: The relational transaction T' is constructed as follows: T' = ^delete,jD *e l 5 delete^re^lPg ^ 1 | Q is a class in the component D j }); ■ ^ m o d ify ,jD Je l > 9re\ (T); deleteDreA({BPi / l} ) ; deleteDre i({Pp ^ 1 }) 3 J where Tde iete,D ?e l removes all tuples in the relation instance for D ™1 which do not correspond to objects under mapping / rei, similar to the proof of Theorem 3.5 and -^m odify,D re l is ^ sequence (in any order) of the following operations: {modifyprei^PQ = 1}||{J5q< = 1 | Q isa* Q'}) | Q is a class in DA. 3 We now show that T does not preserve P\A\ , ..., A„] C P I[A[,..., A(J if and only if T' does not preserve Z?[el[A i,. . . , A n, Bp] C D^el[A^,. . . , A ' n, Bp>]. (O nly if.) Since T does not preserve P[Ai,..., An] C P f[A[,..., A(J, there exist an instance d 6 inst(D) and a variable assignment a , such that d = (o ,a,og) < E Sat(P[Ai,..., An] C P'{A!X ,..., A'J) but [7>]]](<J) < £ Szt(P[Au ..., A„] C P '[A '„ . Let f?, B! be the isa-roots of the components D ^D j (respectively). Let r E inst(Drel) be the relational instance defined by: 1. r(D fl) = f re](d)(Dje[) for I / j , and 2. r(Z))el) = / rei(d)(Z)[el) U S, where S contains a tuple t for each object o € o(R) - o(P): (a) *(^m) = <A„(o)(An) for m € [l..n], (b) <(5p/) = thti(o)(Bp>), and 37 (c) t(B ) — h^(o,B) otherw ise. Claim 1: r e S a ^ D f 1^ , . . . , A n,B P} C D f [ A [,..., A'n, BP,\). This follows easily from the construction of r and the condition that d € Sat(P[A 1, . . . , A n]G P '[A ’ 1,...,A 'n]). Claim 2: [ T 'H ] £ S atfC p 'K ,..., An, B P] C D f [ A \ , ..., A'u, BP,)). Obviously, [^deiete.D- 1 ; d e le te ^ ({B q ± 1 | Q is a class in D f!}); 7 m odify jD re i]](r) = f iel(d). Using an argument similar to that for the case of a functional dependency in the iproof of Theorem 3.5, it can be verified that jT 'fa']]]^) does not preserve the linclusion dependency. (If.) Suppose T' does not preserve the inclusion dependency D[el[A i,.. ., A n, B p\ C Djel[A[ ,..., A ' n, BPi]. Then there exist a relational database instance r € m s£(Drel) and a variable assignment a such that r € S a tfD r1 ^ ! , • • •. A ., BP\ C D f[A !,, . . . , A!n, B P,}) but [ r [ Q]](r) £ S a ttO r1 ^ , , . . . , An, B P] C D f [A',.. . . , A'n,B P,]). Clearly, r> ~ E 2 delete,D"1 ]](r ) is also a witness of T' not preserving the inclusion dependency. Now consider the semantic database instance fseia(r"), where r" = [[delete^rei({B q ± 1 | Q is a class in D f 1 }); Tmod;fyDrei]](r'). From the construction, it is immediately seen that / L ( r ”) € Szt(P{Au .. ., A,] C P'{A\,.. ., AJJ). Using a slight generalization of the proof of Theorem 3.5, we can show that M ( / L M ) i S at(p[A „ . . . A 1 5 m , • • •, < ] ) • D We now present the result stating that it is decidable to check if a set of inclusion dependencies is preserved by a given SL transaction. 38 T h e o re m 3.9: Let D be a semantic schema and £ a set of acyclic inclusion dependencies. There is an algorithm to decide if an arbitrary SL transaction T on D preserves £. P ro o f: For each inclusion dependency < 7 E £ , construct a relational transaction Ta and an inclusion dependency a 1 in the relational framework. Now the problem of T preserving £ is reduced to Ta preserving a' under £ ' = {&' \ cr E £} for each inclusion dependency in £. The construction of Ta and a' is as in the previous theorem. □ 3 .3 .4 F u n ctio n a l an d A c y c lic In clu sio n D e p e n d e n c ie s Here we consider the case where the set of dependencies includes both functional and inclusion dependencies. Neither translations presented in § 3.3.2 and § 3.3.3 would work. Intuitively, the translation for the case of only functional dependencies does not work because, when considering an inclusion dependency P[A] C Q[B], objects not in the class P would introduce tuples possibly resulting in violations of the inclusion dependency. More importantly, deleting non-classifiable tuples in the relation where Q belongs will not guarantee that the semantic database obtained satisfies the inclusion dependency. On the other hand, the translation for the case of only inclusion dependencies tends to retain as many tuples as possible in the relation where Q belongs. This will possibly cause some functional dependencies to be violated in the semantic database obtained. In the following another reduction is presented which deals with both functional and inclusion dependencies simultane ously. The main idea is to “pad” the relation schema corresponding to the weakly- connected component containing P , with class attributes of the weakly-connected component containing Q in order to ensure that deletions of non-classifiable tuples do not cause violation of any inclusion dependencies. D efin itio n : Let D be a semantic database schema and £ a set of functional and acyclic inclusion dependencies. Suppose further that T is an SL transaction and < 7 E £. T preserves a under £ if for every instance d E inst(D) and every assignment a of variables, d E 5 at(£ ) implies [[T[a]]](d) E Sat(a). 39 T h e following proposition is straightforw ard. 'P ro p o sitio n 3.10: Suppose D is a semantic database schema, E a set of func tional and acyclic inclusion dependencies, and T an SL transaction. T preserves S df and only if T preserves a under E for each < r < E E. □ In the following context, let D be a semantic database schema with k (> 1) weakly-connected components D \ ,..., Dk- Suppose that C,- is the set of classes occurring in D, for i £ [l..fcj. L em m a 3.11: Let S be a set of functional and acyclic inclusion dependencies on D\ and Z? 2 • Suppose a £ E is an inclusion dependency: P[A] C P'[A'] and T an SL transaction on D. Then there exist: 1. a relational database schema DT el = { D ^ , D ™ 1 , ...} , 2 . a set of functional and acyclic inclusion dependencies E, 3. an inclusion dependency a such that the set of all inclusion dependencies in S U {5} is acyclic, and 4. an RL transaction T, such that: T preserves a under E iff T preserves a under E U {d}. P ro o f: Suppose a\,...,crn are all of the inclusion dependencies in E and for i € [l--n], = PilA] C P/[A']. W ithout loss of generality, we may assume that cr = (?i and {Pi | 1 < i < n} C C 1 ? {PI I 1 < * < n} C C 2. 40 For i £ [2..A :], define Dr tcl = D*el to be the relation schema constructed as in § 3.3.1. To define -Dp1 , we introduce the following definitions. Let 7 be a one-to-one function from {1 ,..., n} x C 2 to A such th at for each i € [1 ..n] and each class Q € C 2, 7 (i, Q) D\el. In other words, 7 designates a distinct class attribute for each dependency < J i and each class P in D2. Moreover, these attributes do not overlap with the attributes already in the relational simulation D r {] of D \ . Now define D f = D fu{-y(i,Q ) | 1 < i < n,Q € C 2,Q ft P '} and Drel = {D f \ 1 < i < k}. The set {7 (2 , Q) \ Q P}} will be used in the simulation of 0 7 of & i- (l(h PI) is om itted because it will be replaced by Bpt in the simulation.) If S is a set of class names, we use S to denote an enumeration of S. The mapping 7 is extended to enumerations of classes in the natural m anner with the order preserved. Also, B-$ denotes an enumeration of the class attributes for classes in the enumeration S with the order preserved. Now for i £ [l-.rc], let < 7 , = 5 ; d [A.Bp,7 (i, C T 7 7 ? } ) ] C D f[A ’Bp.Bv ^ w n }. For each functional dependency 6 = X — * ■ Y € £, we construct a functional dependency 6 — X B q — * Y, where Q is the class such th at X Y C A *{Q) and for any class Q', X Y C A *{Q‘) implies Q’ isa* Q. Now S = { < ! > | 8 £ E} U {Wi | 1 < i < n}. The inclusion dependency a is defined as: a = D f[A B P] C Df[A'BP,\. T is now constructed as: T — T - • J d e l e t e ,D\el ’ T X d e l e t e ,D \e X ,V 41 T - d e l e t e , jD Je l , n ’ T - • ± d e l e t e , D J el ’ 9ta{T)\ d e lete ^ ({J 5 p ^ 1}); d e le t q^ ( { B p > ± 1}) where 1 . r delete^ re l consists of a sequence of the following d elete operations (in any order): for each non-classifiahle set of classes S ' C Q , there is an operation d e le te g ,el({f?g = 1 | Q £ S'} U { B q ^ 1 | Q 5}); 2 - T d e l e t e , D l e> i S t h e a n a l ° § f ° r ^ e l o f d e l e t e , D?> 9 1 1 (3 3. for each i £ [1.. A :], Tdelete£ jrei i consists of a sequence of the following d elete operations (in any order): for each non-classifiable set of classes S C C 2, if S' is nonempty, then there is an operation d e l e t e ^ e i(r), where T = ({B P t — 1 \ P( £ S} U {B P i ± 1 | P[ g 5} U {7 (1, Q) = 1 \ Q £ S,Q / m u {7 (i, Q) 1 1 I Q t s, Q ± PI}). Informally, T ,e]ete g re l and Tddete ^ re l delete tuples that can not be m apped to objects by some mapping f^exn. Consider the proof for the direction th at a witness (of not preserving the dependency) in the relational context implies the existence of a witness in the semantic context. In other words, we have to find a semantic database which satisfies the dependencies, and which will violate the dependency after being updated by T. In this case, the above deletions in the relation D T 2l could possibly result in the corresponding semantic instance not satisfying the in clusion dependencies. To avoid such situations, we use Tdelete g,e i i for each inclusion 42 dependency al. Intuitively, delete by 1 % ensures that whenever ^delete b re l deletes a bad” tuple (since its class attributes are non-classifiable) the deletion will not cause the corresponding semantic instance (obtained using violating the inclusion dependency at. It remains to show that T does not preserve a under £ if and only if T does not preserve a under £. Since the proof is similar to the proofs of the cases for functional dependencies and for inclusion dependencies, only a sketch of the construction is presented. (If.) T does not preserve a under £ implies T does not preserve a under £. Since T does not preserve a under £ , there exist an instance d — (o, a, om) of D and an assignment a of variables such that: d G S at(£) but [T[a;]J(d) ^ Sat(a). We now construct an instance cFel of DT el such that <?el € Sat(E U {£}) but [ f [a ]](^ el) 0 Sat(a). Let d'ei = drel(D‘el). Further let dt be the “sub-instance” of d corresponding to the weakly-connected component D{. For each i G [3..A :], d|el = / rei(dj). The relations d\el and ddf are constructed as follows. The relation d\el includes a tuple t\{o) for each object o in dp 1. For each Q G C l 5 Uo)iBQ) = ( 1 if 0 6 °W ) | /^(o, jB q) otherwise 2. For each attribute A G A (Q) for some Q G C i: Uo)(A) = h (0’A) if 06o(Q) I hfj,(o,BQ) otherwise 43 3. Finally for each attribute A = j(i,Q ),Q £ C 2 ,Q 7 ^ P(, we consider two cases: Case 1: o £ o(Pi). Since d € Sat(Pi[Ai] C P/f/Tj]), there is an object o' € o(P-) such that a (o,A{) = a (o', A'). Then f l(o)(A) = ( 1 ‘f £ ° (Q) ( h ^o ', B q ) otherwise where B q is the class attribute for the class Q £ C 2 . Case 2 : o o{Pi). Then we simply let ti(o)(A) — h^o^A). Since all inclusion dependencies in E are from D x to Z?2, we need to insert extra tuples into c £ jel so that the inclusion dependencies in E and also < r are satisfied. Thus, we construct = /re: (^ 2 ) U U r t where for each i £ [1 ..&], rt is essentially to make < rf = C 2 - « '} ) ] c D'2"[AiB P,Bu ;z w n ] satisfied. Formally, for each object o in d\ and o ^ o(Pi), we construct a tuple £ 2 (0 ) as follows: 9 t 2(o)(BQ) = /im (o,7(z,Q)), ® t 2(o)(BP> ) = h^o, B Pi), and » t 2(o)(A) — h^(o,A) for all other attributes A. It is easily seen that dT el satisfies both E and a. Using the “com m utativity” property of f re1 and gT ei, it can be shown that [T[o:]]](drel) does not satisfy a. (O nly if.) T does not preserve a under E implies T does not preserve a under E. 44 The proof for this part is relatively easy because of the construction of T. Since T does not preserve a under E, there exist an instance r of (Fel and an assignment a of variables such that r 6 S'at(E) D Sat(a) but [[T[a]]](r) ^ Sat(a). Let f = T'\ < 7rei(T );T" and r' = [T '](r). It can be verified th at for some mapping h, / S em(U) is a witness that T does not preserve a under E. □ Using the same construction as in the above lemma, it is straightforward to show that the result still holds if a is a functional dependency. Hence, L em m a 3.12: Let E be a set of functional and acyclic inclusion dependencies on D\ and D2. Suppose a £ E is a functional dependency and T an SL transaction on D. Then there exist: 1. a relational database schema DT el = D 3d, .. 2. a set of functional and acyclic inclusion dependencies E over DTel, 3. a functional dependency a. and 4. an RL transaction T such that: T preserves a under E iff T preserves a under E. □ The decidability of preservations of functional and acyclic inclusion dependen cies follows easily from the above two lemmas. T h e o re m 3.13: Let D be a semantic database schema and E a set of functional and acyclic inclusion dependencies over D. There is an algorithm to decide if an arbitrary SL transaction T preserves E. □ 45 3.4 Complexity In this section, we study complexity issues concerning testing dependency preser vation. We show that even in extremely simple cases where there is only one dependency and a very simple schema, the problem of testing preservation of the dependency is co-NP-complete [GJ79]. Specifically, the problem is co-NP-complete in the following four cases: ( 1 ) there is one nontrivial functional dependency A — > B, the semantic schema contains at least one class with two or more attributes (including A, B), and the SL transaction consists of only c re a te and d e le te operations; (2) there is one nontrivial functional dependency A — > J3, the relation schema has at least two attributes (including A ,B), and the RL transaction consists of only in s e r t and d e le te operations; (3) there is one nontrivial acyclic inclusion dependency P[A] C Q[B]. the seman tic schema contains at least two classes P and Q, each having at least one attribute, and the SL transaction contains only c re a te and d e le te opera tions; (4) there is one nontrivial acyclic inclusion dependency Ri[A] C R 2[B], the rela tion schema contains at least two relation schemas R\ and R 2, each having at least one attribute, and the relational transaction consists of only in s e r t and d e le te operations. The proofs for showing the above problems are co-NP-hard are all based on reductions from the 3-Satisfiability problem [GJ79] which is stated as follows: 3-Satisfiability: Given a collection C = {ci,...,cm} of clauses on a finite set V of (propositional) variables such that |c,| = 3 for each i £ [l..m], is there a tru th assignment for V that satisfies all the clauses in Cl We now present the complexity result of testing functional dependency preser vation of SL transactions. 46 T h e o re m 3.14: Let D = {{P },0,A (P ) = {A, B}) be a semantic database schema and A — > B a nontrivial functional dependency over the schema D. Given an SL transaction T on the schema D which consists of only c re a te and d e le te operations, testing if T preserves A — > B is co-NP-complete. P ro o f: We show that the complement problem is NP-complete. In other words, it is NP-complete to find a database instance d G inst(D) and an assignment a : V — > IA such that d G Sat(A — + B ) but |[r[a]J(d ) ^ Sat{A — » B ). We first show that it is in NP. Consider the corresponding FD ABp — > B on the relational database schema DT el which contains one relation schema {A, B , Bp} and the transaction T' as constructed in Theorem 3.5. By the result from [AV89], T' preserves ABp — * B if and only if it preserves the dependency for every relation containing constants from the set S = C'(T') U Ck+& where C{T') is the set of constants occurring in T', k is the number of variables in T \ and Ck+e is a set of A ;+ 6 new constants, i.e., constants not occurring in T'. Thus, we guess a relation r and an assignment a which maps variables in T' to S. Then checking if r and a are a witness of T' violating the functional dependency can be done within polynomial time. I We now show that there exists a transaction T , consisting of only c re a te and d e le te operations, such that finding a witness of T not preserving A — * B is NP-hard. The proof is based on a reduction from the 3-Satisfiability problem to this problem. ] Now suppose V = {ri,..., xn} is a set of n prepositional variables and sup pose C = {ci,... ,cm} is a collection of m clauses on V such th at each C j - contains exactly three literals for i G [l..m]. We construct an SL transaction T consisting of only c re a te and d e le te operations such that T does not preserve the functional dependency A — ► B if and only if there is a mapping from {x\,..., xn} to {true, false} such that every clause in C is true, i.e., C is satisfiable. T has n variables j/i,..., yn, corresponding to x t ,... ,x n. T is constructed as follows: 47 T = delete(P, 0); create(P, {A = 0,B = 0}); delete(P, Tj); delete(P, r z); delete(P, Tm); create(P, {A = 0, B — 1}) where for each i £ I\- is constructed using c ,- as follows. Let c ,- = h,2i , 3 }- For each j £ [1-.3], we construct a condition 7 {B — Vk} if % i,j — li,3 = < {B ^ yk} if /,tJ - = ->a:f e Then F; = {7 ^ | 1 < j < 3}. We now show that T does not preserve A — * ■ B if and only if there is a tru th assignment such that C is satisfiable. (If part.) Suppose C is satisfiable. Then there is a mapping from {0 7 ,..., xn} to {true, false} such th at c ,- is true for each i £ [l..m]. Define the assignment a from variables {t/i,... ,yn} to constants {0 , 1 } as: 1 if Xi is true a (Vi) = 0 if Xi is false Let d = (0, 0 , 0 1 ) be the empty database. It is easily verified that the database will have only one object 01 where o\.A — 0 and 0\.B = 0 after the first two operations (d e le te and c re a te ) of T have been executed. Now for each i £ consider the operation d e le te (P , T;). Since ct = {h,i-,h,2^ 1,3 } G C is satisfied under the truth assignment, there must exist a j £ 48 [1..3] such th at lij is true. Suppose — Xk = true. Then ot(yk) = 1 and < 2 (7 7 ) = a({B = yk}) — {B = 1} from the construction. Suppose Uj = -ix* = true. Then ot{yk) = 0 and 0 (7 7 ) = a({B ^ yk}) = {B ^ 0}. In either case, the object o does not satisfy 7 7 and hence does not satisfy F,. Therefore, o is not deleted by the operation d e le te (P , F,-). Since o is not deleted by any delete(P,T ,-) for i £ [l..m], the last c re a te operation of T causes a violation of the functional dependency. Hence, T does not preserve the dependency. (Only if part.) Now suppose T does not preserve the functional dependency A — » B. There exist an instance d and an assignment a of variables {y1?... ,yn} such th at d £ Sat(A — > ■ B ) but |[T[a]]](d) ^ Sat(A — > B). For i £ [l.-n.], assign true to Xi if a(yi) ^ 0 and false if a(yi) = 0 . It remains to show that c ,- is satisfied for i £ [l..m]. Obviously, after the first two operations of T, the database has only one object o whose values under attributes A and B are alb 0. Since [[T[a]]](d) does not satisfy the functional dependency, it must be the case that for each i £ [l..m], d e le te (P , I\-) does not delete the object o. For each i £ [l..m], consider the ground condition o;(rj). There must be a j £ [1..3] such that < 2 (7 , 7 ) one t ^ ie following two forms: • B = a where a ^ 0 is a constant, or • B ^ Q . By the construction of 7 , 7 the literal m ust be true in each of the above cases. Hence, c ,- is satisfied for all i £ [l..m] and C is satisfiable. □ Using the technique presented in the above proof, it is easy to show th at testing if a relational transaction with only in s e r t and d e le te operations preserves a nontrivial functional dependency is co-NP-complete in the context of relational databases. 49 P rop osition 3.15: Given a relational database schema which contains at least one relation schema R with at least two attributes, a non-trivial functional de pendency a over R, and a relational transaction T consisting of only in se r t and d e le te operations, it is co-NP-complete to decide if T preserves a. □ We now consider acyclic inclusion dependencies. Motivated by the proofs for the case of functional dependencies, it is not hard to obtain similar results regarding acyclic inclusion dependencies. In the following we show that checking preservation of a single nontrivial inclusion dependency is co-NP-complete even in an extreme case where there are two classes each with only one attribute. T heorem 3.16: Let D be a semantic database schema and P, Q two classes in D which are not weakly connected. Suppose that each of the classes P and Q contains one attribute (A and B , respectively). Suppose further that P[A] C Q[B] is an inclusion dependency over D and T is an SL transaction consisting of only create and d e le te operations. It is co-NP-complete to decide if T preserves P[A\ C Q[B}. Proof: The proof is very similar to the proofs of previous theorems, so only the construction of the reduction from the 3-Satisfiability problem is presented. Let V = {xi,..., xn} be a set of propositional variables and C = {ci,..., cm} be a collection of clauses on V such that each clause has exactly three literals. We will construct a transaction T such that C is satisfiable if and only if T does not preserve the dependency. The transaction T is built as follows: T = deletep(0); d e le te ^ 0 ); insertp({A = 0 }); deletep(Li); deletep(Ti); deletep(rm ) 50 where is obtained from ct in the same way as in the proof for a functional dependency. It is easy to show that if is indeed a reduction. □ Slightly modifying the proof for the case of semantic databases, it is easy to obtain the following result for the case of relational databases. P ro p o s itio n 3.17: Given a relational database schema containing at least two unary relation schemas, a nontrivial inclusion dependency over the schema, and a relational transaction consisting of only in s e r t and d e le te operations, checking if the transaction preserves the inclusion dependency is co-NP-complete. □ 51 Chapter 4 Object Migration In this chapter, we initiate the theoretical study of object migration in object-based models. In a class hierarchy, an object can simultaneously belong to a set of classes, called a role set, and can be migrated to a different role set. During the life span of an object, the sequence of sets of classes that it belonged to forms a “migration pattern” of the object. A new kind of natural dynamic constraint, named “mi gration inventories,” is defined and studied. These constraints restrict the possible patterns through which objects can migrate. SL transactions are considered here in conjunction with the new dynamic constraints. An essential problem relating to the new dynamic constraints is to ensure that transaction schemas yield only per m itted migration patterns (soundness), and all permissible migration paths can be obtained using the provided transactions (completeness). If a transaction schema is both sound and complete with respect to a migration inventory, the transaction schema generates the inventory. A key issue here is to be able to characterize the migration inventory generated by a given transaction schema. We first formally introduce the notion of “migration patterns.” The new dy namic constraint is then defined. Four different kinds of inventories are introduced and discussed. Theorem 4.9 shows that the set of migration patterns generated by any SL transaction schema is a regular set; and conversely, any regular inven tory (regular set of migration patterns) can be “simulated” by some SL transaction schema. The result implies the decidability of the completeness and soundness for SL transactions schemas with respect to the dynamic constraints of inventories. 52 As noted at the end of the Chapter 1 , the m aterial discussed here is independent of the discussions presented in Chapter 3. This Chapter is organized into two sections. Migration patterns and inventories are introduced in Section 4.1. The main results are presented in Section 4.2. 4.1 Migration Inventories In this section, we first define the notions of “role set” and migration pattern. The formal definition of a migration inventory is then provided and four kinds of inventories are discussed. We illustrate the notions through a series of examples. In the language SL, operations on objects in one class cannot depend on the “content” of other unrelated classes. Furthermore, objects cannot migrate to classes which are not weakly connected. Hence, we assume without loss of generality that the schema graph (i.e., the class hierarchy) is weakly connected. This assumption is similar to focusing on a single relation in [AV89]. The assumption will be relaxed when we consider richer languages in Chapter 5. We now provide some definitions. Intuitively, a “role set” corresponds to a set of classes that an object could possibly reside in at the same time. Due to inheritance induced by the class hierarchy, role sets are closed under isa. D efin itio n : Let D = (C ,isa, A) be a weakly-connected semantic schema. A role set on D is a subset to of C such th at for each class P in C, P in uj implies that all ancestors of P are also in to, i.e., {Q G C | P isa* Q} C lo. The empty role set is denoted by The set of all role sets on D is denoted by fL The set of non-empty role sets, i.e., D — {w^}, is denoted by fi+. E x a m p le 4.1: Consider the database schema D shown in Example 2.1. The set of role sets is {u;^, [G], [S], [E], [SE], [P]} where [G] denotes the isa* closure of {GRAD- ASSIST}, ..., and [SE] denotes the isa* closure of {STUDENT, EM PLOY EE), etc. 53 In the instance shown in Figure 2.2, the role sets of oi, 0 4 , and 0 5 are [G], [SE], and [P] (respectively). The role set of o6 is u> $. □ Let D be a database schema and d = (o, a, oi) be an instance of D. An object o occurs in d if o € o(P) for some P £ C. If an object o does not occur in d, RoleSet(o,d) = c 0^. Otherwise, let u; be the set of all classes o belongs to in d, that is, = {P | o € o (P )}. Obviously, uj is closed under isa* and therefore a role set in 0 . We then define RoleSet(o,d) = co. The following fact (proof om itted) states that the two operations s p e c ia liz e and g e n e ra liz e are sufficient to m igrate objects between role sets. P ro p o s itio n 4.2: Let D be a database schema and 001,002 £ be two nonempty role sets on D. There is a ground transaction T consisting of only { s p e c ia liz e , g en eralizej-o p eratio n s such that if d £ inst(D) and o £ O with RoleSet(o, d) = co 1 , then RoleSet(o.^T'J(d)) — co2. D We now consider object migration patterns, i.e., sequences of role sets through which objects can pass in their life cycles, in the context of a given transaction schema. Migration patterns are also viewed as words over the alphabet H. In this study, we focus on patterns starting from the empty database (0, 0, 0 1 ). In general, a migration pattern of an object may start with an element in co% (before being created), be followed by an element in (while in the database), and end in an element in co J again (after being deleted). D efin itio n : Suppose D is a schema and Q the set of all role sets on D. An object migration pattern is a word over fI which is in the set co^Q+co^. In many applications, objects migrate between classes following particular pat terns. The set of patterns captures some behavioral information regarding the objects in the databases. If the changes to the database are constrained to only these patterns of object migration, they serve as a new kind of dynamic constraint. N o ta tio n : Let L be a language (set of words) over an alphabet E. The initial words of L, denoted INIT(L ), is the set {x \ 3y E E*, xy 6 L}. 54 E x a m p le 4.3: Consider Example 4.1. Suppose that each, person will live through exactly one continuous tim e period as a student, perhaps receive assistantships from some point on, and eventually be employed. This can be expressed as a set of migration patterns: INIT(L) where1 , 2 L — aA£[P]*[S]*[G]*[E]+[P]*a^. □ The notion of “migration inventory,” which is a language over ft, is now pre sented. A migration inventory is viewed as a dynamic constraint on database updates. D efin itio n : An object migration inventory (over ft) is a set L of object migration patterns such that INIT(L) C L. Before we define the notion of an SL transaction schema “satisfying” a m igration inventory, we discuss two orthogonal decisions that allow us to study four different kinds of migration inventories: laziness and immediate- or delay-start. Laziness concerns whether consecutively repeated role sets are included or not. Since in reality an object may not m igrate or even be updated very frequently, this notion allows us to “forget” consecutively repeated role sets and to focus exclusively on changes to role sets. Formally, we define the function (‘r ’ emove ‘r ’epeats) / rr : ft* — » ft* as3: ( 1 ) /rr(A) = A ; (2) /„-(a) = a if a € ft; (3) /„.(waa) = frr(wa) if a G ft and w £ ft*; and (4) / „ ( wab) — / „ (wa)b if a, b £ ft, w € ft*, and a ^ b. The notion of immediate start restricts the focus to patterns where objects are created on the first transaction (or the first role set is not empty). D efin itio n : Suppose D is a weakly-connected semantic schema and ft the set of all role sets on D. An object migration pattern w is lazy if frr(w) = w. An object migration pattern w is immediate start if w (jj o^ft*. Finally, every object migration pattern is also a nonlazy and delay start migration pattern. An object migration inventory L is lazy (nonlazy) and immediate start (delay start) if it is a set of lazy (nonlazy) and imm ediate-start (delay-start) patterns. 1 We also use regular expressions to denote languages. 2a+ = aa*. 3A is the empty word. 55 Figure 4.1: A Class Hierarchy For Four Operations E x a m p le 4.4: Continuing from Example 4.1, [P][S][G][E] is a lazy and also im m ediate-start migration pattern. [P][S][S][S][G][G] is not a lazy migration pat tern, but it is immediate start. Finally, w^w^[P][P][P][S] is a delay-start (non- im m ediate-start) migration pattern. Let L = {[P][S]n[G]m[E]fe [P] | n ,m ,k > 1}. Then INIT(L) is an (immediate- start) migration inventory. /^(INIT^L)) = {[P], [P][S], [P][S][G], [P][S][G][E], [P][S] [G] [E] [P]} is a lazy migration inventory. □ In concurrent programming, operations on shared resources have to be syn chronized to ensure correctness. One of the mechanisms used to control concurrent operations is called “path expressions” [CH74]. Path expressions are regular expres sions (over the set of available operations) specifying the order in which operations can be executed without causing inconsistency of resources. The following example illustrates th at the path expressions are a special case of inventories. E x am p le 4.5: Let B be an abstract data type with four operations p,q,r, and s. A path expression is a regular expression which specifies orders in which the four operations can be executed. Using a schema (Figure 4.1) to represent four operations by four subclasses of the root, each path expression is converted into a migration inventory in the natural fashion. For example, suppose (p(q U r)s)* 56 is a path expression of the four operations. Then, the nonlazy inventory L = INIT(u>^ • (p(q U r)s)* • specifies the restriction that each transaction which simulates one operation has to obey the path expression. □ D efin itio n : Let D be a weakly-connected semantic schema, Q , the set of role sets over D , and T a transaction schema on D. An (object) migration pattern of T is an element • • • ton in with the following property: 1 . n > 0 ; and 2. = 3 o € 0 , 7\,.. ., Tn € T , and assignments a^,..., a n such that RoleSet(o, di) = u > i for i € [l-.ra], where d0 = (0 , 0 , < ?i) and di — [[Tj[a,-]]](<fi_i) for i € [l..n]. The migration inventory generated by T, denoted £ (T ), is the set of all migration patterns of T . D efin itio n : A migration pattern of T is immediate start if it is an element of f1+ ■ tuj. A(n imm ediate-start) migration pattern u of T is lazy if u = /„■(«') for some (imm ediate-start) migration pattern u' of T . The im m ediate-start inventory generated by T , denoted Amm(T), is the set of all imm ediate-start migration patterns of T. The lazy (or lazy and immediate- start) migration inventory generated by T is £ lazy(T) = /n.(£(T )) (or £ - ^ ( T ) = /rr(£;mm(T))). The semantics of the new family of dynamic constraints is now defined as follows: D efin itio n : Let D be a semantic database schema and T a transaction schema. Suppose further that L is a migration inventory over D. T is sound with respect to (w.r.t.) L if £ (T ) C L; T is complete with respect to (w.r.t.) L if L C £ (T ). T is sound (complete) under lazy (immediate-start) patterns w.r.t. L if L is lazy (imme diate start) and the lazy (imm ediate-start) inventory generated by T is contained in (contains) L. We now present some examples concerning migration inventories generated by transaction schemas. 57 jE xam ple 4.6: Consider the database schema D shown in Figure 2.1 and assume jthat each person is uniquely identified by his/her Social Security Number (SSN). The transaction schema T consists of the following four transactions: Ti(n,ssn,t,m) = create(SSN=ssn,Name=n); specialize(PER SO N ,STU D EN T,{SSN =ssn}, {M ajor=m ,First-Enroll=t}). T2 (ssn,p,s,d) = s p e c ia l ize(STUDENT,GRAD-ASSIST, {SSN=ssn}, { %- App oint= p , S alary= s, Work- In= d }). T3 (ssn) = general ize(EMPLOYEE,{SSN=ssn}). T4 (ssn) = delete(PERSON,{SSN=ssn}). Intuitively, T\ enrolls a student; assigns an assistantship to the student iden tified by the provided social security number; T3 cancels a graduate assistantship of a specified student; and finally T4 deletes a student (who quits or has graduated) from the database. Suppose that these are the only transactions. It is easily seen that each object will be initially created as a student, may get several assistantships from tim e to time, and finally will be deleted. Hence the migration inventory generated by T, £(T), is the set of all initial words of ^ ( [S ] + [G]*)*u/£. Also, £ lazy(T) = {uw^[S], u;*[S][G], « 4 S ] [ % } . □ The following example shows that for a regular expression on role sets, it is possible to design a transaction schema to generate only those migration patterns which are prefixes of the words in the given regular language. E x am p le 4.7: The Ph.D. program in some departm ent contains “screening” and “qualifying” examinations. A Ph.D. student, therefore, is either unscreened, screened, or a candidate. These three phases, however, are sequentially ordered as shown in Figure 4.2(a). Suppose th at the schema for the Ph.D. students is shown as in Figure 4.2(b) and ID is an attribute defined on the class STUDENT which 58 (u n s c r e e n e d SCREENED CA N D ID A TE^ (a) The Life Cycle of A Ph.D. Student STUDENT CANDIDATE SCREENED UNSCREENED (b) The Database Schema Figure 4.2: An Example of Object Life Cycle identifies each student. The transaction schema T can be designed to preserve the sequential order. In particular, T contains the following parameterized transactions as the only operations to m igrate objects in the class STUDENT. Tj(sid) = create(STU D EN T,{ID =sid}); specialize(STU D EN T,U N SC R EEN ED ,{ID =sid}). T2 (sid) = generalize(U N SCREEN ED ,STU D EN T,{ID =sid}); specialize(ST U D EN T,SC R EE N E D ,{ID =sid}). T3 (sid) = generalize(SC R E E N E D , STUDENT, {ID=sid}); specialize(ST U D EN T,C A N D ID ATE,{ID=sid}). □ Example 4.5 demonstrates that each path expression can be expressed as a migration inventory. The following example illustrates that it can further be in corporated into the design of transactions which will autom atically obey the path 59 A, B Figure 4.3: A Database Schema expression. (Conversely, it will be shown in the next section that given a set of transactions (concurrent processes), it is possible to figure out the corresponding path expression.) E x am p le 4.8: Consider the database schema shown in Figure 4.3. There are three classes P, Q , R and two attributes A and B. Now consider the regular ex pression P(QQP)*, where P (Q) is a role set containing a singleton class P (Q). Furthermore, let L — INIT(u j^P(QQP)*u j^). L is now a (nonlazy and delay-start) migration inventory. We can design a transaction schema T to generate the inventory L. Specifically, T consists of the following transaction: T(x) = T0 (x);T1(x);r 2;T3; r 4 (x) where Tq(x) = modify(Q, {A = c, B = x}, {A = d}); delete(i?, 0 ) Ti(x) = g e n e r a liz e ^ , {A = c, B ^ x}); modify(i2, {A = c], {A = a}); s p e c ia liz e ^ , P, {A = a}, 0) 60 T2 — modify{Q, {A — b}, {A - c}) T3 = generalize(P, {A — a}); s p e c ia liz e (P , Q , {A = a}, 0); modify(Q, {A = a}, {A = 6 }) T 4 (x ) = c re a te (P , {A = a, B = a;}); sp ecia lize(P , P, {A = a}, 0). Here a, b, c, d € W are constants. These constants are used to “control” migra tion of objects. Intuitively, the transaction T4 will create an object in the class P . The transaction T3 will m igrate object(s) in the class P to Q and T2 will let ob je c ts ) stay in Q. Transaction Tj will finally migrate those objects whose attribute B values are not x to P to enter another migration cycle. For those objects having x as their value for attribute P . the transaction To simply deletes them from the database. The param eter x is used to “randomly” determine whether objects will continue to migrate or to be deleted. As another example, consider the regular expression u>^(PQ* U QP*)u>The transaction schema T = {T} will generate the initial language of this regular ex pression, where T(x)= d elet e (P ,{P = x}); 1 generalize(Q, {A = 1}); sp ecialize(P , P, {A = 1}, 0); generalize(P, {A 7 ^ 1}); sp ecia lize(P , Q, {A 7 ^ 1}, 0); create(P, {A = x , B = a?}); sp ecia lize(P , P, {A 7^ 1}, 0); sp ecia lize(P , Q, {A = 1}, 0). Intuitively, T creates an object with the attribute value provided by the pa ram eter x. Depending on whether x — 1 or not, the object created will follow the migration pattern QP* or PQ*. The attribute B is again used to delete objects randomly. □ 61 4.2 SL Transactions Schema And Inventories We now present the main results regarding SL transaction schemas. It turns out that for each of the four kinds of patterns introduced, the family of all inventories generated by SL transaction schemas are exactly the set of all regular inventories of that kind. As a consequence of this, the soundness and completeness of SL transaction schemas with respect to inventories are decidable. T h e o re m 4.9: Let D be a database schema and fl the set of role sets on D. 1. For any transaction schema T: (a) £ (T ), £imm(T), £ lazy(T ), and £ ^ ( T ) are all regular. (b) £ (T ) = • £ imm(T). 2. For every regular set L C 0 ^ , there exists a transaction schema T such that £ (T ) = ■ INIT{L • w;). Since inventories yielded by SL transaction schemas are regular sets and there exist algorithms to determine if one regular set contains another regular set [HU79], it immediately follows: C o ro lla ry 4.10: Let D be a weakly-connected semantic database schema, L a (lazy and/or im m ediate-start) migration inventory, and T a transaction schema on D. The following are decidable: (1) T is sound (under lazy and/or imm ediate-start patterns) w.r.t. L\ (2) T is complete (under lazy and/or imm ediate-start patterns) w.r.t. L. □ We define a function (remove empty initial) f ie\ : C l* — » ft* as: f T e\(l > J \ • • • ojn) = where k > 1, u> k and u > i = L v < j> for i € [l..(& — 1)]. The following corollary reveals the relationship between four kinds of inventories generated by a given SL transaction schema T. 62 C o r o lla r y 4 .1 1 : T he following are true: 1 . £><“ * - ( T ) = / „ ( £ ( T ) ) ; 2. £ ] S , ( T ) = / r r ( £ i m m ( T ) ) ; 3. £im m (T) = /« i(£ (T ))| and 4. £ £ , ( T ) = / fei(£ lazy(T )). □ In other words, the diagram commutes: £(T ) £la z y ( T) / r e i J , J , / r e i £ ™ » ( T ) £ | S ( T ) . The remainder of this chapter is devoted to the proofs of Theorem 4.9 and Corollary 4.11. The proof for Theorem 4.9 is accomplished through a series of lemmas. The proof for Corollary 4.11 easily follows from one of the lemmas. We now define the notion of “migration graph” and present the first lemma showing that each regular set can be “realized” by a single SL transaction. P art (2) of Theorem 4.9 easily follows from the lemma. D efin itio n : Let D be a weakly-connected semantic schema and f2+ the set of nonempty role sets over D. A migration graph is a vertex-labelled graph G — (V, E, Label), where 1 . V is a set of vertices containing two distinct vertices: a source vs V and a sink G L; 2. E C (F — {nt}) x (V — {ns}) is a set of edges; and 3. Label is a total mapping from V — {vs, to fi+. 63 L em m a 4.12: Suppose D is a weakly-connected semantic database schema with isa-root R, where R has three attributes A, B, C. Let e be a regular expression over Then there is an SL transaction T such that £({T}) = to; . INIT(L(e). P ro o f: The proof is based on constructing such a transaction T. The construction consists of two parts: first we construct a migration graph Ge from the expression e. Using Ge, T is then obtained. For any regular expression e over the migration graph Ge of e is con structed recursively as follows. 1. Suppose e = 10, where to € f I n this case, Ge — {V, E, Label) where V — {ns, v, vt}, E — {(ys, y), (u, yt)}, and Label(v) = uj. 2. Suppose e = (ei ■ e2). Let the migration graphs for e\ and e2 be Gei = (Vi, Ei, Labelt) and Ge2 — (V2, E2, Label2). W ithout loss of generality, we may assume th at Vi fl V2 = {ys, vt} (otherwise we rename the vertices). Then Ge — (V, E, Label), where V = V 1 U V2, E = {(u,y) € £ 1 | v ^ vt) U {(u,^) £ E 2 \ u ^ ys}U {(u, v) | € E i,(vs,v) € E2}, and Label = Labeli U Label2 3. Suppose e = (ei U e2). Let the migration graphs for e\ and e2 be Gei = (Vi, Ex, Labeli) and Ge2 = (V2, E2, Label2), where Vi H V2 = Let Ge — (Vi U V2, E\ U E 2, Labeli U Label2). 4. Suppose e = (e*). Let Gei = (Vi, Ei, Labeli), Ge = (14, E, Labeli), where E = Ei U {(u,v) | (u,vt) < E Ei,(va, v) e Ex}. In fact, the construction of the migration graph Ge of e is similar to con structing a nondeterministic finite state autom ata for e, except that the vertices are labelled. 64 P Q Q P Figure 4.4: The Migration Graph for P(QQP)* Now let Ge — (V, E, Label) be the migration graph of the regular expression e. Let h : (V — {ut}) — > 14 be a one-to-one total mapping from vertices in V — {w *} to constants. For each vertex u G V — {u<}, define an atomic condition 7 u = (A = h(u)). E x a m p le 4.13: Consider the regular expression P{QQP)* from Example 4.8. The migration graph for P(QQP)* is shown in Figure 4.4. □ We assume th at U contains the integers N = { 0 , 1 , 2 , . . For each vertex {u*} which has at least one outgoing edge, define Vu = {v \ (u, v) G E } to be the set of vertices reachable from u by an edge in E. Let gu be a one-to-one total mapping from Vu to an initial segment of N — {0}, i.e., gu : Vu — {1,..., card(Vu)} where card(S) denotes the cardinality of a set S. For each vertex v G Vu, let Fu(n) be a condition defined as follows: ® If card(Vu) = 1 (i.e., u has only one outgoing edge), then r u(v) = 0; ® Otherwise, card(Vu) > 1, i.e., u has at least two outgoing edges. In this case r u(u) is defined by the following two cases: - ru (u) = {B = gu(v)} if gu{v) ± card(K ), - Tu(u,v) = { B ^ i \l < i < card(K ) — 1} if 9u{v) — card(Vu). In view of Proposition 4.2, given two conditions T and such th at A ti(r) C A (R), Att{Y') C A(i?), we can define a sequence of operations which m igrate all 65 objects satisfying the condition F from a roie set u to another role set u j and modify the attribute values according to F '. We denote this sequence as mig(u,u/, F, r'). The transaction T is constructed as T(x) — create(i£ , {7 ^ , B ~ x,C = 0}); T\x); modify(i?, {C = 1}, {C = 0 }) where x is a variable and T’ is a sequence of the following operations. For each vertex u £ V — {w f} such that u has at least one outgoing edge, T ‘ includes a sequence of operations Tu defined as follows: Tu = modif y(i?, {7 ,,, C = 0}, {B = x, C = 2 }); 0i;...; 9k where k = card(Vu), letting Vu — {n1?.. . ,Ufc}, for each i € [!.■&], 9i mig(Labei(u),Label(ni), r u(nt) U { C = 2}, { 7 Vt, C = 1}) if Vi ± vt d elete(i?, r u(ni) U { C = 2}) if Vi = vt Intuitively, the attribute A is used to determine which vertex the current object is with regard to the migration graph of the regular expression e. If there are several outgoing edges at some vertex then the attribute B is used to choose some edge based on the value of the param eter x. And finally, the attribute C is used to mark the objects which have been migrated so th at they will not be migrated again without exiting the transaction. Note th at the last operation of T is to clear the marks. Claim: Let d be a database instance, o £ O an object which does not occur in d, n > 1 , and a l 5 ■ ■ •, an assignments such that o occurs in [T[o:1]J(d). Then there exist vertices vs = v0, Vi, ■ ■ ■ , vn in the graph Ge such th at for i G [1 ..n], Label(vi) = RoleSet(o,di) and is an edge in Ge, where do = d and di = JT[q^- OW-i) for i € [l..ra]. Conversely, for every walk (vertex repetition is allowed) with n 66 (> 1 ) edges in Ge starting from vs, there exist a database d, an object o, and n assignments such that the above statem ent is true. Proof of the claim: We perform an induction on the number n of applications of the transaction T. Basis: n — 1 . In this case, o is created by the first operation in T and o.A = h(vs),o.B — x,o.C = 0 immediately after it is created. Since o.A = h(vs) and o.C = 0, it will only be updated by the transaction TVs. The first modify operation will change the C value to 2 . In the case of card(VV s) = 1, o will migrate along the only edge out of v3. Otherwise, card(VVs) > 2. Based on x = 1, ...,(& — 1) or 4 x ^ 1,..., (k — 1 ), o will m igrate along some edge departing from vs. Furthermore, each edge originating from vs will be traversed for some value of x. Induction: Suppose the claim holds when n = k and now consider n = k - f 1 . Suppose d is the database after k applications of T and RoleSet(o, d) — uj. Furthermore, suppose that u is the vertex o currently migrates to. By the induction hypothesis, Label(u) = to. Hence, o.A = h(u) and o.C = 0 (due to the last operation of T). Now from the construction of T, o will only be updated by Tu. Note th at the modify operation of Tu will assign the value of x to o.B. Using an argument similar to the above case, it is easy to see that o can m igrate according to Gm(e) and each of the edges can be traversed. The proof of the lemma easily follows from this and the fact o4 will be created into the database by the i-th application of the transaction. □ The proof of P art (1) of Theorem 4.9 involves constructing a migration graph for T such that each object migration pattern produced by T is a walk in the transition graph starting from vs, and vice versa. In the following, we first show th at objects in databases behave independently as far as SL transactions are concerned. (Using this, Corollary 4.11 easily follows.) We then present the main steps of constructing the transition graph. 4x f 1 ,..., (k — 1) is shorthand for x 0 A x 1 A • • • x (k — 1). 67 Let D — (C ,isa, A) be a semantic database schema, d = (o,a, o;) £ inst(D), and I C O be finite. The restriction of d onto / , denoted by d |/, is an instance of D: d|j = (o', a',<?,•), where o'(P) — o(P) fl I for each P £ C, and a' = {((P, o, A), a) £ a | o £ I}. The following lemma can be easily shown using an induction on the length of transactions. L em m a 4.14: If d £ inst(D), T is a ground transaction, and I C O such that every object in I appears in d, then [[F]](d|j) = ([[F]](d))|/. □ Since each object behaves independent of the others, it is easy to see that if an object o has a migration pattern uj^u generated by a sequence of transac tions Tj, • • •, Tn (with assignments cki, ■ • •, an) where n > i, then the sequence of transactions Tt+i, • • • , Tn will generate the migration pattern u for some object o’. Therefore, P art 1(b) of Theorem 4.9 holds. Furthermore, if o, Tfs are as above, the subsequence of all transactions in 7 \, • • •, Tn which changed the role set of o generates the lazy migration pattern f TI(io^u). Therefore, Corollary 4.11 follows. Using the corollary, if £;I[m i(T) is regular, then £ (T ) is regular. Since the family of regular sets is closed under homomorphism, £ lazy(T) and £ ? ^ (T) are also regular. We now show th at C\mm (T) is indeed regular. The above lemma allows us to focus on each individual object when studying migration patterns. To construct the state transition graph, we use a technique generalizing the “hyperplanes” of [AV89] to partition the object space (with respect to role sets and attribute values) such that elements in the same subspace are not distinguishable by SL transactions. To begin, suppose S = {Ai,..., An) is a set of n attributes, and C — {ai,.. is a set of constants. The family 7Tc(S) of separators is obtained through the ; following procedure. (Intuitively, ttc(S) partitions the object space according to the values under a set of attributes S.) 68 First, we use constants in C to partition the value of each attribute in the set S and further extend this partition on all attributes in S. Formally, a condition is a hyperplane on S with respect to C if it is of form {£l £>n } ? where for each i £ & is one of the conditions5 { Aj — a i }-, ■ • •, {A4 — cik} i {Aj ^ ai , . . . , O fc}. Then ttc(S') is obtained from hyperplanes on S by attaching to a hyperplane an equality relation on those attributes which are not equated to any cq’s in the hy perplane. Let F be a hyperplane on S w.r.t. C . We define Att^(Y) = Att(T) — Attdef(F) = {A | {A -= ^ ai,.. .,Ok} G E r}. Let the binary relation Er — {(A ,A ') | A ,A! £ Att^(T)}. For each relation r C £ r , let r* denote the reflexive and transitive closure of r (relative to A tt^(F)). The equivalence relation = r on such binary relations r ’s is defined by: r\ = r r 2 if r* — r% - In the case where card(Att^(T)) > 1, = r yields a partition {[r^],... , [r£]}. Then the hyperplane T is further partitioned into {(F, [r*]) | 1 < j < k}, where k is the number of equivalence classes. In case A tt^( F) = 0, T is not partitioned further. For technical convenience, we will denote this as (F, 0). Now let Trees') = {(F, [r*]) | F is a hyperplane on S w.r.t. C, and [r*] is an equivalence class of relations C _ E r }U{(F, 0) | T is a hyperplane on S and Att^{T) = 0}. Let w G fi be a role set on D. The set of attributes defined on classes in to is Aft(w) = (J A *{Q). Q $ ~ U ) Suppose Const(T) is the set of all constants occurring in T . Let F t = {(w,p) | u > £ H+,p £ 7rConst(T)(Att(oj))} and vs, vt are two new symbols not in F t- We shall show that a migration graph G = (V, E, Label) can be constructed, where V = F rU {ns,n t}. Intuitively, this is because T cannot distinguish objects “corresponding” to the same vertex. 5Note that Ai ^ a*,..., a* is a shorthand of Ai not equal to any of ad s. 69 In the following, we first formally define the above notion of correspondence between objects and vertices and then present the construction of the edges for the migration graph. D efinition: Let d be a database instance of D and o an object occurring in d. The object o matches a vertex (a?, (T, [r*])) E V under d if the following conditions are satisfied: 1. RoleSet(o,d) = u > ; 2 . o \= T; and 3. if = {A | A E Att(co),o.A ^ Const(T)} and ra = {(A, A') | o.A = o.A', A, A' E 5V), then rQ E [r*], i.e., r* = r*. From the construction, it is easy to verify that the vertices in V t form a partition on the object space. Hence, the following lemma is straightforward: L em m a 4.15: For each database instance d of D and each object o occurring in i d, there exists exactly one vertex v E V t which is m atched by o. □ It is also obvious th at if two objects o, o' in some database instance d m atch a vertex (to, (F, [r*])) in V t, then there exists a one-to-one mapping p :U — *IA which is the identity on Consf(T) such that p(o.A) = o'.A for every attribute A £ Att(cj). i The condition above is denoted as p(o) = o'. N o ta tio n : Let d be a database instance and o be an object occurring in d. If T is a ground transaction, then HjT]](o) denotes L em m a 4.16: Let d be a database and o, o' two objects such that o, o' match some vertex v € Vt- K a, a* are two assignments such th at there exists some one- to-one mapping p from IA to U , p(o) = o' and 6 p(a) = a', then for each transaction T E T , both [[T[g:]]](o) and [[Tfcs'']]] (o') m atch some v' £ Vf. 6For each variable x, p(a)(x) = p(a(x)). 70 P ro o f: Observe th at for each (ground or parameterized) condition T, o |= cc(r) if and only if p(o) j= a^F ) since p(a) = a'. The lemma can be easily verified by induction. □ L em m a 4.17: Let nx = (u>i, (Fi, [r^])) and v2 = (tu2, (r2 , [r^])) be two vertices matches v2. P ro o f: Define A =V l A' if (A , A') G r Let [Ax],..., [Ai] denote the equivalence ing vx has the same value for both attributes. Let px, ... ,pi be I new values not Suppose T (xi ,.. ., xm) € T is a transaction with m variables xx, ..., xm. We further assume that {iq,..., z /m} is another set of m new values which does not intersect with hi U {px, ..., pi}. Claim: There exist a database instance containing o which matches vx and an assignment a such that in Vt- It can be decided if there exist a database instance d consisting of a single object o matching vx, an assignment a , and a transaction T ^ T such that J[T[a]]](o) classes yielded by =Vl. Intuitively, A, A' are in the same class if each object m atch- in li. We construct an instance dV l = (o, a, o2) (extended to include the values Pi, .. . ,pi in the natural fashion), where o (P) {ot} if P G u > t 0 otherwise and if A — ai G Fi and G Consf(T) if (A ^ ai,..., ak) G F and A G [A ,-] j[T[o;](o)]] matches v2 if and only if there exists an assignment a' : {aq,... ,x m} — ► (Const(T) U {px, ...,p i} U {vu ..., vm}) ^ such that [[Tfa^Ox )j] matches v2. 71 From the above claim, there are only finitely many assignments th at need to be considered. The decidability result of the theorem easily follows. We now present the proof of the claim. Proof of the claim: The if part is trivial. Now consider the only if part. Suppose there exist d and a satisfying the above condition. Since both o and Oi m atch Vi, there exists some one-to-one mapping p from hi U {/Ui,..., pi} to hi U {p i,..., pi} such that: p(o) = Oj. Now we define a' by the following procedure: for 2 = 1 ste p 1 to n do: if a(xi) = o.A for some A th e n a'(xi) — p(a(xi)) = p(o.A) else if 3j £ — l)],a(x ;) = a(xj) th e n a.\xi) = a'(xj) else a'(xi) = i/t - We further define a mapping p' on hi U {pi, ..., p{] U {^x,.. ., um} as: 1 . p'{a) = p(a) if a = o.A for some A, 2. p'{a{x)) — a'{x) if VA, o.A ^ a(x). In this case a!{x) = for some i and we also let: 3. p'(vi) = p(a(x)), 4. p'(a) = p{a) for the rest of a £ (hi U {pi, ■ ■ ■ , /«;}), and 5 . p\a) = a for the rest of a € {ui, ..., 2/m}. It is easy to verify that p' is one-to-one and p'(o) — p(o) = Oi. We now show th at p'(a) = a'. If a(x) --- o.A for some attribute A, then p'(a)(x) = p'(a(x)) = p(a(x)) = a'(x) from the definition of a'. Otherwise, for every attribute A, a(x) ^ o.A. By the definition of /?', p'(a)(x) = p'(a(x)) = a.'(x). 72 Applying Lemma 4.16, we have [[T[a ]](o) and [ r [ a'}~ l (°i) m atch the same vertex v-^. □ Now let E\ = {(?i, v ) | there exist a database d containing an object o which matches u , an assignment a and an SL transaction T € T , such that [r [a ] ](o ) matches t> }. The edge set of the migration graph will include Ei, along with the edges corresponding to the following two situations: 1 . object creations; 2 . object deletions. Since the constructions for these cases are similar to the case discussed, we briefly state how the edges are obtained. Edges emanating from vs will correspond to objects created by transactions in T . To determine edges departing from vs, we consider for vertex u € V t, whether the database d = ( 0 , 0 , o\ ) will be updated to a database which includes an object matching u. Using the similar procedure as in the proof of Lemma 4.17, this can be shown decidable. We let E s — {(us,m) | there exist an assignment a, and a transaction T in T such that there is an object in j[T[o:]]](d) which matches u). For constructing edges into vt, we also modify the procedure in Lemma 4.17 to check if there is a case when some object matching a vertex u in V t is deleted. We define Et = | there exist a database d containing an object o m atching u, an assignment a and a transaction T 6 T , such that o is not in [T [a ]](«)}• Now let E = Ei U Es U Et and Label be the mapping V t — > defined as: Label(u) — u > if u = (oo,p) £ Vt- Finally let G — (V, E, Label). 73 L em m a 4.18: Let G = (V, E, Label) be constructed as above and o £ O be an arbitrary abstract object. Then the migration pattern of o is wj • ps, where ps is a walk7 in G starting from vs. P ro o f: The proof is based on an induction on the number n of applications of transactions in T. Basis: n = 0. This is trivial since the migration pattern for o is an empty word. Induction: Suppose the lemma holds when there are n applications. Consider the situation when there are n + 1 applications. Let d be the database instance obtained after n applications of transactions in T. Let a be the assignment and T £ T a transaction such that Tja] is the (n + l)-th application. There are two cases to be considered. Case 1 o does not occur in d. If o does not occur in (0F[a:]]](d), then the result holds. Otherwise o is in a]J(d). Since the create operation does not depend on the d, there is an object o' in [[T[o]]]({0, 0,O l)) which is indistinguishable from o (modulo databases). T hat is, both o and o' reside in the same set of classes and have the same values under all attributes. Suppose o (and also o') matches u £ Vt- From the construction of G, (vs,u) £ Es C E. Case 2 o occurs in d. From Lemma 4.14, it is sufficient to consider d |{0}. Suppose o matches it € V t and [[T[a]]j(o) matches v £ Vt- By reasoning similar to the proof of Lemma 4.17, it is easily seen th at there is an edge (u,v) in E\. Hence the lemma is proved. □ From the construction of G, the following result is easily verified. 7 A walk of a graph (V, E) is a sequence of vertices (not necessarily distinct) v i , ... ,vn, where Vi G [1..n],Vi G V and Vi G [l..(n - 1)], (viy ui+ i) G E. 74 L em m a 4.19: Suppose vs has at least one outgoing edge. Then for each walk vs = Vo,..., vn, each object o, there exist m > 0 , m + n assignments a [ ,..., ,..., an, and m + n transactions Xj',.. ., T'm, T i,... ,Tn € T such that if then o does not occur in d and for each i £ [l-.n], ( P i [ a i]l • •' ;^i[«t]])(<0 )l{o} matches v ,-. □ The transactions T(,... ,T'm are used to consume enough abstract objects to eventually use o in Tj. From the above lemma, it is straightforward that the patterns in £;mm(T ) is the set of all walks starting from vs in G. It repiains to show th at the set of all walks starting from vs is regular. From G, it is easy to construct a regular grammar N corresponding to all walks departing from vs. Specifically, N has the set of nonterminals F t U {t>J, the set of terminals fl+ , the start symbol vs, and the following set of productions: 1 . for each edge (u,u) in G where v ^ vt, there is a production: u — » Label(u) v 2 . for each edge (w, vt) in G, there is a production: u — > Label(u) It is easy to verify th at L (N ) is the set of walks starting from vs in G. 75 Chapter 5 Extended Manipulation Languages In this chapter, we introduce two extensions of the language SL and study their ex pressive power in terms of inventory constraints. The main addition is conditionals. Namely, we extend the atomic operators to include a “testing” condition, which allows to check certain conditions regarding the current database. The operation is executed if the condition is satisfied; otherwise, it is not executed. For example, it is now possible to delete objects in a class P if there exist objects in class Q. Essentially, the testing conditions allowed for operations are conjunctions of pos itive and negative conditions, defined in Chapter 2. Two extensions are studied: CSL+, which allows only positive conditions, and CSL, which allows both positive and negative conditions. The focus of our study is on the interrelationship between transaction schemas jand inventory constraints. Our approach is to characterize migration inventories generated by both CSL and CSL+ transaction schemas. It is shown that the set of migration patterns yielded by each CSL or CSL+ transaction schema is recur sively enumerable. On the other hand, for delay-start patterns, every recursively enumerable inventory can be generated by some CSL+ (hence CSL) transaction schema; for imm ediate-start patterns, every recursively enumerable inventory is the left quotient by a regular set of the set of migration patterns generated by some CSL+ (hence CSL) transaction schema. That is, the inventory can be ob tained by padding with a regular set. The problem of characterizing the family of inventories generated by CSL+ or CSL transaction schemas in the case where 76 padding is not allowed remains open. As a consequence of the above results, there are CSL+ (CSL) transaction schemas whose sets of migration patterns are non recursive. This implies that the soundness and completeness regarding inventories are not decidable properties for those languages. It is shown, however, that all context-free inventories can be obtained without padding. The chapter is organized into the following two sections. In Section 5.1, the two languages CSL+ and CSL are defined. The main results on characterizations for transaction schema on those languages are presented in Section 5.2. 5.1 CSL and CSL+ ! In order to extend SL to include testing conditions before updating the database, we define the notion of “literals” and then provide the definition of “conditional atomic update.” Finally, we present the definitions of CSL and CSL+ transactions. Let D = (C ,isa , A) be a database schema, P a class name in C , and T a condition such that Att(T) C A*(P). A positive literal on P is an expression ‘P (T )’ and a negative literal on P an expression ‘-iP (T ).’ A literal on P is either a positive literal or a negative literal on P . A set of literals is positive if it contains only positive literals. D efin itio n : Let D be a semantic database schema. A conditional atomic update (expression) on D is an expression £ of the form £ = 6u ...,6n -> 0 , where n > 1 , 6{ is a literal for each i E [l..w], and 8 is an atomic update (one of c re a te , d e le te , modify, g e n e ra liz e , and s p e c ia liz e ) on D. The conditional atomic update is positive if the set of literals | 1 < i < rc} is positive. The conditional atomic update £ is ground if it contains no variables in the literals S^s and none in the atomic update expression 0 . Intuitively, the atomic update is executed only when all literals are satisfied. Note th at the condition literals defined here are very restricted since variables local 77 to a conditional atomic update are not allowed. All variables in a conditional atomic update are global within a transaction. A transaction is defined as a sequence of conditional atomic updates. D efin itio n : A conditional (or CSL) transaction on a semantic database schema D is a sequence T = where 1 . n > 0 ; and 2. for each i £ [l..n], £; is either a conditional or an atomic update on D. The conditional transaction T is positive (or in CSL+) if £; is a conditional atomic update implies that £; is positive for each i £ [l..ra]. T is an empty transaction if n — 0. T is ground if £ ,■ is ground for each i £ [l..n]. T is parameterized if it is not ground. A CSL (or CSL+) transaction schema is a finite set of CSL (or CSL+) transactions. The semantics for CSL (CSL+) transactions is defined in the natural manner. Speaking informally, the semantics of each ground conditional atomic update is a mapping from database instances to database instances such th at if the database “satisfies” all the conditional literals, then perform the atomic update (as defined in Chapter 2); otherwise, leave the database unchanged. The semantics of a ground CSL or CSL+ transaction is then defined as the composition of the mappings for each update. Each parameterized CSL or CSL+ is a mapping from assignments to mappings from instances to instances. We now formally define the semantics for CSL and CSL+ transactions. Let D = (C ,isa ,A ) be a semantic database schema and d = (o, a, ot -) a database instance of D. In addition, suppose P £ C is a class in D and T is a condition such that AH(T) C A *(P). We say d satisfies the positive literal T*(r), denoted d j= -P(T), if there exists an object o £ o(P) such that o |= I\ Also, we say d satisfies the negative literal -iP(T), denoted d |= ->P(T), if for every object o £ o(P ) it holds that o T. 78 D efin itio n : Let D be a semantic database schema and £ = £x, • ■ ■ ■ , & n Q a ground conditional atomic update on D. The semantics of £ is defined as a mapping from inst(D) to inst(D) such that for each instance d £ inst(D), t a w = U#J(d) if Vi £ [l..n],d |= Si d otherwise. D efin itio n : Let T = £x; • • •; £n be a ground conditional transaction on a schema D. The semantics of T is the mapping [[T]| from inst(D) to inst(D) defined by [ r ] = [ 6 l o - . - o | [ e J . D efin itio n : For any parameterized conditional transaction T , the semantics of | T, denoted fljT]], is a mapping from assignments (V — * ■ U) to mappings (inst(D) — * • inst(D )): if a is an assignment, P I M = [r[aO. In the languages CSL+ and CSL, isolated classes in a schema can be connected by testing literals. For example, suppose P and Q are two classes in a schema D which are not weakly connected. The conditional atomic update P(T) modify(Q,r',r") conditionally modifies the objects in class Q according to whether there is an object in class P which satisfies T. Clearly, the testing conditions add expressive power to the basic language SL. Furthermore, it does not allow us to concentrate on a single weakly-connected component while we study transactions because of this “commu nication” between classes. The assumption of weak connectivity of a schema used in the previous chapter is no longer adopted in this chapter. 79 The property similar to Lemma 4.14 showing that independent behaviors of individual objects under SL transactions does not hold for CSL or CSL+ transac tions. This is easily shown by the following update on the class R, assumed to be an isa-root of some schema D: ->R ({B = y }) — * create(i2, {A — x, B — y}). It creates an object if no objects already in R have the value y for the attribute B. (This, for instance, can be used to enforce the functional dependency A — » ■ B .) The focus of this chapter is to extend the study of object migration inventories initiated in the previous chapter. Since objects cannot m igrate between classes that are not weakly connected, our investigation “focuses” on inventories with respect to some weakly-connected component. We now extend the relevant definitions provided in Chapter 4 to the current context. Definition: Suppose D — (C,isa, A) is a semantic database schema. A set of classes u C C is a role set on D if 1. for each class P in C, P in to implies that {Q € C | P isa* Q} C u > \ and 2 . for each pair of classes P,Q G u> , P and Q are weakly connected. If G is a weakly-connected component of D and G contains all classes in a role set ui on D, u > is also called a role set on G. The set of all nonempty role sets on G is denoted by Oq. Since operations in CSL and CSL+ are performed according to whether certain (testing) conditions are satisfied by the database, it is possible that transactions executed do not change the database at all. In terms of migration patterns yielded by a transaction schema, this is not very interesting because any role set can then repeat just because some transaction(s) having “null” effects can be applied, espe cially when there is a transaction consisting of only conditional atomic updates and all of its testing conditions are not satisfiable. To distinguish these situations from the real impact of conditional transactions on migration inventories, we modify the definition of “a migration pattern yielded by a transaction schema,” by adding J requirement that each application of a transaction must change the database. 80 D efin itio n : Let D be a schema, 0 the set of all role sets on D, and T a transaction schema. A migration pattern of T is an element ■ - ■ u ju in such that 1 . n > 0 ; and 2. 3o £ O, Ti, • • • , Tn € T , and assignments • • • , an such that di ^ d;_i and RoIeSet(o,di) = for * £ [l..n], where d0 = (0,0, ox) and = [[2^-[ 0 :^]]](c?*_i) for i £ [L.n], The migration inventory yielded by T , denoted £ (T ), is the set of all m igration patterns of T. The lazy and/or immediate-start migration patterns (inventory) of T are defined in the natural way. We now provide: D efin itio n : Let D be a database schema and G a weakly-connected component of D. For each transaction schema T , £ ( T, G ) denotes £ (T ) fl • tuj. Since the sets of objects for two non weakly-connected classes are disjoint and objects can not migrate between non weakly-connected classes, the following fact states that migration patterns do not cross weakly-connected components. F act: For each database schema D and transaction schema T on D, if G j, • • •, Gn are all weakly-connected components of D, then £(T)= U £(T,Gi). 1 <i<n □ 5.2 Migration Inventories In this section, the main results concerning migration inventories are presented. Theorem 5.1 states that the inventory generated by each CSL or CSL+ transaction schema is recursively enumerable. Theorem 5.2 shows that for delay inventories, 81 each recursively enumerable inventory can be generated by some CSL+ (or CSL) transaction schema. For immediate-start inventories, it is shown in Theorem 5.4 that each recursively enumerable inventory padded with some regular set can be generated by some CSL+ (or CSL) transaction schema. If padding is not allowed, Theorem 5.9 shows th at each context-free inventory can be generated by some CSL+ (CSL) transaction schema. T h e o re m 5.1: For each CSL+ (CSL) transaction schema T , £ (T ), £imm(T), £ lazy(T ), and £ |^ ( T ) are all recursively enumerable sets. P ro o f: We consider only £ (T ), the other cases leaving similar. We need to show that for each transaction schema T , there is a Turing machine M such that a migration pattern is generated by T if and only if it is accepted by M. Notice that the set inst(D) is recursively enumerable. Since the number of variables in T is finite, for each d 6 inst(D) there are only finitely many assignments which are not isomorphic to each other. It is not difficult to see th at a Turing machine M can be constructed to check if a pattern can be generated by T . □ We now show th at the converse is also true, that is, each recursively enumerable inventory has a corresponding CSL+ (CSL) transaction schema. T h e o re m 5.2: Let D be a database schema containing at least two weakly- connected components G and S, where S has at least four attributes. Then for every recursively enumerable inventory L C there is a CSL (CSL+) transaction schema T such that £ (T ,G ) =w%-INIT{L-ui%). P ro o f: Suppose D = (C ,isa , A) such that D has two weakly-connected compo nents G and S. The component G has the class R as its isa-root. We may assume without loss of generality that the component S contains only the class S with four attributes, namely A x, A 2, A3, and A4. In the following presentation, we assume the terminology for Turing machines (e.g., [HU79]), although the presentation is self-contained for readers familiar with Turing machines. Since L is recursively enumerable, there is a Turing machine M (with a right- infinite tape) which accepts L. We assume th at M does not erase the input word. (If not, it is easy to construct another Turing machine M ' which duplicates the input word and then simulates M on the right copy.) W ithout loss of generality, we assume th at £ is the input alphabet of M and £ = £q,g L J { j 6 } where £«,g is the set of distinct symbols, each of which represents a role set lo € FIg, and is the blank tape symbol. We further assume that the symbols in £ are also constants in W, i.e., £ C U. The proof is essentially to construct a CSL+ transaction schema T such th at T produces only migration patterns which are accepted by the Turing machine M (as words). This is accomplished by simulating the computation of M (on some input word). If M halts, objects are created into the component G and then migrated following the patterns specified by accepted words. The class S is used to store an encoding of a (partial) computation of M . More specifically, the four attributes of S are used to store an encoding of a computation of M in the following manner: 1 . Ai, A 2 are used to hold a chain; 2 . A3 is used to hold the tape contents; and 3. Ai is used to indicate the position of the head and the current state. An example of a “pure” encoding is shown in Figure 5.1. In an actual simulation, the class S may contain some “unusable” portions of encodings, created during the phase of generating an input word. This is due to the fact th at in the language CSL+ it is impossible to test whether a variable has a value which does not appear in the database. The transaction schema T consists of a collection of transactions which cor respond to three phases of simulation: (l) to randomly generate a word, (2 ) to simulate M , and (3) to produce the pattern from the word if it is accepted by M. 83 oid a 2 A 3 A 4 Q % x 4 1 ai — O i2 i 2 a2 — ; ; ; • : Oi4 3 ~ 1 j a, P : I ■ : O i tn — 1 n — 1 $ O ’n — Figure 5.1: A “Pure” Encoding of a Com putation of M A flag will be used to indicate the current phase. The flag is represented by an ob ject in the class S. The object will hold a value among aw ord, aC ompi am ig (€ U — E) for attributes A{ (i € [1 ..4]), to indicate the three phases. Briefly, T has the following transactions: • Tjnit clears the database, initializes the class 5 for simulation and sets a flag for generating an input word. ® Expand adds a symbol to the encoding of input for M if it is in the phase of generating a word. Repeatedly applying Texpand will randomly generate an input word for M. ® Tstartjvr sets the flag for starting the simulation of the computation of M . 9 A transaction Ts(P ta )=(g ,b,m ove) f° r each transition 6 (p, a) — (q, b, move), where move indicates the movement of the head: R (right), L (left), or S (stay or no move). Ts(p,a)=(q,b,move) is used to simulate one step of M according to the transition. ® ^start-mig sets the flag for starting to migrate the object in the component G using the pattern accepted by M . ® Tnug generates the migration pattern. 84 We denote TW ord — " { A2 ' ~ ^word | I — ^ ^comp — (A,' — ® com p | I ™ ^ ™ 4}, and r m ;g = {Ai = a^g | 1 < i < 4}. The three phases are then represented by the testing conditions *S'(rw ord), S^Tcomp), •S'fTmig) (respectively). Suppose that c j,.. . , cn is an enumeration of £ , where for each i (E [l-.ro], c * - is a symbol representing a role set on the component G. The role set represented by Ci is denoted by co(ci). Suppose s (h ) is the starting (halting) state for M and — is a constant in U. Finally, let ^ be a constant in £/, but not a tape symbol in S. We use a natural syntax for abbreviating conditions and literals over the class S , which is illustrated in the following example. The condition T = {Ai = a, Ai ^ b, A 2 = c,A 3 d}, is abbreviated as (a ,^ b;c;j£ d; *), where * denotes that values in this column are irrelevant; and the literal S'(T) is abbreviated as S(a, ^ c; ^ d; *). The transactions in T are presented as follows. © Tinit(^) first deletes everything in the database (Tciear), and then creates two objects in S. One serves as the flag for phases initialized to be the flag indicating the phase of generating a word. The other is the first element in the chain. dinit (a ) — dclear, ere a*t e ($, T w ord) 5 c re a te ^ , (<£;$;#; -)); modify(5', {A3 = # } , {A3 = a ? , A3 = cx}); modify(S, {A3 = # } , {A3 = x, A3 = cn}) where Tciear = d e le te (i? , 0); d elete(5 ', 0). If the value of the param eter x is a nonblank tape symbol (in £ ), then the value is stored into the chain by a modify operation. If x holds a value other than nonblank tape symbols, the symbol # is stored. (Note that at most one of the modify operations will be executed, and this occurs only if the value of x is in Sq.g-) 85 ® ^expand has three variables: x, y, and 2 , where x is to be m atched to the first element in the chain, y is then added to expand the chain, and the value of 2 decides the symbol on the tape: if z holds a nonblank tape symbol, then it is simply stored, otherwise the symbol is stored. TexpandC^; U -, z) consists of the following conditional operations: d e le te(S,(y; *;*;*)); d e le te(S,(*;y; *;*)); c re a te (S ,(4 ;y ;# ; - ) ) ; modify (S', ( 4 ; y; # ; *), (*; *; 2 , cx; *)); modify(5', ( 4 ; y \ # ; * ) , ( * ; * ; 2 , cn; *)); modify(S,(4;a;*;*),(i/;®;*;*)). The above syntax is an abbreviation for repeating the same testing condition literals for every update. Since only the last operation in the above sequence will change the testing condition, the abbreviation does not change the se mantics. The condition r w ora U x, ^ y; *; *; *) is to ensure th at x and y do not have the value a w o r d - The condition S( 4 , 7 ^ y ; ^ g / ; * ; * ) is satisfied only if x and y neither have the same value, nor have the value 4 ? and x matches the first element in the existing chain. Ideally, the value held by y should not be used by any other objects existing in S. This restriction can be expressed in CSL as: ~iS(y, *;*;*), y; *; *). In CSL+, however, this is impossible. But since the main purpose is to avoid forming cycles in the chain, we use two d e le te operations to delete any objects whose value for either attribute A x or attribute A 2 is the value held by y. This could result in shortening the chain and leave some unusable portions of the chain. The unusable.port ions will not, however, interfere with the simulation. Also, repeatedly applying Texpanci will possibly create more than one words in the encoding of the tape. However, only the first of these words will be the actual word used as the input for M. The symbol # is used as the delimiter between words. 86 • ^startjvf simply sets the flag for the simulation phase and inserts the starting state and the blank symbol at the beginning of the chain, to serve as the left boundary for the tape: -^start_A/ ~ m o d ify ^ , r w ord, r comp), modify(5', {A\ = < j:} , {A 3 = = s}). • Now we construct a transaction Ts(P}a)=(qtb,move) for each transition S(p, a) = (q, b, move). It simulates steps under the transition. Ts{p,a)={q,b,move) h -as six variables x4, x2, x3, x4, y , and z, where xi, x2, x3, and X 4 are matched to three consecutive positions (elements) in the chain such that the middle position has the head of M and the current state, and the variables y and 2 will be then matched to the tape symbols in the left and right of the head position. If move = L, then Ts^ a)={qAL){xx, x2, x3, x4, y, z) = 5(rc o m p ), S{x 4\x 2,y \ - ) , S(x2; x3; a; p), S(x3; x4; 2 ; - ) S(l c o m p ) 5 ^(xi; x 2 ;y; q), S(x2; x3 ;a;p), S(x3, x 4, z; - ) m odify(S, (xi; x2; y, - ) , (x4; x2; y; q))\ modif y(S, (x2; x3; a; p), (x2; x3; b; - ) ) ; where Tp is empty if a ^ ) 6 ; otherwise Tp also treats the symbol ^ as a blank symbol and propagates the # rightward: Tp — 5'(rc o m p ), S ( x u x 2;y; - ) , S(x 2;x3; # ;p ), S(x3; x4; - ) modify(5', (xi; x2; y, - ) , (x4; x2; y\ < ?)); 87 s(rc o m p ), S(x 1;x 2-,y,q), S(x 2;x 3 -,#]p), S{x 3;x 4; # ; ~ ) ^ modif y(S, (x3; x4; z ; - ) , (ar3; ar4; # ; - ) ) ; modify(S, (®2; ^ 3 ; # ;p ), (ar2; ^ 3 ; & ; - ) ) . s(r c o m p ) ? 5 ( ® i ; x 2;j/;g), S(x 2]x 3;#;p), S(x 3;x4 - ) The second conditional update in Tp propagates the symbol # . The transactions for move = R or S are defined analogously. • When the halting state is reached, the transaction T s t a r t _ m i g will be the only applicable transaction; it essentially creates an object in the component G and migrates it to the role set specified by the first symbol of the accepted word. (Recall that it is assumed that M does not erase the input word.) -^start_m ig has four variables: xu x 2,x 3 which are matched to the first three elements in the chain, and y which is matched to the symbol in the second tape square (the first contains the blank symbol). Tstart_xnig(2;i 5 x 2, x3, y) = £(*,*, * , h), S(rc o m p ) ► modif y(5', rc o m p , T ^g) S C l m i g ) , S( 4 ;xi, f c * ) , 5(®1 ;x 2 ;c 1 ,y;*), S(x2, x3; *; *) c re ate (i? , 0); k mig({R},oj(c1),0,0); 5 ^ ) , 5(4;xi ; ]6;*), 5(xi;ar2;c„,y;*), S(x2] x3; *; *) c re a te (i? , 0); mig({i?},u;(cn),0, 0); 88 3(TnuS), 5(4; a?!; s(xi; x2;y, *), S(x 2-,x3 -,*i*) S ( r ^ s), 5(4; ®i; fc*), S(x 1-,x2;y;*), S(x2- x3] *; *) S f l W , Stfw, |6;*), S (z i;z 2;?/;*), S(x 2;cc3;*;*) d e le te (5 , (*; *; *; 4 )) m odify(5, (4; xx; ]6 ; *), (*; *; *; 4)); m odify(5, (aq; x2; y, *), (*; *; *; 4)); modify(S, (x2; x3; *; *), (4 ; *; *; *)); where mig is a sequence of atomic updates which migrate objects satisfying conditions specified by the third argument from the source role set (specified by the first argument) to the target role set (second argument), and finally perform a modification on the object according to the condition in the fourth argument. Since the testing condition literals do not contain literals on classes in the component G, the above abbreviation is possible and the semantics is well-defined. Note that if the accepted word is empty (has length 0), then no object will be created into R ; if the word has length 1 , the created object will be deleted when Tjnjg (described below) is executed. • Finally, the transaction Tmig migrates objects in the component G using the word encoded by the objects in S. T-^g has two variables x i,x 2 which are to be matched to the first two elements in the chain. Depending on the value under attribute A 3 of the first object in the chain, the transaction migrates all objects in the component G to the corresponding role set or deletes them if the blank symbol is encountered. After doing this, g deletes the first object in the chain and attaches the 4 symbol to the second object so th at it becomes the first. Tnag(xi,x2) = 89 <S'(rm ig), 5(4; Xi, Cl; *), 5(xi; x2; *; *) — > migto(cu(c1)); 5(rm ig ), 5(4; Xij cn; *), 5(xi; x 2; *; *) -» migto(a;(cn)); 5(rm ig ), 5(4; xi] J 6 ; *), 5(xj; x2; *; *) -> d elete(i2 , 0) ^(rmig), 5(4; xi; *; *), 5 (x 2; x2; *; *) -> m odify(5, (aq; x2; *; *), (4; x2; *; *)) 5(rm ig ), 5(4; Xi; *; * ),5 (4 ;x 2; *; *) — > d e le te (5 , (4; X i,^ x2; *; *)). where migto(t<;(cj)) (z G [l.-w]) is a sequence of updates which m igrate all objects in the component G to the role set o>(c;), in a. way similar to mig. Actually, it generalizes all objects to the isa-root R and “invoke” the se quence mig({-R},a;(cj), 0, 0). The condition for the sequence migto(o>(c,-)) is an abbreviation of having the same condition for each individual update in the sequence. Since no updates in the sequence will update objects in 5, the testing condition will not be changed. The semantics for the abbreviation is well-defined. From the construction, it is easy to verify that the only m igration patterns generated by T for objects in the component G are words accepted by the Turing machine M ; and for every word accepted by M , there is a sequence of applications of transactions in T which will generate a migration pattern being the accepted word with a prefix for some k. □ Each lazy inventory can be the lazy inventory of some transaction schema T: C o ro lla ry 5.3: Let D be a database schema containing at least two weakly- connected components G and 5, where 5 has at least four attributes. Then for every recursively enumerable lazy inventory L C there is a CSL (CSL+) transaction schema T such that £ lazy( T ,G) =uj(j> - INIT(L ■vt). □ D efin itio n : Let E be a alphabet and X, Y two languages over E. The left quotient of Y by X , denoted by X _1Y, is the language {z | 3x G X ,x z G Y}. 90 T h e o re m 5.4: Let D be a database schema containing at least two weakly- connected components G and S, where G contains at least two classes and S has at least four attributes. Then for every recursively enumerable (non-lazy) immediate- start inventory L C there exists a CSL+ (CSL) transaction schema T , such that - INIT(L ■ u>;). P ro o f: Let u> i, u> 2 be two different role sets on G. The proof is similar to the proof of Theorem 5.2. The only difference is that an object is created by the transaction ?init into the component G and has the role set u?i. If computation halts, the object is migrated to u> 2 first and then follows the pattern specified by the accepted word. Hence, the migration patterns of T are padded with u > * u > 2- O For lazy and imm ediate-start inventories, the following easily follows. C o ro lla ry 5.5: Let D be a database schema containing at least two weakly- connected components G, S, where G contains at least two classes and S has at least four attributes. Then for every recursively enumerable lazy imm ediate-start inventory L C 0 ^ , there exists a CSL+ (CSL) transaction schema T , such that: {“ i ^ r ' d S T , G) = INIT(L ■ wt) for some u>i,u>2 £ 1 = 1 It follows that: C o ro lla ry 5.6: There exist nonrecursive migration inventories for CSL and CSL+ transaction schemas. □ The above results imply that it is undecidable to check soundness and complete ness of CSL+ (CSL) transactions schemas with respect to inventories in nontrivial cases. C o ro lla ry 5.7: The following are undecidable: 91 1. a CSL+ (CSL) transaction schema is sound with respect to a nonempty in ventory; 2. a CSL+ (CSL) transaction schema is complete with respect to an inventory which does not contain all possible migration patterns. □ In the cases of immediate-start inventories, padding with some regular patterns is used, while the computation of a Turing machine is simulated. Thus, each word in the recursively enumerable set is produced with a padding. W hat if padding is not allowed? The exact characterizations of expressive power remain open at this time. In the remainder of the section, we provide a partial answer to the charac terization of immediate-start inventories without padding. We show that each context-free inventory can be generated by some CSL+ transaction schema with out padding. We first illustrate this through an example and then generalize the argument to arbitrary context-free inventories. E x a m p le 5.8: Let L = {a*& | i,j > 0,i = j} be a language. Then both L and INIT(L) are context free [HU79]. Consider INIT(L) as an im m ediate-start m igration inventory. Suppose the database schema has four classes R, Pa, Pb, S where Pa isa R and P \, isa R. The class S has two attributes A, B , to hold a chain which serves as a counter. We now construct a CSL+ transaction schema T which generates INIT(L) as an imm ediate-start inventory. T consists of the following transactions: 1. the transaction Tstait clears the database and creates an object into Pa. Tstart = delete(i?,0); delete(5', 0); create(5, {A = < { : , B = $}); create(i2, 0); s p e c ia liz e ^ , Pa, 0, 0). 92 2- ^expand will expand the chain represented by the attributes A and B : Texpandfa, y) = Pa(0), 5 (4 ,^ x, ^ y; x, ± y) -* ■ | create^ ’ 3. Tstartb wiU m igrate objects from Pa to Pi,: Tstarth = generalize(Pa, R, 0); specialize(i2, _P j, 0, 0). 4. Finally, Tshrink will reduce the chain while objects are staying in j P & : ^ sh r in k : y) = P & (0),5(< j:,^ a:,^ y),5(s;y) — > modify(S, (4; »),(*; 4)); Pb($ ),S(x,^ y \4,^ ar, ^ y),S{x-,y) -»• m odify(5, (a; y), (4;y))j J Pb( 0 ),5 ( x ,/ y;4,^ y),5(4;y) -> d e le te (5 ,(*; 4)). Note that Ts hnnk deletes an object every time it is executed and changes the database. Since the number of objects in S is at most the number of times the object stays in Pa, it can be verified that INIT(L) is generated. □ We now present the result on the immediate-start migration inventories without padding. T h e o re m 5.9: Let D be a database schema containing at least two weakly- connected components G, S, where S has at least three attributes. Then for each context-free set L C Qq, there is a CSL+ transaction schema T: Amm(T, G) = INIT(L ■ u,;). P ro o f: Suppose D — (C , isa, A) such that D consists of two weakly-connected components, G and S', where S' is a single class with three attributes Ai, A 2, A 3. Since L is context free, there is a context-free grammar Gl for L which is in Greibach normal form [HU79]. T hat is, every production rule has the form: N — > a ex. where a is a terminal and a is a string of nonterminals. It is assumed that the terminals and nonterminals of Gl are all constants in U. The transaction schema consists of the following two kinds of transactions. 93 1. For every production rule of form No — + cN\ ■ ■ ■ Nk, we construct a transaction TNa-tcN-i-Nk, which has k variables x, (i < E [1..&]) representing a chain of length k (to be inserted into the database). length k with index elements x i ,... ,Xk and having I\\...., Nk as the values for attribute A 3. Also, Tincrem eilt deletes elements in the original chain which the chain. The “simulation” still works since it still generates a prefix of L. The above update can be accomplished in a way similar to the construction used in the proof of Theorem 5.2. 2. Suppose Ns is the start symbol for Gl . Then for each production rule N s — ► ca, there is a transaction T^s^,ca to initialize the database: Xi,N0) ► itiigto(iu(c)), 7lncrem ent where Tincicrnent substitutes the top element Nq in the chain by a chain of i n c r e m e n t i n c r e m e n t will cause cycle(s) due to the insertions. The deletions will possibly shorten d e le te (i? , 0 ); d e le te (S l, 0 ); c re a te (i? , 0 ); c re a te (5 , (< j;; $; N a)); It is then straightforward to verify that the transaction schema T generates the initial words of L. □ 94 Chapter 6 Applications In this section, we illustrate through two examples how the techniques and results obtained above on object migration can be applied to m any practical problems. The two examples are motivated by the transaction design methodology of the INSYDE model by King and McLeod [KM85] and the notion of “scripts” in the transaction language for the TAXIS data model developed at the University of Toronto [MBW80, NCL+87]. The essential ingredient in both frameworks is to introduce an ordering on updates to the databases. In the spirit of INSYDE, we define a transaction schema with “inflow” (informa tion flow) to be a pair (T, E) where T is a finite set of transactions and E defines an order in which transactions can be applied. D efinition: Let D be a semantic database schema. An inflow schema is a pair T “ °w = (T, E) where 1. T is a transaction schema on D\ and 2. E C T x T . If T consists of only SL (CSL+ , CSL) transactions, T inflow is also called an SL (CSL+, CSL) inflow schema. 95 Suppose T inflow = (T, E) is an inflow schema on a semantic database schema D. A sequence of transactions T i,.. .,T n is well-formed under the inflow schema 'j'mflow jf g ft for each i € [l..(n — 1 )]. W ithin this framework, it is interesting to answer questions such as “will a student currently majoring in history work in business office with salary > 35A" in the future?” and “will an airplane which belongs to United Airlines be in the repair depot at Los Angeles International Airport?” In some situations such information can be used to detect mistakes in the data to be added into a database. E x am p le 6 . 1 : Consider a database system used by an office of Immigration Service in country X. According to the immigration law, before a person entering the country with a type C visa can be allowed to immigrate, she has to go back to her own country (defined as the country she was a citizen of just before she entered the country X) and stay for at least 3 years. The transactions designed for this application have to guarantee that no one can directly change his or her status from visa type C to an immigrant. □ In the formal setting, we provide the following definition. D efin itio n : Let D = (C ,isa, A) be a semantic schema and P € C be a class in D. An atomic property over P is an expression of one of the following two forms: A = a or A — B where A, B < E A *(P) are attributes defined on P and a € U is a constant. A property is a conjunction of a set of atomic properties. Suppose P is a class in a semantic database schema D and g is an atomic property over P. Let d = (o, a, of) be an instance of D. For an object o € o(P), o satisfies the property g, denoted o |= g, if g is A — a and a(o, A) = a, or g is A = B and a(o, A) = a(o,B). Let p be a property over P. An object o & o(P) satisfies p if it satisfies all atomic properties in p. The questions discussed above can now be formulated as the following “reach ability” problem. 96 R e a c h a b ility P ro b le m : Let P, Q be two classes in a semantic database schema D , where P and Q are weakly connected, and d be an instance of D. Further suppose pv,pq are two properties on P and Q (respectively). For a given inflow schema T lnflow , can an object o in class P in the instance d satisfying the property pp be updated by a well-formed sequence of transactions under T jnflow such that o is in class Q and satisfies the property pq after the execution? We now show that the reachability problem is decidable for SL inflow schemas but undecidable for CSL or CSL+ inflow schemas. , T h e o re m 6 . 2 : The following are true: (1) The reachability problem is decidable for SL inflow schemas. (2) The reachability problem is undecidable for CSL or CSL+ inflow schemas. P ro o f: The proof is based on extending the proofs used in Theorem 4.9 (for the language SL) and Theorem 5.2 (for the languages CSL and CSL+). Let P, Q be two classes in a semantic database schema D and pvi pq two properties over the classes P, Q (respectively). (1) To show that the reachability problem is decidable for any SL inflow schema T infiow = we present a construction which is obtained from the construction of the migration graph for the SL transaction schema T with a slight modification to incorporate the ordering E on the transactions. Specifically, if two classes P and Q are not weakly connected, the two properties are not reachable from one to the other. Now suppose P, Q are in the same weakly-connected component G of D. Let C be the set of all constants occurring in T , pv, and pq. We construct the set of vertices V t = {(^bP) I ^ G $Ig,P € tcc(AU(uj))} in the same way as in the proof of Theorem 4.9, except that the set C is used (instead of Const(T) used in that proof). We also use the algorithm described there to determine edges between vertices and, in addition, the name of the transaction contributed to the edge if it exists. Note that for each vertex in the graph and each of the two properties 97 PviPqi either all objects matching it satisfy the property, or none satisfy it, since the constants in the properties are used in constructing the vertices. It is now straightforward to construct a cross product of this graph and the graph (T,J5) and check if there is a path between a vertex satisfying pp to a vertex satisfying pq. (2) To show that the reachability problem is undecidable for CSL or CSL+ inflow schemas, a reduction is performed from the halting problem (of Turing ma chines) to this problem. The reduction is obtained by modifying the construction used in the proof of Theorem 5.2. Let M be any Turing machine such that the alphabet of M is (or corresponds to) the role sets of some weakly-connected com ponent G in D. Suppose further that G has at least two different nonempty role sets u>i,u> 2- (In the case of G having only one nonempty role set, two different attribute values are used and the proof is modified to this case.) The inventory L = {cuj} U {u>*uj2 | M halts on the empty input } is now considered. Obviously L is recursively enumerable. Now let the CSL + transaction schema T be the schema constructed in the proof of Theorem 5.2. Consider the CSL+ inflow schema rpnflow _ (T, {(T, T') | T,T* € T}). Since ^ w2, there is a class Q £ ui2 — o j% (or u> i — o l> 2, in which case the role sets are switched). Let P be a class in and define pp — pq — 0. It is straightforward that pq is reachable from pp if and only if M halts on the empty input. □ Although an inflow schema allows the specification of a precedence relationship between transactions, it is not desirable to relate updates on one object to updates on a different object. In the following, a refined model is considered, which permits the specification of relationships on updates on each object. The model is motivated by the construct of script in TAXIS. Here the focus is still on the reachability problem. Syntactically, a script schema is exactly the same as an inflow schema: a set of transaction with an ordering. The difference lies in their semantics. For inflow schemas, the ordering is interpreted globally. For script schemas, however, it is interpreted at the level of objects. 98 D efinition: Let D be a semantic database schema. An script schema is a pair T script = <T,E) where 1. T is a transaction schema on D\ and 2. E C T x T . If T consists of only SL (CSL+, CSL) transactions, T script is also called an SL (CSL+, CSL) script schema. Suppose T scnpt — (T, E) is a script schema on a semantic database schema D and d is an instance of D. If T is a ground transaction, T properly updates an object o € O if either o occurs in one of d or [j[Tj(d) but not the other, or RoleSet(o,d) ^ R,oleSet(o, fC j(d )), or some attribute value of o in d is changed in !jr]](d). Now let o € O be an abstract object. A sequence of transactions Tu ...,T n obeys the script schema T scnpt for o (under d) if there is a subsequence 1 <* !<• •• < ij < n, such that (T^, T,fc + 1 ) € E for k £ — 1)], and for any assignments «x,..., an, o is not properly updated by any ground transaction TP[aP ] for p € {1,.. ■ , n} — {ik \ 1 < k < j}. Finally, a sequence of transactions obeys a script schema T script (under d) if for each object o in O, it obeys the script schema for o (under d). By a reasoning similar to the previous theorem, we can show the following results on reachability with respect to script schemas. A more refined characterization of scripts is shown in [DS90]. T h e o re m 6.3: The following are true: (1) The reachability problem is decidable for SL script schemas. (2) The reachability problem is undecidable for CSL or CSL+ script schemas. □ 99 Reference List [AH87a] [AH87b] [AV8 8 a] [AV8 8 b] [AV89] [Bee8 8 ] [BMSW89 [BR84] S. Abiteboul and R. Hull. IFO: A formal semantic database model. ACM Trans, on Database Systems, 12(4):525-565, 1987. T. Andrews and C. Harris. Combining language and database ad vances in an object-oriented development environment. In Proc. Conf on OOPSLA, pages 430-440, 1987. S. Abiteboul and V. Vianu. The connection of static constraints with determinism and boundness of dynamic specifications. In C. Beeri, J.W . Schmidt, and U. Dayal, editors, Proc. 3rd int. conf. on Data and Knowledge Bases, pages 324-334, Jerusalem, Israel, June 1988. S. Abiteboul and V. Vianu. Equivalence and optimization of relational transactions. Journal of the ACM, 35(1):70-120, January 1988. S. Abiteboul and V. Vianu. A transaction-based approach to relational database specification. Journal of the ACM, 36(4):758-789, October 1989. D. Beech. A foundation for evolution from relational to object databases. In J.W . Schmidt, S. Ceri, and M. Missikoff, editors, Ad vances in Database Technology — ED BT’88, volume 303 of Lecture Notes in Computer Science, pages 251-270. Springer-Verlag, 1988. A. Borgida, J. Mylopoulos, J.W . Schmidt, and I. Wetzel. Support for data-intensive applications: Conceptual design and software develop ment. In R. Hull, R. Morrison, and D. Stemple, editors, Proc. 2nd Int. Workshop on Database Programming Languages. Morgan Kaufmann Publishers, 1989. M.L. Brodie and D. Ridjanovic. On the design and specification of database transactions. In M.L. Brodie, J. Mylopoulos, and J.W . Schmidt, editors, On Conceptual Modelling, pages 277-306. Springer- Verlag, 1984. 100 [Bro81] [CFP84] [CH74] [CM84] [Cod70] [DL90] [DS90] [GJ79] [HJ90] [HK87] [HM81] [HU 79] [KM85] M.L. Brodie. On modelling behavioural semantics of databases. In Proc. Int. Conf. on Very Large Data Bases, pages 32-42, 1981. M.A. Casanova, R. Fagin, and C.H. Papadimitriou. Inclusion depen dencies and their interaction with functional dependencies. Journal of Computer and System, Sciences, 28:29-59, 1984. R.H. Campbell and A.N. Habermann. The specification of process syn chronization by path expression. In E. Gelenbe and C. Kaiser, editors, Proc. of Inti Symp. on Operating Systems, volume 16 of Lecture Notes in Computer Science, pages 89-102. Springer-Verlag, 1974. G. Copeland and D. Maier. Making Smalltalk a database system. In Proc. ACM SIGMOD Int. Conf. on the Management of Data, 1984. E.F. Codd. A relational model of data for large shared data banks. Communications of the ACM , 13(6):377-387, June 1970. G. Dong and Q. Li. Object migration in object-oriented databases. Manuscript, 1990. G. Dong and J. Su. Object behaviors and scripts. M anuscript, 1990. M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, New York, 1979. R. Hull and D. Jacobs. On the semantics of rules in database pro gramming languages. Technical report, USC Computer Science Dept., 1990. R. Hull and R. King. Semantic data modeling: Survey, applications, and research issues. ACM Computing Surveys, 19(3):201-260, 1987. M. Hammer and D. McLeod. Database description with SDM: A se mantic database model. ACM Trans, on Database Systems, 6(3):351- 386, 1981. J.E. Hopcroft and J.D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison Wesley, 1979. R. King and D. McLeod. A database design methodology and tool for information systems. ACM Trans, on Office Information Systems, i 3(1 ):2-21, January 1985. [LR89] [LRV8 8 ] [Mai83] [MBW80] [NCL+87] [Per90] [U1182] [Via87] [Via8 8 ] C. Lecluse and P. Richard. Modeling complex structures in object- oriented databases. In Proc. ACM Symp. on Principles of Database Systems, pages 360-368, 1989. C. Lecluse, P. Richard, and F. Velez. O2 : An object-oriented formal data model. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 424-433, Chicago, June 1988. D. Maier. The Theory of Relational Databases. Computer Science Press, Potomac, Maryland, 1983. J. Mylopoulos, P.A. Bernstein, and H.K. Wong. A language facility for designing database-intensive applications. ACM Trans, on Database Systems, 5(2):185-207, June 1980. B.A. Nixon, K.L. Chung, D. Lauzon, A. Borgida, J. Mylopoulos, and M. Stanley. Design of a compiler for a semantic data model. Technical Report CSRI-44, Dept, of Computer Science, Univ. of Toronto, 1987. B. Pernici. Objects with roles. In Proc. of Conf. on Office Information Systems, pages 205-215, 1990. J.D. Ullman. Principles of Database Systems (2nd edition). Computer Science Press, Potomac, Maryland, 1982. V. Vianu. Dynamic functional dependencies and database aging. Jour nal of the ACM , 34(l):28-59, January 1987. V. Vianu. Database survivability under dynamic constraints. Acta Informatica, 25:55-84, 1988. 102
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11255785
Unique identifier
UC11255785
Legacy Identifier
DP22835