Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PA TTERN MATCHING BY RS-OPERATIONS: TOW ARDS A UNIFIED APPROACH TO QUERYING SEQUENCED DATA by Xiaoyang Wang A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In P artial Fulfillment of the Requirements for the Degree D O CTO R OF PHILOSOPHY (Com puter Science) May 1992 Copyright 1992 Xiaoyang Wang UMI Number: D P 22856 All rights reserved INFORMATION TO ALL U SE R S The quality of this reproduction is d ep en d en t upon the quality of the copy submitted. In the unlikely even t that the author did not sen d a com plete manuscript and there are m issing p a g es, th e se will b e noted. Also, if material had to be rem oved, a note will indicate the deletion. Dissertation Publishing UMI D P 22856 Published by ProQ uest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © ProQ uest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United S ta tes C ode ProQ uest LLC. 789 E ast E isenhow er Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 - 1346 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, w ritten by Xiaojrang. W ang# ........... under the direction of hXs Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillm ent of re quirem ents for the degree of D O C TO R OF PH ILO SOPH Y Dean of G raduate Studies Date DISSERTATION COMMITTEE Chairperson ? h , 9 . c p $ ^ 2 - V 2 2 .4 ^ 3 7 2 £ f e M r To m y dear wife Rong-Rong and to our little son William Acknowledgment s Over the years of my study at USC, my advisor, Professor Seymour Ginsburg, encouraged me, guided me and, most im portantly, allowed me to share his profound philosophical insights into scientific studies. His ever-running Theoretical D atabase Seminar and its long-time ardent participants, Professor Richard Hull, Dr. Guozhu Jianwen Su, Dr. Dan Tian, Dr. Stephen K urtzm an and Yongkyun Cho, in numerous aspects. i Dong, Dr. helped me 1 1 1 Contents A ck n ow led gm en ts L ist O f F igu res A b stra ct 1 In tro d u ctio n 2 R egu lar S eq u en ce O p eration s 2.1 Sequence Mergers and E x tr a c to r s ......................................... 2.2 Generic a-Transducers and Rs-Operations ........................ 2.3 Equivalence of Generic a-T ran sd u cers................................... 3 C o m p o sitio n and D eco m p o sitio n o f R s-O p eration s 3.1 Composition of R s-O p eratio n s................................................ 3.2 Decomposition of M e r g e r s ....................................................... 3.3 Decomposition of E x tra c to rs .................................................... 4 R s-O p era tio n s on an E x ten d ed R elation al D a ta M od el 4.1 An Extended Relational D ata M o d e l ................................... 4.2 S-algebra: an Algebraic Query L a n g u a g e ............................ 4.3 A Sequence Logic: SL .............................................................. 4.4 S-Calculus: a Calculus-like Query L a n g u a g e ..................... 4.5 Safe s -C a lc u lu s ............................................................................ 5 T w o A d d itio n a l A p p lication s o f R s-O p eration s 1 5.1 Rs-Operations in an Extended D a ta lo g ............................... 5.2 Rs-Operations and Nested S e q u e n c e s................................... j 6 C on clu sion J R eferen ce List I S y m b o l In d ex List Of Figures 2.1 The merging of abd and ce according to a ia ia 2a ia 2.................................... 6 2.2 O perations performed by a merger ................................................................ 8 4.1 Tour schedules............................................................................................................. 69 5.1 Quick sort in S-datalog........................................................................................... 101 5.2 A structure for b o o k s .............................................................................................I l l I I 1 I i V Abstract A simple “p attern m atching” mechanism is introduced to unify th e specification of the sequence operations in database queries. Using this m echanism, a family i of sequence operations, called “rs-operations,” is defined and its properties exam-j ined. In particular, it is shown th at (i) the family includes m ost of the “n atural” , sequence operations appearing in database queries, (ii) the operations in the family j I are characterized by a mechanical device called “generic a-transducer,” and (iii) the “expressive” power of the family exceeds each of its finite subsets. ! I The applicability of the family to database queries is illustrated through four' query languages. The first is an algebraic query language on an extended relational d ata model. (The relations of this model are tables, with each entry occupied byj a sequence of basic elements.) The second is a calculus-like query language on th e above d ata model. (The calculus-like query language is based on the sta n d a rd , first-order logic plus special predicates “converted” from the rs-operations.) T h e 1 third and fourth are an extended DATALOG and an algebraic query langauge on a nested-sequence d ata model. | l I i f i ! i I I r Chapter 1 Introduction .It is generally accepted th a t sequences (or lists) are useful in m any database ap plications [ABD+89, SSU90, The90]. Because of this, “new-generation” database -{systems, e.g., EXODUS [CDV88], Galileo [AC085], 0 2 [BCD89, Deu91] and Vbase [Ont87], usually support “sequenced data.” In order to query the sequenced data, these systems require appropriate sequence operations. However, the sequence op- Jerations in the systems are usually chosen in an ad hoc m anner. Also, the essential i < ■properties, such as “expressiveness,” “completeness” and “independence,” of the selected operations are not well understood. A m ajor underlying cause for this situ a tio n m ay be the lack of a unifying theoretical mechanism for defining and studying m ost, if not all, of the desired operations. The purpose of this thesis is to show th a t “p attern m atching” of a simple kind, namely “rs-operations,” can be used to j i specify and investigate many of the “natural” sequence operations in database query ^languages. Sequence operations in database query languages tend to be “high-level” in na tu re (in contrast to those in programming languages such as C, Lisp and Prolog). i jFor exam ple, in the EXCESS algebra of EXODUS [VD91], the operations HEAD 1 I --------------- ---------------------------------------------------------------------------------------------------------- Lnd SUBARRAY are employed to obtain the head and a subinterval of a sequence, ! respectively. Such operations specify results of “processes” rather than the processes themselves. The specification of the results of such high-level sequence operations can be viewed as a type of “pattern m atching.” For instance, let u be a sequence of ilength at least 1. Clearly, one of the sequences in the regular set [HU69] a \a ? j, say aiQ 2 , is of the same length as u. Therefore, “m atches” u and au “m atches” the first element of u. Thus, aqa^ can ^ > e “used” to retrieve (by “pointing a t” o i) the first element of «, i.e., the HEAD operation on u. The above p attern m atching mechanism can be found in text editors “vi” and “em acs” (usually in UNIX systems). Both editors use regular expressions in their isearch-and-substitute commands. In these commands, a pair of special symbols Retrieve a portion of the m atched sequence. The rs-operations introduced in this jthesis are similar to, but more powerful than, this “retrieving” mechanism. i A formal treatm ent of sequence operations, based on p attern m atching, is ini tiated in this thesis. It is intended as a unified approach towards specifying and 'studying operations on sequenced data. As a first step, a family of operations ( “rs- ioperations”), which consists of “mergers” and “extractors,” is defined based on a simple p attern m atching mechanism with “regular sets” (in formal language theo ries) as “p attern languages.” This family (i) includes m ost of the “natu ral” sequence loperations, (ii) is easy to extend, and (iii) is readily applicable to database query languages. P attern m atching has been used extensively in text processing, e.g., [AHU74], and (although in a different manner) in information retrieval systems, e.g., [MH89]. One paper employing pattern m atching in database queries is [PT86], where regu- I lar patterns serve as “maskings” in an extended N F2 query language to deal with 2 sequences (as well as sets). The roles of pattern m atching in these systems act as conditions. The present investigation extends the use of p attern m atching into the [domain of sequence operations. The rest of this thesis is divided into five chapters. C hapter 2 introduces the central concept of the study, namely rs-operations, and defines a type of mechanical device, called “generic a-transducer,” to characterize the operations. C hapter 3 jconcerns the composition and decomposition of the rs-operations. In particular, it is shown th a t there is no finite subset of rs-operations which yields all rs-operations under composition. In order words, an infinite num ber of rs-operations are needed to obtain the “expressive power” of the rs-operations. C hapter 4 and C hapter 5 exhibit the use of the rs-operations in database query languages. Specifically, C hapter 4 presents an extended relational d ata model and two query languages, called “s- algebra” and “s-calculus,” over the data model. A “safe” subclass of s-calculus is shown to be “equivalent” to s-algebra. C hapter 5 briefly illustrates the application (and extension) of the rs-operations in an extended Datalog and a “nested sequence” j Idata model. The last chapter, C hapter 6, presents some concluding rem arks. 3 Chapter 2 Regular Sequence Operations In this section, we first introduce some basic terminology and the notion of p at tern m atching. Using regular sets as pattern languages, we then define two classes of sequence operations, namely “mergers” and “extractors.” Finally, we character ize these operations in term s of a type of a-transducer and study the equivalence problem of a-transducers, mergers and extractors. j 2.1 Sequence Mergers and Extractors In this section, we define two families of sequence operations: sequence “m ergers” and “extractors” . These operations will play an essential role in our approach to querying the databases involving sequences. Before presenting the sequence operations, we recall some elem entary concepts related to sequences and describe a p attern m atching mechanism called “simple sequence m erger.” •A sequence v of length n > 0 over a nonem pty set A of elements is a mapping I ifrom { 1 ,..., n} to A, and is usually w ritten as ay - ■ • an where a; = u(?) for each 1 < i < n. The symbol £ represents the sequence of length 0 (i.e., the empty 4 'sequence). A subsequence of v is either the em pty sequence £ or a sequence of the form a4 l • • • ajk such th a t 1 < ii < i2 < • • • < < n, and a suffix (prefix, respectively) of v is either the em pty sequence £ or a subsequence of v of the form ak ■ • • an respectively) for some 1 < k < n. Sequences are denoted by u, v and w etc., possibly with subscripts. Given a sequence v, len{v) denotes the length of v. Let u and v be two sequences of length m and n, respectively. Then uv, called the concatenation of u and v, is the sequence u( 1) • • • u(ra)t>(l) • • • v(n). For U and \V two sets of sequences, let UV = in U and v in V}. Also, let U° = {e}, |Uk = Uk~1U for each k > 0, and U* — [jfL0 U \ For each set A of elements, each a in A can be viewed as the sequence a of length 1. Therefore, A* is the set of all finite-length sequences over A , i.e., A* — {v\len(v) > 0 and v(i) € A for all I ,1 < i < len(v)}. Let A\ and A 2 be sets of elements and h a mapping from A\ to A\ such th at h(uv) = h(u)h(v) for all u and v in A*. Such a mapping h is called a homo morphism (from A\ to A j). Let / be a mapping from Ax to A2 U {e}. Then / I can be extended to the homomorphism from A* to A\ such th a t f(e) = e and f(a\ ■ • • an) = /( « i) • • • f(a n) for each a\ • • • an in Aj. Henceforth, a hom om orphism j h from A\ to A *2 is defined by stating its value on each a in Ax- i Turning to the “simple sequence mergers,” we first informally describe a merging process. Suppose we have a sequence w = aqoqa^aia^ (called a pattern) of special symbols aj and a2- Intuitively, this sequence gives the nam e “ax” to positions 1, i .2 and 4, and “0 : 2” to positions 3 and 5. Now let ui and u2 be two sequences. A J (sequence u is a “merging” of Ui and u2 according to w if the subsequence of u formed by the elements a t the cq-positions is m for i = 1,2. Thus, u = abode is a merging of abd and ce according to w since the elements of u at the o l\ positions (i.e., positions < 1 , 2 and 4) is abd, and the elements at the oppositions (i.e., positions 3 and 5) is ce. Figure 2.1 below illustrates the above merging. From -p o sitio n s a b d < — Equal to u\ From a p p o s itio n s I c e « — Equal to uo l i The p a tte rn w a i a j 0 2 a j a7 | , . , , The merging r e s u lt a b Figure 2.1: The merging of abd and ce according to aq aq a^aq cp To formally define the above merging process, we first present two prim itive types of m appings on sequences. N o ta tio n Let A\, . . . , A k be k > 0 nonem pty sets of elements and A = A\ x • • ■ x Ak. Then (i) for each 1 < i < k, let 7r8 - be the homomorphism from A* to A* defined by 7r,(e) = ai for each e = ( a i ,...,a ^ ) in A. For each subset L of A*, let *z(L) = UueL*i(u). (ii) for each 1 < i < k and a in Ai, let cr$;=a be the hom om orphism from A* to A* defined by < r$ i=a(e) = e if 7r;(e) = a and < 7 $ ;=a(e) = e otherwise for each e in A. For each subset L of A*, let o% i=a(L) = <r$i=a(u). Intuitively, 7r,-(u) “picks up” the T th component in each element of u and cr$i=a(u) is the “largest” subsequence of u in which the T th com ponent of each elem ent 1 I .is a. For example, let u = (a, b)(c, d)(a, e). Then 7 r2(u) = bde and < r$ 1=a(n) = J(a,6)(a,e). Notice th at the mappings 7 r and cr defined above are similar to the 1 I |operations “projection” and “selection” in the relational algebra ([U1188]). I 6 For each sequence U\ in A\ and u2 in A 2 such th at len(ui) = len(u2), let u-i®u2 be the sequence v over A\ x A 2 defined by v(i) = u2(i)) f°r each 1 < * < len(ui). Henceforth, we assume a fixed infinite alphabet X ^ (whose elem ents are denoted by a, 6, etc., possibly subscripted) and a fixed, countably infinite set of special symbols = {e*i|* > 1}- For each n > 1, we will use Vn to denote the set consisting of the first n elements of V ^, i.e., Vn = {ou,. - -, < xn}. We are now able to formally define the notion of a simple merger. D e fin itio n Let n > 1 and w be in V*. Then is called a simple merger. For sequences u \ , . . . , un in E|U> let ([n;]re ( « i,... ,un) — {u e = 7 r2<r$i=ai(w ® u) for all 1 < i < n}. jSuch a m apping from X ^ x ... x X ^ (n times) to 2S °° is called a simple merger f t [mapping. For example, [ { Q ! 1Qfio2a i a 2| 2(a6d, ce) = {abcdej (cf. Figure 2.1). Notice th a t the result of a merger can be empty. For instance, be) = 0. ! It is easy to see th a t there is at most one sequence in the set |[iy]n(i/1, . . . , wn) for all w and u; (1 < i < n ). [Indeed, let v\ and v2 be both in [u>]n(wi, • • • > wn) and Vi(k) 7^ v2(k) for some k. Suppose w(k) = on and there are I appearances of cq in w before the £-th position, i.e., there are I elements in the set {* < k\w(i) = a j . Since w ,- = 7r2a$1=cti(w ® iq), obviously, tq(/ + 1 ) = V\(k). Similarly, U{(1 + 1 ) = v2(k). Hence, v\(k) = u*(/ + 1) = v2(k), which is a contradiction.] Also, there are cases where [[u>]n( u i,. . . , un) — 0. (For example, [aq a 2G:ia2]] 2(a, be) = 0.) If \u is in [u>]n(wi, • • •, un), then u is said to be a merging of u\, ..., un according to I tw. Note th a t there exists a merging of « i, . . . , un according to w if and only if len(ui) = len(<7$x=0li(w)) for each 1 < i < n. 7 Simple mergers perform very simple tasks, namely, merging fixed-length se quences. This is partly because only a single p attern is used in a simple merger. In the following, the single patterns in simple mergers are expanded to sets of patterns. As will be seen, such expansions yield powerful sequence operations. ! D e fin itio n Let n > 1 be a positive integer and W a subset of V*. Then \ W Jn is called an n-ary (sequence) merger. For subsets L\, . . . , Ln of let u U ■ ■ ■ U w £W ui£Li un€L„ Such a m apping from 2S °° x • • • x 2S °° (n times) to 2S °° is called an (n-ary) merger mapping. Thus, an n-ary merger defines a m apping from n sets of sequences (over Doo) to a single set of sequences (over Sqo). Intuitively, u is in JIT]n( T i,. . . , L n) if u is the merging of some n, in Li (1 < % < n) according to some w in W . Figure 2.2 illustrates how a merger works. U i in Li -+ ui ... un I ... I Merge U{, 1 < i < n , J J . according to w Output ! Figure 2.2: Operations performed by a merger As an example, let W = Ufc>o{a i Q,2 • • • appears k times}, L\ = {a6, abed} and Li = {cd, e /, cdef}. Then u — acbd is in |[kF]2(Ti, L 2 ) since ab and cd are j in Li and L 2, respectively, and u is the merging of ab and cd according to a 1a 2 a 1a 2 w in W Merger in W . Similarly, it is easy to see th a t aebf, acbdcedf are also in L 2). There fore, [W J2(T i, L 2) = {acbd, aebf, acbdcedf}, i.e., the set consisting of the “perfect” shuffles of the sequences from L\ and L2. Consider now the “inverse” of a merger. For each merger [[VF]r a and subset L of EJo, let = {(wi> • • • 5 W «)l fhere exists u in L such th at u is in flW]n(ui, • • •, un)}. In other words, given a set L of sequences, ( « i,. . . , un) is in ^ there is a u in L such th a t u is a merging of «i, . . . , un according to some p attern in W . The inverse of an n-ary merging operation maps a set of sequences to a set of n-tuples of sequences. To retrieve the i-th components of these tuples for some 1 < i < n, the mapping tt , ■ is used. Specifically, we have: D e fin itio n Let [VFjn be an n-ary merger and 1 < i < n. Then ^[[VF]” 1 is called a (sequence) extractor. For each subset L of let 7 r![VFj~1(T) = ~1(jL)). Such a m apping on is called an extractor mapping. To illustrate, let W = Ufc>o{°;i Q : 2Q:2 • * • < ^ 2 1^2 appears k times } and L = {abc, defgh}. Then tti|1/ F ] J 1(T) = {u,d}, i.e, the set consisting of the first element of |the given sequences. I J Clearly, each u in '^i^W'\~1(L) is a subsequence of some sequence in L. Also, u is in 7r[lLrJ~1(T) if there exist v in L and w in W, with len(w) — len(v), such th at u = T r2cr$1= = ai(w < g ) v). Each sequence merger and extractor defined above consists of an arbitrary set W I of patterns. Obviously, the mergers and extractors thus defined are very powerful, and m ay be hard to compute an d /o r represent. (For example, let W = {a\ ■ • ■ a i | 9 appears n times such th at there is a solution for the equation x n + yn — zn}. Then u is in |h h ] i(S ^ ) if and only if there are integers a, b and c such th at aten^ + y,en{u) _ cien(u) ^ -p or practiC al purposes, the set W should be tractable. In this study, W will be restricted to the “regular sets” (as defined in form al language l Itheory). There are two m ajor reasons for this: Regular sets describe m ost of the Inatural patterns encountered in practice; and one of their representations, namely “regular expressions” , is easy to use in query languages. Traditionally, the regular sets are defined by the means of the well-known “finite state acceptors” (or “finite autom ata” [HU69]). i D efin itio n A nondeterministic finite state acceptor (fsa) is a 5-tuple (K , E, 6,p0,F), jwhere K is a set of states with po in K and F Q K, E is a finite set of elem ents and i 1 6 is a subset of K x E x K. i Let A — (K, E, S,po, F) be an fsa. Each sequence p = (po,ao,pi) ■ ■ • (p ^ -i, a^ -i, Pk) for some k > 0 (note th at p = e if k = 0) is called a run of A if (pj,aj,pj+ 1) iis in S for each 0 < j < k and is in F. Such a sequence p is abbreviated I jas rio<j<fc(Pj,ajiPj+i)- [In general, IIo<j</&j is an abbreviation of the sequence bo ■ ■ ■ for some I > 0 and the em pty sequence e for / = 0.] A subset L of E* is said to be accepted by A if there is a run p of A such th at 7r2( p ) = w for each w in L. The set of sequences accepted by A, denoted by T(A), is called a regular set. For each n > 0, each set accepted by an fsa of the form (K, Vn, S,po, F) is called a regular set over Vn. By Kleene’s Theorem (see Theorem 3.10 of [HU69]), the collection of the regular sets over Vn is the smallest family containing {<*,}, where 1 < i < n. and closed under union, concatenation and *. Therefore, each regular 'set over Vn can be represented by an algebraic expression (i.e., regular expression) 10 composed of {g;}, where 1 < i < n, and the operations U, • and *, along w ith some parentheses for grouping purposes. (W hen no confusion arises, oc{ is used instead of {a,}.) For example, ( g ig 2)* is the regular expression which represents the regular set Ufc>o{a i a 2 • • • axCt2\otxO i2 appears k times}. We now formally define the central notion of this study. D efin itio n A regular sequence operation, or rs-operation, is either a merger [[Wjn or an extractor ^ [[W ]"1, where W is regular. Henceforth, W is always regular in each merger | W \n and extractor We now present several examples of rs-operations and their compositions. E x a m p les In the following, u and v are assumed to be sequences over Eoo, and Li |and Z /2 subsets of E ^ . I j (1) K a ; l 2( « ,r ) = {OT}. (2) TTi^al[a2^2 1(u ) is the prefix of u of length fc, TrafaJa^liT 1(tt) is the suffix of u of length k , and 1(u ) is the set of all prefixes of u. I j (3) 7Ti[[(gi U 1(w) is the set of all subsequences of u. (4) Let SL (u,u) = 7Ti[(giG2)*12 1(I(Q :iQ : 2)*l2(«,^))- Obviously, SL(«,u) — u if len(u) = len{y), and SL(u,u) = 0 otherwise. Let SL(L1,L 2) = (J (J SL(u,v) u^Li v£Ld2 for all subsets L\ and L2 °f ^to- Then u is in SL(LX , L 2) if and only if u is in Lx and there is a v in L2 such th at len(u) — len(v). 11 (5) Let Half(u) = SL(Prefix(u), ^[[(aia^)*]^ 1(u))i where Prefix(u) = 7 rilalaa B a^ u ). | Clearly, Half(w) returns the first half of u if u is of even length, and H alf(u)=0 if u is of odd length. For example, Half(a6cd) = {ab}. Using the above examples, it is easy to see th a t all the sequence operations defined in [GT87] can be sim ulated by our rs-operations. The rs-operations are constructed using regular sets. The regular sets (of pat- [ terns) can be “declared” by regular expressions as seen above. Furtherm ore, in J sequence mergers and extractors, the patterns determ ine the m anner in which thej I sequences are merged or extracted. Thus, the regular expressions in rs-operations \ I can be viewed as “declaring” how to merge or extract sequences and, therefore, th e ; rs-operations are also called “declarative sequence operations.” Since regular sets are defined by “mechanical devices” (i.e., fsa), it is n atural to i seek some similar devices to describe the rs-operations. Such a type of device, called j “generic a-transducer,” is presented in the next section. | I I 2.2 Generic a-Transducers and Rs-Operations In this section, we introduce a new type of mechanical device and then apply it to, characterize the rs-operations. The characterizations developed here will play an ' im portant role in later sections. I The m echanical devices are “transducers” which have m ultiple input tapes and a i single outp u t tape. They deviate from the traditional ones ([Gin75]) in th a t they do not have predefined input and output alphabets. (In this way, they are analogous i 121 to “generic procedures” as used in programming languages [Hor84].) Furtherm ore, they have no ability to change symbols, th at is, the elements in the output, if any, are the same as in th e input. In order words, these transducers can only “erase” I some symbols from the input sequences and then “shuffle” the resulting sequences into a single one. We now define the device. D e fin itio n A generic n-tape sequential transducer with accepting states, abbrevi a te d generic (n-tape) a-transducer, is a 6-tuple M = (n, K ,x , H,po, F), where (1) n is a positive integer. (2) K is a finite set (of states). (3) X is a symbol (the variable). (4) H is a subset of K x {x} x {1,..., n} x K x {x, e} (the transition ; rules) such th a t for each (pi, X?^>P2> x) in H, either x — X o r X — £- (6) po is in K (the start state). (7) F C K (the set of accepting states). If X = X f°r each (p, Xi h<hx) in H, then M is called e-free. i i The third com ponent of a generic a-transducer, x, acts as a place holder. W hen ever the device “reads” an input symbol a, it uses a transition rule w ith x replaced by a. ! Throughout this thesis, the symbol x will always and only be used as the third component of each a-transducer. Also, e will be used to denote e or e for each symbol e. (For example, x represents x ° r £, and a either a or e.) I To formally defined the behavior of generic a-transducers, we need 13 'N otation Let M — (n, K ,x , H,Po, F) be a generic n-tape a-transducer and A a set. For each h = (pi,Xi^iP 2 ,x) in H and a in A, let h[a] be the tuple (pi, a, i,p 2, a) where a — e if and only if x — £• Let H[A] = {h[a]\h in H and a in A}. The relation describing the “moves” of a generic a-transducer is now defined. N o ta tio n For each generic n-tape a-transducer M — (n, K, x, H ,p0, F), let (or h when M is understood) be the following relation on K X x . . . X E ^ (E^, appears n + 1 times): For all 1 < i < n, and v in pi and p 2 in K , 1 < t < n and a in Sqq, j ( P l , « l , ■ ■ • , “ (, • • • , « n , v ) F (P2, «1, ■ • ! « » , » « ) if (pi, a, t,p 2, a) is in i/fXoo]. Let F be the reflexive, transitive closure of K We are now able to define the mapping realized by a generic a-transducer. iD efin itio n Let M = (n, K, x, i / ,p 0, F) be a generic n-tape a-transducer and Li C for each 1 < i < n. Then M ( L \,..., Ln) = {u|(p0, u \ ,..., un, g)h* {p,£, u) for some p in F and w; in Li for 1 < i < n}. Such a m apping is called an {n-ary) i i generic a-transducer mapping. If M is e-free, then the m apping defined is called an e-free (n-ary) generic a-transducer mapping. Accordingly, let M = (n, K, x, H, po, F) and Li C for each 1 < i < n. Then ;tt is in M ( L i, ..., Ln) if and only if there exists p — IIo<j<A;(pj, Pj+i, dj) for some k > 0 such th a t p*, is in F, tt5(p) = u, 7 r2< 7 $ 3 =t(p) is in Li for each 1 < i < n ♦ and (pj,o,j,tj,pj+i,aj) is in /^[Soo] for each 0 < j < k. Such a p is called a run of j M . Note th a t p = e if k = 0, and e is a run of M if po is in F. I I 14 For convenience, each mapping / from x . .. x (n times for some positive integer n) to 2S °° will be called an {n-ary) sequence operation, or simply {n-ary) operation. The integer n is called the arity of / , denoted arity{f) = n. Clearly, each n-ary merger m apping and (e-free) n-ary generic a-transducer m apping is an n-ary operation, and each extractor mapping is a unary operation. Two n-ary operations / i and / 2 are equivalent if /i(Lu,..., Ln) = / 2(Li, • ■ • > Ln) for all subsets Li, ..., L n of By abuse of language, two “structures,” each of which is either a merger, ! an extractor or a generic a-transducer, are said to be equivalent if their associated m appings are equal. For example, the merger [ckjIi and the extractor are equivalent by using this terminology. In the rem ainder of this section, we characterize each of the two types of rs- operations by a type of generic a-transducer. Before presenting the characterization i result for mergers, we have the following: L e m m a 2.1 Let M n be the e-free generic a-transducer (n, if, x , if,p o , F) and A the fsa {K, Vn, 8,po, F) such th a t (p, o ?2 -, q) is in 6 if and only if (p, x , *, q, x) in H. Then l |T (A )Jn and M n are equivalent. 4 P r o o f Let Q = T{A) and L1 ? ..., Ln be subsets of X ^ . In order to establish 'the lem m a, it suffices to show th at (a) M n{Li, ..., Ln) C |TT]n( L i,. . . , L n) and (b) lW \n{Lu . . . , L n) Q M n{Lu . . . , L n). First consider (a). Suppose u is in M n{Li, ..., Ln). Then there exists a run p = H0<j<k{Pj, < L > tj,Pj+i, %) of Mn for some k > 0 such th a t 7r5(p) = u and i i : C 1) 7 r2crj3_j(p) is in L, for each 1 < i < n. | Let pi Ilo<j</c(pj, cij, tj, pjX-i, a,j, oif3) and ( 2 ) w - 7 T 6(pl). 15 To see th a t u is in |[W]n(jLi,. .. ,L n), it is enough to show th a t 7 r2< r$1=ai(u> ® u) is in L{ for each 1 < i < n and w is in W. Since M n is e-free, aj = aj for each 0 < j < k. Thus, 7r5(p) = t t 2 (p). Then it Is easily seen th a t u = 7r5 (p) = tt2(p) = ir2(pi). By this and (2), w < g > u — ^e{pi) ® 7t2 (/0i). Hence, for each 1 < i < n, T r2a$i=ai(w < g > u) = 7r2cr$1= Q ,.(7r6 (/9i) < g > 7 r2 (pi)) = ^ 2^ 86=0 ,-(Pi) = 7T 2^$3=i(pi) = K2<T$3=i(p)- By (1), it follows th a t 7r2< T $ i=Qt(w < g > m) is in Li for each 1 < i < n. It now remains to establish th at w is in W . Let p2 = H0<j<k{Pj, a t}->Pj+i)- S ince p is a run of M n, po is the start state and pk is in F. Also, (pj,atj,pj+ 1) is in 6 for each 0 < j < k. [Indeed, let 0 < j < k. Since p is a run of M n, (pj,a,j,tj,pj+1,a,j) is in i/fSco]. By definition, (pj, x? tjiPj+h x) is in H. Since M n is e-free, x ~ X- By hypothesis, (pj, a tj,pj+1) is in d.] Thus, p2 is a run of A. Clearly, 7r2( p 2) = 7r6(p1). :By this and (2), 7 r2(p2) = 7T q(pi) = w. Therefore, w is in W as desired. I i Now consider (b). Suppose u is in |W jn(jLi,..., L n). By definition, there exists w in W , w ith len(w) = /en(u), such th at (3) 7 r2cr$1=ai(w ® u) is in Li for each 1 < i < n. .Since to is in IT = T(A), there exists a run p = JJ0< :j<k(pj,cxt},Pj+i) of A for some k > 0 such th a t 7r2(p) = w. Since len(u) = len(w) = len(p), there exists pi = no<j<A;(pj, Pj+1 ) aj) such th a t u = ir2{p\). To show th a t u is in M n(Li, ..., Ln), it suffices to establish th a t 7r2< r$ 3_j(/91) is in Li for each 1 < i < n and pi is a run of 'Mn. \ Let p2 = n 0<j<fc(pj, aj, tj, Pj+i, aj, a* -). Clearly, for each 1 < i < n, ( 4 ) 7r2<7$3 = i(/? i) = 7r2cr$6=Q i( p 2) = 7r2<7$i=etj(7r6 (/>2 ) < g > 7r2 ( p 2) ) . It is also clear th a t ! 1 i j (5) tt6(p2) = 7 r2(p) = w and ir2(p2) = v 2(pi) = u. By (4) and (5), it follows th a t 7 r2cr$3=i(/?x) = K 2< 7% i=0li{-K & (p2)®'K2{p2)) = 7t2<t$i=ai(u>® , u). This and (3) imply th at 7r2cr$ 3=;(pi) is in Li for each 1 < i < n. It now remains to establish th at p\ is a run of Mn. Since p is a run of A, po is the sta rt state and pk is in F. Clearly, (pj,a,j,tj,pj+i,aj) is in if[Eoo] for each 0 < j < k. [Indeed, let 0 < j < k. Since p is a run of A, (pj,cxtj,pj+i) is in 8. By hypothesis, (pj,x^jiPj+i,x) * s in H. Hence, (pj,aj,tj,pj+i,aj) is in //[Eoo]*] Thus, px is a run of M n as desired. □ Using the proceeding lemma, we are now able to characterize the merger opera tio n s in term s of a special kind of generic a-transducers. 1 I T h e o re m 2.1 Each merger is equivalent to some £-free generic a-transducer and, conversely, each £-free generic a-transducer is equivalent to some merger. Expressed otherwise, an n-ary operation is equivalent to a merger m apping if and only if it is equivalent to an e-free n-ary generic a-transducer mapping. I jP ro o f Let [[VE]n be a merger. Since W is regular, there exists an fsa A ^ ( K , V nJ , Po,F) such th a t W = T(A). Let M n be the e-free generic a-transducer (n, L T, X: H, p0) F), ,where H = {(p, y, i, q, x)|(p, cq, q) in < 5 } . By Lemma 2.1, M n is equivalent to [[Wjn as desired. Conversely, suppose Mn = (n, K ,x , H,Po, F) is an e-free generic a-transducer. j Let A be the fsa (K,Vn,8,po,F), where 8 = {(p, a,-, q)\(p, Xi L < h x)l m H}. By | Lem m a 2.1, [T (A )]n is equivalent to M n as desired. □ Notice th a t the generic a-transducer M n in the above proof is effectively con- structible from the merger [kUJ„. 17 Obviously, a necessary condition for an n-ary operation / to be equivalent to an e-free a-transducer is th a t len(u) = len{u\)A- ■ ■ + len(un) if u is in / ( u i , . • • ,u n). It is i easy to see th a t for each n > 1, there exists a generic rc-tape a-transducer which do es! not satisfies this necessary condition. (For example, let M n = (n,K,XiH>PoiF),l where x = £ for each (p , x, *, q, x) in H and po is in F. Then M n{L\ ,..., Ln) = {e} for all subsets L\, ..., L n of S ^ .) Therefore, the above theorem leads to the following result. I C orollary For each n > 1, there exist n-ary generic a-transducers which have noj I equivalent mergers. We now present the characterization result for extractors. First, we have: J i L em m a 2.2 Let n > 1 ,1 < i < n, A be the fsa (K, Vn, 6, po, F) and M thej | generic 1-tape a-transducer (l,K,x>H ,po,F) such th a t (i) (p a i,q ) is in S if andj only if ( p , i s in H and (ii) (p,<Xj,<l) is in 6 for some j ^ i if and only if! (p, X)l?<7,e) is in H. Then and M are equivalent. P r o o f Let W = T (A ) and L be a subset of In order to establish th e lemma,| it suffices to prove th a t (a) 7r,[WJ~1(T) C M(L) and (b) M (L ) C wi[W ]~ 1(L). | First consider (a). Suppose Ui is in By definition, there exist w in' W and u in L, w ith len(w) = len(u), such th at ! I i ( 1 ) Ui — w2cr$i=a, (w < 8 > u). j Since w is in W , there exists a run p = Tl0< :j<k(pj,atj ,pj+i) of A for some k > oj such th a t 7t2(p) = w. Let / be the mapping from ' 5 j 0 0 x Vn to Soo U {e} such th at| I /(&, a) = b if a = a* and f(b, a) = e otherwise for each b in and a in Vn. Since, len(u) = len{w) = len(p), there exists pi = Ho<j<k(Pj, Uj, l,P j+ i, f ( aj, a tj)) such; i th a t 7r2 (p2) = u. In order to show th a t U { is in M(L), it is enough to show th a t Ui = 7T 5(pi) and pi is a run of M. Let ctj = /(a y , a tj) for each 0 < i < k and p2 = IIo<y<fc(py, l>Py+i, «y? < * * ,-)• By th e definition of ay, it is easily seen th at ( 2 ) 7T20-$6=ai{pi) = ir5(pi). Clearly, (3) w = 7T 2(p) = 7 T 6 (/92), and (4 ) U = 7 T 2 (/Ol) = 7 T 2 (p2). Thus, Ui = 7 r2< T $1=a. (u; (8) u), by (1) = 7r2cr$ 1=ai(7r6(/>2) ® 7r2(p2)), by (3) and (4) = 7 r2< j$6= Q .i(/?2) . = x 5 (/j2), by (2 ) I ! = ^ ( p i) , i.e., u; = 7t5 (/9i) as desired. To complete the proof, it remains to show th at pi is a run of M. Let 0 < j < k. Since p is a run of A, (pj,atj,pj+1 ) is in S. Two cases arise. (i) a tj = ay. Then (pj, a;, py+i) is in 6. By hypothesis, (pj, x, l,Py+i, x) is in H. Hence, (py, ay, l,py+1, ay) is in i/^Eoo]. By the definition of ay, (py, ay, l,py+1, ay) is in # [£ « ,]. (ii) a tj 7^ a*. Then (py, ct(,py+ 1) is in 6 for some I / i. By hypothesis, (py, x, l,py+u j ; e) is in H. Hence, (py,ay, l,py+i,e) is in iffSoo]. By th e definition of ay, i l>Pi+i> aj) is #[£«>]• ! Combining the two cases, (pj, aj, l,p J+i, aj) is in i7[£oo] for each 0 < j < k. Also, it is clear th a t p0 is the start state and pk is in F. Therefore, p\ is a run of Af, which completes the proof. Now consider (b). Suppose U { is in M(L). Then there exists a run p = IIo<j<k(Pj,aj, 1, Pj+ijttj) of M such th at it{ - = it5(p) and - it 2(p) is in L. Let (5) u = 7 r2(p). Let 0 < j < k. Since p is a run of M , (pj,a,j,l,pj+i,aj) is in L/’[S 00]. Hence, (pj,Xi x) is in H. By hypothesis, there exists a tj such th a t (pj,atj,pj+i) is in 6. Let / be the m apping from { 1 ,..., k — 1 } to Vn such th a t f( j ) — a.i if aj — aj and f ( j ) = a tj otherwise. Let w = /(0 ) • • • f ( k — 1). In order to show th a t U { is in Xi[[lTr! “ 1(L), it is enough to prove th at «,• = < S > u ) and w is in W. Let p! = n 0<3<k(pJ: aj, l,p i+1, a,), /(] ))■ Clearly, ( 6 ) w = 7Tg(px), and (7) 7r2(p) = tt2(p i). For each 0 < j < k, by the definition of f(j), aj = aj if and only if f( j ) = a^. Thus, ( 8 ) 7r2<7$6=ai(Pl) = ^ ( P l) = Ui. Then 7T 20-$i=ai(w < g > it) = 7r2cr$ 1=a.(x6 (/71) < g > 7r2(p)), by (5) and (6 ) = 7T2^$l=a,(7r6(pl) ® 7T2(/>l)), by (7) = ff2<7$6=ai(Pl) = Ui, by (8 ), i.e., Ui = 7 r2cr$ 1=ai(u> < g ) u) as desired. 20] To complete the proof, it remains to show th a t w is in W. Let 7)2 = Ho<j<k{Pji f(j),Pj+ 0 - Let 0 < j < k. Suppose f { j ) / a,-. Then (pj, f ( j ) ,p j+l) is in 6 by the definition of f(j). On the other hand, suppose / ( j ) = ct;. Then a,j = a 31 i.e., (P i,a j,l,P j+ i,a ,-) is in # [ £ Thus, (pj? l,P ;+ i, x) is in H. By hypothesis, (pj, a,-,pj+i) is in Therefore, (p^, /(i),P j+ i) is in 5 for each 0 < j < k. Clearly, po is the start state of A and pk is in F. Thus, p2 is a run of A. Then it is easily seen th a t w — = ^ 2 (^2) is in W = T(A), thereby completing the proof. □ We are now ready to characterize the extractors by the generic 1-tape a-transducers. T h eo rem 2.2 Each extractor is equivalent to some generic 1 -tape a-transducer and, conversely, each generic 1 -tape a-transducer is equivalent to some extractor. Expressed otherwise, a unary operation is equivalent to an extractor m apping if and only if it is equivalent to a unary generic a-transducer m apping. P r o o f Let tTjIIITJ" 1 be an extractor. Since W is regular, there exists an fsa A — (/C, 14, S, po, F) such th a t T(A) = W. Let M be the generic 1-tape a-transducer P0 ,F), where H = { ( p ,x ,l , 9 ,x )|(p ,o :;,9 ) in 6} U {(p,x, L < 1,e)|(p, q) in 6 and j / i}. By Lem m a 2.2, M is equivalent to tt^ ji“ 1 as desired. Conversely, let M = (1, K, x, -H, po, F) be a generic 1-tape a-transducer. Let A be the fsa (if, V 2 , 6, po, F), where 6 = {(p,ai,g)|(p,x,l,9,x) in H} U {(p,a2,9)|(p,x,l,?,e) in H}. By Lem m a 2.2, 7r!|[T'(A) ] “ 1 is equivalent to M as desired. □ Notice th a t the generic a-transducer M in the above proof is effectively con- structible from the extractor TrilfE])"1. 21 ^ 2.3 Equivalence of Generic a-Transducers | | In this section, we first establish a necessary and sufficient condition for two a- i i transducers to be equivalent (Proposition 2.1). Using this result, we then show the decidability of determ ining w hether two mergers (extractors, resp.) are equivalent. I I In order to study the equivalence problem for generic a-transducers, we associate I I a finite state acceptor, or fsa, to each generic a-transducer. j I I D efin itio n For each generic n-tape a-transducer M = (n, K, x> H,po, F), th e fsaj (K ,X n, 6,p0,F ), where X n = {(x>*\x)> (x ,* ,e )|l < * < and | < 5 = {(p,(x,hx),q)\(p,x,hq,x) in H], ' i i is called the associated fsa of M and denoted by A[M). j i i I It is easy to see th at two generic n-tape a-transducers Mi and M 2 are equivalent if T(A[Mi]) = T(A[M 2]). However, the converse is not necessarily true. For example,; let Mt = (2,K,x,Hi,P,F) and M2 = (2, I<, x, H2,p, F), where K = {p,q,q'}, \ F = {<?'}, Hi = { (p ,X ,l, 9 ,x ) 5 (9 5X ,2 ,g /,e)} and H2 = {(p,X »2,g,e), {q, x, 1, x)}-! Clearly, Mi and M 2 both perform the same tasks, namely ( 1 ) erase th e single element input from the second tape, and (2 ) output the single elem ent input on the first tape, j However, T{A[M\)) = {(x, 1 ,x)(x,2 ,e)} and T(A[M2]) = {{x,2 ,e)(x, 1 ,x)}- T hus , 1 * Mi and M2 are equivalent but T(A[Mi\) ^ T(A [M 2 ]). t To obtain a necessary and sufficient condition for two generic a-transducers to ] be equivalent, we introduce the “e-move closures.” This concept is essential for our j necessary and sufficient condition. First though, we define some auxiliary notions. 22 N o ta tio n For each i > 1, let Wi = {{y, i, x)> (X) h e)}- For each n > 1, let X n = Ui<i<n Wi and Won = {(x>*?x)|l 5: * ^ n }- W hen n is understood, W0 will be used to denote Won- We also need a special relation on X*. N o ta tio n For each n > 1, let < 3-n (or when n is understood) be the relation on X* defined as follows: For each oq • • • a* in X* and 0 < / < k, let cq ■ ■ ■ a /a /+ 1 ■ • • a*, oq ■ ■ ■ ai+iai • ■ ■ ah if there is no 0 < i < n such th a t {a/, a/+i} is a subset of W{. Let (or when n is understood) be the reflexive, transitive closure of < = > ■ „ for each n > 1 . A lthough not needed for our purposes, we note th a t both < = * > and are sym m etric. We are now ready for the e-move closure. D e fin itio n The e-move closure of a subset L of X*, denoted em(L), is the set {«; w < £ /•* w' for some w' in L}. Consider the example above involving two equivalent a-transducers Mi and M 2. It is easily seen th a t em(T(A[M1])) = {{y, 1 , y )(y , 2 , e), <y, 2, e)(y , 1, x)} = em (T,(A[M 2])), i.e., em(T(A[Mi])) = em (T(A [M 2])). This is no accident. Indeed, in Proposition 2.1 below, em(T(A[Mi])) — em (T(A [M 2])) is shown to be a necessary and sufficient condition for two generic a-transducers M\ and M 2 to be equivalent. Before establishing the necessary and sufficient condition, we first prove four technical results. In the statem ents and proofs of these lem m as and in the proof of Proposition 2.1, we use < 7$leW r . as the homomorphism on (V ^ x Sqq)* such th at 23J < 7 $1€wa((<2 , 6 )) = (or, b) if a is in Wi and < 7$ 1€ha((o:, & )) = e otherwise for each (a, 6 ) in 'V00 x Sqq. Also, we use {w,v)in as a shorthand for '^2 ar%\£Wi{w ® v )- The first two technical results concern the interactions between 4^* and {w , v)i„. [ ! L em m a 2.3 Let n > 1, w and w’ be in X* such th a t w' 45* w, and v in 5j^n(ud.j Then there exists v' in such th at {w,v)in = (w',v')in for each 0 < * < n. P r o o f Let t = len(w). If t is 0 or 1, then the lem m a is obviously true. Suppose t > 2. Since w' < $ ■ * w, there exists an integer k > 0 such th a t w' w. If k — 0, then th e lem m a trivially holds. Using induction, assum e the lem m a is true; for each 0 < k < m. Suppose now k = m. By definition, there exists w" such th at; w" w and w' < = > w". By the induction assum ption, there exists v" such t h a t ’ {w,v)in = (w",v")in for each 0 < * < n. To establish the lem m a, it thus suffices to! show th a t there exists v' such th a t (u/', u"),„ = (wr,v')in for each 0 < i < n. i By the definition of w' 4$ w", there exists 0 < I < t such th at I u /(l) • • • w'(l + 1 )w'(l) • • • w'{t) = w" : and \ ( 1 ) {w'(l),w'(l + 1 )} gt Wi for each 0 < i < n. ' Let 0 < * < n. It follows from (1) th a t for each a in Xoo, ' ( 2 ) either < 7$1€ M ,t(V (7 ), a ) = £ or + 1), a) = e. I Let v' = v"{\) ■ ■ ■ v"(l + l)r>"(/) • • • v"(t) and, for 1 < j < t, aj — (w'(j ), v'(j )) a n d ' j bj = vsieWiiw'itijV'Q)). By (2), either 6/ = e or bi+1 = e. Thus, <r$i€Wi(w' ® vf) = (T$eWi(ai * ‘ • a‘ai+ 1 •'•«<) = bi • • • bibi+i • • -bt = b\ • • • bi+\bi • • - bt, since either b\ = £ or 6;+i = c = cr$ieWi(w" ® v")- By definition, (w',v')in = {w",v")in as desired. □ L e m m a 2.4 Let n > 1, w and wf be in X *, and v in S ien(W ) such th at no element appears in v twice. Suppose there exists v' in such th a t (w,v)in = {w',v')in for each 0 < i < n. Then w w'. P r o o f Let = {w,v)in and u\ = (w',v')in for each 0 < i < n. By hypothesis, Ui = u\ for each 0 < i < n. Let k = len(w). If k — 0, then the lemma holds trivially. Using induction, assum e the lem m a holds for each 0 < k < m. Now suppose k = m. Two cases arise. (i) tw(l) = w '(l). Let W\ = 1^( 2 ) • • • w(m ), V\ = v{ 2 ) ■ ■ ■ v(m ), and w[ = w'( 2 ) • • • v[ = v'{2 ) ■ ■ ■ v'{m). Clearly, (wi,vi)in = {w^v'^in for each 0 < i < n. It is also clear th a t no element appears in v\ twice. By the induction assum ption, w\ w[. Hence, w w'. (ii) u>(l) 7^ m '(l). Clearly, there exists 1 < * < n such th a t w(l) is in Wi. Thus, Ui is not empty. Since u\ = Ui, u\ is also not empty. Thus, there exists an integer j such th a t w'(j ) is in Wi by the definition of u'-. Therefore, {w'(j), w (l)} Q Wi and the set J — {1 < j < m\{w'(j), w (l)} C Wi for some 0 < i < n} is not empty. Let I = min(J) and a = w(l). It is readily seen th a t v'{l) — a. [Indeed, since a occurs in v, it appears in w, for some 1 < i < n. Thus, a appears _________________________________________________________________________________________________________________________________________________________________ 2 5 , in u\ for some i and hence in v'. Let j be an integer such th a t v'{j) — a. Suppose j ^ I. By the definition of /, there exists 0 < i < n such th a t {ic(l), w'(l)} C Wi and {tc(l), w'(l')} % Wi for each V < I. Thus, w'(l) is in Wi and w'(l') is not in Wi for each 1 < / ' < / . Hence, u'- starts w ith v'{l) by definition. Since v'(j) = a , j / l and no element appears in v' twice, v'{l) ^ a. Thus, u\ does not sta rt w ith a. However, starts with a since w>(l) is in Wi- This contradicts the fact th a t u, = u\. Therefore, j — I. Hence, v'(l) = a since v'(j) = a.] Suppose w'(l) = u>(l). Let 1 < j < I and 0 < i < n. By the definition of /, {w’(l), w'(j)} is not a subset of Wi. Then {u/(7), w'(j)} is not a subset of Wi by th J assum ption th a t w'(l) = tu(l). Let w" = u>'(/)tt/(l) • • • w'{l — 1 )w'{l + 1) • ■ • w(m). By definition, it follows th at w" < 4-* w '. By Lemma 2.3, there exists v" such th at {w',v')in = (w",v")in for each 0 < i < n. Thus, (w ,v)in = (w",v")in for each 0 < * < n. Since u /'(l) = itf(l), w w" by case (i). Hence, w w' as desired. To com plete the proof, it remains to show th a t w'(l) = ^(1)- Let 0 < * < n. Suppose ic(l) is in Wi. Then Ui, and thus «'•, starts with a. Since v'(l) = a and v'(j) 7^ a for all j ^ I, w'(l) is also in Wi. On the other hand, suppose w( 1 ) is not in Wi. Then Ui does not contain a. Since u\ = ut, u\ does not contain a. Thus, tn'(Z) is not in Wi. Therefore, for each 0 < * < n, (1) either {u>(l), w'[l)} C Wi or { u ;(l),«/(/)} P i Wi = 0. Suppose 1 < i < n is the integer such th a t w (l) is in Wi. By (1), { tf(l), C Wi. Two cases arise. (i) w(l) is in Ho- By (1), {to(l), w'(l)} C Wo. Thus, {w(l),w\l)} C Wi fl Wo- Since Wi n W0 = {(x, *, x)}> ™'(0 = u>(l) = (x, *\ x)- M (ii) u>(l) is not in Wo- By (1), {to(l), u/(/)} fl W0 = 0. Then {u>(l), u/(/)} C X n - Wo since {w(l), u/(/)} C X n. Thus {w(l),w'(l)} C W { n (X n - W0). Since Wi D (X n - W0) = {(*:,»,£)}, = ™(1) = (x,h^)- Thus, w'{l) = u>(l) for each case. □ Lemm a 2.3 and 2.4 are used in the proofs of the next two technical results (Lem m a 2.5 and 2.6). Lemm a 2.5 and 2.6 establish some connections between the set em{T{A[M})) and the m apping accomplished by M. L e m m a 2.5 Let M be a generic n-tape a-transducer and w a sequence in X*. Then w is in em(T(A[M})) if and only if (w, v)0n is in M ((w , v )\n, . . . , (tu, v )nn) for all v in S ^ n(w). P r o o f (Only if) Suppose w is in sm{T{A[M))). Then there exists w' in T(A[M ]) such th a t w' 44* w. Let v be a sequence in By Lem m a 2.3, there exists v' such th a t len{v') = len(w') and (w,v)in = (w',v')in for each 0 < i < n. Let Ui = (w,v)in for each 0 < i < n. Then ( 1 ) U { = (w ',v’)in for each 0 < i < n. Since w' is in T(A[M]), there exists a run p = Ho<j<k(Pj, (Xt tji x)-,Pj+i) °f M such th a t w' = ^ (z 3)- Let / be the mapping from Sqq x X n to Sqo U {e} such th a t (for each a in and ( x ,h x ) in X n) /( a , (x ,* ,x )) = a if x — X and f ( a ,( x ,i,x ) ) = £ if X = €. Since len(v') = len(w') — len(p), there exists pi = Tlo<j<k(pj,a,j,tj,pj+i, f ( aj,(x ,tj,x ))) such th a t 7r 2 (pi) = v'. It is easy to see th a t pi is a run of M. To complete the proof, it suffices to show th a t uq = ^ ( z ^ ) and U { = Tt2 < 7 $ 3 -i(pi) for each 1 < i < n. 21 Let p2 = n.o<j<k{pj,aJ,tj,pj+i, (y, tj, y)), (y, X>)- Clearly, ( 2 ) w' = 7T 2 (p) = x 6(p 2), and ( 3 ) = 7T 2(/9l) = 7T 2(p 2 ). Thus, for each 0 < i < n, m = (u/, v')in, by ( 1 ) = eWi(w>® v> ), by definition = V2 <r$ieWi(n6 (pi) ® ^ ( p i) ) , by (2) and (3) = 7r2<7$6ewi(pi)- Let dj = /(a ,(x ,* )X )) f°r each 0 < j < k. For each 0 < j < k, it follows from th e definitions of cij and W0 th a t hj = aj if and only if (xAjiX) is in W0. Hence, 7 r5 (p2-) ' = ^ ( p i ) = 7 r2cr$ 6ew0(pi) = uo as desired. Now let 1 < i < n. Clearly, (x A j,x ) Is in if and only if tj — i for each 0 < j < k. Then 7 r2< 7 $ 3 =,-(pi) = 7 r2^r$6€W'o(pi) = Ui for each 1 < i < n as desired. (If) Suppose w is in X*. Let v be a sequence in S /en(«,) s u c h t h a t no element' appears in v twice. For each 0 < i < n, let Ui = (w ,v )in . By hypothesis, uq is in M (ui , .. ., un). Thus, there exists a run p = n 0<j<k(Pj,aj,tj>Pj+i>Qj) of M such th a t (4) m = 7 r2cr$ 3=i(p) for each 1 < i < n, and (5) u0 = 7 ts(p). Let Pl = U 0 <j<k{pj ,( x ,tj ,x),Pj+uaj,aj ), where (6 ) (x> tj, x) = (X? X) if and only if hj = aj for each 0 < j < k. Also, let P 2 — no<j<fc(pj, (y , tj, x ),p j+i). Clearly, p2 is a run of A[M], Then w' = 7r2 (p2) = 7r2 (pi) is in T(A[M]). Let v' = 7r2(p). To establish the “if” p art of the . 2 8 1 lem m a, it is needed to show th at w is in em(T(A[M])). Since w' is in T(A[M]), by Lem m a 2.4, it suffices show th a t {w,v)in = {w' ,v')in for each 0 < i < n. Clearly, v' = ir2 {p) — ft4 (pt ). Thus, (7) 7r4cr$ 2G W o(/Oi) = n 2o-$ 1G w0 M p i ) ® ^ (p i)) = n 2cr$1€ m (wf ® V') = (w', v')0n. By (6 ) and the definition of W0, = aj if and only if (xAj,X) is in Wo f°r each 0 < j < k. Thus, Ts(pi) = 7r 4 a$2eWo(pi) and (to,u)on = wo, by definition = * M , by (5) = M pi) = 7r4 O ’ $2€W 0(/?l) = (wf, v')0n, by (7). Now let 1 < i < n. Since tj = % if and only if {x, tj, x) is in Wi for each 0 < j < k, Tt2 °$3 =i(p) = ^ 4 ^$2 eWi(pi)- Then {w,v)in = U{, by definition = * 2< r$ 3=i(p), by (4) = K 4< T$ 2GWl(pi) = (w \v')in, by (7). Thus, (w,v)in = (wf, v')in for each 0 < i < n as desired. □ L e m m a 2.6 Let M be a generic n-tape a-transducer. For uo, . ■ ■, un in w o is in M ( u i ,..., un) if and only if there exist w in em (T(A[M ])) and v in such th a t Ui = (w,v)in for each 0 < i < n. P r o o f (Only if) Suppose w 0 is in M (ux, ..., un). Then there exists a run p = IIo<j<k{Pj, o-ji tj,Pj+\, (tj) of M such th at 7r5 (p) = u 0 and 7r2cr$ 3=i{p) = Ui for each 1 < _______________________________________29 i < n. Let pi = UQ <j<k(pj, (x, tj, x),Pj+i) and p2 = U0 <j<k(p3, (x ,tj ,x ),p J+\,aj ,d1), where ( 1 ) {XitjrX) — (x ,tji£) ^ and °nly if hj = e for each 0 < j < k. By definition, p\ is a run of A[M]. Let w = 7 r 2 (pi). Then w is in T(A [M ]) and hence in em{T{A[M])). Obviously, x 2(p2 ) = ^(pi)- Thus, w = ^ 2 (^ 2)- Let v ~ 7T 4 (/92)- Clearly, len(w) = len(v). To complete the proof, it rem ains to show th at Ui = (w , v)in for each 0 < i < n. Obviously, for each 0 < i < n, (2) ew,(w ® V) — 'K 2&%\£Wi{'K2(p2) < 8 > ^ 4 (^2 )) — ^4CT$2eWi(P2)- For each 0 < j < k, it follows from (1) and the definition of Wo th a t hj = e if and only if (x ,tj,x ) is not in W0. Thus, ir4aS2€Wo(p2) = ws(p2) and (w,v)0n = ^ 2 ^$iew0{w ® v)i by definition = W t i e W o M , by (2 ) = ^ 5 (^2 ) = T T S (/>)• This and the definition of p imply th at (3) (w ,v )0n = u0. Similarly, let 1 < * < n. By the definition of Wi, (x>tj,x) is in W if and only ifi tj = i for each 0 < j < k. It follows th a t ^ 4cr% 2& Wi{p2 ) = V2 & $ 3 =i(p)- Thus, (w,v)in = tt2<j$i&w,(w < g ) v), by definition, = K4<rs2eWi(p2), by (2 ), = K2<T$3=i{p)- This and the definition of p imply th at (4) (w, v)in = Ui for each 1 < i < n. _________________________________________________________________________________30 Com bining (3) and (4), {w ,v)in = Ui for each 0 < i < n as desired. (If) Let w be in em(T(A[M])), v in and Ui — (w , v)in for each 0 < i < n.\ By definition, there exists w' in T(A[M ]) such th a t w w'. By Lem m a 2.3, there exists v' in such th a t (w', v’)in = (w , v)in for each 0 < i < n. Thus, m = {«/,*/},„ for each 0 < i < n. Since w' is in T(A[M]), there exists a run p = IIo<j<k(Pj, (Xitj>x)iPj+1) °f A[M] such th a t w' = 7r2 (p). Let / be the m apping from Soo x X n to Sqo U {e} such th a t (for each a in and (x ,/x ) in X n) /( « » (X,h X)) = a if x = x and /( a , (y, i, y)) = e if y = e. Since len(v') = len(w'), there exists pi = n 0<J<fc(pj, aj, tj,pj+i, /( a j, (y, tj, y))) such th a t T 2 (pi) = v'. By definition, it is easy to see th a t pi is a run of M. To show th a t u 0 is in M (u\ , ..., un), it suffices to show th a t 7Ts(pi) = u 0 and 7r2cr$ 3_,(pi) = u,. Let aj = / ( a j, (x, £j,x)) for each 0 < j < k and P2 — n 0<j<*(pj, flj, flj, ( y ,t j, x}). Clearly, 7 r6(p2) = ^ 2{p) = and, for each 0 < i < n, 7r2a'$6ewr i(pi) = 7 r 2a-$ 1€vK i(7r6 (/3i) (g ) 7 r2 (pi)) = 7 r2cr$ 1€ V i/| (tf;/ ® */) = (u/, l/)j„ i.e., (5) 7 r2< J$ 6€V K i(Pl) — W ,- . For each 0 < j < k, by the definitions of aj and V F 0, aj = aj if and only if (y, tj, tj) is in W0. Therefore, x 5 (pi) = 7 r2< T $ 6eM /0(Pi)- Hence, 7 r5(p2) = 7r5( /£ » 1) = 7r2cr$ 6€w0 (p1). It then follows from (5) th a t 7t5(/>2) = ao as desired. ________________________________________________ 31 Now let 1 < i < n. Since (for each 0 < j < k) tj = i if and only if (xAjiX) 1S in Wi'j n2C$3=i(Pl) = / ^ 2 < ^ r$ 6 €W,(pl)' Hence, 7r2<7$3=i(p2) = ^ 2^%3=i{p\) = ’ K 2 cr% 6 ^.W i (pi )• It then follows from (5) th a t 7r2< 7$ 3=j(/>i) = w ,- for each 1 < i < n, thereby com pleting the proof. □ We are now ready for our characterization of when two generic a-transducers are equivalent. P r o p o s itio n 2 . 1 Let M\ and V ff2 be two generic n-tape a-transducers. Then M\ is equivalent to M 2 if and only if em(T(A[Mi])) = em{T{A[M 2\)). P r o o f (Only if) Suppose Mj and M 2 are equivalent. By sym m etry, it suffices to show th a t em{T{A[M\]) C em(T{A[M 2\). Suppose w is in em(T(A[Mi])). Let v be a sequence in and u, = (w,v)in for each 0 < i < n. By Lem m a 2.5, u 0 is in M \(u\,..., un). Since M\ and M 2 are equivalent, uo is in M 2 («i, ..., u n). By Lem m a 2.5 again, w is in em{T{A[M^)) as desired. (If) Suppose £m(T(A[Mi})) = £m{T{A[M^[)). Let Li, ..., L n be subsets of 53^, and w 0 in M i(L i, ..., Ln). By symmetry, it suffices to show th a t uo is also in M 2(L1, ..., L n). Since w o is in Mx(Lj,..., Ln), there exists w? - in L, (for each 1 < i < n) such th at uo is in M(wi,..., un). By Lemma 2.6, there exist w in £m(T(A[M})) and v in such th at w s = (w,v)in for each 0 < i < n. Since £m(T(A[Mi])) = em (T(A [M 2 ])), w is in £m{T(A[M 2])). By Lem m a 2.6 again, w 0 is in A/2 (wi, .. ., Un). Thus, uq is in M (L \ ,. .., L n) as desired. □ Using the above proposition, we are able to establish the decidability of the equivalence of mergers (extractors, resp.). T h e o re m 2.3 It is decidable whether two mergers (extractors, resp.) are equiva lent. __________________________________________________________________ 32 P r o o f Let M be an £-free generic n-tape a-transducer. Since M is e-free, T(A[M]) is a subset of W$. Thus, e m (T (A [ M ])) = T (A [ M ]) by definition. Hence, by Propo sition 2 .1 , two e-free generic a-transducers M i and M 2 are equivalent if and only if T(A[Mi\) = T(A[M?\). Since j4[M] can be effectively constructed from M by defi nition and it is decidable whether two regular sets are equal ([HU69]), it is decidable w hether two e-free generic a-transducers are equivalent. Let fWjfln, 1 < j < 2, be two mergers. By Theorem 2.1, there exists an e-free generic n-tape a-transducer M j equivalent to flW}]|n for each 1 < j < 2 . By the comment after Theorem 2.1, M j is effectively constructible from [IT jJn for each 1 < j < 2 . Thus, it is decidablj w hether two mergers are equivalent. Now let M be a generic 1-tape a-transducer. Then T(A[M ]) is a subset of W f. By definition, em(T(A[M ])) = T(A[M]). Thus, it is decidable w hether two generic 1-tape a-transducers are equivalent. Let [W}]]"1, 1 < j < 2, be two extractors. By Theorem 2.2, there exists a generic 1-tape a-transducer Mj equivalent to TTijKW j]]” 1 for each 1 < j < 2. By the comment after Theorem 2.2, Mj is effectively constructible from for 1 < j < 2. Thus, it is decidable w hether two extractors are equivalent. □ 33. I j I I Chapter 3 j i i i i Composition and Decomposition of Rs-Operations i i i In this chapter, we study “composition” and “decom position” of mergers and ex- ( I tractors. We first prove th a t the set of all merger (extractor, resp.) m appings is ' I » “closed” under composition, th at is, the composition of mergers (extractors, resp.) ; is still a merger (extractor, resp.). This raises the question as to w hether there exists a finite set of mergers (extractors, resp.) which yields all mergers (extractors, resp.) . under com position. By studying the decomposition of rs-operations, we show th a t th e answer is “no.” i j I 3.1 Composition of Rs-Operations i i i In this section, we study the “composition” of rs-operations. O ur m ain results are (1) each com position of mergers is a merger and (2) each com position of extractors , is an extractor. ! i We start w ith the notion of composition. ! D efin itio n Let T be a set of operations. Then an n-ary operation / is said to be a i composition of operations in T if either (1) / is an n-ary operation in T , or (2) there I ______________________________________________________________________________________ 3 4 , exist compositions f\ and / 2 of operations in .A,-with arity(fi) + arity(f2) — 1 = n, such th a t 1 i+arity(fi)-l)? Li+arity(f1)? • • ■ 5 L n) for some 1 < i < n — arity(fi) + 1 and all subsets L\, ..., L n of E ^ . An n-ary operation / is a composition of generic a-transducers (mergers, ex tractors, resp.) if / is a composition of operations in J-, where T consists of all generic a-transducer (merger, extractor, resp.) mappings. In order to present our m ain results, we need two lem m as and a proposition (of interest in its own right). The first of our two lemmas deals w ith a special case of com position of two generic a-transducers. L e m m a 3.1 Let M\ and M 2 be generic ri\- and n 2-tape a-transducers, respectively. Then there exists a generic n-tape a-transducer M , where n = ni -f n 2 — 1 , such th a t M ( L i, . .., L n) = ... ,L ni),L ni+1, ..., Ln) for all subsets L x, ..., Ln of S ^ . Furtherm ore, M is e-free if both Mi and M 2 are. P r o o f Suppose M t = (n l5 K\, x, Hi, q0, Fi) and M 2 = (n 2 , K 2 , x , H 2 ,p 0 ,F 2). Let K — {(p,q)\p in K 2 and q in Ki}, F = {(p, q)\p in F 2 and q in Fx] and H = {((p,q),X,n 1 + i - l,{pf,q),x)\{p,x,htf>x) in H2, i > 1 and q in Ah}U {((Pi<l),X,h(P,, ( l'),x)\{P,xA,P,,x) in H 2 and (g, x, *\ ?', *) in H i }U {{(p,<l},X,h {p,<l')iZ)\{<l,X,h( l',£) in Hi and p in K 2). Let n = ni + n 2 — 1 and M be the generic a-transducer (n, K ,x , H, (po,qo), F). Clearly, M is e-free if both M x and M 2 are. *It is understood that if i — 1 (i + arity(fi) — 1 — n, resp.), then there is no L\, ..., L,_i preceding (no Li+arity^lh ..., Ln following, resp.) fi{U,..., Li+arity(fly i). ________________________________________________ as Let Li, ..., Ln be subsets of Assume th a t the following assertion is true. (1) Let v be in p in A 2 , q in Ah, and Wi and w\ in A; for 1 < i < n. Thenj there exists k > 0 such th at ((p0, q0 ) ,w i,... ,wn,e) \-j^ ((p, q), w'1:. . ., w'n, v) if and only if there exist w', ki > 0 and 0 such th a t (q0, toj,..., wni, e) (q, w'ni,w') and (p0 ,w ',w ni+l, ... ,wn,e) {p,£,wf ni+1, ... ,w'n,v). Let L = M (L\ , ..., Ln). By definition, u is in L if and only if there exist Wi in Li for 1 < i < n and (p /,9 /) in F such th a t ((p0,9o), m , . . . , w nie ) \ ^ ((p /,tf/),e , By (1), u is in L if and only if there exists w' such th a t (q0, w\ , ..., wni, e) and (p0, w > ni+i,...,w„ ,e) ht 2 (pf, £,...,£,«). Thus, u is in L if and only if u is in M i(M 2(Ai, ..., Lni), A„1+i,..., Ln). In order to establish the lem m a, it therefore suffices to show th a t (1) is true. Consider the “only if” p art of (1). Obviously, the “only if” is tru e if k = 0. Continuing by induction, suppose the “only if” part is true for each 0 < k < m. Assume k = m. Since k = m > 0, there exists ({p',q'),a,i, (p,q),a) in such th a t ((po,qo),u>i,---,wn,E) ((p',q/),w'1 ,...,aw 'i, . . . , w l n,v ') w ( 0 p » ... fw'i, .,., w'n,v'a), where v'a — v. Two cases arise: (a) 1 < i < rii and (b) n\ < i < n. Consider (a). By induction, there exists w' such th at (2) (p0, w', w 2 , . . . , w n,£) (^2 (p', e, w'ni+1, ...,w'n, v') and (3) (qo,wx, . . . , w ni,e) ^ {q' ,w [,. .. ^aw'i,... ,w'ni,w'). Since ((p', q'), a, i, (p, q), a) is in i7[£oo] and 1 < i < two possibilities arise by the definition of H : 36 (i) (q',a,i,q,a) is in i/ifEoo] and (p',a,l,p,a) in ^ [S o o ]- Therefore, {p', «> < 1+1, • •• ,< , t/)lm2 (p, e, < 1+i, • • • ,< , v'a) and (g', u/1?. . . , aw'i, w')\-Mi (g, w [,..., w'i:. . ., w'ni, w/a). Then (p0 ,w'a,wni+1 , . . . , w n,£) ^ (p',a,w'ni+1 ,...,w 'n,v'), by (2 ) \~ m 2 (p,£,™n1+i , - - - , K , v> a) and (go, ^ ( g ' , ^ , . . . , ^ ' , . . . , ^ , ™ ' ) , by (3) ^ (q,w' 1 ,...,w 'i,...,w 'ni,w'a). (ii) p = p'and ( t j r 7 , a, i, g , e ) is in l/ifXoo]. Since p = p', by (2), (pD,u/,u>ni+1,...,u>n,e) | ^ 2 ( p , < i+ 1,...,u 4 ,u). It is clear th a t a = e in this case. Hence, v' = v. Therefore, (g', w[,..., ..., ^ (?> «4,..., tyj,..., u/). Then (qQ ,w u . .. ,wni,e) by (3) ^ ( g X , . . . , Thus, the “only if” part of (1) is true in either case. Consider (b). By induction, there exists w' such th at (4) (p0 ,w ',w ni+l, . . . , w n,£) ^ (p', e, u £ 1+1, . . . , aw\, vf) and (5) (q0 ,w 1 , . . . , w ni,e) ^ ( g ', ^ , . . . , ^ , ™ ') . _37 Since < i < n and ({p q ') ,a , iy {p , q), a) is in i f [£oo], it follows from th e definition of H th a t q' = q and (p',a,i — ni + l,p, a) is in # 2 [5200]• Then (p0,w',n;ni+1,...,w n,e) (p ',e ,« 4 i+1,...,a u ; ', v'), by (4) {Pi£ TWni + li • • • ^Wi-> • ■ ■ Since q' — q , it follows from (5) th at (< ? 0, • • •, 1 e) • • • 5 5 Thus, th e “only if” p art of (1) is true. We now show the “if” p art of (1). Clearly, the “if” p art is tru e if either ki = 0 [ or k 2 = 0. Continuing by induction on ki + k2, suppose the “if” p art is true if 0 < ki + k 2 < m. Assume k\ + k 2 = m. If kj = 0, then the “if” p art is tru e as just m entioned. Suppose k\ > 0. Since k\ > 0, there exists 1 < i < n\, (q',a,i,q,a) in -£fi[52oo] and 0 < k 3 < ki such th at (<7o, « > i , • - • , w n i , e ) ^ ( 9' , K , • • • , aw\, w " ) ^ (q, w [ , . . . , w '{, . . . , w ' n i , w ”a), where w " a — w ' . Thus, (6) ( q ' , a , i , q , a ) is in ifi [Eoo]. Two cases arise: (c) a = e and (d) a = a. Consider (c). Since a = e, w " = w ’. By induction, ( V ( ( P o , q o ) , W ! ,. . . , w n , e ) ( ( p , q ' ) , w [ , . . . , a w ' i, . . . , w , n , v ) . By (6) and the definition of A T, (*) ((p,q')ia,h (p,q),e) is in #[£oo]- Then ( ( p o , q 0) , w 1, . . . , w n , e ) ( { p , q ,) , w f l , . . . , a w ' i , . . . , w ' n , v ) , by (7) & ( ( P , q ) , w [ , . . . , w ' n , v ) , by (8) as desired. ____________________________________ 38 Consider (d). By the hypothesis of the “if” p art of ( 1 ), (p0 ,w' ,wni +1, ... , ion,e)| ^m2 iPie- > wni+\i ■ • Then there exist w" («i < j < n), 0 < k 4 < k2, p‘, p" and v" such th a t (pQ ,w"a,wni+i,.. .,w n,e) h % 2 (p',a,w"1+1, ... ,w",v") \m 2 (p",e,w 7"+1, . . .,ic " ,u " a ) It is easy to see th at (9) (p',a,l,p",a) is in //i[£oo] and ( 1 0 ) (<p",g),u;;,...,u;',..., w"i+ 1,...,r<, v"a) {{p,q),w'x,...,w 'n,v) for each q- Since k4 < k2, kz < k\ and kx + k 2 < m , it is clear th at k 3 + k4 < m. By induction, (11) {(po,q0 ),W i,...,W n,e) {{p',q,),w'1,..., aw\, ...,w'n, u>"+1, • • • ,< , v"). By (6 ) and (9), it follows from the definition of H th at ({p', q'), a, i, (p", q), a) is in H [B oo]. Then by (11), ( 1 2 ) ((p0 ,q 0 ),w 1 , . . . , w n,e) ^ ((p",q),w[,... ,w'i,... ,w’ ni,wZi+x, .. . ,wZ,v"a). By (10) and (12), ((p0 ,q 0 ),w 1 , . . . , w n,£) {(p,q),w'x,...,w 'n,v) as desired. □ T he second lem m a deals with an interchange of two specific tapes of a generic a-transducer. L e m m a 3.2 Let M be a generic n-tape a-transducer and 1 < i < j < n. Then there exists a generic n-tape a-transducer M l such th a t M '(L i, ..., L j, ..., Li, ..., L n) = M (L \,..., L{,..., L j, ..., Ln) for all subsets L\, ..., L n of Furtherm ore, M ' is e-free if M is. _________________________________________________________________________________ 39 P r o o f Suppose M = ( n ,K ,x ,H ,p 0,F). Let i H '={(p,xJ,q,x)\(p,X,i,<l,x) in H} U {(p, x, q, x)l(P> X,j, q, x) in H} U {(P>X»*>«>X)I(P>X,*,«,X) in H , t ± i and t ^ j}, I and M' — (n, K ,x , H',p0, F). Clearly, M 1 is e-free if M is. It is easy to see th a t j M'(..., L i,. . ., L j ...) = M (..., L j , . .., L i,...) for all subsets L\, ..., Ln of S ^ ,. □ ; i \ i i By a repeated use of Lemmas 3.1 and 3.2, we get the following: I i l I P r o p o sitio n 3.1 Each composition of (e-free) generic a-transducers is equivalent to a (e-free) generic a-transducer. ! By Theorem 2.1, each merger is equivalent to an e-free generic a-transducer. By j Proposition 3.1, each composition of e-free a-transducers is still an e-free generic a - , transducer. Thus, each composition of mergers is still an e-free generic a-transducer ! and hence (by Theorem 2.1) a merger. Similarly, by Theorem 2.2 and Proposi- | i . 1 ition 3.1, it is easy to see th at each composition of extractors is still an extractor. , Therefore, we have T h eo re m 3.1 Each composition of mergers (extractors, resp.) is equivalent to a m erger (extractor, resp.). ! To illustrate, let C oncat2 (L i, L 2 ) = l a ^ a ^ i L i , L 2) for all subsets Li and L 2 ‘ ! of Clearly, u is in Concat2 (L i,L 2) if and only if u is a concatenation of a ! sequence in Li and a sequence in L 2. By the above theorem , there exists a merger j C oncat3 such th a t Concat3 (L i, L 2, L3) = Concat2(C oncat2 (L i, L 2), L 3) for all sub- ! sets Li, L 2 and L 3 of E ^ . Indeed, it is easy to see th a t ^aX a^a^^Li, L 2, L 3 ) = : 40 , C oncat2(C oncat2 (T i, L 2), L3) for all subsets Li, L 2 and T 3 of As another exam ple, it is easy to see th a t 7 r1 [(o;1a2a;2)*lr1(^) = 7riI ( Q :iQ :2 )* lr 1 (7riE(Q :iQ ;i Q :2 )* lr1(j^)) for each subset L of We note in passing th a t there exists a composition of mergers and extractors which is not equivalent to any one extractor or any one merger. 3.2 Decomposition of Mergers The m ain result of the last section states th at a composition of m ergers (extractors, resp.) is still a merger (extractor, resp.). The question arises as to w hether there exists a finite set T of mergers (extractors, resp.) such th a t each merger (extractor, resp.) is a composition of mergers (extractors, resp.) in T . The purpose of this section is to provide an answer to the above question for mergers. In particular, the notion of “decom position” is presented. Through a development of two theorem s concerning the decomposition of mergers, a negative answer is given. (The analogue for extractors is examined in the next section.) We start w ith the notion of decomposition. In this section only, we use A4 (possibly subscripted) to denote mergers and A4n (possibly subscripted) to denote n-ary mergers. W hen n is understood, we usually use A4 instead of M n. Intuitively, a merger is said to be “decomposable” if it is equivalent to a com po sition of at least two mergers. However, all mergers are decom posable according to this notion. Indeed, it is easy to see th a t [[orj]]i is the unary identity m apping, i.e., |a*]]i(T) = L for each subset L of Thus, M (L U = M f tK J iU O , i2, • . ■ , Ln) :1 1 for each n-ary merger Ad and all subsets L \ , ..., L n of E ^ . In th e following, a m erger is called an identity if it is equivalent to Joy H i- Evidently, identities should be excluded from our definition of decomposition. By Theorem 3.1, a merger is equivalent to a composition of two mergers if it is equivalent to a composition of three or more mergers. It is easily seen th a t no com position of non-identity mergers is an identity. [Indeed, suppose a m erger Ad is a com position of non-identity mergers Adi, ..., Ad*,. Clearly, Ad is an identity implies th a t each Ad; is unary. It is easy to see th a t a unary merger Ad; is not an identity implies th a t Ad;({u}) = 0 for some u. Thus, Ad({u}) = 0 for some u. Hence, Ad is not an identity.] Therefore, our definition of decom position only needs to consider when a merger is decomposed into two mergers. W ithout loss of generality, the decomposition of a merger Ad into mergers Adi and Ad2 m ay be assumed to have the “canonical” form Ad2 (A d i(T i,..., Lni), Lni+1, ..., Ln). Indeed, let Ad = [Q ]n be a merger. It is easy to see th a t Ad is a com position of two (not necessarily distinct) mergers if and only if there exists a 1 -1 hom om orphism h from Vn onto Vn such th a t is decomposable into this canonical form. [For exam ple, let Ad = [[a^a^a^)*]^, M \ = [[(auo^)*^ and M 2 = I< 2 * < 2 2]]2- Clearly, A d(Li, L 2, L3) = Ad2(Ti, A di(T2, L3)) for all subsets L i, L2 and L3 of E ^ . Let h be the hom om orphism from V3 to V3 such th a t h(ai) = o;3, h(a2) = cti and h(a3) = a 2. Then h{a\{a2a 3)*) = a 3 ( a i a 2)*. It is obvious th a t |o;3 (q;iQ;2)*]3(T i, L 2, L3) = \o.2 0 l\\2{M i{Li, L 2), L3) for all subsets Li, L 2 and L 3 of E ^ .] In view of the com m ents in the above three paragraphs, we will restrict ourselves to th e following notion of decomposition. 4 2 D efin itio n A decomposition of an n-ary merger M is a list M™1, M%2, where! j n = m + n 2 - 1, such th a t M ( L X , .. . ,L n) = M 2( M r (L i, ..., L ni) ,Lni+1, ..., Ln)\ for all subsets L x, ..., Ln of Such a merger M is said to be decomposable into M i , M 2 , or simply decomposable. A decomposition M x, M 2 is said to be trivial if either M x or M 2 is an identity, and non-trivial otherwise. To illustrate, let M = [[(aio^aa)*]^, M \ = [(o ic^ )* ^ and M 2 = K c^aia^)* ^- Clearly, M i , M 2 is a non-trivial decomposition of M . Indeed, neither M i nor M 2 !is an identity and l(a 1a 2 a 3 )% (L 1 ,L 2 ,L 3) = l(axa xa 2 ) % ( l(a xa 2 ) % ( L x, L 2), L3) 1 |for all subsets L x, L 2 and L 3 of This example has some interesting practical im plications. For instance, suppose Li and L 2 are stored in a rem ote site (thus, L\ \ and L 2 are finite sets) and the size2 of M \(L i, L 2) is much smaller th an th e size of i L\ plus the size of L 2. Then A4 2 (M i(L i, L 2), L3) is obviously an efficient way o f : processing M .(Xi, L 2, L3). i i T he above definition includes two extrem e cases, nam ely when n 1 is 1 or n. T h e ; next result gives a characterization for these extrem e cases. In order to present the 1 ! characterization, the following notions are needed. \ i N o ta tio n For each set A of elements, A' C A, b in A, w in A* and W C A*, let I (i) w A' = hr w), where hx is the homomorphism from A* to A* defined by J hx(a) = a if a is in A' and hx(a) = e otherwise; , I (ii) w[A'/b] = h2(w), where h2 is the homomorphism from A* to A* defined by ■ h2(a) = b if a is in A' and h2(a) = a otherwise; and 1 (iii) W \A' = Uwe^ H A'} and W[A'/b] = {w[A'lb}\w € W }. \ 2The size of a finite set of sequences is the sum of the lengths of the sequences in the set. ' 43 f Intuitively, w[A'/b] is the sequence obtained by changing all symbols in A' to t 6, and w\A' the sequence by retaining only the symbols in A ’. For exam ples,; t a6c[{a,6}/c] = ccc, = aja^c*!, and (0:10:2)*|{<Ti} = i We are now ready for the following trivial cases of decomposition of mergers: P ro p o s itio n 3.2 Let M. = | W \n be a non-identity merger. Then (a) there exists a non-trivial decomposition A i\, A4 % of A4 if and only if W |{oi} 7 ^ o*; and (b) there exists a non-trivial decomposition A4”, AA\ of AA if and only if W ^/n/a 1 ] | j P r o o f Consider (a). Suppose VF|{o:i} 7 ^ ctj, i.e., there exists k > 0 such that no j 1 sequence in W contains exactly k occurrences of Gq. Let W x = VF|{«i}. Obviously, I W U L i , ...,L n) = [ WUIW1ML1), ... ,L n) ! I _ * for all subsets L\, ..., Ln of Since W\ = W |{gi} 7 ^ a\ and \W~\n is not an identity, it follows that |Wi]]i, [VF]n is a non-trivial decomposition of [VFjn. i Conversely, suppose lW \n{Lx,..., Ln) = |W 2]]n([[Wi]i(Li), Z,2, • • •, Ln) for all subsets Li, ..., Ln of and Wi 7 ^ a*. If W |{gi) = 0 , then W\{a\} 7 ^ a £ as desired. Suppose now W |{ai} 7 ^ 0, say w is in W |{ai}. Then there exists w' in W l such th at w = io'|{gi). For each 1 < i < n, let Li = {ui} for some Ui in [ 1 By definition, [[«/]]„(«!,..., un) 7 ^ 0- Thus, i .. ., L n) = I W U L i , . . . , Ln) D M n(ux, ..., un) ± 0. i 1 Hence, pFi]]i(.Li) = 7 ^ 0. Then there exists w" in Wi of length len(ui). Clearly, len(w") = len(u\ ) = /en(u/|{o:i}) = len{w). Since w and w" are both in 4 4 j it follows th a t w = w" is in W\. Therefore, IT |{ a i) C W\. Since W\ / a;J, IT |{ a i} ^ as desired. Consider (b). Suppose W \yn/ai] ^ o l{. Let IT2 = JT[V ^/ai]. Obviously, lW }n(Li,. . . , L n) = IW2U W U L i , ■ - ., L n)) for all subsets L\, ..., Ln of Since Wi = W \yn/ax[ ^ a* and [[VT]„ is not an identity, it follows th a t [[JT]„, |IT 2 | i is a non-trivial decom position of [lT jn. Conversely, suppose \W \n(Lu ..., L n) = |[H/2]i([W/1]]n(jL1, . . . , L n)) for all sub sets L i, L n of and W 2 ^ ol[. If IT = 0, then W\Vnl a 1] = 0 / a j as desired. Suppose now IT ^ 0, say, w is in IT. Let Li = X ^ for each 1 < i < n. Clearly, [[u> Jn( L i,. . . , L n) ^ 0. Let v be in ([w ]]n( L i ,. . . , L n). Then len(v) = len(w) by definition. Since H n ( i l5. c I W U L i , • • •, Ln) = I W i U l W i U L n . ■ •, Ln)), v is in HlT2]]i([[lTi||n(jLx, . . . ,L n)). Thus, there exists w' in Wi such th a t len(w') = len(v). Since 1 T 2 C a*, w' = a l ^ w ^ = aqe re ^ = cqen^ = w\Vn/ai}. Since w' is in W 2, w[Vn/ax} is in 1T2, whence W fV ^/ai] C 1T2. Since 1T2 ^ o^, lT [T „/ai] / a\ as desired. □ As seen from the above proposition, the two extrem e cases are rather trivial and thus not considered further. In w hat follows, a merger A i n is said to be restricted decomposable if n > 3 and there exist mergers Ad™1 and Adi2, w ith 1 < n\ < n and n — ni + n 2 — 1 , such th a t A ii, Adi is a, decom position of Ad. (Notice th at this kind of decomposition is always non-trivial.) It will be seen th a t this notion of decom position is adequate for our purpose, i.e., by using this, it will be shown in Theorem 3.3 th a t no finite set of mergers gives all the mergers by composition. In _________________________________________________________________________________45 I order to establish Theorem 3.3, a characterization is presented of when a merger is restricted decomposable. The following notion of substitution plays an im portant role in the characterization. N o ta tio n Let V be a subset of X o, and irq and w2 in Xo- If w\ — a n • • • a ^ j and w2 = W2ool2iW2\ • • • ct2kW2k, where3 io2; is in (Xo — V)* for each 0 < i < k and- ct2i is in V for each 1 < i < k, then let S ub(u;2, wi, V) — { ^ 2 0 ^ 1 1 ^ 2 1 ••• cxikW2k}-' \ Otherwise, let Sub(u>2, Wi, V) = 0. For subsets L\ and L 2 of XL and subset V of i Voo, let S u b (T 2,ii, V) = UW 2 6 L 2 S u b (w2, wx, V). ; I I I Clearly, S u b (u ;2, uq, V) 0 if and only if len(w2\V) — len(wa). Then S u b (u ;2,: I W\,V) consists of the sequence (if it exists) obtained by first changing all occur-j rences of elem ents of V in w2 to one special symbol and then replacing, for each i, th e i-th occurrence of the special symbol in w2 by the i-th elem ent of uq. (For exam ple, Sub(abcdedb, hihi, {6 , d}) = {ahciehi}.) Also, Li C S u b (L i, L \\ V, V) and! S ub(T i[Y /a:], L2, {a}) = Sub(Li,L 2,T) for each a in X*,, subset V of X » and| I subsets Li and L 2 of XL- Using the concept of substitution, the notion of “^-separable” is now defined. As will be seen, th e characterization in Theorem 3.2 is expressed in term s of k-: < separability. j i I I D e fin itio n Let n > 1. A subset L of V* is said to be k-separable for some 1 < k < n I ifSub(L,L| 1 4 ,H ) = L. We first establish th a t the ^-separability of IT is a sufficient condition for a I m erger [VFjn to be restricted decomposable. For this, we need a technical result.j i 3If k = 0, then u> 2 = w2o and u> i = e. ________________________________________________________________________ 46! (In th e rem ainder of this section, we use < 7 $ ley(io < g ) w) as a shorthand for (w C g > M p x S o , ) . ) L em m a 3.3 Let w be in Vn for some n > 1 and u in Then for each subset V of Vn and a in V, 'ir 2C r$i=a('w ® u) = 7 r2< 7 $ 1=o,(i0i ® « i), where wx = w\V and « i = K 2 < r $ i e v ( w < 8 > m). P r o o f Suppose u = ax • • • am and w = flx • • • {3m , where a; is in Soo and /?; is in Vn for each 1 < i < m. Suppose w\ — ■ ■ - Aik- Since ux — 7r2cr$le y(u> ( g > u) and w\ = u;|V, M i = a,ii • • - aik by definition. Let a be in V and u>i|{c*} = ■ ■ • fiijr Then T T 2cr% x= a{wx ® ux) = ■ ■ ■ aijr Clearly, ic|{a} = /3ajl • • • j3ijt = u;i|{a}. Hence, *2<7$i=a(w < g > u) = aih • • • aijt = n2cr$x=a(wi ® Ux). □ We are now ready to establish the sufficient condition for a merger to be restricted decom posable. (In w hat follows, we write w[a/b] instead of ic[A//6] if A! = {a}, and w[ax/bx, ... ,a,k/bk] instead of w[ax/bx\ ■ • ■ [akfbk].) L em m a 3.4 A merger |JT ]n is restricted decomposable if W is ^-separable for some 1 < k < n. P r o o f Suppose IT is a fc-separable regular subset of V* for some 1 < k < n. Then S u b (IT ,IT |I4 ,I4 ) = W. Let W x = W\Vk and W2 = W[Vkf ax\[ak+x! a 2, ..., «n/ctn2]? where n2 — n — k ~f-1. (Intuitively, ITi is obtained by restricting IT to the symbols in 14, and IT2 by replacing the symbols in 14 by ol\ and shifting the other symbols to a 2, ..., a n2.) Clearly, ITi is regular. [Indeed, ITi can be obtained by a general sequential machine (gsm) mapping [HU69] applied to IT. The gsm simply o utputs a if it reads a symbol a in 14 and outputs e if it reads a symbol a not in 14- By Theorem 9.10 of [HU69], Wi is regular.] Similarly, IT2 is also regular. Let L i, ..., L n be arbitrary subsets of It suffices to show th a t ___________________________________________ 17. (a) C [iy 2 ]„!([iy 1f c ( i 1, . . . , X t ) . i n - i , - - - , i » ) , a n d (b) i * ) . i * + i , • • .,£ » ) £ i „ ) . Consider (a). Suppose u is in |lT ]„ .(L i,. . . , Ln). By definition, there exists w in W , w ith len(w) = len(u), such th at (1) (w ® u) is in Li for each 1 < i < n. Let wi = w\Vk and u;2 — w[Vk/ai][ak+i / a 2 , ..., a n/ a n2]. Clearly, w\ is in W\ and w2 is in W2. It is easy to see th a t 7 r2cr$i=ai_2+(fc+1)(u; < 8 > u ) — ^ 2 ^%\=ai{w2 ® u) for each 2 < i < n2. Then by (1), 7 r2cr$i=Qi(u;2 ® u) is in L i-2+(k+i) = L i-k+i for each 2 < i < 7 1 2. Let v — T r2< 7$i= ai(w2 ® u). In order to show th a t u is in iW 2ln.2(lW 1}k(L1, . .., L k), L k+1, ..., Ln), it suffices to prove th a t v is in p T iJ^ X i, • • • 1 L k). Clearly, 7r2cr$leVfc(u; < g > u) = 7 r2cr$1=ai(u;2 ® «)• Thus, v = 7 T 2cr$ieVk(w < g > u). Also, w\ = w\Vk by the definition of w^. By Lemma 3.3, 7 T 2 0 r$i=a;(w;iC>u) = ^ 2 <^%\=ai{w®u) for each o i n Vk- Then by (1), ^ 2cr$1- ai[wi < g > v ) is in Li for each 1 < i < k. Since' wi is in Wi, v is in \W-i\k{Li, ..., L k) by definition. Now consider (b). Suppose u is in p T 2fln2(p T iJfc ( L i,. . . , L h), L k+u ■ ■ ■ > L n). By definition, there exist v in [W iJ^ Z q ,. .. ,L k) and u > 2 in W 2, w ith fen(u>2) = len(u), such th a t (2) v — T r2cr$1=ai(w2 ® u) and (3) 7r2or $i=a-(w2 ® u) is in Li+ k - 1 for each 2 < i < n 2. By (2), len(v) = /en(u>2|{a:i}). Since v is in |W i]](£i, • • •, L k), there exists u>i in W\, w ith len(wi) = len(v), such th at (4) ‘ ^ 2 ^$i=ai('Wi ® w) is in Li for each 1 < i < k. . 48, 38 Let w3 = w2[a2/ a k+i, ■ • •, otn2/ctn}. Obviously, 1^3 1 { } = w^K^i}- Since len(wi) = len(v) = /en(iC2 |{o!i}) = ^n(it>3 |{ a i} ), Sub(tu3, wx, {ai}) 7^ 0. Let (5) w be in S ub(u;3 ,iei, {on}). It is easy to see th a t w is in W. [Indeed, since W\ is in W\ and u > 2 in W 2 , it follows th a t w 1 is in W\Vk and w2 in W\Vk/a\][ak+i / a 2, ... ^an/ a n2]. Thus, u;3 is in W[Vk/ a i}. Hence, Sub(in3, tcj, {a,} ) C Sub(W[Vk/ar], W\Vk, {< *x}) C Sub(VL, W\Vk, Vk). Therefore, w is in S ub(W ,W \V k,Vk). Since W is ^-separable, w is in W.] To show th a t u is in [[L T r]T l(X1, . . . ,L n), it thus suffices to prove th a t 7r2cr$i- ai(w ® u) is in Li for each 1 < i < n. Let 1 < i < n. Two cases arise. (i) 1 < i < k. Clearly, 7 T 2 C r$1=ai (w2 ® u) = K2& % \=ctl{vJz ® u ) by the definition of w3. By (5), the definition of w3 and the fact th a t w\ is in V£, it is easy to see th a t 7 T 2 cr$ 1=cei(ty3 < g > u) = ft2C r$ievk{w ® u ) and wi = w\Vk. Therefore, *2<7ii=ai(w2 ® u) = * 2 <r*ievk(w ® «)• Then by (2), v = 7 r2cr$16V fc(u; < g > u). By Lem m a 3.3, 7r2< x $ 1=a.(u; ® u) = 7r2< r$1=aj(u;i < g > n). Hence, '^2 ^$i=al(w ® u) is in Li by (4). (ii) k < i < n. Clearly, ^ 2 0r $\=ai(w < g > u) = 7r2er$i=ai.(it>3 < g ) u) by th e definition ofj w and the fact th a t Wi is in V£. By the definition of ro3, it is readily seen th a t 7T2<7$i=a, (w3 ® u) = tf2<7$i=crj_fc + 1 («>2 ® «). Therefore, 7 r2« 7 $ 1=ai('u; ® u) = K2 <r$i=ai-k+1(w 2 ® «)• Thus, by (3), 7 r2^$x=a,(^ ® «) is in //(i-fc+ij+jfe-i = U. Com bining the above two cases, 7 r2o'|1_ a .(uJ < g > u ) is in Li for each 1 < i < n as desired. □ 4 . 9 J We next show th a t the ^-separability of W is also a necessary condition for a m erger fW ]n to be restricted decomposable. To do so, we need three technica results. T he first technical result gives a special decomposition of a merger. L e m m a 3.5 Let M.\ = and A42 — |W 2]]n2 be mergers, n = n x + n 2 — 1 and W = S u b (W 2[a2/ a ni+u ..., a „ 2/ a n], Wx, {ai}). Then (A4X, M.2) is a decom position of l W j n. P r o o f Let L x, ..., L n be subsets of E ^ . It suffices to show th at (a) \W \n(L „ . . . , L„) C [W J]|,ll(|[H'1 ]l,,,(L1, . Ln), and (b) Lnt+1, I W U U £„). Consider (a). Suppose u is in |[IT]ra( L i,. . . ,L n). By definition, there exists w in W , w ith len(w ) = len(u), such th at (1) ® u ) is in Li for each 1 < * < n. Let W 2 = W2[a.2/ a n 1+i,... ,a n2/ a n]. Since w is in W = S u b (B /2 /, Wx, {ax}): there exist w2 in W 2 and wx in W\ such th at (2) tn is in Sub(zx;2 , Wi, {oi}). Let w2 — w2[ani+\ / a 2, .. . , a n/a „ 2]. Clearly, w2 is in W2 and wx — w\Vni. Let (3) u = 7 T 2aUeVni{w® u). By Lem m a 3.3, 7 r2o-$1=at(wx (g ) v) = K2< 7 $ 1=ai{w ® u) for each 1 < i < n x. Then by (1), ff2&$i=an(wi < 8 > v) is in Li for each 1 < * < nx. Hence, v is in I W d ni{Ll t . . . , L ni).\ In order to show th a t u is in [H/2]]n2(|W/i l ni( / / i ,. . . , Ln i), Lni+i, • • •, L n), it suffices to show th a t T T 2a$i~ai (w2 ® u) = v and ir2cr$1=ai(w2 < g ) u) is in L„1+;_i for each 2 < i < n2. _____________________ 50 Since w2 does not contain on (2 < i < n\) and wi is in V* , it follows from (2) th a t (4) 7r2cr$1€vni { w 0 u ) = ir2cr$1=Ql (w‘ 2 0 «). By the definition of w2, it is easy to see th a t K2o'$i=c n (w2 0 u) = 7r2a$i=ai (w2 0 u ). Then x2cr$i-ai('w2 0 u) = v by (3) and (4). Since w\ is in W\ C , it follows from (2) th at (5) ^ 2a$1=0lt(w2 0 u) = T r2cr$i-ai(w 0 u) for each ni < i < n. Now let 2 < i < n2. By the definition of w2, it is clear th a t 'ff2cr$1=ai(w2 0 u) = ^ 2ff$i=or„1+i_1 (u>2 ® «)■ Then T T 2< T U=ai{w2 < g > u) = x 2cr$1=ani+._1 (tc 0 u) by (5). Thus, * 2cr$i=ai(w 2 < 8 > u) is in Tni+i_i by (1). Now consider (b). Suppose u is in ffW 2l„ 2(|Wi])ni(Z/i • • •, Lni), Lni+i, ..., L n). Then there exist v in ..., Lni) and w2 in W2, w ith len(w2) — len(u), such th a t (6) ft2(7$i=a,(w2 ® u) 1S in Lni+i-i for each 1 < i < n 2 and (7) V = 7r1cr$1=ai(w 2 0 u). Since v is in [WiJn^Za,..., L ni), there exists Wi in Wfi, w ith len(wi) = len{v), such th a t (8) x 2o r$1= Q ,[(tui 0 u) is in Li for each 1 < i < nx. By (7), len(w2\a\) = len{y). Thus, len{w2\ot\) = len{y) = len{w\), whence Sub(tw2, wj, ay) ^ 0. Let w'2 = w2[a2/ a ni+1, • • •, C K n a /«»»]• Clearly, S u b ( ^ 2)^i, {<Tt}) = Sub(itf2,u h ,a i) ^ 0- 51 Let w be in S u b (tc 2> {cti}). By hypothesis, W = S u b (IT 2 , ITi, {on}). Thus, w isi in S u b ( ^ 2,^i, {«i}) Q Sub(M /2 /, Wi, {cq}) = W . To show th a t u is in [lT ]n(.L i,. . . , Ln), it now suffices to show th at 7 r2<7$j=ai(, w ® w) is in Li for each 1 < i < n. By the definitions of w and w'2, and the fact th at W\ is in V * ^ , it is easy to see th a t T 2cr$i=a i(u/2< g> u) = x 2a$i€vni(yj®u) and 7 r2< T $ 1=ai(w2®u) = 7 r2a$1=ai(w'2®u). Thus, 7 T 2< 7 $ 1 = C 1 1 (w2 < g > u) = ir2tr$leVni(w ® u). By (7), it follows th a t v = ir2< r$leVni(w < g > it). Clearly, w\ = w\Vni. Then by Lemma 3 .3 ,7 r2< 7 $ 1= Q ,.(iy(gm) = T T 2a$i=ai(wi®v) for each a 8 - in V. Thus, by (8), K2cr§1- ai(w < 8 > u ) is in Li for each 1 < i < n\. Let n\ < i < n. It is easy to see th a t *2Cr$i=ai+ni_1(w < % >u) = ir2 < T $i=ai+ ni-i(w 2 ® u ) = 'K 2cr§1=ai(w2®u). By (6), 7 r2< T $ i=0(i+Bi_:l(iu ®u) is in Lt - +Bl_i. Therefore, ir2a$i=ai(w ® u) is in Li. □ The second technical result characterizes when two mergers are equivalent. L em m a 3.6 Mergers [lTi]]n and |IT2 ]]n are equivalent if and only if W i = IT2 . P r o o f The “if” part is obvious. Now suppose |ITi]]n and [IT2]„ are equivalent mergers. Let {«i,...,an} be a subset of Soo and h the m apping from Vn to Sqo defined by /j(cq) = a, for each 1 < i < n. Let Li = a* for each 1 < i < n. Clearly, lW jn(Lu ... ,L n) = h(W) for each subset W of V*. [Indeed, let IT be a subset of V* and u in \ W \n{Lx,... ,L n). By definition, there exists w in IT, with len(w) = len(u), such th a t 7t2< t$i-ai(w ® u ) is in Li = a* for each 1 < i < n. It is easy to see th a t u(j) = ai if w(j) = a , ■ for each 1 < j < len(w). Then u (j ) = h(w(j)) for each 1 < j < len(w), whence u — h(w). Since w is in IT, u is in h(W). Conversely, let u be in h(W). Then there exists re in IT such th a t h(w ) = u. Clearly, 7t2<t$1=ai(ty® u) is in (^(a,))* = Li for each 1 < i < n. Thus, u is in lW }n(L i, ..., Ln).} Since |[Wi]]B and \W2\ n are equivalent, [[ITi|]n( L i ,. . . , L n) = 52 . . . , L n). Then h(Wx) = {W x U L u • • ■ , Ln) = \W 2l n(Lu . . . , L n) = h(W2). Hence, W x = h -\h{W x)) = h~l (h(W2)) = W2 as desired. □. T he th ird technical result involves relationships among sets defined by using S u b . L em m a 3 .7 Let V, V' and V " be subsets of such th a t (V " — V) D V' = 0. Then for each subset L\ of V'* and L 2 of V"*, (a) S u b (L 2, L U V)\V' C L x and (b) S u b (S u b (T 2, L x, V), Li,V ') C S u b (T 2, Lx,V). P r o o f Consider (a). Suppose w is in Sub(T 2, L ll V)\V'. Then there exists w' in Sub(X 2, L\, V) such th a t w'\V' = w. Since w' is in Sub(X 2, Lx, V), there exist w2 in L 2 and Wi in L x such th a t w' is in Sub(u;2, wx, V). Then w2 = w2oa21w2x ■ ■ ■ a 2kW2k for some & > 0, where w2i is in (V" — V )* for each 0 < i < k and a 2i is in V for each 1 < i < k. Since Sub(cc2, u> x, V) ^ 0, len(w2\V ) = len{wx). Since len(wi) = len(w2\V) = k , there exist cm , ..., a u in V' such th a t4 W\ = a n • • • cm . By definition, w' = w2G aiiw 2i ■ • • cm W2k- Since w2i is in {V" — V)* for each 0 < i < k, an is in V' for each I < i < k and (V" — V) D V' = 0, it follows th at w'\V' — a n - • • cm- Thus, w'\Vr = w\. Therefore, w = w'\V' — wx is in Lx as desired. Now consider (b). Suppose w is in S u b (S u b (L 2, L x, V), Lx, V'). Then there exist wx and w'x in Lx, w2 in L2 and w'2 such th at (1) w'2 is in Sub(u>2, w1: V) and (2) w is in Sub(u?2 , V'). 4Recall that u>x = £ if k = 0. _________________________________________________________________________________5.3. Let w2 = W2 0C X 2 1W2 1 • • • ci2kW2k for some k > 0, where w2i is in (V " — V)* for each 0 < i < k and a 2i is in V for each 1 < i < k. Since Sub(u>2 , w\, V ) 7^ 0, len(wi) = len(w2\V). Clearly, len(w\ ) = /en(u!2 |V) = k. Therefore, there exist « 2i, ..., O L 2k in V' such th a t W\ — a n • • • a n • By (1), w'2 — w2o«iiW 2i • • • « ikW2k- Since w2i is in (V " — V)* for each 0 < i < k, (V" — V) D V' = 0 and an is in V' for each 1 < % < k, it follows th a t w ^ V ' = a n • • • &ik- Since S ub(u;2 , w'x, V ') 7^ 0, Zen(iUj) — len(w 2 |V ') = k. Thus, there exist a 3i, ..., a 3 lt in V' such th a t w[ = a 3i • • - a 3fc . Clearly, W2oce3iW2 i ■ ■ ■ oi3k ^ 2k is in S ub(tn 2 , w[, V). By (2) and the fact th a t Sub(iC 25 wi, V) contains at m ost one sequence, w = W2 0a 3 iw2i • ■ • < y3kw 2k- If is also clear th at W2 0C X 3 1 W2 1 • ■ • a3kW2k is in S ub(io2, , V). Hence, w = u> 2oa 3iu>2i • • • a 3k ^ 2k is in S u b (ty 2) V). Since w2 is in L 2 and w[ is in Zq, S ub(ty2, V) C S u b (Z 2, Zn, V). Thus, w is in Sub(Z/2, Zq, V) as desired. □ We are now ready for the necessary condition for a merger to be restricted decom posable. L e m m a 3.8 If [[W]n is restricted decomposable, then W is ^-separable for some 1 < k < n. P r o o f Suppose [fV]n is a restricted decomposable merger. Then there exist mergers IIVi]]jt and where k + n2 — 1 = n and 1 < k < n, such th a t [IV]n(Z q,. . . , L n) = W U d W i M L i , ■ ■ • 1 Lk), Lk+1, ■ • ■ , L n) for all subsets Zq, ..., L n of £ ^ . Let W ‘ = S u b (W 2 [0 2 / 0 ^ 1,..., a n2 / a n], W1 ? { a i} ). By Lemm a 3.5, l W ' U L u ..., Ln) = [W2ln2([WiJft(Li,..., L k), L k+1,..., L n), 54J whence [[IU 'J^ L i,. . . , Ln) = |f F ] n(j[a ,. . . , L n) for all subsets L t , ..., L n of By Lem m a 3.6, VF = W '. It therefore suffices to show th a t W ' is ^-separable. Clearly, W ' C S u b (W ', W'\Vk,Vk). Conversely, S u h (W ',W '\V k,Vk) = S u b (S u b (W 2[a2/ a fc+i , .. . , a n2/ a n], Wi, {an}), S u b (W 2[a2/ a fc+ 1 ,..., otn2 / a n l W u i a ^ V ^ V k ) , by the definition of W ' C S u b (S u b (W 2[a2/a jt+ i,. . . , a„2/ a n], W u {« i}), W u Vk), by Lemma 3.7 (a) C S u b (W 2[a2/ a fc+i , .. .,a „ 2/ a n], W1, {ai}), by Lem m a 3.7 (b) = W', by the definition of W '. Therefore, W ' = S u b (W ', W'\Vk, Vk) as desired. □ Com bining Lemm a 3.4 and 3.8, we are now able to obtain a characterization of when a merger is restricted decomposable. T h e o re m 3.2 A merger |W jn is restricted decomposable if and only if W is k- separable for some 1 < k < n. Using Theorem 3.2, we are now able to establish th a t the set of m ergers has no finite “generators.” T h e o re m 3.3 There is no finite set of mergers which yields all m ergers by compo sition. P r o o f For each n > 3, let Wn = (aia2Ua2a 3U-• • U a n_ ia n)* and ton = |W n]]n. We shall show th a t there is no integer n > 3 such th a t ujn is restricted decomposable. ___________________________________________________________55- Let n > 3 and 1 < k < n. Obviously, Wn\Vk = ( a i « 2 U • • •ak-iQk Uc^)*. Hence, a*k Q W n\Vk. Thus, S ub(W n,acl,Vk) C Sub(W n,W n\Vk,Vk). Since k > 1, { a i,a 2} Q Vk- Clearly, (aia2)* U Wn. Thus, Sub((a:ia:2 )*, «£, Vk) V. Sub(TTn, Wn\Vk, Vk)- Also, it is easy to see from the definitions of S u b and Wn th a t S u b ^ au o ^ )* , 14) = (c^afc)* and (ctkOLk)* is not a subset of Wn. Therefore, S ub(W „, Wn |V fe, Vk) is not a subset of Wn. Hence, Wn is not k-separable and ujn is not restricted decomposable. Let f tb e a 1-1 hom om orphism from Vn onto Vn. Using a sim ilar argum ent, it is easy to see th a t [[/i(Wn)]]n is not restricted decomposable. Suppose there exist mergers |VU']ra i and [ [ U U r//]]T l2 , w ith rii < n and n2 < n, such th a t ujn = [[ITnJn is restricted decomposable into |[VU'|ni, Obviously, there exist 1-1 hom om orphism s hi and h2 from Vn onto Vn such th a t lhx(Wn)\n{Lu . .., L n) = lh2{W")\m{ l W \ ni{Lu . . . , L ni), L ni+i, . . . , L n) for all subsets L\, ..., Ln of Thus, \h{Wn)Jn is restricted decom posable, which is a contradiction. Therefore, there are no mergers Ad”1 and AdJ2, with ni < n and n2 < n, such th a t ion is restricted decomposable into Adi, Ad2. Now suppose there exists a finite set T of mergers which yields all mergers by com position. Since T is finite, there exists n > 1 such th a t each merger in J- is of arity less th an n. Then u > n is not in T . Thus, there exist Ad”1, ..., Ad”* (k > 1) in T such th a t u > n is restricted decomposable into Adi, • • • > -Vik- Clearly, Ul _ |_ . . . nk — k -f 1 = n. Then there exists i such th a t n 4 - > 1. Since Ad; is in T , 1 < rii < n. By Theorem 3.1, it is easily seen th a t there exist mergers Ad”” and Ad 122 such th a t ton is restricted decomposable into A dn, Ad 12 and n u < n and 56 n \ 2 < n. This is a contradiction. Thus, such a finite set JF of mergers does not exist. □ Finally, we note th a t it is decidable to determ ine w hether a merger is nontrivially decom posable. Indeed, by Theorem 3.2, to see w hether |VFJ„ is restricted decom posable, it suffices to test whether W = Sub(tF, W\Vk, Vk) for some 1 < k < n. Since IF is a regular set, Sub(lF, IF |14 5 Vk) is regular and effectively constructible from W (proofs om itted). This in turn implies th at it is decidable w hether W = Sub(IF, T dF |V fc, Vk) given W and 14. Furtherm ore, it follows from Proposition 3.2 th a t it is decidable to determ ine whether a merger is nontrivially decom posable in the two extrem e cases. 3.3 Decomposition of Extractors The purpose of this section is to establish the fact th a t there is no finite set of extractors which yields all extractors by composition. Analogous to the last section, we do this through a study of decomposition of extractors. We sta rt with the notion of decomposition of extractors. D e fin itio n A decomposition of an extractor iti§W\~l is a list of m > 2 extractors ’’ ■iilW'il"1, ..., K in P V J - l such th at *,IW\]-Hi) = for each subset L of S ^ ,. An extractor is said to be decomposable if there exists a decom position of it. Note th a t each extractor 7T * dFF] — 1 is equivalent to an extractor in “canonical” form. Specifically, an extractor tTjIIF] ” 1 is said to be in canonical form if n = 2 , 57 W C V 2 * and i = 1 . For each extractor ^ [ W ] " 1, let hi be th e hom om orphism from V* to V2 * defined by hi(ai) — cni and hi(ctj) = a 2 f°r each j 7^ i. Clearly, 1 is equivalent to TTilbF]” 1. In this section only, each extractor discussed is assumed to be in canonical form. Canonical form extractors are denoted by £, possibly w ith subscripts. Similar to mergers, we have the following result about extractors. L e m m a 3 .9 Let £ = ttiHW JJ1, £ 1 — and £2 = TlHHVIJ1 be extractors. T hen £\, £ 2 is a decomposition of £ if and only if W = Sub(VFi, W 2, {cci})- P r o o f It is easily seen th a t the following assertion is true: (1) Let wi, w2 and w be in Voo*, w ith len{w2) = len{w\\ot\) and len(w) = len(w 1). Then 7r2cr$1:=ai(w2 ® 7 r2cr$1=ai(wj ® u)) — 7 T 2cr$i=ai(w < g > u) for all u in if and only if w is in S u b (u q , to2, {cti})- Consider the “only-if” p art of the lem m a, i.e., suppose £ 1 , £2 is a decom position of £. Suppose further th at w is in W. Let u be in S ^ n(u/) and v = '^2 cr$i=ai(w ® u). By definition, v is in £({«}). Since £({«}) = £2(£i({u})), v is in £2{£\({«}))■ Thus, there exist Wi in an(J W2 in pden(7 r 2 < T *i=ai(w i®u)) such th a t v = 7r2cr$1_ Q ,1 (w2 ® 'x2v§i=ai{wi ® w))- Then w is in S u b (u q , w2, {«i}) by (1) above and hence in S u b (W i, W2, {0 1 }). Therefore, W C S u b (W u W2, {cq}). Conversely, suppose w is in S ub(W i, W2, {c*i})- Then there exist uq in W\ and w2 in W 2 such th a t w is in Sub(it>i, w2, Let u be in S ^ . Two cases arise. (i) leniu ) / len{w). By definition, = 0 Q fli[[W ]|j1({m}). (ii) len{u) = len(w). Let v = w2cr$i-ai(w 0 u ). By definition, v is in tti[Jio]2 1({w}). Since w is in S u ty u q , tc2, {«i}), v = 7r2< 7 $1= ai(w2 < 8 > ^ 2 ^ = ^ («h w)) by (1) above. Thus, v is in £2(£ \({«})) = e({w}). Hence, 7r2 |w;]|j1({w}) Q e({w})- 58 Com bining the above two cases, it follows th at 7r2 |[tuJ2 *(-£) C 7r2 []W]2 *(£) for each subset L of E ^ . Let u be in E^,nh") such th a t no elem ent appears in u twice. Obviously, ( 2 ) 7r2cr$1=ai(ic < g ) u) 7^ 7r2crSl=ctl(w' ® u ) for each w' in V? w ith w' = /L w. Let v = 7 r 2cr$1 = < :v i(ti; ® u). Then v is in 7 ^ 2 [ [ ’ w]T1 ({^}) hence in 7r1 [[W 'r| j 1({it}). By definition, there exists w' in W such th at v = < g ) u ). Thus, wf = w by (2). Hence, w is in W , whence Sub(H /i, W’ 2 , {«i}) C VL. Consider the “if” p art of the lemma, i.e., suppose W = Sub(W i, W2, {c*i})- Suppose further th a t I is a subset of E ^ and v is in S(L). By definition, there exist u in L and w in W , with len(w) = len(u), such th a t v = 7r2cr$1=ai(w ® it). Since W = S u b (lT i, H 2 , {ctj}), there exist uq in Wi and ic2 in W 2 such th a t w is in Sub(itq, w2, {<*i})- By (1) above, v = 7r2o-$i=ai(u> 2 ® 7r2cr$ 1= C ( 1(tt;1 < g > it)). Thus, v is in £2 ( £ 1 (-£)), whence £(L) C £2(£i(L)). Conversely, suppose v is in £2 (£ i(£))• Then there exist u in L, uq in W\ and u>2 in W ’ 2 , w ith len{w\) — len(u) and /en(u;2) = /en(a$1=ai(uq ® it)), such th a t v = K2 cr§1=ai(iV2 < S > 7r2< T $ 1=ai(uq ® it)). Since W = S u b (W i, W2, {aq}), there exists w in W such th a t w is in Sub(uq,u?2, {0 1 }). Clearly, len(w) — /en(uq) — len(u) and, by (1) above, v = 7r2cr$1=ai {w ® it). By definition, v is in £(L), whence £ 2 (£i(T )) C £(L) as desired. □ Let £ 0 = Clearly, £0(L) — L for each subset L of E ^ . Because of this, each extractor equivalent to £ 0 is called an identity extractor, or simply identity. Clearly, £0 , £ is a decomposition of each extractor £. Therefore, all extractors are decomposable according to the above notion of decom position. To have a “meaningful” definition, identities should be excluded from decom positions 59 of extractors. It turns out, however, almost all extractors are decom posable even when identities are excluded. In order to show this, the following notation is needed. i N o ta tio n For each subset W of Vco* and k > 1, let W </c = {w € W \len(w ) < k} and W>k = {w G W\len(w ) > k}. We are now ready for the following. P r o p o sitio n 3.3 Let m > 1 and £ = ttiIW ]^1 be an extractor such th a t 0 ^ W <m ^ (a*)<m and 0 7^ W>m ^ (a*)>m. Then there exist nonidentity extractors £\ and £2 such th a t £*, £2 is a decomposition of £. P r o o f Suppose len(w\ ) = k, len(w2) = to and k < m. Let W % = > to} and W 2 = W>m U {o:^|0 < j < m — 1}. Then it is clear th a t 1(L) = tti[[W2 |2 1(7ri[[IFx]]i(L)) for each subset L of S ^ . Obviously, neither nor 1 is an identity. □ The above proposition shows th a t each extractor can be decom posed into two “p a rts” , the first dealing with “short” sequences and the second w ith “long” se quences. To obtain a definition which avoids this kind of “trivial” decom position, th e following relation is introduced on the set of extractors: N o ta tio n Let £>; denote the set (S^0)>,- for each i > 0. For each k > 0, let be the relation on the set of all extractors defined by £i S2 if £\{L) fl £ > f e = £2{L) f l for each subset L of For extractors £\ and £2, let £\ ~ S2 if S2 for some k > 0. It is easily seen th a t if £\ S2, then £\ £ 2 for each k' > k. Also, ~ is a reflexive, sym m etric and transitive relation, i.e., an equivalence relation. For each 60 extractor £, let {£}~ denote the set {£']£' ~ £}. By definition, for each extractor £ in {£0}~, there exists k > 0 such th a t £{L) fl £ > f c = £0(L) fl £ > f c = L C I X>f c for each subset L of X ^ . Each extractor in the set {£o}~ is called a pseudo-identity. We are now ready for our notion of trivial decomposition. D efin itio n Let £ be an extractor. A decomposition £x, ..., £m of £ is said to be trivial if either £ ,- is a pseudo-identity or £* ~ £ for some 1 < i < m. We now present some extractors having only trivial decom positions. Using these extractors, we shall show in Theorem 3.4 th a t there is no finite set of extractors which yields all extractors by composition. First, we need two prelim inary results. L e m m a 3.1 0 Let k > 0, W>k — {tc € V2 \len(w\ax) > k}, £x = tti [[W7 ]]^1 and £2 = KXI W '% 1- Then £x ~ k £2 if and only if W> D W$k = W " fl W$k. P r o o f (Only if) Suppose £x ~ k £2. Let w be in W ' fl W>k and I — len{w). Since w is in len(w\oci) > k. Clearly, I — len(w) > len(w\a.i) > k. Let u — ax ■ ■ ■ at, where ax, . .. , ai are distinct elements in Xqq. It is easy to see th a t 7Tx[['toji"1 C^) is not empty. Let u' be in 7Ti[u;]2 1(u). Since ax, ..., ai are all distinct, it follows from th e definition of extractors th a t (1) for each 1 < i < /, w(i) = a x if a,- is in u' and w(i) — o 2 otherwise. Since w is in W \ u' is in [W /]]2 1({u}) = £i({«}). Clearly, len(u') ~ len{w\a\) > k. T hen u' is in £x({u})nEv> f c . By hypothesis, £i £2, i.e., £x({u})n£> jk = £2 ( M ) n E>fc. Then u' is in £2({u}) fl and hence in £2({u}). Therefore, there exists w' in W " such th a t u' is in 7 T x [[ri)/]]j1(u). Since the elements in u are all distinct, it follows from the definition of the extractors th at for each 1 < i < /, w'(i) = ax if ai is in v! and w’(i) = a 2 otherwise. Com paring this w ith (1), it is clear th a t 61 w' = w. Thus, w is in W " since w' is. Hence, W T\W > k C W " C\W>k. By symm etry, W " n W $k c m W$k, whence W ' n W$k = W " n W$k. (If) Suppose W ' fl W>k . = W " fl W>k. Let L be a subset of and u' in £\{L) fl E>fc. Then there exists u in L and w in W ' such th a t u' is in tti[u ;]J1({u}) fl £>*,. By definition, len{u') = len{w\ax). Since u' is in £>*., len(u') > k. Hence, len{w\ai) > k. Thus, w is in W ' H W>k. By hypothesis, w is in W " fl W>k and hence in W " . Therefore, u' is in ttx [in] 2 1({u }) Q Since leri(u') > k, u> is in 7 T , I W '% \ L ) fl and hence C £*>* C 7nI W '% \ L ) n £*>f c . By sym m etry, it follows th at 7rilVL'J J 1(L) fl £ > f c = 1(L) fl £ > fc . Then £x ~ k £2 by definition. □ L e m m a 3.11 Let £(d — 7r1[c4«2<2i']]2 ’ .1 f°r each i > 0 . If £ is in {£(f e l}^ for some k > 0 and £ 1 , £2 is a decomposition of £, then either £2 ~ £ or £2 is a pseudo identity. P r o o f Let & , £, £x and £ 2 be as in the hypothesis. Assume £ = ttiIJW JJ1, £ 1 = T rilV n j 1 and £ 2 = ttiI W '% 1. By Lemma 3.9, W = S u b (lF ', W '\ {<*i}). Since £ ~ £(*), there exists m > 0 such th at £ ~ m £(k\ Clearly, £ £ ^ for each m' > m. Let N — m a x {m , k) -f 1 and (1) N k = N - k . For each i > 0, let = {u> G V2 \len{w\ax) > i}. By Lemma 3.10, W fl W>N = (aka 2a x ) D W>N since £ ~ N £(f c ). Clearly, {a\a2a x ) fl W>N = ( a j 0 2o:i')>Ar-f-i = (a$a2af)> Nk+k+1 = a^a2a x ka^, i.e., (2) W fl W$N = a \ a 2a xka\. Let n > N k be an arbitrary integer and wn = a ka 2a™ . Since n > N k, /en(ion|a:i) > k + N k = N and wn is in a ka 2a ^ ka*. By (2), wn is in W — S u b (IF /, W " , {cti}). 62 Thus, there exist W\n in W ' and w2n in W " such th at wn is in Sub(u>in, u> 2n, {au}). It is easily seen th a t either (*) win — c*i+1+" and w2n = a^a2a^, or (**) w^in = a i a 2ai and w2n = a i +n. [Indeed, suppose w\n does not contain a 2. Then w\n is in cv^. Hence, w2n = wn = o ^ a ^ i and win — Q'i+1+n, i.e., (*) holds. Now suppose win contains a 2. Then a.2 does not appear in w2n since otherwise, wn contains at least two occurrences of < 2 2 , a contradiction. Therefore, w2n is in cc*. Hence, iuin — wn — a^a-zcc™ and w2n — cti+n, i.e., (**) holds.] Two cases arise. (i) There exists K > Nk such th a t (*) holds for K , i.e. n = K , i.e., w\k — a i +1+i’ ': and w2k — cxja2a f . Suppose (**) holds for K + 1, i.e., Wl(a'+i) = ct\a2a ^ +l. Let w = a.\a.2a 2a ^ . Clearly, w is in S u b (w 1(^-+1), w2k , {<* 1 }) Q Sub(VLr', W " , {ou}) = W . Since len{w\a,\) — k + K > k + Nk, len(w\a.\) > N by (1). Then w is in H7 ^ . By (2), w is in a \ a 2a ^ a \, a contradiction. Thus, (**) cannot hold for K-\-1, whence (*) holds for K + 1. By induction, (3) (*) holds for each i > K. Let K ' = K + k -f 1. Since K ' > K and (*) holds for each i > K , (*) holds for each i > K ' . Let w be in W fl W>K,. By (2) and the fact th a t K ' = K + k + 1 > Nk + + 1 = N + 1, there exists t > 0 such th at w = a\. Since w is in W$K,, k + N k + t > K'. Hence, N k + t > K ' - k > I< + 1. By (3), (*) holds for Nk + t, i.e., w2(Nk+t) = cx.\ot2a ^ k+t = w. Thus w is in W " . Since w is also in W>K,, w is in W " fl W>K,. Hence (4) w n w$K , c w" n w;K ,. 63 Suppose there exists w" in {W"C\W>K,)—{WC\W>K,). Let I = len{w"). Since w" is in W "nW Z K„ I > len(w"\oti) > K'. Since l - k - l > K ' - k - 1 = K + k + l - k - 1 = K , it follows from (3) th a t (*) holds for l — k — 1, i.e., wx^ k - i ) = a k+1+l~k~1 = a[. Then a[ is W ' since wx^^k- i) is in W '. Thus, S u b (a j, w", {«i}) C S u b (lT /, W " , { ai} ) = W. Clearly, w" is in S u b (al 1,w> > , {ai}). Hence, w" is in W . Since w" is in VF>fc, it then follows th a t w" is in W fl JT>A,, a contradiction. Therefore, (W" D W>K,) — {W fl W$K,) = 0, i.e., W " n W$K, C W n W$K,. Combining this and (4), W " n W$K, = W fl W>K,. Then £2 ~ £ by Lemma 3.10. (ii) For each K > T V * ,, (*) does not hold at K. Thus, (**) holds for each K > Nk, i.e., W2K — ctk+K for each K > Nk. Since w2k is in W " for each K , it follows th a t a k+K is in W " for each K > Nk, i.e., a \+Nka\ is in W " for each i > 0. Therefore, a k+Nka* C W ". By (1), N — Nk + k. Then a ^ a x C W " . Hence, o f a* = o f a* D W$N C W " H W$N. Let W0 = a ” a*. Then ( 5 ) w0cw"nwzN. Suppose there exists w" in ( W" W> N) — Wo- Since w" is in W>N, len(w"\ai) > N. Then len(w") > N. Hence, there exists / > 0 such th a t len{w") — I + N. Let V = I + N - k. Then len(w") = V + k. Clearly, V > N - k. Then by (1), I' > Nk • By hypothesis, (**) holds for V, i.e., wu> = a ka 2a 1 ^ is in W*. Thus, Sub(it>x//, w", {e*i}) 7^ 0. Let w be in Sub(u;m , w", {cti}). Since wu< is in W ' and w" is in W", it follows th a t w is in S u b (H r/, W", {< * 1 }) = W. Since w" is not in W q = and len(w") > N, it is easily seen th a t a 2 occurs in w " . Hence, a 2 occurs in w at least twice since a 2 appears in w x//. On the other hand, it is easy to see th a t /en(zn|cti) > len{w"\ax). Then len{w\ax) > N, i.e., w is in W>N. Since w is also in W , w is in a ka 2a^a^ by (2). This contradicts the fact th a t a 2 occurs in w a t least twice. Therefore, (V F"nW>N) — W q = 0, i.e., W"C\W>N C W q = a.\C\W>N. 64 Com bining this and (5), W " fl W>N = a* fl By Lemma 3.10, ttiJVF" ] ] ^ 1 ~ xijjo;*]]^'1, i.e., £ 2 is a pseudo-identity. □ Let i > 0 and £(d be the extractor defined in the previous lem m a. It is easily seen th a t each decomposition of each extractor in {£(d}„, is trivial. Indeed, let £ be in and £1 , ..., £m a decomposition of £. By Theorem 3.1, there exists an extractor £' such th a t £'{L) — £m- i {• • *£i (£ )) for each subset L of Thus, £(L) = £m(£'(L)) for each subset L of i.e., £', £m is a decom position of £. By th e above lem m a, either £m is a pseudo-identity or £m ~ £. Thus, £ 1 , .. ., £m is a trivial decom position of £. Hence, each decomposition of £ is trivial as desired. Before presenting the m ain result of this section, we have the following: L em m a 3.1 2 Let £x and £ 2 be extractors and £ ' a pseudo-identity such th a t £ i(L ) = £ '(£ 2 (T)) for each subset L of Then £\ ~ £2. P r o o f Let L be a subset of Since £' is a pseudo-identity, there exists k > 0 such th a t £ ,(£2( L ) ) n E |f c = £0 (£2( L ) )n £ |* = £2( £ ) D £ |fc . Since £\{L) = £ '(£ 2(X)), it then follows th a t £i(L ) fl = £ '(£ 2 (L)) P i St.*. = £ 2(L) fl £ > fc . By definition, £1 ~ £2. □ We are now ready to show th a t the set of extractors has no finite “generators.” T h eo rem 3.4 There is no finite set of extractors which yields all extractors by composition. P r o o f Suppose there exists a finite set T = {£j,..., £mj of extractors such th at each extractor is a composition of some extractors in T . For each * > 0 , let £ 6 ) be the extractor defined in Lemma 3.11. It is readily seen th a t there exists k > 0 such th a t { £ ^ } ~ fl .F = 0. To see this, suppose (£ (f c )}^ fl T ^ 0 for all k > 0 . 65 For each k > 0, let S\k be an extractor in f l T . It is easy to see th a t n {£00}^ = 0 if i = £ j. [Indeed, let i and j be two distinct integers. Suppose n ^ 0. Then there exists an extractor £ such th a t £ £(*') and £ ~ £ (^ . Since ~ is an equivalence relation, £ti\ Thus, there exists I > m a x { i,j} + 1 such th a t £b) £U)? i.e., £b)(T) f l £>; = £ ^ ( L ) f l S > ( for each subset L of Now let a\, ..., eq+i be distinct elem ents of Soo; u = a\ - • • a/+i, U-i = ai • • • a,-at-+ 2 • • • « / + 1 and u^j = dj ■ • • ajaj+2 • • • Q /+i- Then {«-;} = and {« -j} = £(-d({u}). Since /en(u_8 ) = len(u-j) = /, {u_;} = n St,; and {u -j} = £ ^ ( { M }) fl St,;. Since £b) ^ £(i)? it follows th a t {«-;} = a contradiction. Therefore, {£b)}^ n {£fd}^ = 0 ] ft follows th a t {£ik\k > 0} is an infinite set. Furtherm ore, {£\k\k > 0} is a subset of T since £\k is in T for each k > 0. Thus, {£\k\k > 0} C {£ i,... ,£m} and there exist i ^ j such th a t both £\ i — £i an d £\ j — £i for some 1 < / < m. Therefore, £n = £ij. Since £i i r^ J £b) and £\j ~ £(*\ it follows th a t £b) ^ £0')? a contradiction. Thus, there exists a positive integer, say k , such th a t {£(*)}„ D T = 0. For each n > 1, let V n be the following statem ent: There exist £2i, •••, ^ 2n in T and £ in {£(*)},s, such th a t £{L) = £ 2 n(’ • • ^ 2\{L)) for each subset L of By induction on n, it can be shown th a t Vn is not true for each n > 1. Indeed, suppose Vi is true. Then £ is equivalent to £2 X , whence £ is in T . This contradicts th e fact th a t = 0. Hence, V\ is not true. Suppose Vn is not tru e for each 1 < n < N. Continuing by induction, assume th a t V n is true. By Theorem 3.1, there exists an extractor £' such th a t £'{L) = £2 (n-\){• • • £2 1 (L)) for each subset L 66 of E ^ . Therefore, £{L) = £2n {£'{L)) for each subset L of E ^ . By Lem m a 3.11, two cases arise. (i) £ 2n is a pseudo-identity. By Lemma 3.12, £ ~ £'. Thus, £' is in Since £'{L) = £2(a/-i)(- • ■ £2\(L)) for each subset L of E ^ , it follows th a t V n - i is true. This contradicts the induction assum ption. (ii) £2n ~ £■ Thus, £2n ~ ~ £^k\ i.e., £2n is in {£(*)}„,. Since £2n is in T , £2n is then in {£W }~ fl T . This contradicts the fact th a t {£(k^}~ ft T = 0. Since both cases yield a contradiction, V n is not true. Therefore, V n is not tru e for each n > 1, i.e., there is no £ in { £ ^ } ~ such th a t £ is a com position of the extractors in V . Hence, there is no finite set T of extractors which yields all extractors by composition. □ In passing, we note the following two related open problems: (i) Characterize those extractors whose decompositions are always trivial. (ii) Is it decidable w hether an extractor has only trivial decom positions? 67 Chapter 4 t i i i Rs-Operations on an Extended Relational Data Model i i i i i In this chapter, we extend the relational d ata model [Cod70] to include sequences. ; We then exam ine the use of rs-operations in queries over the extended d a ta model. In j particular, we construct an algebraic query language, “s-algebra,” and a calculus-like | query language, “s-calculus.” (These two languages are extensions of th e relatio n al. algebra and relational calculus, respectively, with rs-operations added to handle \ sequences.) Finally, we show th a t a “safe” subclass of the s-calculus is “equivalent” | I to the s-algebra. < ! 4.1 An Extended Relational Data Model i In this section, we extend the “traditional” relational d a ta m odel [Cod70] to include sequences. In later sections of this chapter, we define two query languages over this i extended d ata model. I I To m otivate the discussion, consider the following: | I E x a m p le Figure 4.1 describes the tour schedules of a travel agency. The numbers j in column T o u r_ N o are the identifications of the tours and the numbers in column j COST the corresponding prices. For each tour, the list in column CITY specifies the cities to be visited, and the lists in columns ARRIVAL and DEPARTURE show the arrival and departure dates of these cities. Note th a t the order in which th e cities are to be visited is significant. T o u r_No City A rrival D eparture C ost 356 New York A tlanta Miami 3/14/90 3/16/90 3/18/90 3/16/90 3/18/90 3/20/90 1004 456 Los Angeles Santa B arbara San Jose San Francisco Portland Vancouver 3/18/90 3/20/90 3/22/90 3/24/90 3/27/90 3/28/90 3/20/90 3/22/90 3/24/90 3/27/90 3/28/90 3/30/90 1409 556 San Francisco San Jose Los Angeles San Diego 3/21/90 3/23/90 3/24/90 3/29/90 3/23/90 3/24/90 3/29/90 3/31/90 699 Figure 4.1: Tour schedules. It is easily seen th a t the only difference between th e table in Figure 4.1 and a “typical” relation (of the relational d ata model [Cod70]) is th a t sequences of valuesj appear in the cells of th a t table while only “atom ic” values are allowed in cells of a relation. In the following, an extension to the relational d a ta m odel is presented, which allows us to describe such tables as th a t in Figure 4.1. Traditionally, a tuple in the relational d ata model is defined as a m apping from a finite set of “attrib u te nam es” to a set of “values” such th a t each attrib u te nam e is m apped into a pre-assigned “dom ain” for it [Cod70, Mai83, U1188]. However, in order to simplify our discussion of the extended d a ta m odel and its two query __________________________________________________________________________ 6 . 9 languages, it is assumed here without loss of generality th a t (1) a tuple is a fixed list, i.e., a m apping from {1,..., n} for some n to a set of (sequence) values; and (2) there is a “universal” domain from which all d ata values are drawn. Formally, let U be a non-em pty set of elements called atoms (sometimes called atomic values). Atoms are usually denoted by a and b etc., possibly subscripted. For each n > 0, a m apping t from {1,... ,n} to U* is called an (n-ary) sequence tuple, abbreviated tuple, and is custom arily w ritten in the form of (ux, ... ,u n), where m = t(i ) for each 1 < i < n. (It is understood th a t if n = 0, then {l,...,n} denotes th e em pty set and (iq, . . . ,u n) denotes the 0-ary sequence tuple ().) Thus, (ab, cbbc, bcab) is a 3-ary tuple. For each n > 0 and finite set I of n-ary sequence tuples, (n, I) is called an (n-ary) s-instance and abbreviated I when n is understood. For exam ple, (2, {(a, 6)}) and {(ab, cbbc), (bba, abca)} are both binary s-instances. E x a m p le Consider the table in Figure 4.1 of the Introduction. One of the tours described is as follows: 356 New York 3/14/90 3/16/90 1004 A tlanta 3/16/90 3/18/90 Miami 3/18/90 3/20/90 It is understood th a t “356” , “New York, A tlanta, M iami” , “3/14/90, 3/16/90, 3 /1 8 /9 0 ” , “3/16/90, 3/18/90, 3/20/90” and “1004” are all sequences. For example, “356” is a sequence of length 1 and “New York, A tlanta, M iam i” is a sequence of length 3. For aesthetics reasons, the sequences in the above table (and in Figure 4.1) are w ritten vertically instead of horizontally. Thus, the above table represents th e following 5-ary tuple (if the above sequences are assumed to be in U*): ( “356” , “New York, A tlanta, M iami” , “3/14/90, 3/16/90, 3 /1 8 /9 0 ” , “3/16/90, 3 /1 8 /9 0 ,3 /2 0 /9 0 ” , “1004”). Also, th e table in Figure 4.1 (ignoring the column names) is clearly a 5-ary s-instance. _________________________________________________________________________________ 70 Finally, let 72 be a nonem pty set of elements called s-relation names. S-relation nam es are usually denoted by R, possibly subscripted. Let arity be a m apping from 7Z to th e positive integers. For each R in 72, the integer arity (R) is called the arity of R. Each finite subset of 72 is called an s-database scheme and usually denoted by D , possibly subscripted. Each mapping Id from an s-database scheme D to the set of all s-instances, where I d (R) is an arity(R)-ary s-instance for each R in 77, is called an s-database instance (of D). The d ata m odel defined above is a simple and n atural extension of th e relational d a ta m odel [Cod70]. [Indeed, if the lengths of all sequences in an s-instance arej restricted to 1, then the s-instance can be viewed as an instance (or relation) of th e traditional relational d ata model.] Obviously, the introduction of sequences into databases dem ands new “constructs” in their query languages. In th e following sections, the rs-operations will be incorporated into query languages to handle these sequences. 4.2 S-algebra: an Algebraic Query Language In this section, we define an algebraic query language, called “s-algebra,” over s- database instances. S-algebra is essentially the relational algebra [Cod70, U1188] w ith rs-operations added to deal w ith sequences. We start by constructing two types of operations, called “merger reconstructions” and “extractor reconstructions,” which use rs-operations to m anipulate sequences in s-instances. 21 Turning to the merger reconstructions, let 7 be an n-ary s-instance. For each list £ = *!,..., ik of k > 1 num bers in {1,... ,n} and each A;-ary m erger |VFJa;, let [bF]^(7), called a merger reconstruction (of I), be the (n + l)-ary s-instance (J{(ul5... ,u n,u n+i)\ui=t(i) for 1 < i < n and un+1 is in |[H^|fc({ttia},. • ., {«;*})}• tel For exam ple, [[cnja^ U ct2 a fl2’3({(ab, bed, a)}) = {(ab, bed, a, beda), (ab, bed, a, abed)} (since beda and abed are in [[o^a^ U {a })) an<i ^>c)}) = {(a, b, c, cc)}. Notice th a t the list £ of num bers can have repetitions. The “extractor reconstructions” of s-instances are defined similarly. Specifically, let I be an n-ary s-instance. For each integer k, 1 < k < n, and extractor ttiJVF]^1, let |V F |_/s( /) , called an extractor reconstruction (of I), be the (n -f l)-ary s-instance: • • • ,u n,u n+i)\iti = t(i) for 1 < i < n and un+x is in 1({wfc })}. tei For exam ple, [a * a 2l _2({(a 6 , 6c, a)}) = {(ab,bc,a,e),(abc,bc,a,b),(abc,bc,a,bc)}. Notice th a t only extractors in canonical form are used. Also, th e negation sign (—) in an extractor reconstruction |W/ ]_fc(7) suffices to distinguish th e extractor reconstruction from a merger reconstruction. We now consider the operations “union,” “intersection,” “difference,” “cross p ro d u ct,” “projection” and “selection” over s-instances. We will see th a t the first five operations are used in the same way as in the relational algebra. T he only difference betw een the “selection” (our last operation) of this chapter and th e selection in th e relational algebra is th a t the “selection conditions” here are defined in term s of sequences of atom s rather th an atoms. T he first operations are union, intersection and difference. For s-instances (n, Ix) and (n, I 2 ) of the same arity, let _________________________________________________________________________________ 72 (1) (n, /i) U ( n ,/ 2), called the union (of Ii and I2), be the s-instance (n ,Ii U I 2); (2 ) (n, I\) fl (n, / 2), called the intersection (of I\ and I2), be the s-instance (n,IiC\ I 2); and (3) (n, Ii) — (n, I2), called the difference (of Ii and I2), be the s-instance (n, I\ — I 2). Notice th a t intersection can be defined by difference since I 0 J = I — (I — J) for all s-instances I and J w ith the same arity. The next operation is the cross product. Formally, for m -ary tuple t\ — (u1?..., um) and n-ary tuple t2 = (v\ , ..., vn), let t\ x t2 be the (m -f n)-ary tuple (ui,..., um, V\,. . ., vff). For example, (ab, cd) x (d, /e , h) = (ab, cd, d, fe, h). For m -ary s-instance Ii and n-ary s-instance I2, let I\ x I2, called the cross product (of It and I 2), be the (m -fn )-ary s-instance {ti x t 2\ti in Ii and t2 in I2}. To illustrate, {(a, bb)} x {(«)>(*)} is the 3-ary s-instance {(a, 66, a), (a, bb, 6)}. Now consider the projection operation. For each n-ary s-instance I and list £ = *i,..., ik of k > 0 distinct numbers in {1,..., n}, let 7 r^(/), called the projection (of I to £), be the k-ary s-instance {(t(fx),..., t(ik))\t in I}. (If k = 0, then £ is the empty list, denoted by Q.) For example, T r 3 ti({(ab, abc, de)}) = {(de,ab)}, 7Tn({(a,6)}) = {()} and 7 rn(0) = 0. Notice that ^ here cannot have repetitions. In order to define the last operation, namely selection, the notion of (n-ary) selection conditions for each n > 0 is needed and is defined inductively as follows: ( 1 ) 7 X = 7 2 is an m ary selection condition if 7 , • (i = 1 , 2 ) is in U* or is of the form $j for some 1 < j < n; (2 ) (->C) is an n-ary selection condition if C is; and (3) (Ci A C2) and (Ci V C2) are both n-ary selection conditions if both C\ and C2 are. Since th e parentheses in selection conditions are for grouping purposes only, they 23. m ay be om itted if no confusion arises. For each n-ary selection condition C and each n-ary sequence tuple t, let C(t) = T if either (1) C is of the form 7 1 = 7 2 and «7l = n72, where (for i = 1,2) (a) tt7i = t(j) if 7 i — $j and (b) u7t = u if u is in U* and 7 , • = it; (2) C is of the form and C\{t) = F; (3) C is of the form (C\ A C2), Ci(t) = T and C2(t) = T; or (4) C is of the form (Ci V C2) and either C\{t) or C2(t) is T; and C(t) = F otherwise. For instance, let C be the selection condition $1 = aa. Then C((aa, b)) = T and C((a, b)) = F. We are now ready for the selection operation. For each n-ary s-instance I and n-ary selection condition C , let crc(J), called a selection (of I by C), be the n- ary s-instance {t £ I\C{t) = T ). For example, o$i_ao({(aa, 6), (a, c), (aa, c)}) = {(aa, b), (aa, c)}. Using the above eight operations (i.e., merger reconstruction, extractor recon struction, union, intersection, difference, cross product, projection and selection), we now define “s-algebra expressions.” For each s-database scheme D and integer n > 0, the (n-ary) s-algebra expressions (over D) are defined inductively as follows: (1) {(mi, ..., « n)} is an n-ary s-algebra expression if ii{ is in U* for each 1 < i < n, and {()} is an 0-ary s-algebra expression; (2) R is an n-ary s-algebra expression if R is in D and arity(R) = n; (3) (E\ U E 2), (Ei fl E 2) and (Ei — E2) are all n-ary s-algebra expressions if both Ei and E 2 are n-ary s-algebra expressions; ______________________________________________________________________________________________ 1 4 (4) (Ei x E2) is an n-ary s-algebra expression if E\ is an n i-ary s-algebra expres sion, E 2 is an n 2-ary s-algebra expression and ni + n 2 = n; (5) is an n-ary s-algebra expression if E is a k-ary s-algebra expression and £ is a list of n num bers in {1,..., k}; (6) (a c E ) is an n-ary s-algebra expression if E is an n-ary s-algebra expression and C is an n-ary condition; (7) [W ]£(l?) is an n-ary s-algebra expression if n > 2, E is an (n — l)-a ry s-algebra expression, £ a list of k > 1 num bers in {1,..., n — 1} and [VE]*; a &-ary merger; and (8) is an n-ary s-algebra expression if n > 2, 1 < k < n, E is an (n — l)-ary s-algebra expression and KilW ]^1 an extractor. Each n-ary s-algebra expression over D defines a m apping from each s-database instance over D to an n-ary s-instance. Specifically, let I d be an s-database instance of the s-database scheme D and E an expression over D. Then the value of E over I d , denoted E[Id\, is (1) {(«!,...,«„)} if E = u„)}; (2) Id (R) if E — R; (3 ) Ei[Id ] U E 2[Id ) if E — (Ei U E 2)- (4) Ei[Id \ - E 2[Id ] if E = (Ei - E 2)- (5) Ei[id ] n E 2[ID] if e = (El n e 2); (6) Ei(ID) x E 2[Id ] if E = (Ei x E 2)-, 75 (7) w^ I I d}) if E = (n E 1y, (8) ac{E1{ID) )\iE = { < jc E 1)-, (9) ( [ ^ ( ^ [ / o ] ) if E = and (10) I W j - ^ i l D } ) if E = ( iW j ^ E ,) . Thus, each s-algebra expression, when used as a m apping over s-database instances, defines a “query.” The collection of such queries is called the s-algebra. We now form ulate some specific queries over the s-instance in Figure 4.1. E x a m p les Suppose R is a 5-ary relation nam e whose first column corresponds to T o u r _N o , second column ClTY, third column ARRIVAL, fourth column DE PARTURE and fifth column COST (cf. Figure 4.1). The following are some queries addressed to R. (The answer given in each example is the value of the query over th e s-instance in Figure 4.1.) (1) “P rin t the num bers of those tours whose second city is A tlan ta.” This is expressed in s-algebra by: 7rl<7$6=“Atlanta” [ Q:20!l Q :2 l 2(-^ )' The answer is the set {(“356”)}. (2) “Give the num bers and costs of those tours which visit Los Angeles and later San Francisco.” This is expressed in s-algebra by: ^ 1 ,5 ^ 8 6 = “Los Angeles, San Francisco” [ ( ^ 1 C OL2 ) 1 ( -^ )• The answer to this query is the set {(“456” , “1409”)}. m (3) “R eturn the pairs of tour num bers such th at the first tour ends on the day when the second tour starts.” This is expressed in s-algebra by: 7ri,7<r$6=$i2(Ia2Q :iU~5(i?) x flo:1a!;ll'~4(i?)). The answer is the set { (“356” , “456”)}. 4.3 A Sequence Logic: SL In this section, we introduce a logic about sequences called “sequence logic” (SL). In the next section, we use SL to construct a calculus-like query language over s-databases. Following the custom ary m ethod of building a logic system , we first define the languages of SL. An SL language L is a 4-tuple (C, V, P , Lc), where (1) C is a set of elem ents called constants. (2) V is a set of elem ent called variables. (3) V is a set of elem ent called predicates. For each P in V, there exists a positive integer arity(P) called its arity. The set V contains [VF]„ for each n > 1 and regular subset W of V*. The arity of the predicate [IF ]n is n + 1. (It will become clear later why the arity of [[VFJn is defined as n -f 1.) (4) Lc is the set of the logic connectives “A”, and “3” , and the grouping symbols “(” and “)”. 77 In an SL language, constants are usually denoted by a and b etc., possibly sub scripted, variables by x and y etc., possibly subscripted, and predicates by P, pos sibly subscripted. Predicates of the form |VF]r e are called sequence predicates, and non-sequence predicates are called logic predicates. In the rem ainder of this thesis, V is assumed to be an infinite set of variables such th a t V P i U — 0. In this section only, L is always assum ed to be a fixed SL language (C, V, V, Lc), where C fl V = 0. Each sequence in C* and each variable in V is called a term of L and denoted by t , possibly subscripted. For example, suppose a is in C and x is in V. Then aa, a and x are all term s of L. Using the term s, the “form ulas” of L are defined below. For all n-ary predicate P and term s ti, ..., tn, P (t\ ,... ,tn) is called an atomic formula of L. Accordingly, [VF]n(U, • • •, tn, tn+i) is an atom ic form ula for each m erger |W ]n and all term s t\, ..., fn+i- In w hat follows, |W ]n(<i,... , tn, tn+1) will be written as (tn+1 6 . . . , tn)) and [et!*Ji(U, ^2 ) as {t\ = ^2)- The formulas (of L) are defined inductively as follows: (1) Each atom ic form ula of L is a form ula of L; and (2) (F\ V F 2 ), (_ 1 E \) and (3a:F\) are all form ulas of L if F\ and F 2 are formulas of L and a: is a variable in V. The parentheses in a form ula are for grouping purposes. Following custom , parentheses are usually dropped if no confusion arises. For example, (3x(3y((x = y) V P(x, y)))) will be w ritten as 3x3y(x = y\J P(x, y)). Also, formulas Fi A F 2 , Fi — > F 2 and VxF will be used as abbreviations of V - ^ ) , (~'Fi)VF 2 and ->3x(->F), respectively. A variable x is free in a form ula F if either (1) F is of the form P ( t \ ,... ,tn) and ti = x for some 1 < i < n; (2) F is of the form 3yF\, where y x and x is free in F i; (3) F is of the form ~^F\ and x is free in Fi; or (4) F is of the form F\ V F 2 _____________________________________________________________________ 7 . 8 . and x is free in F\ or in F2- For example, the form ula ab € [aua^lta^j ^ as no ^ree variables, and x is free in x € Iccili(^) and not free in 3xP(x,y). In the rem ainder of this chapter, x is assumed to be free in F when a form ula of the form 3x F is used. Turning to the sem antic aspect of SL, the “structures” for SL languages are now defined. A structure (for L) is a triple S = (A, 9? ,^ ), where ( 1 ) A is a nonem pty set (the universe), (2) is a m apping from C to A such th a t, for each constant c in C, <f(c) is an elem ent in A, and (3) ip is a (partial) m apping from V to the finite sets of tuples over A* such th a t ip(P) is a subset of A* x • • • x A* (n times) for each n-ary logic predicate symbol P in V. In other words, a structure assigns an element in the universe to each constant and a set of n-tuples of sequences to each n-ary logic predicate, and thus gives a “m eaning” to each constant and each logic predicate. (In contrast, the “m eaning” of each sequence predicate is predefined. This will become clear when “tru th values” of form ulas are defined below.) For each structure S — {A,ip, ip) for L, each (partial) m apping from V to A* is called a variable assignment (in S). The variable assignm ent which (only) m aps x to u is w ritten as [x/u], i.e., [x/u]{x) = u and [x/u\{y ) is undefined for each y 7^ x. Let and 02 be variable assignments. Then 9i92 is the variable assignm ent 92 followed by 91, i.e., 9i02(x) — u if 02(x) = u, or $2( 2 ) is undefined and 6i(x) = u. If 0i(x) and 02(x) are both undefined, then 0i02(x) is undefined. As an exam ple, let 0 = Then 9{xj) = tq, 6(x2) = u2 and 9{y) is undefined for each y _____________ 79 such th a t y ^ xi and y ^ x 2. As usual, the m apping [xxfu x] • • • [Xk/uk] is w ritten as [xi/ui, ..., Xk/uk]- For each variable assignm ent 9 in S — (A, p, ip) for L, let 9V be th e m apping from the term s of L to A* such th a t ( 1 ) 9v(ax • • ■ am) — p(ax) • • • p(am) for each ax ■ • • am in C*, and ( 2 ) 9lfi(x) — 9{x) for each x in V. We are now ready to define the “tru th values” for formulas. Let S = (A , < p , ip) be a structure for L, F a form ula of L and 6 a variable assignm ent such th a t 9 is defined for each free variable of F. Then F is said to be true (in S) under 9, denoted S j= F 9 , if either (1) F is of the form P(t\ , ... ,tn), where P is a logic predicate, and (9v(ti),... ,#„(*„)) is in ip(P); ( 2 ) F is of the form t e . . . ,tn) and 9v(t) is in Ad({^(^i)},..., {9v(tn)}), where M. is the merger m apping defined by the merger [[fFj^; (3) F is of th e form Fx V F2 and either S |= F\9 or S |= F29; (4) F is of th e form ->FX and S \= Fx9 is not true; or (5) F is of th e form 3xFx and S |= Fi9[xfu] for some variable assignm ent [x/u]. O therwise, F is said to be false (in S) under 9, denoted S ^ F9. Suppose F is a form ula of L w ith no free variables. Then F is said to be true in S, denoted S |= F, if there exists 9 such th a t S |= F9. If F is true in all structures, then F is said to be true, denoted |= F. For example, (x = x) is true in all structures. Thus, |= (x = x). Sequence logic formulas are used to specify or declare properties of sequences or sets of sequences. The following are some examples. (Throughout this chapter, ________________________________________________________aa jP(si, ..., sn) is used as an abbreviation of the form ula 3x^ • • • 3 ^ P (*i, where s t - is a term of L or the symbol ” for each 1 < i < n, P an n-ary predicate, {*i,... ,ik} = {1 < j < = —} and, for each 1 < i < n, ti = Xi if 5 * = — and ti = otherwise. For example, P( — ,y) is an abbreviation of 3xP(x,y).) E x a m p les ( 1 ) A sequence can be viewed as a m ultiset1. Let x C y denote the form ula \/z(x G [(on U a 2)*l2(-,z ) a EQ (z) -> y € |[(ai U a 2)*]]2( - , *)), where EQ (z) = Wyu y2(z G la la 1a 2a ll3(y1,y 2, - ) -+ (yi = y2))). It is easily seen th a t ( 1 ) EQ(m) is true if and only if u is in a* for some a , and (2) C u2 is tru e if and only if Ui is a subset of u2 when viewed as m ultisets. Expressed form ally2, let L be an SL language and S = (A , ip, ij> ) a structure for L. Then S |=EQ(;c)# if and only if &(x) is in a* for some a in A , and S |= (x C y)9 if and only if 0{x) is a subset of 6{y) (in the m ultiset sense). Now suppose P is a total order relation on constants (e.g., “< ” on th e in tegers). Let Sorted (z,P) = (Vx,y)(z G |a J a 2a:3 a:i]]3 ( - 5 y) P{x,y))- Clearly, Sorted(«, P) is true if and only if u is sorted according to P. Now let Sort(ar, y, P) be the form ula {x E y) A (y Q a;) A Sorted(y, P). Intuitively, Sort(w!, u2, P) is true if and only if u2 is a result of sorting Ui according to P. Expressed formally, let L be an SL language containing P as 1A multiset is a set having possible duplicate elements. For example, {a,a,b } and {a, a, b, b] are both multisets. Let and s2 be multisets. Then Si is a subset of s2 (in the multiset sense) if, for each a, a occurs in s2 at least as often as it occurs in «i. For instance, {a, a, b} is a subset of {a, a, b, 6). 2In examples (2) and (3) below, only the intuitive meanings of the formulas will be described. 81 a binary logic predicate and S — a structure of L such th a t xft(P) is a to tal order relation. Then S |= Sort(x,y,P)$ if and only if 9(y) is the result of sorting 0(x) according to ifi(P). (2 ) In the study of object histories [GT8 6 ], “local constraints” play an im portant role. Each constraint can be viewed as a unary predicate. Let R be a con straint, i.e., a unary predicate. Given a sequence w, if all subintervals of u of length k are in R implies th a t u is in R, then R is said to be k-local. Let Local* (i?) be the form ula V®(Vj/((Length*(y) A x € la\cP2a H 2{ -,y )) -»• R{y)) -* R(x)), where Lengthy (a:) = x € ffoiBiC— )- If is easily seen th a t Local* (/£) is tru e if- and only if R is fc-local. (3) A t the end of Section 2.1, the operation Half was expressed using rs-operations. Here, an SL form ula is used to describe the property th a t one sequence is the first half of another. Indeed, let Samelength(a:, y) be the form ula — 6 [(aio:2 )*]]2(:r, y) and Prefix(a;,y) the form ula y € — )• Clearly, Samelength(i<i, u 2) is tru e if and only if u\ and u 2 are of the same length, and Prefix(wi, m 2) is tru e if and only if u\ is a prefix of u 2. Now let Half(x, y) be the form ula Prefix(x,y) A 3z(y 6 [[(<^i^2 )*l2 ( “ 5 z) A Samelength(a:, z)). It is easily seen th a t H alf(u i,tt2) is true if and only if u 2 is of even length and is the first half of u 2. .82 4.4 S-Calculus: a Calculus-like Query Language! i Using SL form ulas, a calculus-like query language on s-databases, called “s-calculus,” ■ i ! will now be constructed. As will be seen, s-calculus queries can be used to express j the Post Correspondence Problem , and thus are not com putable and therefore un- j suitable for practical use. In the next section, a subset of s-calculus, nam ely “safe! s-calculus,” will be defined and shown to be equivalent to s-algebra. Thus, each safe s-calculus query is com putable. Recall th a t each s-database scheme is a finite set of s-relation nam es and an s- I 1 database instance associates each n-ary s-relation nam e w ith a finite set of n-tuples over U*. (As m entioned earlier, U is the set of constants, i.e., th e universe.) The correspondences of s-database schemes to SL languages and s-database instances to SL structures are apparent. This observation leads to the use of SL form ulas as j queries over s-databases. I t i I For each s-database scheme D, the associated SL language of D, denoted Ld, is j J I the SL language {U, V, D U Sp,Lc), where Sp is the set of all sequence p red icates.' i (N ote th a t U is an infinite set of constants and V an infinite set of variables.) An | s-calculus query is a construct of the form {(:ri,..., where F is a form ula of j L d , n > 0 and x\, ..., x n are the free variables3 in F. T he collection of all s-calculus , i queries over all s-database schemes is called s-calculus. 1 Let I d be an s-database instance over D. Then the associated SL structure of' I d , denoted SiD, is the structure for Ld , where cp(a) = a for each a in U , and tp(R) = I d {R) for each R in D. (By abuse of language, I d will be used instead 3It is understood that if n = 0, then there is no free variables in F. of SiD if no confusion arises.) Let T be an s-calculus query over D. Then the answer j set of T over Id, denoted T[Id], is the set4 j { (wi) • • • j wn) £ li x • • • x U | I d \— F\x\/U\ ,..., J• ' n times To illustrate, let D be the s-database scheme {R x, R 2}, where the arities of Ri and Ri are 1 and 2, respectively. Then Ld — {U, V, {Ri, R 2 } U Sp, Lc), where Sp is the set of all sequence predicates. Clearly, T = {(x)|3y(i?i(y) A R 2(y, ®))} is a query over D. Now let Id be the s-database instance over D such th a t Id(R\) = (aaa)} and Jj9 (i?2) = {(aaa,bbb),(a,c)}. It is easy to see th a t {(bbb)} is the answer set of! I T over I d - } 1 1 E x a m p le s Recall the examples at the end of Section 4.2, where three queries; pertinent to a 5-ary relation R of tour schedules were expressed in s-algebra. In th e following, these queries are form ulated in s-calculus. The answer given in each 1 I exam ple is the answer set of the s-calculus query over the s-instance in Figure 4.1. I (1) “P rin t the num bers of those tours whose second city is A tlan ta.” { (a;)|3?/(jR(a;, y, A y £ | a ia 2a l]2{ - “A tlan ta”))}. ' I The answer is the set {(“356”) }. ! I | (2) “Give the num bers and costs of those tours which visit Los Angeles and later' San Francisco.” { ( x ) \ 3 y ( R ( x , y , - , - , - ) A V € [(a i U a 2)*J2(—, “Los Angeles, San Francisco”)) }. T he answer to this query is the set {(“456”, “1409”)}. , 4If n = 0, then U* x • • ■ x U* (n times) denotes the set {()}. 8 4 : (3) “R etu rn th e pairs of tour num bers such th a t the first to u r ends on th e day when th e second tour sta rts.” { ( x i,x 2)\3yi3y2(R(x1, - , y 1, - , ~ ) A R (x2, - , —, «/2, —)A 3z(y i € la la 2h ( - > z ) A y2 -))) }• T he answer is the set {(“356” , “456”)}. S-calculus is a very powerful language, perhaps too powerful. Indeed, we now show th a t there exists an s-calculus query T over an s-database schem e D such th a t T[Id) is not com putable. We do this by sim ulating th e P ost Correspondence P roblem (see Section 14.2 of [HU69]) using an s-calculus query. T h e o r e m 4.1 T here exists an s-calculus query T over an s-database schem e D such th a t it is undecidable to determ ine for an arb itrary s-instance Id w hether T[Id\ is em pty. P r o o f Let A be a finite set of elem ents, and = iq ,..., un and S2 = Ui,..., v. be lists of non-em pty sequences over A for some n > 1. Recall th a t the P ost Correspondence Problem (P C P ) for si and s2 is to find a list of k > 1 num bers in {1,..., n} such th a t u^ ■ ■ • Uik — ■ ■ • V{k. If such a list exists, then si and S2 are said to have a solution. It is well known [HU69] th a t it is u n decidable to determ ine for an arb itrary integer n > 1 and lists Si and S2 (both of length n ) w hether there exists a solution. W e shall “reduce” each P C P to the problem of deciding w hether th e answ er set of an s-calculus query over an s-database instance is em pty. Let S\ and s2 be lists of n > 1 non-em pty sequences over A. W ith o u t/ z^iary/ of generality, suppose U is a superset of A. Suppose further th a t R i s / 851 relation nam e. Let D be the s-database scheme {i?} and I d the s-instance such th a t I d {R) = {(mi,^i)5 • • • 5 In order to build an s-calculus query “sim ulating” the PC P, some auxiliary SL formulas are first constructed. Let N ot-in(a,:r) be the form ula \/y(x e ->->!/ = a). Clearly, N ot-in(a, u ) is tru e if and only if the sequence u does not contain a as an elem ent. Furtherm ore, let No-delimiter(x) be the form ula N ot-in($,x) A Not-in(<k x). Thus, No-delim iter(u) is true if and only if u contains no occurrence of $ and no occurrence of Let All(a,x) be the formula Vy(x € lo L * xa 2<xih{-,y) V = a)- Obviously, All(a, u) is true if and only if u is in a*. Now let F(z) be the form ula G\{z) A where Gi(z) is the form ula (1 ) \/x\/y(z € y,$) A (2) No-delimiter(x) A No-delimiter(y) (3 ) - > R ( x , y ) ) and G 2 (z) the form ula (4) 3x3y3zx3z2(z el{ciiOL*2a 3 a l)* a 1^i (z1,x ,Z 2 ,y ) A (5) No-delimiter(x) A No-delimiter(?/) A (6) All($, zi) A A ll(^ z2) A (7) x = y). 86 Let u be a sequence in (AU{$, £})* such th a t Id |= F{u) is true. It is easy to see th a t lines (4), (5) and (6) above guarantee th at u can be w ritten as where Uij and Vij (1 < j < k) are in A*. Lines (1) and (2) select sequences u,y and Vij for all 1 < j < k and line (3) ensures th at (Uij,Vij) is in Id{R) f°r each 1 < j < k. Lines (4), (5) and (6) assign un • • • Uik and vn ■ • • Vik into x and y , respectively. Line (7) says th a t un ■ ■ ■ Uik = vn • ■ ■ Vik. Therefore, Id \= 3zF(z) if and only if si and S2 have a solution. Let T be the s-calculus query {(2 ) |F ( 2t)}. Then T[Id] is not em pty if and only if Id |= 3zF{z). The latter holds if and only if the P C P has a solution. Therefore, it is undecidable to determ ine whether T[Id] is empty, d By the above theorem , there exists an s-calculus query which is not com putable. Indeed, suppose all s-calculus queries are com putable. It is then decidable to de term ine w hether the answer set of each query over an s-database is em pty (by com puting the answer set). However, this contradicts the above theorem . 4.5 Safe s-Calculus As shown in the last section, some s-calculus queries are not com putable. Also, analogous to the relational calculus, the answer set of an s-calculus query over an s- database instance may be infinite. (Indeed, the answer set of the query {(a;)|->i?(a:)} is infinite over each Id -) In this section, a com putable subset of s-calculus, called “safe s-calculus,” is defined and shown to be equivalent to s-algebra in “expressive pow er.” In order to define the notion of safe s-calculus, the “active dom ain” of each s-calculus query is first given. ________________________________8 7 Let T = {(a?i,..., xn)\F} be a query over the s-database schem e D and Id an s-database instance over D. Then the active domain of T over Id, denoted adom(T, I d ), is the set {a € U\a appears in F or in Id {R) for some R in jF}. Let adomk(T , I d ) — {(M )|« in adom(T, I d )* and len(u) < k} for each k > 1. T he active domain of a query over an s-database instance consists of all elements used in the form ula and all elements in the s-instances of the relation nam es ap pearing in the formula. Intuitively, the answer set of a query over an s-instance should consist of only these elements, i.e., no “new” elements should be “invented.” Furtherm ore, the answer set should not contain arbitrary long sequences. These ob servations lead to the notion of “safe s-calculus queries.” First though, the following technical term is needed. An SL formula F % is said to be a subformula of an SL form ula F if either (i) Ft = F, (ii) F = (F1 V F") and F\ is a subform ula of either F' or F ", or (iii) F — ~'F' or F = 3xF ' and F\ is a subform ula of F ' . We are now ready for the notion of safe s-calculus. For each k > 1, an s-calculus query T = {(aq,... ,x n)\F) over the s-database scheme D is said to be k-safe if T satisfies both of the following two conditions for each s-database instance Id ' (1) If («i,..., un) is in T [ I d], then Ui is in adomk(T, I d ) for each 1 < i < n. (2) If (3iZ.Fi) is a subform ula of F and x, X\, , xm are th e free variables in jFi, then Id | -Fi[cc/u, X\/U\, • • • , Xm j where iq is in adomk(T, I d ) for 1 < i < n, implies th a t u is in adomk(T, I d )- 88 A query T is said to be safe if it is k-safe for some k > 1. The collection of all safe-s-calculus queries is called safe s-calculus. It is easy to see th a t an s-calculus query T is Ar-safe for each k > k' if T is k' safe. Intuitively, condition (1) above ensures th a t the answer set of a safe s-calculus query is “lim ited” in some sense. Since A ; is a finite num ber and I d is also finite, the num ber of possible tuples in the answer is finite. Therefore, one “naive” algorithm to evaluate a safe s-calculus query is to loop through all possible answers (note th a t k does not depend on Id ) and check whether each of them satisfies the query. T he only problem w ith this naive algorithm is th a t when testing w hether 3xFi is satisfied, there appear to be infinite num ber of possible values for x. However, condition (2) above elim inates this possibility. Thus, safe s-calculus queries are com putable. This conclusion is formally proved below when each safe s-calculus query is shown to be equivalent to an effectively constructible s-algebra query. The notion of safe s-calculus is very similar to th a t of the safe relational calculus [U1180]. In fact, if sequence predicates ([Q Jn) are disallowed and each sequence is viewed as an elem ent, then (safe) s-calculus is exactly the (safe) relational calculus. Also, th e use of sequence predicates in s-calculus is essentially the sam e as th a t of th e merger reconstructions and extractor reconstructions in s-algebra. Therefore, it should be no surprise th a t the safe s-calculus is “equivalent” to s-algebra as shown in Theorem 4.2 below. In order to prove Theorem 4.2, the following technical result is needed. L e m m a 4.1 Let T be an s-calculus query over the s-database scheme D and k> 1. Then there exists an s-algebra expression E such th a t E[Id] = adomk(T, Id ) for each s-instance I d over D. S S L P r o o f For each sequence u in U* and /-ary relation R over ti *, let Ei(u) = 7r2tta^2«;5;l_1({(M)}) and E 2 {i,R) = where 1 < i < I. Intuitively, E\(u ) gives the unary relation {(a)|a appears in u}, and E 2 (i, R) the unary relation {(a)\a appears in the i-th column of R}. Let T be the s-calculus query {(aq,..., a:n)|-F}. Suppose now ttl5 . .. , W f c are the^ k > 0 sequences in U* appearing in F and R i , .. ., R m are the m > 0 relation names appearing in F. Let E 3 = U i E a (j,/* ,• )• (Note th a t E 3 = 0 if k = 0, and E 4 = 0 if m = 0.) Given an s-instance In over D , (i) E 3[Id] obviously returns all elements appearing in F and (ii) E^In] gives all elem ents appearing in U/ieZ? Id(R)- For each 1 < i < k, let E l = 7 rt - +ija i • - • a i! 1’ - ’ ,((E3 U E4) x • • • x (E3 U E4)), ^ ' i times E° = {(e)} and E = E° U E 1 U • • • U E k. Clearly, E[Id\ = adomk(T, In) for each In- E l We are now able to show th a t s-algebra is at least as “powerful” as safe s-calculus. We say th a t an s-calculus query T over an s-database scheme D and an s-algebra expression E over D are equivalent if E[In] = T[In] for each s-database instance In of D. In w hat follows, D O M ^(T, In) will be used as an abbreviation for adomk(T, In) x • • ■ x adomk(T,In) '-------------------------------------------r ------------------------------------------- ' m times for each m > 1 and k > 1. For each k > 1, let D O M g(T , In) be th e 0-ary s-instance {()}■ _________________________________________________________________________ OH L e m m a 4.2 Let D be an s-database scheme and T a safe s-calculus query over D. T hen there exists an s-algebra expression E equivalent to T . P r o o f Suppose T = {(aq,..., x n)\F} and T is &-safe for some k > 1. By Lem m a 4.1, there exists an s-algebraic expression E' such th a t E'[Id]= adomk(T, I d) for each s-database instance Id over D. By abuse of language, let D O M ^(T , Id) also denote th e s-algebra expression E ’ x • • • x E r (m tim es). (Notice th a t if m = 0, then D O M ^(T , I d) is the 0-ary s-algebra expression {()}•) Let I d be an s-database instance over D. Assume for the m om ent th a t the following statem ent is true: (1) Suppose Ei is a subform ula of F and aq, ..., x m, m > 0, are the free variables of Fy. Then there exists an s-algebra expression E\ such th a t E i [ID] = T\[Id \ H D O M ^ T u Id ), where Tx = {(aq,..., xm)\Fi}. (T he s-algebra expression E x in (1) is said to be a corresponding s-algebra expression of T\.) Note th a t F is a subform ula of F. Then by (1), there exists an s-algebra expression E such th a t E[Id] = T[Id] H D O M ^ T , Id)- By th e hypothesis th a t T is fc-safe, it follows th a t T[Id] Q DOM%(T, Id), whence E[Id] = T[Id] as desired. To prove th e lem m a, it therefore suffices to establish (1). We prove (1) by an induction on the num ber of connectives appearing in F. To begin, consider the case when no connectives occur in F , i.e., F is an atom ic form ula. (In this case, Fx is a subform ula of F implies th a t Fx = F and Ti = {(^i,..., x n)\Fi} = T.) Two possibilities arise. .91 . (i) F = R ( z i,... ,zi), where R is a relation nam e in D and each Zi (1 < i < I) is a variable in V or a sequence of constants in 14*. For each 1 < i < /, let ji = m in {j < i\zj — Z{ } (note th a t ji is always defined) and { $i — u if Zi — u is in U* Si = $ji if Z{ is a variable. Let C be the selection condition C\ A • • • A C\. For each 1 < j < n, let lj bei the smallest num ber m such th a t zm = xj. (Since x i, ..., xn is the list of the (distinct) variables appearing in F, Xj (1 < j < n) occurs in z\, ..., zi Hence, lj is well defined.) It is easily seen th a t irili...tinac(R) is a corresponding! s-algebra expression of T = {(x 1 , . . . Jxn)\R(z 1 , . . . , z l)} = { ( x i ,.. . , x n)\F}. (ii) F = zi+ 1 € [Q]j(2i, • • ■ , zi), where Q is a regular subset of V ^ * and each 2 ; ( 1 < * < / - | - I ) i s a variable or a sequence. For each 1 < « < / - ( - 1, let D O M * (T, In) if Zi is a variable {(«)} if Zi = u is in 14*. For each 1 < j < n, let lj be the smallest integer m such th a t zm — Xj. (Since xi, ..., x n are the (distinct) variables appearing in F, Xj (1 < j < n) occurs in zi, , zi+1. Hence, lj is well defined.) Then * 1, 1- d Q ] 1 '(A i x • • • x A,) n (D O M K T J d ) x /1,+1)) is a corresponding s-algebra expression of T = {(x1?..., xn)\zl + 1 € \Q\i{zu ..., z^}. Com bining the above two cases, (1) is true if F has no connectives. Assum e (1) is true if F has less than K > 1 connectives. Suppose now F has exactly K connectives. Three cases arise. ________ 92 Ai = (i) F = Fx V F2. Let yx, ..., yt be the free variables in Fx and zx, ..., zm the free variables in F2. Notice th a t I < n and m < n. Also, Ti (F 2 , respectively) contains less th an K connectives. By induction, there exist E x and E2 such th a t Ei is a corresponding s-algebra expression of Tx = {(yi,... ,yi)\Fi} and E 2 is a corresponding s-algebra expression of T 2 = {(^i,... ,z m)\F2}. Let fi be a 1-1 m apping from {1,..., n} onto {1,..., n} such th a t f x(i) = j if yj = Xi for some j. (Since yj appears in x x, ..., xn at m ost once for each 1 < j < /, such a 1-1 m apping fi exists.) Let f 2 be analogously defined w ith respect to Zj instead of yj. Now let E 'i = x D O M ^ ^ T i i I d )), and E 2 = TM-L),-Mn)(E 2 X DOM*_m(T2,lD ))• It is easily seen th a t E x U E'2 is a corresponding s-algebra expression of T . (ii) F = Clearly, there are n free variables in F\. Since Fx has exactly K — 1 connectives, by induction, there exists a corresponding s-algebra expression El of {(xx, ..., xn)\Fx}. Then DOM*(T, ID) - E x is a corresponding s-algebra! expression of T. (iii) F = 3xFx. Clearly, there are n + 1 free variables in Fx. Let Tx {(*^1, • • •, ®n, * ^) |T\ } • By induction, there exists E x such th a t (2) E x[lD\ = {(«i,... ,u n,u)\ID |= Fx[ux/ x x, .. .,u n/x„, u /x ]}n DOM*+x(Tx,Id ) 93J for each s-database instance Id over D. Let E — irx,...,nEi and Id be an s-instance. It remains to show th at (3) E[Id \ = {(ui,.. .,u n)\ID |= F [u i/x!,... ,u n/ x n]} n D O M k(T ,ID)- First suppose (ux,... ,u n) is in E[Id\- Since E — K\,...,nE\ and th e arity of Ex is n + 1, there exists u such th at (ux, ..., un, u) is in E x[Id ] by definition. Since E x is a corresponding s-algebra expression of Tx, («i,..., un, u) is in Tx[Id]^E O M ^+1(Tx, ID). Thus, ID |= Fx[xx/ux, ... , x n/u n, x/u], whence ID |= 3xFx[xx/ux,... ,x n/u n]. It is then easy to see th at («i, is in the set on th e right hand side of (3). Now suppose («i,..., un) is in the set on the right hand side of (3). Clearly, ui is in adomk(T, I d ) for each 1 < i < n. Since F = 3xFx and Id |= F[u\/xx, ..., un/ x n], there exists u such th at I d |= Fx\uilx\ , ..., un[xn, ufx\. Since F is k-safe, u is in adomk(T,lD)- Hence, (ux, ..., un, u) is in E\[Id\ by (2). Thus, (txi,..., «n) is in icx nE x[ID] = E[ID}. Therefore, (3) is true, thereby completing the proof for this case. In each of the above three cases, (1) is tru e if F has exactly K connectives. Hence, the induction is extended and (1) established. □ We now show th a t safe s-calculus is at least as “powerful” as s-algebra. L e m m a 4.3 Let D be an s-database scheme and E an s-algebra expression over D. T hen there exists a safe s-calculus query which is equivalent to E. P r o o f We prove the lem m a by an induction on the num ber of operators in E. Suppose there are no operators in E. Two cases arise. __________________________________________________________________________________ 94 . (i) E = {(iq,. • •, «n)}, where iq is in U* for each 1 < i < n. It is easily seen th a t the query {(aq,... ,x n)\x\ = A • • • A x n = « n} is equivalent to E and is safe. (ii) E = R, where R is an n-ary relation name. It is easily seen th a t the query {(aq,.. ., x„,)|i?(aq,. .. ,x n)} is equivalent to E and is safe. Thus, if E contains no operators, then E has an equivalent safe s-calculus query. Now assum e th a t each s-algebra expression E has an equivalent safe s-calculus query if E contains less than N > 1 operators. Suppose E contains exactly N operators. Eight cases arise. (i) E = E XU E2. Then Ei and E 2 each have less th an N operators. For 1 < i < 2, it follows from the induction assum ption th a t there exists a safe s-calculus query Ti = {(aq,..., xn)|jF8 -} equivalent to Ei. W ithout loss of generality, suppose T\ and T2 are both fc-safe. Then it is clear th a t E is equivalent to T = {(aq,. . . , xn)|i'i V F2}. Since Ti and T2 are both safe, T\\Id] Q DOM ^{T xJ d ) ^ T 2[Id ] C DOM*(T2,I d ). Hence, T[ID\ C D 0 M * (T u Id )13 DOM£l(T2, I d)- It is easily seen th a t DOM*(Ti, Id ) ^ DOM£(T, Id ) for i = 1,2. Then T[Ip ] C DOM%(T, Ip), i.e., T satisfies condition (1) for safe s-calculus queries. Furtherm ore, it is obvious th a t if a form ula of th e form 3xF' is a subform ula of Fx V F2, then it is a subform ula of either Fx or F2. It follows th a t T satisfies condition (2) for safe s-calculus queries since T\ and T2 are both safe. Therefore, T is safe. (ii) E = Ei — E 2. By induction, there exist safe s-calculus queries T\ = { ( x i , . . . , a r n ) | F i } _________________________________________________________________________________________ a s . and T2 = {(aq,.. .jXn)\F2} such th a t Ti is equivalent to El for * = 1,2. Clearly, E is equivalent T = {(aq,..., x n)\Fx A ~'F2}. Analogous to the proof of case (i) above, T is safe. (iii) E = Ei D E2. Since E\ fl E2 = E\ — {E\ — E 2), it is easy to see from case (ii) above th a t E has an equivalent safe s-calculus query. (iv) E = Ei x E 2. By induction, for each 1 < * < 2, there exists a safe s-calculus query Ti = {(aq,... ,x ni)\Fi) equivalent to Ei. Let zx, ..., zni+n2 be nx + n 2 distinct variables. Furtherm ore, let F[ be the form ula obtained from Fx by changing aq to Zi for each 1 < i < n x and F2 obtained from F2 by changing aq to zni+i for each 1 < i < n2. Let T = {(cci,..., xrei+ .n2)|jFi A F2}. Clearly, T is equivalent to E. Let Id be an s-database instance over D. W ithout loss of generality, we m ay suppose Tx and T2 are both fc-safe. Let 1 < i < 2. Clearly, T [Id] Q D O M ^ T i J o ) . Let Tt - Then T;[Id ] C D O M ^ (T !,Id ). It is easily seen th a t adomk(T-, Id ) ^ adomk(T, I d)- Thus, T{]d ] C D O M ^ (T ,I d) x D O M ^ ( T ,I d ) = D O M ^ T . I o ), i.e., T satisfies condition (1) for safe s-calculus queries. By the sam e argum ent used in case (i), T also satisfies condition (2) for safe s-calculus queries. Thus, T is safe. (v) E = 7r qi? i. Let Tx = {(aq,..., arre )|jF\} be a safe s-calculus query which is equivalent to E\. Let T = {(aq,,..., Xit) |3xjj • • • BxjpFx}, where {jj,... ,jp} = _____________________________________________________________ 9 6 {1,..., n} — {i\i = ij for some 1 < j < I}. It is easy to see th a t T is equivalent to E. It remains to show th a t T is safe. Let Id be an s-database instance. Since T\ is safe, there exists k > 1 such th a t Ti is k-safe. Then Ti [Id] C D O M k(Tx, I d ) by definition. Suppose Ei[ui/xi, ..., xnfu n] is true in Id - Since Tx is safe, tq is in adomk(Tx, I d ) for each 1 < * < n. It is obvious th a t adomk(Ti, Id) Q adomk(T, I d )- Then Ui is in adomk(T, Id ) for each 1 < i < n, i.e., T satisfies condition (1) for safe s-calculus queries. It is easy to see th a t T also satisfies condition (2) for safe s-calculus queries. Thus, T is safe. (vi) E = acE\. By induction, there exists a safe s-calculus query Ti = {(xi,...,;rn)|Ti} which is equivalent to E\. Let T — {(^i, ..., x n)\C' f\F\}, where C' is obtained from C by changing % i to Xi for each $* appearing in C . Clearly, T is equivalent to E. It is easy to see th at T is safe since T\ is safe. (vii) E = By induction, there exists a safe s-calculus query Ti = {Oi,. ..,x n)|T\} equivalent to E\. Let T — {(sq,..., x n, a;n+i)|.F}, where F = F\ A {xn+\ € i • - • i % ii ))• Clearly, T is equivalent to E. It remains to show th a t T is safe. Let I d be an s-database instance. Since T\ is safe, there exists k such th a t T\ is A:-safe. Suppose Id |= ^lfwi/aq,..., un/ x n\. Then U { is in adomk(T\ , I d) for 97 each 1 < i < n since Tx is safe. Suppose I d |= F [u \jx\ ,..., un/ x n, un+\jxn+x]. Then un+1 is in [<2Jn({«n },•••, Let k' = kl. Clearly, un+l is in adomk‘(T\, I d )- It is obvious th a t adomk> (Tx, Id ) Q adomk> (T ,I d )- Thus, T \I d] Q . F>OMk' +1(T ,ID), i.e., T satisfies condition (1) for safe s-calculus queries. Furtherm ore, since each subform ula of F of the form 3xF ' is also a subform ula of F\ and T\ is safe, it follows th a t T satisfies condition (2) for safe s-calculus queries. Hence, T is safe. (viii) E — By induction, there exists a safe s-calculus query T\ = { ( a q , . . . ,a;n )|F i} equivalent to E\. W ithout loss of generality, suppose I = n. Let F[ = Fx A (xn € [ T l 2(x n+i,a;„+ 2)) and T = {(a:i,.. ., a:n, a:n+i)|3 x n+2F^}. It is clear th a t T is equivalent to E. It rem ains to show th at T is safe. Let Id be an s-database instance. Since T\ is safe, it is &-safe for some k > 1. Suppose Id |= F i[xi/w i, . . . , xn/u n). Then Ui is in adomk(Tx, I d ) since Tx is fc-safe. Suppose I d F{[xx!ux, - ■ ■ , x„+2/w„+2]- Thus, un is in [Q ]2({u„+i},{t<n+2}). Clearly, un+i is in adomk(Tx, Id ) for * = 1,2 since un is in adomk(Tx., I d )- It is obvious th a t adomk(Txi Id ) ^ adomk(T, I d)- Hence, un+2 is in adomk(T, I d)- It is then easy to see th a t T is safe. In each the above eight cases, E has an equivalent safe s-calculus query. This completes the induction and, therefore, establishes the lem m a. □ We say th a t safe s-calculus and s-algebra are equivalent in expressive power'll for each safe s-calculus query there exists an equivalent s-algebra expression, and vice ________ 98. versa. Combining the above two lemmas, we im m ediately have the m ain result of this section, namely: T h e o re m 4.2 Safe s-calculus and s-algebra are equivalent in expressive power. The algebra 99 above theorem is parallel to the result about the equivalence of th e relational and the safe relational calculus [U1180]. Chapter 5 i Two Additional Applications of Rs-Operations j In this chapter, we examine the use (and the extension) of rs-operations in two additional situations. The first one is in an extended Datalog. We add sequences j i into the traditional Datalog and then employ rs-operations as special predicates (sim ilar to those in SL) to deal w ith sequences. The second is on “nested sequences.” We shall see th a t th a t the rs-operations can easily be adapted to nested sequences. In contrast to the previous chapter which detailed two query languages, the current chapter is intended to be illustrative. The reader is assum ed to be fam iliar w ith related m aterials. i I \ 5.1 Rs-Operations in an Extended Datalog In this section, we introduce an extension of Datalog [U1188], called “S-datalog,” which includes sequences. The introduction of sequences results in infinite “answers” ! to S-datalog queries. Using special properties of rs-operations, we show th a t th e re ; exists a n atu ral subclass, called “downward recursive,” of S-datalog which allows a j type of restricted recursion and whose answers are always finite. j i j 1 0 0 1 We start by defining S-datalog. We assume th a t the reader is fam iliar w ith the standard syntax and semantics of logic programs described in [Llo84, Apt87]. S-datalog is based on the sequence logic SL (of the previous chapter). In order to define S-datalog, some prelim inary notions are needed. Let L be an SL language. Each atom ic form ula of L is called a positive literal of L and denoted by /, possibly subscripted. For examples, P (x,aa) and x € are both positive literals. Each form ula of the form Vaq • • • Vxfc(/0 V ->/i V • • • V ->/m) is called a program clause over L if each U is a positive literal and aq, ..., x n are the free variables in /o, ..., lm. The program clause Vaq • • • Vxfc(/0) will be w ritten as lo < — and called a fact. The program clause Vaq • • • Va^(/o V ->/i V • • • V -<lm) will be w ritten as Iq < 1%,..., l m In a program clause, Iq is called the head of the clause and 1 , ..., lm the body of th e clause. Each finite set of program clauses over L is called an S-datalog program. (over L) and denoted by P , possibly subscripted. Sort(x, y) <- x € l(a iU a 2)*a3}3(yi,y2,y 3), sLT(y1 }y3), sLT(y3,y 2), Sort(yi,yJ), Sort(y2, y2), 2/ € , y3y2/2) sL T (x,y) *- x e la 1a.ll2(x1, x 2), y € [ a i a ^ y i , y2), L ess(x1,y 1), sL T (x i,y 2), sLT(x2,y) sL T (x,e) < — sLT(e,y) <- Figure 5.1: Quick sort in S-datalog. 1 0 1 As an example, Figure 5.1 shows a “quick sort” program . Intuitively, the first program clause separates the “input” sequence x into y\ and y3 according to y3 (the last elem ent of x ) such th a t each elem ent of y\ is less than y3 and each elem ent of yi is greater th an y3. The “result” of the sort is the concatenation of y[ (sorted version of 2/1), y3 and y'2 (sorted version of 7/2)- The second through the fourth program clauses give the definition of sLT(:r, y). It is easy to see th a t sLT(x, y) is tru e if each elem ent of x is less th an all elements of y. (The predicate Less is assum ed to be a total-order relation defined over the set of constants.) Turning to the sem antical aspect of S-datalog program s, let P be an S-datalogj program over L and C the set of constants appearing in P. Then C is called the Herbrand base for P . A Herbrand interpretation for P is an s-structure S — (C, 9?, ip) for L, where C is the H erbrand base for P , 97(c) — c for each c in C and ^([W^ln) = {(«o, • • • ,«n) € C* X • • • x C * > 0 is in [[VFJn({ u i} ,. . . , {u„}) S ■ " V ^ n + 1 times for each sequence predicate [VF]n in P . (Since C and < p are always fixed in each H erbrand interpretation for a S-datalog program , by abuse of language, a H erbrand in terpretation will usually be denoted by its third com ponent, i.e., by ip.) For H erbrand interpretations ip2 and ip 2 , let tpi 5! ^ 2 if ipi(P ) Q 1P2 {P) for each P appearing in P. A H erbrand interpretation S = (C,ip,ip) for an S-datalog program P is said to be a Herbrand model for P if each clause in P is true in S. A H erbrand model ip for an S-datalog program P is called a least Herbrand model for P if there exists no H erbrand model ip' for P such th a t ip' < ip. 102 It turns out th a t there is a unique least H erbrand model for each S-datalog program . In order to show this, the following prelim inary notions and results are needed. Let P be an S-datalog program. It is easy to see th a t (<S, < ), where S is the set of all H erbrand interpretations for P , is a complete lattice. [Recall th a t (<S, < ) is a com plete lattice if for each non-em pty subset $ of S , there exists a unique least upper bound (greatest lower bound, respectively), denoted by lub(’ £) (glb(^), respectively), w ith respect to <.] Indeed, let $ be a non-em pty set of H erbrand interpretations for P . Then lub(ty) is the H erbrand interpretation ip', where ‘ fi'(P) = u * g» i>(P) for each P appearing in P , and glb{<t) is the H erbrand interpretation V > ", where ip"(P) = V ’(P ) f°r each P appearing in P . Also, let be the H erbrand interpretation such th at figi(P) = 0 for each P in P and ij> a U the H erbrand in terpretation such th a t 'tpaii(P) = C* x ■ ■ ■ x C* (arity(P) tim es). Clearly, fa and tpan are the the bottom element and top element of the lattice (<S, < ), respectively. It is easy to see th a t if ^ is a non-em pty set of H erbrand models for P , then glb(ty) is also a H erbrand model for P . Let be the set of all H erbrand models for P. Then g lb (^m) is a H erbrand model for P. It is clear th a t g lb (^m) is the unique least H erbrand m odel for P . Thus, each S-datalog program has a unique least H erbrand m odel, i.e., T h eo rem 5.1 There exists a unique least H erbrand model for each S-datalog pro gram . A H erbrand interpretation ^ for an S-datalog program is said to be finite if 'fi(P) is finite for each logic predicate P in P . Clearly, if there exists a finite m odel 0 for an S-datalog program P , then the least Herbrand model for P is also finite. ________________________________________________________________________________________________________________________________________________________________ 1 . 0 3 A nother way of proving the above theorem is to use traditional “logic program s” to “sim ulate” S-datalog programs. [Thus, S-datalog is a “subclass” of the traditional logic program s. This makes it easier to apply results about logic program s to S- datalog.] Since each traditional logic program has a unique least H erbrand model [Llo84], it follows th a t each S-datalog program also has a unique least H erbrand m odel. A brief (sometimes informal) presentation of this sim ulation is given below. A logic program is a finite set of program clauses with only logic predicates, the binary function symbol • and the special constant e representing the em pty sequence (formal details om itted). For example, the following is a logic program : E q{x x • x 2, yi ■ y2) <- E q(xx, yx), E q(x2: y2) E q{x,y) <- E q(y,x) E q (x ,z) <- E q (x ,y ),E q (y ,z ) E q (e,e ) < — Eq(x, x) < — Eq(e ■ x ,x ) < — E q(x ■ e, x) < — Let P Eg denote the above program. In P.g9, Eq is a binary predicate sym bol and • is a binary function symbol (w ritten in infix form). The ground terms of such logic program s are defined recursively as follows: (1) each constant is a ground term and (2) (t\ ■ t 2) is a term if both t\ and t2 are ground term s. Two ground term s t x and t 2 are said to be equivalent, denoted tx = t2. if they represent the sam e sequence when • is viewed as the concatenation function on sequences. For example, (a • (b • c)) is equivalent to ((a ■ b) ■ c) since they both represent the sequence abc. For each finite set C, let P E g(C ) — P Eg U U ag c{Eq(a) < —}. Clearly, the least H erbrand m odel of P E g (C ) is fhe H erbrand interpretation ? /> , where 'ift(Eq) = {(tx, t 2) E C* x C*\tx = t2}. ________________________________________________________________________________ 1.04 Let C be a finite set of constants and JW |n a merger. A logic program P is said 1 to be equivalent to |[VP]n over C if P contains an (n + l)-ary predicate Ppv]]n and, for| all u0, . . . , u n in C*, u0 is in |W ]n({ui}, - • •, K } ) if and only if F\wjn(u0, u u . . . , u n)[ i is tru e in the least H erbrand model of P U P Eq(C)- For exam ple, the following logic! program is equivalent to [[ai]i over {a, b}: 1 ) Plc>ih(b>b) *- I i We now show the following: > L e m m a 5.1 For each finite set C and merger [VF]n, there exists a logic program which is equivalent to [W ]n over C . P r o o f Since W is a regular subset of V*, there exists a regular expression e\y (seel Section 2.1) which represents W and consists of elem ents in Vn and operations U, -| I and *. (By abuse of language, W is used instead of ew as its representative regular' expression.) Let C be a finite set. We now establish the lem m a by an induction on| th e num ber of operations appearing in W . j i Suppose W has no operations in it. Then W = for some 1 < i < n. For each 1 I a in C, let P a be the following logic program: ! ] F > | q , | „ ( C ! 5 lit • • • • > tn) < j where ti = a and tj = e for each j i, 1 < j < n. Obviously, th e logic program ; I Uaec Pa is equivalent to | a i\n over C. Now suppose there exists an equivalent logic program for W over C if W has less th an K operations. Assume W has exactly K operations. Three cases arise: i i i _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ losj (1) W = Wy U W2. Let i = 1,2. Clearly, Wi has less th an K operations. Byj I induction, there exists a logic program P ; which is equivalent to |VE]]n oven I C. Let P ' be the following logic program: j E f W j n ( ^ 0 ) * £ l ? • • • 5 % n ) ■ * ( • £ ( ) ? * ^ 1 ) • • • 5 • E n ) E \ W \ n x \ 1 • • • i % n ) * P \ W 2 \ n ( . " * ' ® ' > " ^ 1 ? • • ■ > ^ n ) and P = P ' U P i U P 2 . Clearly, P is equivalent to |[W]|n over C. i (2) W = W\ ■ W 2. Let i = 1,2. By induction, there exists a logic program P t ’ which is equivalent to \W ^ n over C. Let P ' be the following logic program : I P j W ] n ( ^ 0 5 • • • 5 ■ T n ) * E q [ x o : y Q • ^ 0 ) 5 • • • 5 E q ( i n s Vrt ■ j • 5 Vn)i P \W 2lni.Z0 i • • • 5 z n) j ! and P — P ' U P i U P 2 . Obviously, P is equivalent to |W ]n over C. J 1 (3) W = Wy. By induction, there exists a logic program P i which is equivalent] I to [W i]n over C . Let P ' be the following logic program: j E \ W \ n ['z 0 ) • ■ ■ 5 %n ) * E q ^ X y y , TJyy ’ -^ 0 )5 • ■ • 5 E q ( x n ^ Un ' Z n ) , EiWiJniVOj ■■■iVn), P lW ]n ( z 0, • • • 5 Zn) P\w\n(e, • • • 5e) and P = P ' U P i. It is easy to see th a t P is equivalent to |[VT]n. Each of the above three cases extends the induction. Thus, the induction is estab-i lished and the lem m a proved. □ j By the above lem m a, it is easily seen th a t each S-datalog program has an “equiv alent” logic program . (The converse is clearly not true.) Theorem 5.1 then im m e diately follows from the fact th a t each logic program has a unique least m odel (see j Proposition 6.1 of [Llo84]). j 1 We now tu rn to describing the use of S-datalog program s as queries over s- [ I j databases. Let D be an s-database scheme, R a relation nam e not in D, and E aj finite set of relation names containing R and disjoint from D. Then each S-datalog j program P over Ldue satisfying the following two conditions is called an S-datalog\ query (over D) (with target R): I (1) No relation names in D and no sequence predicates appear in the head of the clauses of P. I (2) Each variable appearing in the head of a clause in P also appears in the body of th e clause. j i I T he relations in D are usually called the internal database relations, or ID B relations, 1 and th e relations in E the external database relations, or EDB relations. j Let D be an s-database scheme, Id an s-database instance over D and P a n 1 S-datalog query over D w ith target R. Let j i = [J {P(ui,... ,un) ^ {(U!,.. . ,un) < E ID(P)} j PeD j i and let S = (C, (p, if)) be the least Herbrand model for P U P iD. Then the answer of ! P over I d is defined to be tp(R). E x a m p le Let P be the relation name for the table in Figure 4.1. Consider the query over {P } : “give the lists of tour num bers such th a t each pair of consecutive to u rs ; 1 ! i ________________________________________________________________________________ 107J visit some common city.” Expressed in S-datalog query (with the target predicate Answer): Answer(a:) * — Rx[x,y) Ri(x,y) <- Rx(xx,yx),x € [ [ a ^ M z i , a ? 2), P{xx,yx, y2j 2/3,2/4,2/5), CommonCity(y, 2/1) i?i(*,y) <- P{x,y,z 3,zx,z5) CommonCity(2/i, 2/2) < — 2 /i € [ c ^ o ^ H ^ i , a), 2 /2 € x) Note th at the first argument of i?i serves as an “accumulative” variable storing thej result sequence and the second argument keeps the cities visited by the last tour of the result sequence. Also, the answer to this query may be infinite. Indeed, if a is the number of a tour, then each sequence in the set a+ is in the answer. As shown in the previous example, the answer to an S-datalog query may be infinite because of the involvement of the sequences. Obviously, queries with poten tially infinite answers are not suitable for practical purposes. However, as the next result shows, it is undecidable to determine whether an S-datalog query will give an infinite answer. T h e o re m 5.2 It is undecidable to determine whether the answer to an arbitraryi S-datalog query over an arbitrary s-database instance is infinite. P ro o f Let G — (V, S,T , S) be a phrase structure grammar [HU79], where T is a finite subset of (V — E )+ x V*. It is well known ([HU79]) that it is undecidable to determine if L(G) is infinite. We now reduce this to the problem of deciding whether the answer to an arbitrary S-datalog query over an arbitrary s-database instance is infinite. 108 Let D be the s-database scheme {/?}, where R is a binary s-relation nam e, and I d be the s-database instance over D such th at Id (R) = {(w, U)|(*L v ) € F}. Consider the following S-datalog query with Answer as the target predicate: Answer(:r) < — Answer(y), y € £ 2, 2 3 ), R {x2,x 4),x < E [o£jci:5o:5!3(a;1, ar3) Answer(S') < — Obviously, the above query “sim ulates” the “moves” of G. It is also clear th a t the answer to the above query over I d is infinite if and only if L (G ) is infinite. Therefore, it is undecidable to determ ine if an arbitrary S-datalog query over an arb itrary s-database instance gives an infinite answer. □ By the above theorem , we cannot tell if a given query will have infinite an swer. To avoid these “bad” queries, we resort to syntactic restrictions. Specifically, we syntactically define a n atural subclass of S-datalog queries, called “downward recursive” queries, which allow (restricted) recursion but always have finite answers. An S-datalog program clause is called upward if there exists a variable x appear ing in th e head of the clause such th a t x € |[VF]n( t i , . . . , t n) appears in the body of the clause. For example, R (x) < — x € x ) is an upw ard clause. Intuitively, an upward clause has the potential to increase in definitely the lengths of the sequences involved in an S-datalog program . A predicate is said to be defined in a clause if it appears in the head of the clause. An S-datalog program P is called downward recursive if the clauses in P can be partitioned into n > 1 sets of clauses P i, ..., P „ w ith the following property: For each predicate P appearing in the body of an upw ard clause in P&, where 109 1 < k < n, P is not defined in Ufc<i<nP;- It is easy to see th a t the query in the proof of Theorem 5.2 is not downward recursive. For each S-datalog program P , let Tp be the m apping from the set of all Her brand interpretations for P to itself defined as follows: For each H erbrand interpre tatio n ip, let Tp(ip) be the H erbrand interpretation ip' such th a t («i,... ,u n) is in ip'{R) if (1) R{ti, • ■ ■ ,tn) hi • • • i h is a clause of P and (2) h@i • • • 5 h® are in tp and , un) — R (ti, ... , t n)$ for some variable assignm ent t. Clearly, Tp is continuous, i.e., T p(lub(^)) = ^ or eacI1 non-em pty set of interpretations. By Proposition 5.4 of [Llo84], there is a unique least fixed point for each S-datalog program P and th a t fixed point is T p ( ^ ) ,..., Tp(tp^),...}, where tp % is the bottom elem ent of the set of all H erbrand interpretations for P w ith respect to < . Now let P be a downward recursive S-datalog program . By the definition of downward recursive, it is easy to see th a t there exists an integer K such th a t the length of each sequence appearing in Tj,{ip<b) is at m ost K for each i > 0. (N ote th a t Tj>(ip®) = rp$.) Thus, the least fixed point for P is finite. Clearly, the least fixed point for P is also a model for P . By the comment after Theorem 5.1, we im m ediately obtain the finiteness of the answers of downward recursive S-datalog queries: T h e o re m 5.3 The answer to each downward recursive S-datalog query over each s-database instance is finite. 110 5.2 Rs-Operations and Nested Sequences In this section, we define a “nested-sequence” d ata m odel1. We show th a t the traditional relational d ata model [Cod70], s-databases (Section 3.1), nested-tuple sequence m odel [GZC89] and the gram m atic d ata model [GPV89] are all special cases of the nested-sequence d ata model. We then “convert” the rs-operations to operations over “nested sequences.” Using these converted rs-operations, we con struct an algebraic query language, called “ns-algebra,” over the nested-sequence j I d ata model. j i To m otivate the nested sequences, consider the following: | i I I E x a m p le Figure 5.2 describes the “skeleton” of the books stored in a d a ta b a se .! | U nder the node “A uthors” is a list of authors. There is a list of chapters under the i node “C hapters” and a list of sections under each “C hapter,”-node. Clearly, “n ested 1 sequence” is a n atural “structure” for storing such d ata about books. Specifically, a ' book can be viewed as a sequence of length 3 where the first elem ent is a sequence ■ of authors, th e second element the title of the book and the th ird elem ent a list of chapters, each chapter being a list of sections. m .l m .km l.k l A uthor Author. Chaptei’i C hapterT O Authors Title Chapters Figure 5.2: A structure for books *This should not be confused with the well-known nested-relation data model [FT83, VF86]. j 111 Before describing th e d ata model for such databases as in the above example, the concepts of “nested elem ents” and “nested sequences” are first defined. Let A be an | t arb itrary non-em pty set of elements. Assume th a t “e,” “([” and “])” are new symbols! not in A. T he set of nested elements (over A ) is defined recursively as follows: (1)! ! each elem ent in A is a nested element over A\ and (2) (Ju]) is a nested elem ent if u\ is a sequence of n > 0 nested elements over A. The set of th e nested elem ents over,' A is denoted by A. For example, {ab^ccty} and a are both nested elem ents. Nested elem ents are denoted by a and b etc., possibly subscripted. Each sequence over A is called a nested sequence over A. Thus, the set of the nested sequences over A is denoted by A*. For example, abc, { [ab}{bc} and a{a&) are all nested sequences. The; nested sequences are denoted by u and v etc., possibly subscripted. We are now able to define the nested-sequence d ata model. Let IA be a non em pty set of elem ents called atoms and denoted by a and b etc, where “e,” and| are symbols not in IA. Each nested sequence in IA* is called a nested-sequence, data, or ns-data. For example, a{bcc(§ is an ns-data. Each finite set of ns-data is; t called a nested-sequence instance, or ns-instance. i Now let 1Z be a non-em pty set of elements called ns-relation names. Each elem ent j in 1Z is denoted by R. Each non-em pty finite subset of 1Z is called an ns-database| I scheme and denoted by D. Each m apping from an ns-database scheme D to the set | i of the ns-instances is called an ns-database instance. I I ! E x a m p le In the following examples, we point out four kinds of special ns-instances.. It tu rn s out th a t these ns-instances “correspond” to, respectively, the relations of; th e relational d a ta model [Cod70], the s-instances of Section 3.1, the nested-tuple! sequences of [GZC89] and the instances of the gram m atic d a ta m odel of [GPV89]. ! 112 (1) R ela tio n a l d a ta m o d el. An n-ary relation in the relational d a ta m odel is a finite set of n-ary tuples [Cod70]. A tuple can be viewed as a fixed- i ( length sequence. Therefore, each n-ary relation “corresponds” to a finite setj ! of sequences of the form a\ • • ■ an, where each a, is a basic elem ent (in U ) . ; Thus, by restricting the sequences appearing in ns-instances, relations in the] i traditional relational d ata model are “im itated” by ns-instances. j (2) S -d a ta b a ses. W hen a tuple is viewed as a sequence, each sequence tuple (see j I Section 3.1) can be regarded as a sequence of sequences. Then each n-ary s-1 i instance can be thought of as a finite set of sequences of the form (Jsi]) • • • {[.s„]), where each is a sequence of “basic” elements (in U). Clearly, each n-ary | s-instance “corresponds” to a finite set of nested sequences, i.e., an ns-instance. j (3) N e ste d -tu p le seq u en ce d a ta m o d el. Informally speaking, a nested-tuple | sequence is a finite sequence of tuples (of the same arity) w ith each entry of! the tuples being a basic element or a nested-tuple sequence [GZC89]. It is easy to see th a t a nested-tuple sequence is a nested sequence if tuples are viewed as i sequences. Hence, by restricting the form of the nested sequences appearing in i t an ns-instance, the nested-tuple sequences can be “sim ulated” by ns-instances. j » (4) G ra m m a tic d a ta m o d el. Intuitively speaking, an instance in a gram m atic! i d a ta m odel [GPV89] is a nested sequence w ith names attached to each “com ponent” sequence. By using the first elem ent of a sequence as the nam e of th e ' I sequence, each instance of the gram m atic d ata m odel can be “sim ulated” by i ! a nested sequence. I We now tu rn to defining an algebraic query language over the ns-databases. F irst, we extend the rs-operations to “nested m ergers” and “nested extractors” over ns-instances. In order to do this, we introduce “nested p attern s,” “cross products” of nested patterns w ith nested sequences, and an auxiliary m apping from such cross products to nested sequences. Let A be a non-em pty set of elements. Assume th a t the symbols “e ” “ and (* > 1) are n°f in A. For each k > 1, the set of nested (k-)pattern elements (over A ) is recursively defined as follows: (1) Each element of A is a nested C p a tte rn elem ent over A; and (2) is a nested p attern element over A if 1 < i < k and w is a sequence of n > 0 nested A;-pattern elements over A. The set of the nested j p attern elements over A is denoted by Patk(A). For example, and Oi arel b o th nested 1-pattern elements. Nested p attern elements are denoted by /?, possibly! t subscripted. Each sequence of nested p attern elements over A is called a nested'^ (k-)pattern (over A j and denoted by w, possibly subscripted. Thus, the set of thej nested Ar-patterns over A is denoted by Patk(A)*. For example, 0 : 10:2 and ^1o 2])1oi are both nested patterns. We now define, recursively, the “cross products” of the nested patterns over th e set Voo = {o;|f > 1 } and the nested sequences over U as follows: For each nested p attern w — (3\ ■ • • (3n over V*, and nested sequence u = d\- ■ ■ dm over K, let w ® u — 0 i < S > d i) • • • 0 m ® am), where for each 1 < i < n, 0 i , hi) if is in V oo {jWi < g > if Pi = U = and wi ® ii\ is defined undefined otherwise, if n = m and each (/? ,■ ( g > di) is defined. Otherwise, let w < g > u be undefined. For exam ple, let w = f1o io 2a i^ 1 and u — Then w ® u = {^(oi, a )(o 2, It is easy to see th a t ^ O i a ^ j < 8 > Q -b is undefined. Clearly, th e results of the above I I ___________________________________________________________________ m j (Pi ® «i) = 7i(Pj) = i cross products are nested patterns over the set of the pairs (a , a), where a is in V*, and a is a nested element, i.e., nested patterns over V*, x U, We now tu rn to the auxiliary mapping. For each i > 1, let 7 ; be the m apping from th e nested patterns over Voo x U to U* defined recursively as follows: For each nested p attern w = ■ ■ ■ jdn over Vx, x U, let 7 i(w) = 'fi(Pi) • • • 7 i 0 n)i where for each 1 < j < n, £ if = (cq, a) and I ^ i a if = {on,a) 7 ,-(i&i) if fa = and l ^ i j 1 _ (7 < (« » i)) if Pj = | Intuitively, the m apping 7 , • changes all to ]),. to ]) and (ai,u) to w, and deletesj everything else. For example, 7 i( k ( a i, « )f 2(o:i, 6 ) ) 2 (a 2,c))i) = {ab}. We are now able to define the “nested mergers.” For each n > 1 , an (n-ary) nested merger is a construct jW]]n, where W is a subset of P a tn(Vn)*. Each rc-aryi nested m erger |[W |n defines a mapping from 2 S °° x • • • x 2 S °° (n tim es) to 2 S °° as follows: For all subsets L \, ..., L n of let A A A A ..., L n) — {u| there exists w in W such th at 7 ;(u> ® u ) is defined and is in Li for each 1 < i < n}. As an example, let W = ( 3 0 3 ) 3) 2) 1 )5 “ 1 = (aK “ 2 = (&) and u3 — < [ 6]). Then dH^]]3 ({w 1}, {^2 }, {W 3 }) is the set containing only the nested sequence ( a ( 6 ( c ))). The “nested extractors” are now defined. A nested extractor is a construct of the form TTjpVj” 1, where n > l , l < i < n and IF is a subset of P a tn{Vn)*. Each 115 nested extractor defines a m apping from 2S °° to 2S °° as follows: For each: subset L of 53^, let 7T{I[lF]]~:l(Z/) = {«t| there exist u in L and u j, 1 < j < n and j ^ i, j such th a t u is in |W |n({«i}, • • •, {^n})}- J For example, let W = anc^ ™ = {abc§ {de f}). Then 7 T i 1({«}) =J k {ad}. I Similar to the “regular” rs-operations, a nested rs-operation is either a nested; i m erger [VF]n or a nested extractor ^[VF]]"1, where W is a regular subset of J P a tn(Vn)*. We now present some additional examples to illustrate the nested rs-operations. E x a m p le In the following, u and v are assumed to be nested sequences and L — (©)*• | I (1) Let Nest(w) = {u}). Clearly, N est(n) is the set containing; only the nested sequence obtained by placing a pair of ^ and ^ “around each; first-level elem ent” of u. For example, Nest(a(a6}6c) — I i (2) Let U nN est(u) = 7Ti|(f2a i^ 2)*]]21({“ })- Obviously, U nN est(u) is the set con-i i I taining only the nested sequence obtained by deleting the pair ^ and } “around! the first-level elem ents.” For example, U n N e st((a ^ 4 a ^}M^Mc)) = {a{ab}bc}.[ It is easy to see th a t UnNest(u) = {u} if v is in Nest(w). i (3) J a ia 5 a ia ia 5 a 2l 3(£ ,{ « } ,{ v } ) = I I I j We next describe the “selection” operations on ns-instances. Before doing this,j : I j we define the “ns-selection conditions.” An atomic ns-selection condition, denoted! ! : 116! by C, is a construct either of the form (u E [IF ]^ 1) or of fhe form C! I [W 2I 2 1)? where u is in U* and W , W -y and W2 are regular subsets of P a t 2(V2)*. For! * ^ 1 each atom ic ns-selection condition C and each v in 14*, let C{v) = T if (1) C — (u E an<f “ is m 1 ^ 1 and (2) C = (fW iJj1 C [W 2I 2 1) and [W iJj^-fv}) is a subset of [W 2I 2 ^{n}); and C (v ) = F otherwise. An ns-selection condition, denoted by C, is a boolean ^ form ula constructed from atom ic ns-selection conditions (formal details om itted).! T he m appings defined by atom ic ns-selections can be extended to (non-atom ic) ns- selection conditions in an obvious m anner (formal details om itted). For example, let C be the ns-selection condition (a E I( 2a i Q ;2^ 2Q ; 2l 2 X ) A (c E | a 2« i « 2]]2 ^ 1). Then! C{{abc§c) = T and C(a{ab}) = F. Clearly, C(v) = T if and only if (i) the first! elem ent of v is a sequence which has a as its first elem ent and (ii) the second element! of v is c. We are now ready for the “selection operations” over ns-instances. For each ns-, selection condition C and subset L of 14*, let cr^(L), called the ns-selection of L by, C , be the ns-instance (u E L\C(v) — T }. j For each ns-database scheme D , the set of ns-algebra expressions (over D) is) I ! defined recursively as follows: (1) Each ns-instance is an ns-algebra expression; I I (2) R is an ns-algebra expression if R is in D; ; 1 (3) [lF]]n( £ i, • • •, E n) is an ns-algebra expression if n > 1, each Ei is an ns-algebra expression and IF is a regular subset of P a tn(\4)*; ' 117 (4) ^ i\W ^ nl {E) is an ns-algebra expression if 1 < i < n, E is an ns-algebra I I expression over D , and I f is a regular subset of P a tn(Vn)*; i i • • > ^ ■ | (5) (? q{E ) is an ns-algebra expression if E is an ns-algebra expression and C is anj ns-selection condition; and (6) Ei U E 2, E i fl E 2 and Ei — E 2 are all ns-algebra expression if b o th Ei and E 2 i are ns-algebra expressions. The collection of all ns-algebra expressions is called ns-algebra. Each ns-algebra| j expression represents a m apping from the set of ns-database instances over D to ns-instances. Formally, let D be an ns-database scheme, E an ns-algebra expression over D and I d an ns-database instance over D. Then the value of E over I d , denoted E[I d ], is j I (1) I if E = / , where I is an ns-instance; I I 1 i (2) I d (R) if E = R, where R is in D\ (3) I W U R A I dI ■■■, E n[ID]) if E = I W U E l ,. ■ ■, £n); (4) r > m ^ ( E i [ I D]) if E = ^ m - ^ E i ) - (5) crc (Ei[ID]) if E = (T C{Ei)\ and (6) E i [Id \ U E 2[Id ] {E i [Id ]C)E2[Id \ or E i [Id \ —E 2[Id ], r espectively) if E = E xUE 2; (E = Ei fl E 2 or E = Ei — E 2, respectively). 1 Note th a t the “projection” operation is not included. The “usual” projections can be “sim ulated” by the extractors. Indeed, let R be a be an ns-relation nam e and I d an ns-database instance over {i?} such th at I d {R) is a set of sequences of length I 118 - 1 n. Then Id{R ) “sim ulates” an “ordinary” relation. (See the example “relational! i d ata m odel” given earlier in this section.) The expression 7Ti[[a1axa2]]_1(i?) gives the' first and the second columns of the tuples of the original “relation,” i.e., a projectionj of the original “relation.” ’ We conclude this section w ith some query examples illustrating the use of ns- algebra. E x a m p le Let R be an ns-relation nam e and D = {R}. Suppose each ns-database instance over D assigns to R a set of nested sequences about books w ith th e structure; i depicted in Figure 5.2. The following are three queries over D. \ (1) “Get the titles of the books each of which has only one au th o r.” This is! expressed in ns-algebra by: ! i 7 r l t t f 2 Q ! 2 } 2 Q : la 2 l l 2 1 ( j R ) - i j I The above ns-algebra expression uses a nested extractor to “pick” the second j nested elem ent of a given sequence whose first elem ent is a sequence of length j 1. (It is interesting to see th a t such a query can be expressed succinctly in' ns-algebra.) I I i (2) “Get the set of the first authors of the books.” This is expressed in ns-algebra by: 7 rl(IC2Q :X Q !2^2Q :2l2 {R)' (3) “R eturn the w riters who author at least two books.” This is expressed in, j ns-algebra by: j 7rl | I ( 2 Q:l ] ) 2 Q;2 l 2 1 ( 0 'C '( I « l Q ! 2 ] 2 ( 7 r i | ( 1Q!2Q!i a 2 D l 0 :2 l 2 1 ( - R ) ! 7rl I ( l Q!I ] ) l Q:2 ] ] j 1 ( ^ ) ) ) ) ! | where C = Q Ia 2f 2a 2a i a 2| 2l 2 * • Chapter 6 i Conclusion j i i i ! i i In this thesis, a theoretical study was initiated on the sequence operations used I i in database query languages. Specifically, two types of sequence operations, called] m ergers and extractors resp. and rs-operations collectively, were introduced and; extensively studied. Using a type of mechanical device called generic a-transducer,! it was shown th a t the set of mergers (extractors, resp.) is closed under com position.' It was also shown th a t there is no finite set of rs-operations (mergers and extractors^ resp.) which yields all rs-operations (mergers and extractors, resp.) under com po sition. T he second half of the thesis was devoted to showing th a t the rs-operations are readily applicable to database query languages. In particular, an extended relational d a ta model containing sequences was introduced and an algebraic query language called s-algebra, which uses rs-operations to deal w ith sequences, presented. Fur- i therm ore, a calculus-like query language based on SL, a logic about sequences using rs-operations as special predicates on sequences, was constructed on the extended, 1 relational d a ta model. (Intuitively, the calculus here extendes the relational calcu-! I lus by adding special “tools” to deal with sequences in the extended relational d ata' i I i j ___________________ _ _ _ ______________ 120: m odel.) A subclass of the calculus-like query language, called safe s-calculus, was shown to be equivalent to s-algebra in expressive power. Rs-operations and their extensions were also applied to two other situations, nam ely an extended Datalog and a nested sequence d ata model. . In closing, we m ention the following five areas which deserve exam ination: 1. Identify “interesting” subfamilies of rs-operations which have finite “generat- i ing” subsets. j i I 2. Find “intuitive” user interfaces which have the expressive power of safe s-| I calculus. I I j 3. Study properties of SL, e.g., (i) the relationship between SL and various tern-: poral logics and (ii) the com putation complexities of SL. ; I 4. Explore com putation aspects an d /o r optim izations of rs-operations. i 5. Investigate the applicability of the techniques and results of this thesis in the H um an Genome Project, an area where sequences are extensively dealt with. I \ I ,121] 1 Reference List [ABD+89] M. Atkinson, F. Bancilhon, D. D eW itt, K. D ittrich, D. M aier, and j S. Zdonik. The object-oriented database manifesto. In Proceedings of the International Conference on Deductive and Object Oriented Data bases, 1989. [AC085] A. Albano, L. Cardelli, and R. Orisini. Galileo: A strongly typed lan guage for complex objects. A C M Transactions on Database Systems, 10(2):230-260, 1985. ! I [AHU74] A. Aho, J. Hopcroft, and J. Ullman. The Design and Analysis of Com-j puter Algorithms. Addison-Wesley, 1974. J [Apt87] K. R. Apt. Introduction to logic programming. Technical R eport T R -; 87-35, Dept, of CS, University of Texas, Austin, 1987. [BCD89] F. Bancilhon, S. Cluet, and C. Delobel. A query language for the 0 2 object-oriented database system. In Database Programming Languages:, 2nd International Workshop. M organ-Kaufmann, Inc., June 1989. [CDV88] M. J. Carey, D. J. D eW itt, and S. L. Vandenberg. A d ata m odel and* query language for EXODUS. In Proc. of SIGMOD International Con-' ference on Management of Data, pages 413-423, 1988. { [Cod70] E. F. Codd. A relational model for large shared d a ta banks. Communi-1 cations of ACM, 13(6):377-387, 1970. ! I [Deu91] O. Deux et al. The 0 2 system. Communications of ACM, 34(10):34-48, | O ctober 1991. < I [FT83] P. C. Fisher and S. J. Thomas. O perators for non-first-norm al-form relations. In Proc. IEEE Computer Software Applications Conference, pages 464-475, 1983. i [Gin75] S. Ginsburg. Algebraic and Automata-Theoretic Properties of Formal Languages. McGraw-Hill Book Company, 1975. ! [GPV89] [GT86] [GT87] [GZC89] [Hor84] [HU69] [HU 79] [Llo84] [Mai83] [MH89] [Ont87] [PT86] [SSU90] [The90] [U1180] , „ j M. Gyssen, J. Paredaens, and D. Van Gucht. A gram m ar-based ap- j proach tow ard unifying hierarchical d ata models. In Proc. of SIGMOD^ International Conference on Management of Data, pages 263-272, 1989.1 I S. Ginsburg and K. Tanaka. Com putation-tuple sequences and object j histories. A C M Transactions on Database Systems, 11 (2): 186— 212, June! 1986. S. Ginsburg and C. Tang. Canonical forms for interval functions. Theo- j retical Computer Science, 54:299-313, 1987. j R. H. Giiting, R. Zicari, and D. M. Choy. An algebra for structured office j docum ents. A C M Transactions on Information Systems, 7, April 1989. j E. Horowitz. Fundamentals of Programming Languages, chapter 7. Com -1 puter Science Press, 2 edition, 1984. j I I J. E. Hopcroft and J. D. Ullman. Formal Languages and Their Relation! to Automata. Addison-Wesley, 1969. I | J. E. Hopcroft and J. D. Ullman. Introduction of Automata Theory, . Languages, and Computation. Addison-Wesley, 1979. ; i J. W. Lloyd. Foundations of Logic Programming. Springer- Verlag, 1984., D. Maier. The Theory of Relational Databases. Com puter Science P re ss,; 1983. D. M etzler and S. Haas. The constituent object parser: Syntactic stru c -! ture m atching for information retrieval. A C M Trans, on Informationj System, 7(3):292-316, 1989. I Ontologic, Inc. Vbase Technical Overview, version 1.0 edition, M arch J 1987. : i P. P istor and R. Traunmueller. A database language for sets, lists and; tables. Information Systems, ll(4):323-336, 1986. ' A. Silberschatz, M. Stonebraker, and J. D. Ullman. D atabase systems:! Achievements and opportunities. SIGMOD Record, 19(4):6-22, 1990. ] The Com m ittee for Advanced DBMS Function. Third-generation d a ta -, base system manifesto. SIGMOS Record, 19(3):31— 44, 1990. J. D. Ullman. Principles of Database Systems. Com puter Science Press, 1980. i ___ _ 12T [ U 1 1 8 8 ] [VD91] ! [VF86] J. D. Ullman. Principles of Database and Knowledge-base Systems. Com-j puter Science Press, 1988. ' S. Vandenberg and D. D eW itt. Algebraic support for complex objects w ith arrays, identity, and inheritance. In Proceedings o f the 1991 A C M SIGMOD International Conference on Management of Data, Denver,' Colorado, 1991. ! I D. Van Gucht and P. C. Fisher. Some classes of m ultilevel relational, structures. In Proc. of the 5th A C M Symp. on Principles of Databasej Systems, pages 60-69, 1986. i Symbol Index * 4 > n, 23 101 tt.-IQI-1: 115 = : 104 Soo: 7 ~ k, 60 < j : 5, 7*, 117 |=: 80 X: ^ W’ W’ *4 0: 75 13 adorn, adomk: 88 112 a rity : 71 ®: 7 C: 77 D O M ^: 90 - : 81 £: 58 \: 43 E q: 10^ /: 43, 47, 79 Lc: 77 $: tf, 75 At: ^1 *: 5 P : 77 [*"]«: 7 ■ p £ !9: 10^ I Q J n : 5 Pat: 114 lOln: 115 Sub: ^5 IQF: 72 r [ / D]: 84 IQ ]-f c : 72 U: 70, 112 7 115 Y x > : 7 e: ^ K : 7 em: 55 V: 77 10 7 r: 5, 75
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11255781
Unique identifier
UC11255781
Legacy Identifier
DP22856