Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Transformation techniques for parallel processing of production systems
(USC Thesis Other)
Transformation techniques for parallel processing of production systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
T R A N S F O R M A T IO N T E C H N IQ U E S F O R PA R A L L E L P R O C E S S IN G OF P R O D U C T IO N SY STEM S by Vishweshwar V. Dixit A Dissertation Presented to the F a c u l t y o f t h e G r a d u a t e S c h o o l U n i v e r s i t y o f S o u t h e r n C a l i f o r n ia In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Engineering) October 1987 Copyright 1987 Vishweshwar V. Dixit U M I Number: DP22764 All rights reserved INFORMATION TO ALL USERS T he quality of this reproduction is d ep en d en t upon the quality of the copy subm itted. In the unlikely event that the author did not sen d a com plete m anuscript and there are missing p ag es, th e se will be noted. Also, if m aterial had to be rem oved, a note will indicate the deletion. Published by P roQ uest LLC (2014). Copyright in the D issertation held by the Author. Dissertation Publishing UMI DP22764 Microform Edition © P roQ uest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United S ta tes C ode P roQ uest LLC. 789 E ast E isenhow er Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 - 1346 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CAUFORNIA 90089 PhD, CpS >£7 D6I9 i m This dissertation, written by ................... .Y ls h T f g s h w a ^ ..J .,..p ^ i.................................. under the direction of his. Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of DOCTOR OF PHILOSOPHY Dean of Graduate Studies D a te ( i / l . l / j . l . DISSERTATION COMMITTEE Chairperson C- - S> , To my parents Acknowledgements I am deeply grateful to my advisor Prof. Dan Moldovan for his guidance, support, understanding, and encouragement. He has been a constant source of ideas and insights upon which this dissertation rests. I thank Prof. Raghavendra and Prof. Les Gasser for being on my dissertation committee and Profs. E. Blum, Kai Hwang, Gerard Medioni, and Sarma Sastry for serving on my guidance committee. Their advice and criticisms have greatly affected this work. My thanks to Prof. Francesco Parisi-Presicce for his ideas and criticisms on the m athem atical aspects. My previous work on Com puter Vision with Profs. Ram Nevatia and G. Medioni has been helpful in this research. Thanks to Yu-Wen Tung and Fernando Tenorio whose early works gave me a start; M att Staker and Fernando Tenorio developed the early versions of j the software. I cannot thank enough my parents, sisters, and brothers for all their support and sacrifices. ! Many thanks to my roommates Ram arao Unnam and B. Srinivas for their understanding and many interesting discussions on diverse topics. My appreciation goes to the staff of the Electrical Engineering Departm ent, especially, Bill Bates, Diane Demetras, Christine E strada, and Carol Gordon. My sincere thanks to Frank Moskal and Abacus Program ming Corporation for providing an opportunity to finish my dissertation. Many thanks to all my colleagues who have m ade the last couple of years go by smoothly. i i i i Contents A cknow ledgem ents iii L ist o f F igures viii Sym b ols an d N o ta tio n s x A b stra ct xii 1 In trod u ction 1 1.1 Production S y s te m s .......................................................................... 2 1.1.1 Control in Production S y s te m s ........................................ 2 1.1.2 Parallelism in Production S y s te m s ................................. 4 1.2 Comparison with Previous S tu d ie s ............................................... 4 1.3 Overview ................................................................................................ 6 2 S ystem M odels 10 2.1 Production System M o d e l................................................................ 10 2.2 Hardware M o d e l .................................................................................... 15 2.3 S u m m a r y ................................................................................................ 18 3 D ep en d en ce A n alysis 19 3.1 Parallel P ro d u c tio n s .......................................................................... 20 3.1.1 D e p e n d e n c e s......................................................................... 21 3.1.2 Conditions for Parallel Firing of P r o d u c tio n s .............. 26 3.2 Communicating P roductions............................................................. 29 3.3 The R ETE A n a ly s is .......................................................................... 29 3.4 S u m m a ry ................................................................................................ 31 v 4 S ta te Space T ransform ations 32 4.1 Search Space R e d u c tio n .................................................................... 32 4.1.1 Search P ro c ed u re .................................................................... 33 4.1.2 An Example ......................................................................... 39 4.2 Sequential to Parallel T ra n s fo rm a tio n ....................................... 43 4.2.1 Parallel Search P ro c e d u re ................................................... 46 4.2.2 An Example .......................................................................... 49 4.3 7 r-A T ransform ation........................................................................... 51 4.3.1 Conditions for n — A T ran sfo rm ab ility ............................ 51 4.3.2 Firability C o n d itio n ............................................................. 55 4.3.3 Reachability C ondition.......................................................... 55 4.4 An E x am p le......................................................................................... 57 4.5 C y c le s ................................................................................................... 57 4.6 S u m m a ry ............................................................................................ 61 5 R u le Space T ransform ations 62 5.1 Composition ...................................................................................... 63 5.2 D e co m p o sitio n .................................................................................. 65 5.3 G raph Theoretic T ransform ations................................................. 72 5.4 S u m m a r y ............................................................................................ 74 6 T he A llo ca tio n P rob lem 75 6.1 Formulation of the Allocation P ro b le m ....................................... 75 6.1.1 Communication C o s t............................................................. 76 6.1.2 Parallelism Loss .................................................................... 76 6.1.3 The C o n s tr a in ts .................................................................... 77 6.1.4 0-1 Linear-Quadratic Program ming Formulation . . . 78 6.1.5 Solutions ................................................................................. 78 6.1.6 Reduction to 0-1 Linear F o rm ulation ............................... 80 6.2 A* Algorithm : A Heuristic M e t h o d .......................................... 81 6.2.1 State Space S e a rc h ................................................................. 82 6.2.2 The A* A lg o rith m ................................................................. 83 6.3 Distribution of Knowledge B a s e .................................................... 87 6.4 S u m m a r y ............................................................................................ 88 V I 7 M od elin g C ontrol and V ariables 89 7.1 C o n t r o l ................................................................................................ 89 7.1.1 Control Specification............................................................ 90 7.1.2 Transformations on Control Strings .............................. 92 7.2 Variables in a Production S y ste m ................................................. 94 7.2.1 Handling V ariab les............................................................... 94 7.3 S u m m a r y ............................................................................................ 97 8 C onclusions 98 A Softw are Tools 101 A .l P ro d u c tio n s.............................................................................................101 A.2 P r o g r a m s ................................................................................................102 A.3 An e x a m p le .............................................................................................103 B ib liograp h y 109 In d ex 113 L vii List of Figures 1.1 Notation for state expansion........................................................... 9 2.1 Graph representation of the production in Example 2.1 . . . 12 2.2 Derivation via the production of Example 2 . 1 .......................... 14 2.3 Cube processor . ............................................................................ 17 3.1 Possible reductions via two p ro d u c tio n s .................................... 20 3.2 Production system of Example 3 . 1 ................................................ 24 4.1 Case 1 in the proof of theorem 4 . 1 ................................................ 37 4.2 Case 2.1 in the proof of theorem 4 . 1 ............................................. 40 4.3 Original search space (80 s t a t e s ) ................................................... 41 4.4 Reduced search space (sequential; 37 states, 54% reduction) . 42 4.5 Search space re d u c tio n ..................................................................... 44 4.6 Serial to parallel transform ation.................................................... 45 4.7 Fully parallel reduced search s p a c e .............................................. 50 4.8 Parallel productions in a sequence .............................................. 52 4.9 Situation and serial to parallel tran sfo rm atio n .......................... 53 4.10 7 r — A tran sfo rm atio n ............................................. 54 4.11 Original search space for PS of Example 4.3 58 4.12 Serial to parallel transform ation of PS4 .................................... 59 viii 4.13 7 r — A transform ation of P S 4 ........................................................... 60 5.1 Transformations on Dependence G r a p h s ........................... . . . 73 6.1 A Tree of 3 p ro c e sso rs............................................................ . . . 85 6.2 Search tree generated by A* algorithm for Example 6.1 . . . 86 7.1 D atabase and a rule for example 7 . 1 ................................. . . . 95 A .l Program f l o w ............................................................................. . . . 104 Symbols and Notations S ectio n S y m b o l/ N o ta tio n M e a n in g 1.1 Stj State obtained by firing production Pj from state Si 1.1 Pi Production num bered i 2.1 c Context graph of a production P . 2.1 K Interface graph of a production P . 2.1 L LHS graph of a production P . 2.1 R RHS graph of a production P . 2.2 D Minimum distance m atrix. 3 G = ^ H Derivation of graph H from graph G via production P. 3.1 S = (B , id) State S is defined by its knowledge base B and the derivation sequence id. 3.1.1 Pim P2 Productions Pi and P2 are input depen dent. 3.1.1 Pl<p2 Production Pi is output-input dependent on production P2. 3.1.1 P l> p 2 Production Pj is input-output dependent on production P2. 3.1.1 p Parallelism m atrix. 3.1.1 C Communication m atrix. 3.1.2 A + B Union of A and B in the context of graphs and sets._ 3.1.2 A B Intersection of A and B in the context of graphs and sets. 3.1.2 L*,C*,R* LHS, context, and RHS graphs, respec tively, excluding the interface graph. x 4.1.1 Ci Conflict set of state S '*. 4.1.1 Ft Forbidden set of state 5,-. 4.1.1 n f Parallel set of production P,. 4.1.1 n Set of all productions. 4.2.1 n / Parallel set of a set of rules {P»|* £ I} . 5.1 Pi o P 2 Composition of productions Pi and P 2. 5.1 D io The elements causing the input-output dependence. 5.1 Doi The elements causing the output-input dependence. 5.2 P1 Q P 2 P i is a partial production of Pg. 6 r Number of rules in a production system. 6 n Number of processors. 6 X The allocation m atrix. 6.1.1 A • B For two matrices A and Y,i aijbij. 6.1.1 E c Communication costs. 6.1.2 EP Parallelism loss. 6.1.2 E The objective function E c -f E p. 6 .1.6 <$ Equals 0 or 1; 1 if rule k is assigned to processor * and rule I is assigned to pro cessor j . 6 .2.2 M P ath cost to reach node s. 6.2.2 P ath cost assuming rule i is assigned to processor j . 6.2.2 h(s) Heuristic cost to reach goal from node s. 6.2.2 h*(s) Minimal cost to reach goal from node s. 6.2.2 [ . Xij = l] An allocation m atrix with just the ele m ent Xij = 1. 7.1 X C . y x is an acceptable value for variable y. 7.1.1.2 ( p y Zero or more successive firings of produc tion P . 7.1.2 P i IIP2 Pi and P 2 are parallel; Execute P x and P2 in parallel. 7.1.2 e Null production (identity) 7.2.1 dom (x) Domain of x: the set of legal values for variable x. 7.2.1 mgu(U, V) Most general unifier of U and V . Abstract Rule-based systems or production systems form a general and im portant com putational mechanism of Artificial Intelligence. Nevertheless, the poor performance of production systems on sequential machines has been an im pedim ent to their widespread acceptance. Parallel processing can be effec tively used to meet the symbolic processing demands of production systems. Present parallel implementations have not reported substantial speedups. It then becomes necessary to go beyond apparent parallelism to reveal all par allelism, perhaps, by transform ing the production system into an equivalent highly parallel system. In this thesis, transform ation techniques to detect and utilize parallelism in production systems are presented. The concept of dependences among rules is used in the analysis to reveal the inherent parallelism and communication requirements. This leads to a procedure for reduction of the search space. The procedure does not exclude the use of heuristics and can be used for sequential or parallel processing. A straightforw ard parallelization may reduce the height of the search tree but broadens the base exponentially. Another transform ation is introduced to shorten the base of the search tree. Composition and decomposition of rules and graph theoretic transform ations to further reduce the search space are presented. Partitioning of knowledge ____________________________________________________________________ xii base and rules among processors can be done based on the dependences. This allocation problem is reduced to 0-1 linear programming. A heuristic algorithm to directly solve the allocation problem is presented. Also, control in a production system can be distributed. Control, when spec ified by a context free language, can be transform ed into control productions which can be mixed with other productions. Dependence analysis of control productions could lead to ‘parallelizing compilers’ for production systems. xiii : Chapter 1 Introduction Problems in Artificial Intelligence (AI) employ various knowledge representa tion schemes such as semantic networks, first-order logic, and frames. Com plex operations such as m atching, unification, and search are needed for rea soning. Many of these AI problems turn out to be combinatorial in nature and end up placing heavy demands on system requirements. Conventional Von Neumann type architectures fail to meet these symbolic processing re quirements. W ith the distribution of control and knowledge, parallel pro cessing may be effectively used for AI problems. C urrent VLSI technology with its ability to build chips with a large num ber of processors can make parallel processing a reality. Active memory elements may be constructed to support the representation schemes and operations of AI. Along w ith the benefits, parallel processing brings forth new problems and issues: communi cation overhead, partitioning of the knowledge base, m apping of algorithms into architectures, and others. These issues, in the context of production systems, have become the focus of this research. 1.1 Production Systems Production systems first proposed by Post [Pos43] have become an impor tan t and general com putational mechanism of AI. A rule based system or production system is a knowledge representation and processing mechanism ! ! widely used in AI applications where the problem can be expressed as a set of rules called productions and a knowledge base on which these productions operate. A pure production system[DK76] is composed of (l) declarative knowledge base or database, (2) productions which act upon the database, and (3) control knowledge or interpreter of the rules. The database contains objects and relations. Each production is a condition-action pair of the form if Ci and C2 and ... Cn th e n Ai, A2, ... Am The condition part of a production m ust be satisfied before the actions can be perform ed upon the knowledge base. The knowledge base is transform ed from one state to another by applying or firing a production whose condition p art is satisfied. The states th at can be generated by application all possible rule sequences constitute the state space. The state space is searched for one or more states satisfying the given goal conditions. As shown in figure 1.1 the state is obtained by the application of rule P3 - to state Si. Application of rules in this m anner to reach a goal state unfolds an exponentially growing search tree. 1.1.1 C on tro l in P r o d u c tio n S y stem s l Given an initial state and a goal condition, the set of states th at can be 1 generated which satisfy the goal condition is called the solution set. Deter- 2 initiation of the solution set involves generating the states in some order. W ith h as the height of the search tree and a branching factor b there are 0 (b h+1) possible states in the search space. In general, generation of all the possible states is neither feasible nor required. Control knowledge determines which of the possible states are generated and in which order they are gener ated. Control in a pure production system is non-deterministic in the sense th a t the exact path from the initial conditions to the goal state remains un specified and unrestricted. The control for the non-determ inistic production system simply evaluates the conditions to determine the applicable produc tions. All the productions applicable at a given state form the conflict set. A production from the conflict set is chosen non-deterministically and fired to obtain a new state. This process is repeated until a state is reached which satisfies the goal. Im plem entation of a production system on a deterministic machine requires the specification of the order in which the rules are to be applied. This ] p art of the control mechanism th at selects a production from the conflict set is called conflict resolution strategy. The mechanism which determines the order in which the states are expanded is called backtracking strategy. Such an implementation which starts from the initial state is called forward chaining or data driven strategy. The state space may also be searched start ing from the goal which is called backward chaining or goal directed strategy. One may also employ both forward and backward chainings which would i then be called bidirectional strategy. Thus, control consists of strategies for conflict resolution, backtracking, and rule chaining. Ultimately, control depends upon the application domain. Nevertheless, the strategies such as selecting the first production in an or 3 dered set, selecting a production on the basis of repeated usage and recency [MF78], and selecting the m ost general rule [LBM85] are used in practice. j 1.1 .2 P a ra llelism in P r o d u c tio n S y stem s Parallelism exists at several levels in a production system including • parallelism between rules : simultaneous m atching and simultaneous execution of productions and • parallelism within a rule : simultaneous processing of relations forming a rule. Parallelism is apparent when the rules do not have anything in common. It is not so obvious when the rules interact. A state may perm it two interacting rules to be fired in parallel depending on the type of the interaction. 1.2 Comparison with Previous Studies Parallelism in production systems has been studied by Gupta[Gup84a], Teno rio and Moldovan[TM84], and Oflazer[Ofl84]. Forgy[For82] proposed the R ETE algorithm primarily for speeding up m atching operations in uniproces sor production systems and then adopted it for parallel environments. P ar allel architectures were investigated by Oflazer[Ofl84], Stolfo[Sto85], Shaw [Sha85] and others. Gupta[Gup84b] has estim ated th at, in a sequential pro duction system, 90% of the time is spent in m atching productions, 8% in conflict resolution, and only 2% in actual execution of rules. The processing model described by Moldovan [Mol87] almost eliminates the ■ m atching at the expense of increased communication time. Unlike the RETE algorithm , productions are fired in parallel in this model. Processors com m unicate by message passing. Portions of the database and the rules are distributed among the processors. Throughput is maximized by exploiting parallelism and minimizing interprocessor communications. The task then is to partition the database and assign the rules to processors to maximize | the throughput, which precisely is the allocation problem. Studies by G upta concentrate on measurements of available parallelism in production systems. No attem pt is m ade to discover the hidden parallelism. Im plem entations of production systems using R ETE algorithm by G upta and Forgy do not post high speedups. The problem here is th a t the production systems were w ritten in serial language and only apparent parallelism was utilized. A parallel architecture w ith parallel control was not simulated but only statistical information on nodes of the R ETE network was gathered. Thus, control issues, search space reduction, and composition and decompo sition of rules were not addressed. From these studies it is apparent th at much of the parallelism in production systems is hidden. In this work, transform ations to expose parallelism are proposed. It is shown th at static analysis of rules leads to reduction of the search space and improved heuristics for the allocation problem. Tenorio [Ten86] recognizes the fact th at transform ations are necessary to discover parallelism in production systems. Essentially the same model of production systems as in this work is used. Nevertheless, Tenorio does not provide a rigorous treatm ent of parallelism and search space reduction. The assum ption th at two rules axe parallel if they are free of input and input- output dependences is incorrect (cf. lemma 3.1). A rigorous and uniform I i m ethod for handling variables is not developed. 5 In this research, a rigorous procedure for search space reduction is given. The search space thus produced is proven to be minimal and complete. The composition and decomposition of theorems give the m ethods to compose and decompose rules in addition to the necessary conditions. The produc- ! tion system model is extended and the dependences are redefined to handle variables. Also, control and its parallelization are addressed. 1.3 Overview This research considers the parallel processing of production systems. Issues considered are reduction of the combinatorial search space, partitioning of the knowledge base, and m apping of productions into processors. Techniques th at transform from the sequential to the parallel are of three kinds: search space, rule-space, and rule-space to processor-space transform ations. The search space techniques reduce size of the combinatorial space. The rule-space techniques transform the given productions into other productions in order to restrict and reduce the search space and to reveal and exploit parallelism. The rem aining concerns focus upon partitioning and m apping of rules and knowledge base into a multiprocessor system. The specification and parallelization of control and handling variables are also m ajor issues. S y ste m M od els: Chapter 2 contains the descriptions of the models used for production sytems and hardware. G raph gram m ar is used to model production systems. Given a rule in if-th e n form, its representation in the graph gram m ar model is illustrated. W ith the simple need to capture the communication costs, the hardw are model is presented as a modified connectivity m atrix. 6 D ep en d en ces: In C hapter 3 the dependences between two rules are defined which form the crux of the techniques described herein. The input- output relations among the rules are statically analyzed to detect parallelism ! and comm unication requirements among rules. The following chapters dis- j cuss the transform ation techniques utilizing the dependence analysis. i I S earch S p a ce R ed u ction : Enormous savings in com putation costs can be m ade by reducing the size of search tree or by limiting its expo nential growth. Detection of parallel rules leads to reduction of the search space: in the sequential processing case the duplicate states are not visited and in the parallel processing case the rules can be fired in parallel besides avoiding duplicate states. Again, based on the dependences among the rules, conditions for merging of branches of the search tree are discovered. These transform ations for search space reduction are covered in C hapter 4. T ran sform ation o f R ules: Duplication of common elements and ac tions m ay be eliminated by combining rules. This may reduce communica tions and increase performance. By decomposing complex rules into simpler, perhaps parallel, rules parallelism and performance may be increased. Con ditions for composition and decomposition of rules are discovered and are shown to be related to the basic dependences among rules. Composition and decomposition of rules are discussed in Chapter 5. A llo c a tio n P rob lem : In order to be able to execute a production system on a multiprocessor system, the rules and the knowledge base m ust be partitioned and m apped into hardware. The problem is stated as a 0-1 i quadratic program m ing problem. It is then reduced to a 0-1 linear program- ! ming problem at the cost of introducing new variables and constraints. A 7 heuristic algorithm also is given to solve the allocation problem directly. This partitioning and allocation problem is considered in chapter 6 . C on trol an d V ariables: Production systems use variables and control knowledge. The dependence analysis of rules w ith variables can be performed j by unification. W ith this observation, the dependences are redefined. Dis tribution of control much like the distribution of productions is suggested. These issues of control and handling variables in productions and in the j database are discussed in chapter 7. 1 Finally, Appendix summarizes the software tools developed in the course of this research. As the experience w ith the rule-based programming grows, larger and larger production systems will be w ritten. History has shown, as in the case of XCON expert system for configuring the VAX systems which grew to hun dreds of rules from the initial 30 rules, th a t production systems tend to grow rapidly as expertise is accumulated. It then becomes very crucial to be able to process production systems in parallel. And in the light of the studies by G upta and others who reported limited speedups, techniques to transform the production systems for parallel processing become necessary. Since the techniques described herein are based on the static analysis of rules it is hoped th a t they would lead to ‘parallelizing compilers’ for production systems. However, static analysis of the rules does have its lim itations and cannot reveal or utilize all the available parallelism. Dynamic analysis and explicit introduction of parallelism in productions and in control by knowl- i edge engineers may be considered. s 1 p p p p j . p i g • • • -i r • • • ■* a i i i i sZ\ r & I . . . \s~] . . . f & " Figure 1.1: Notation for state expansion ! Chapter 2 i I I j | System Models In this chapter, models for production system and hardw are are presented. Production systems are modeled using graph gram m ar where the rules and the working memory are represented by graphs. An example illustrates how a rule in the if-th e n form may be represented in this model. A model for hardw are is necessary for the allocation problem where pro ductions and the knowledge base are m apped into processing and memory elements. W ith the simple need to capture communication costs, the hard ware model is a modified connectivity m atrix. 2.1 Production System Model i I G raph gram m ars can be effectively used to describe the behavior of produc- j tion systems. Productions are considered as m anipulations on labeled graphs | ! representing the d ata base [Ehr78,Nag78]. A simplified model as in [M0I86] f is adopted here. D e fin itio n 2.1 A production or rule is a 7-tuple C t c P : L K R where L, R , K , and C are, respectively, left, right, interface, and context graphs and I , r, and c are graph morphisms. For the sake of simplicity, the morphisms I, r, and c are assumed to be identities. The interface, K , is the persistent part of the production which remains unchanged upon firing the production. E x a m p le 2.1 Consider the following production from the example in the Appendix: (ru le (2 monkey grasps bananas ) (if (and (monkey on table) (monkey hungry for bananas) (monkey holding nil) (bananas above table) )) (th e n (+ (monkey has bananas)) (- (monkey holding nil)) )) This rule can be w ritten in the graph notation as shown in figure 2.1. The conditions in the if p art represent the context graph under which this production becomes eligible for firing. Some of the conditions except those th at are removed can be considered to be in the gluing graph. The elements removed along with the gluing graph form the L graph and the elements added along w ith the gluing graph form the R graph. c hungry monkey bananas on table . holding _ monkey --------- *— ► ml i____________ i 11 L mo: holdinq ik e y ...........* • ’nil K mo:rkey < tab on f le tab on le 1 R , has monkey on Jab le. Figure 2.1: G raph representation of the production in Exam ple 2.1 12 Let G be the knowledge base before applying P , and H be the knowledge base after applying P. D e fin itio n 2.2 A derivation G = > H via P are the gluing diagrams which commute L «-*- K R / g I C k i h i / G D H Since the diagrams commute the path K — > L — ► G is equivalent to K — s - D — * ■ G and K — * ■ R — * ■ H is equivalent to K — * D — * H . Also K — * C — > G is equivalent to K — * D — » G. The m orphisms associated w ith graphs indicate m appings, for example, IK is the image of K in L, essentially a subgraph of L, and g lK is the image of IK in G and in this case may contain variable instantiations. E xa m p le 2.2 The application of the production in example 2.1 to an interm ediate state is shown in figure 2.2. This model differs from th at described in [Ehr78,Nag78] by explicit repre sentation of the context graph C. Separation of the context from the m a nipulations enhances the expressiveness of the gram m ar. More im portantly ! it would allow the results to be carried over to and from the set theoretic domain. I I I It may be noted th at the graph gram m ar model is an abstract tool and, ! w ith the extensions proposed by Parisi et al. [PEM87], has expressive power | comparable to the languages used for expert systems such as OPS5[LBM85]. i hungry , holding _ monkey --------- a— ► nil holding hungry monkey on table stick on floor Figure 2.2: Derivation via the production of Example 2.1 2.2 Hardware Model T he hardw are model consists of an ensemble of processor and memory el- j ements and an interconnection network. Shared memory is not the traget j architecture here since it allows for limited growth. It may also become the ; bottleneck due to synchronization and other im plem entation considerations i and limit the exploitation of parallelism. Message passing system seems to be the m ost flexible alternative in term s of providing parallelism as the system grows. Productions are partitioned and allocated to processing elements. Similarly the knowledge base is distributed among the local memory elements. Each processor has the responsibility of m atching and firing of the productions allocated to it. The processors m ay be independent (MIMD) or centrally controlled and synchronized (SIMD). Either the processor which fires a production sends messages to other proces sors which contain the affected productions or the affected processors inquire when needed. In both the cases some communication is needed among pro cessors having related productions or sharing the knowledge base. The prim ary goal of the parallel processing, namely the increased through put, is heavily dependent on the comm unication costs. Those costs are dic tated largely by the interconnection network used. For the purposes herein, it suffices to capture the communication costs in the hardw are model. D e fin itio n 2.3 The m inim um distance m atrix D E 3?nx" characterizing the system of n processors is constructed such that dij represents the m inim um cost of communicating from processor i to processor j . 15 W ithout loss of generality, all processors may be assumed to be identical. A system w ith non-identical processors is handled by suitably modifying the elements of the distance m atrix. The differences in the processors character ized by their differences in speeds can be reflected in the distance m atrix. E xam p le 2.3 Consider the system w ith 8 processors w ith cube connection as shown in figure 2.3. Assuming identical links w ith unit costs, the minimum distance m atrix is 0.0 1.0 1.0 2.0 1.0 2.0 2.0 3.0 1.0 0.0 2.0 1.0 2.0 1.0 3.0 2.0 1.0 2.0 0.0 1.0 2.0 3.0 1.0 2.0 2.0 1.0 1.0 0.0 3.0 2.0 2.0 1.0 1.0 2.0 2.0 3.0 0.0 1.0 1.0 2.0 2.0 1.0 3.0 2.0 1.0 0.0 2.0 1.0 2.0 3.0 1.0 2.0 1.0 2.0 0.0 1.0 3.0 2.0 2.0 1.0 2.0 1.0 1.0 0.0 The algebraic characterization is simple and amenable to m athem atical ma- i nipulations. It is capable of handling • non-identical links (links w ith different costs), • unidirectional or bidirectional links between processors, • m ultiple links between two processors. ; For more accurate modeling of the comm unication costs and to be able to [ handle m ultistage networks, enhancements are needed. 5 8 Figure 2.3: Cube processor 17 On close observation, this model does not exclude the shared memory im plem entation. Assuming th a t productions are still distributed among the processors and the shared memory is used for the working memory of a pro duction system , the shared memory can be considered as a processor which acts as a comm unicator among the processors. The comm unicator, in this view, is a passive element and the processors have to dem and the d ata as they need. Thus, the communication cost between any two processors is the sum of the tim e to dem and the data from the shared memory and the delay introduced by the shared memory. The shared memory may introduce delays due to lim itations on the num ber of ports available for input and output and the necessity for memory conflict resolution. In spite of the above observa tion, shared memory is to be avoided for the fact th a t it limits parallelism as the system grows. 2.3 Summary The hardw are model is a collection of processing and memory elements op erating in MIMD or SIMD mode. Communication costs as required by the allocation problem discussed in C hapter 6 are captured by the modified con nectivity m atrix. G raph gram m ar is used to model production systems. Context graph was specified explicitly to be able to simplify proofs and to be able to carry the results between graph gram m ar model and set theoretic domain. G raph gram m ar when extended w ith variables as discussed in C hapter 7 is capable of expressing practical production systems. The next chapter defines the basic dependences among productions using this model. 18 | Chapter 3 Dependence Analysis T he analysis of rule interdependences is of fundam ental im portance to paral lel processing of production systems. There are five possible ways to reduce a graph G to a graph H by applying two productions as shown in figure 3.1. The results and the execution times for the five possible ways of reductions are not necessarily identical. It is expected th at the execution times tp, tc, and t e for parallel, composed, and sequential reductions respectively satisfy the relations tp < tc < tg. And so, from tim e considerations alone, reductions in non-sequential ways are to be preferred. Observations reveal th at the dependences between the rules dictate which of the five ways of reductions are possible. It may be noted th a t these results can be extended to more than two rules at a time. In this chapter, it is shown th at the analysis of dependences leads to detection | of parallelism and comm unication requirements. 19 G ^ X ^ H G ^ X ^ H sequential Q Pj\\P2 jj parallel G H G P ^ £ H composed Figure 3.1: Possible reductions via two productions 3.1 Parallel Productions If two rules P i and P 2 produce identical states independent of their order of application then both of them may be fired simultaneously to produce the final state. Hence the key to detection of parallelism is to discover commu tative rules. D e fin itio n 3.1 The id o f a state Sij...k is its derivation sequence i j ■ • • k. D e fin itio n 3.2 A state S is a tuple (B , id) where B is the associated database D e fin itio n 3.3 Two states S and S ' are equivalent, written S = S ', if B — B ' and id of S is a permutation of id! of S'. D e fin itio n 3.4 Two rules Pi and Pj eligible to be fired from a state S are com m utative relative to state S if B ij = Bji. If the resulting database is the same w ithout regard to the order of applica tion of two rules then the rules are comm utative. 20 3.1.1 D e p e n d e n c e s The com m utative rules can be discovered based on the dependence relations among the rules. Several types of dependences are defined below. W hen the context graph of a production is destroyed by the firing of another production then the two productions are said to be input dependent. D e fin itio n 3.5 Two rules P and P' are said to be input dependent, written P tx i P', if c n {V - K') ± 0 v C " n {L — K ) / 0 W hen a production stands to destroy the elements added by another pro duction then the latter production is said to be output-input dependent on the former. D e fin itio n 3.6 A rule P is output-input dependent on a rule P ’, written P < P \ if R n (L1 — K ') # 0 Similarly, when a production is enabled by firing another production it is said to be input-output dependent on the latter. D e fin itio n 3.7 A rule P is input-output dependent on a rule P ', written P > P ', if C n (R 1 - K ') # 0 Based on the above dependences between rules, two Boolean m atrices called parallelism matrix P and communication matrix C are constructed as follows: D e fin itio n 3.8 The parallelism m atrix P £ {0 , l} rxr for a production sys- j tern with r rules is constructed such that an element in row i and column j i I I \ 1 if P{ XPj V Pi<Pj V Pj<Pi i Pa = \ . | [ 0 otherwise I D e fin itio n 3 .9 The comm unication m atrix C G {0, l} rxr for a production system with r rules is constructed such that an element in row i and column j Cij = 1 if PiMPj V Pi>Pj V Pj>Pi 0 otherwise It m ay be noted th at the parallelism m atrix P and the comm unication m atrix C are symmetric by definition. As can be seen later, = 0 is a necessary condition for two rules Pi and Pj to be parallel. E x a m p le 3.1 Consider a production system with 6 rules with the starting state and the goal as given below. The elements are of the form P (a, b) where a and b are objects or nodes and P is a predicate or a relation between the objects. See the following table of productions and the corresponding figure 3.2. For example, the colored graph representation of rule 1 may correspond to the following English sentence: P I: if Adam owns a car and car is registered in California and car has color green th e n Adam owns a car and Adam lives in California and car has color green The following notation was used for representing the production in the graph gram m ar framework. 22 a : Adam b : car d : green c : California P : owns Q : registered R : has color S : lives in The rules shown in figure 3.2 are sum m arized in the following table: Rule C = L K R Pi P{a,b) Q(b, c) R(b, d) P (a , 6) R(b,d) P{a,b) R(b, d) S(a, c) P2 P{a,b) S{b,c) P (a,b) P (a,b) T(c,f) S(e,g) P3 P(a,b) T(a, e) T{g,a) R(g,d) S(e,g) R {b,d) S{e,g) R(g,d) R(e,d) R(g,d) S(e,g) T(e,g) P4 S(e,g) S(h,e ) R(g,h) R(g,k) S(k,i) S (k ,j ) R(g,k) S(e,g ) R(g,k ) S(e,g) P 5 R{e,d) S{e,g)' R{g,d) N {f,g ) Pe R(e,d) S(e,g) R{g,d) R(e,d) S(e,g) R{g,d) R{e,d) S(e,g) R{g,d) N(g,h) The initial and the goal states are: initial state P (a , b) Q(b,c) R(b,d) S(b,c) T(a,e) T{g, a) S(h,e ) S{e,g) S(e,g) R{g,d) S(a,e) R(g,h) R(g,k) S(k,i) S(k,j) goal state S(c,a) T{c,f) T{e,g) N{g,h) S(h,e ) R (g,h) R(g,k) S(k,i) S(k,j) S(j,g) S(a,e) R(e,k) Rule 1 and rule 3 are input dependent because of the common element P (a , b). Also, P\<Ps and Ps<P5- Every rule is not necessarily input de pendent on itself, e.g., see rule 6. All the dependences are summarized in the following arrays: a - ! ~ s « > 7 a - £ - ^ e i * A - t — * I ' I ' * ' ........T . A P r B A ' B m I s JC C , T f E — 5 - ^ - G - * „ V ~ N v . ] « t - 5 — G W , {■ C - i — G * K i ^ G E — --------- D S \ c / R 4 > E — -------- P \ A E _ I _ D V E ~ l — S \ ' / m H Figure 3.2: Production System of Example 3.1 24 Input dependence Output — input dependence 1 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 1 1 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 Input — output dependence 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 These lead to : Parallelism matrix communication matrix 1 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 and c = 0 0 0 1 1 0 0 1 0 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 1 0 0 1 1 0 1 0 25 3.1 .2 C o n d itio n s for P a ra llel F irin g o f P r o d u c tio n s It is obvious th a t com m utativity is a necessary condition for two rules to be fired simultaneously. Some other conditions may be imposed by the paral lelism model used [MY80]. Assuming the basic model, P r o p o s itio n 3.1 Parallelism = Commutativity. The term parallelism, is reserved for the ability to fire two rules in an arbi trary order. This neither implies nor requires the total absence of interac tions between the two rules. This is dem onstrated from the following lemma which shows th a t the parallelism m atrix does indeed capture the essence parallelism. The condition for two rules to be com m utative relative to a given state is given by the following lemma. L e m m a 3.1 Two rules P, and Pj applicable in a state S are parallel or commutative relative to S if and only if p ^ — 0. Proof: (pij = 0 = > parallel) In w hat follows, A + B stands for A U B and A B stands for A n B . Pij = 0 = > ■ -i(P,txPy V Pi<Pj V Pj<Pi) i.e., Ci{Lj - Kj) = 0 A Cj{Li - K i) = 0 (3.1.1) and Ri{Lj - K j) = 0 A Rj{Li - Ki) = 0 (3.1.2) In order to show th at the two rules are parallel, it is enough, due to proposition 3.1, to show th at they axe com m utative when they are both 26 eligible to fire in a given state. Hence, let S = (B , A) be the current state such th a t both the rules P,- and Pj are applicable. Then Cj C B and Cj C B, i.e., C iB = Ci A C jB = Cj (3.1.3) Applying the rule P,- to S we get S, Bt = (B — Li) + Ri (3.1.4) W ithout loss of generality, for a rule L < — K — * R let L*, C*, and R* be defined such th at L* = L - K and L = L* + K C* = C - K and C = C* + K (3.1.5) R* = R — K and R = R* + K Thus, L*HR* = 0 (3.1.6) Now, from (3.1.4) and (3.1.5) Bt = B - { L * + K i ) + R i = BLfKi + Ri (3.1.7) Hence, CjBi = CjBL*Ki + CjRi = C jL \K i + CjRi from (3.1.3) = C jR i + CjRi 27 = Cj(Ki + Ri) from (3.1.1) = Cs{Ki + Ki + i2?) from (3.1.5) = cv = ► Cj C Bi (3.1.8) Therefore rule Pj is applicable to state Si to produce the state Sij B ij = (Bi — Lj) + R j — B L iL j + R iL j + R j (3.1.9) Similarly, Now, Bji = B L jL i + R jL i + Ri (3.1.10) R iLj = R i(L j + K j) = RiLjKj = R i(L j - K j)K j = R iK j from (3.1.2) (3.1.11) Hence, R iL j + R j = R iK j + R* + K j = Ri + R j + K j = Ri + R j Similarly R jL i + Ri = R j + R % - Thus B ^ = B L iL j + Ri + R j = B L jL i + R j + Ri = Bji and hence Sij ~ Sj{. 28 A similar proof for ‘rules P, and Pj parallel => = O’ can be provided. Since this result is not used here the proof is om itted. 3.2 Communicating Productions ! i i ! Interactions among productions give rise to comm unication requirem ents in a m ultiprocessor environment. Any knowledge of such interactions should help the allocation strategy to minimize communications and increase throughput. In a production system there are only a few ways in which a production can interact w ith another. For example, when a production is fired it would destroy the preconditions of those productions which are input dependent upon it and they would be removed from the conflict set, and it would deposit elements required by input-output dependent productions which may then be added to the conflict set. A prior analysis of the productions should detect these interactions. Note whether or not the conflict set is changed there still needs to be communication by which the processors, somewhat like the nodes of a R E TE network, update the status of the rules they contain. Hence, P r o p o s itio n 3.2 the input or input-output dependences among productions induce communications in a multiprocessor environment. Clearly, then the C m atrix represents comm unication requirements. i 3.3 The RETE Analysis ! 1 ; The R E TE algorithm [For82] is an indexing scheme. The indexing functions are simple p attern or feature recognizers. The patterns in the productions are analyzed, and compiled into a network of nodes where each node tests for a specific pattern. P atterns common to productions are tested only once. W hen a pattern traverses the network starting at the root it sends the instan tiating token to all the productions containing th a t pattern. The status of each production is m aintained in the network. Thus, iterating over working m emory as well as production memory is avoided. Parallel interpreters for the R ETE algorithm have been studied by G upta ; [Gup84a], Stolfo [Sto85], and Oshisanwo and Dasiewicz [OD87]. Node level parallelism activates m any nodes of the R ETE network in parallel. This requires [OD87] fast process creation, scheduling, activation, suspension and term ination mechanisms. It also requires a global comm unication network or shared memory providing equal access to all processors. Meeting these re quirem ents needs special hardw are schedulers, com m unicators, a n d /o r com plex interconnection networks. The R ETE algorithm also allows production level parallelism. Here, pro ductions are m atched in parallel and the actions in a given production are processed serially [OD87]. This does not require complex hardw are. How ever, A m dahl’s law sets a bound on the achievable speedup due to serial processing of actions. In comparison, the dependence analysis also avoids iterating over production and the working memories since the processors m aintain the status of produc tions. The execution of an action by a processor usually results in commu nications to other processors containing the affected productions. Commu nication traffic should be minimal since only a small num ber of productions are affected by a change to the working memory. However, good allocation i j strategies can minimize communications and maximize parallelism. Thus, 30___ although the actions are processed serially, the processing tim e is traded for comm unication time. U pdating the working memory, m atching and updat ing the status of productions are done in parallel. Firing a production is reduced to message passing. Though not necessary, a sophisticated commu nication network could improve the speed. This scheme is comparable to the I ‘condition level parallelism ’ described in [OD87]. 3.4 Summary Interdependences among rules are analyzed to discover parallel and commu nicating rules. The processing scheme relies heavily upon message passing mechanism. The dependence analysis is compared to the R E TE analysis. Com munication traffic is expected to be small since productions when fired cause a few changes to the working memory. Nevertheless, communication bottlenecks may still develop. For this reason, it is im portant to limit the search space, transform the rules, and have a good allocation strategy. The dependence analysis is crucial to these m ethods described in the following chapters. This analysis leads to search space reduction as described in Chap ter 4. C hapter 5 discusses transform ations of rules for increasing parallelism and decreasing communications. C hapter 6 addresses the allocation problem. I ( ! i i i i 31 Chapter 4 State Space Transformations In this chapter the conditions for rules to be parallel are presented. Then a rigorous procedure for search space reduction is given. The reduced search space is shown to be minimal and complete. A distinct advantage of this m ethod is th a t it does not exclude the use of heuristics for further reductions. The search procedure is then shown to be usable for parallel processing. The rem ainder of the chapter discusses the transform ations to further reduce the search space. 4.1 Search Space Reduction A reduction in the search space can be obtained by elim inating duplicate states. If two rules Pi and P2 produce identical states independent of their I order of application then only one of the order need be specified in the control. Hence the key to search space reduction is to discover commutative rules. | The m ethod for discovering com m utative rules has already been presented j in section 3.1.2. i I ! 32 Now consider a rule th a t has no preconditions. Trivially, its preconditions are always satisfied and the rule is always in the conflict set. It is conceiv able th a t the rule m ight be fired in succession. However, repeated firing is not necessary since the rule does not add facts th a t are not already in the knowledge base. Hence, Lemma 4.1 Repeated firing of a rule P i does not alter the knowledge base, i.e., Bu i — B ,, Proof: W ithout loss of generality, let S — (B , A) be the current state from which the rule P, : L, * — K , — ► Ri is repeatedly firable. This implies P, is not self input-dependent, th a t is, (Ct - — K i)L i = 0 and (L, — Ki)C{ = 0. The first firing results in the state S, = (B ,,i) where Bi = B — {Li — Ki) + Ri. The second firing results in the state Su = (B u ,ii) where Bn — Bi — {Li — Ki) + Ri = B — {Li — Ki) + Ri = Bi. The argum ent is completed by induction. Using parallelism in rules and the above repetition lemma a procedure for search space reduction can be developed. 4 .1 .1 S earch P ro c ed u r e Based upon the preceding analysis of dependences a procedure for search ing the states can be w ritten. This procedure eliminates visiting duplicate states during the search. If two rules are parallel then only one sequence of application need appear in the search. The rules will be arbitrarily ordered for the purpose of identification of the possible sequences of parallel rules. Let the rules be num bered from 1 to r in an arbitrary m anner, and let the firings be such th a t the subsequences of parallel rules are increasing in their 33 rule num bering. Thus rule Pj which is parallel w ith Pi can be fired on state Si only if j > i. Let the parallel set 1 1 , - of rule P , - be the set of rule P , and all the rules th at can be parallel w ith rule P j, th a t is, I I , = {P,} U {Pj\pij = 0} Also, let II ! denote the set of all rules, th a t is, II = { P ,- ,* ‘ = l..r} i For each state Si it is desirable to construct a forbidden set P, consisting of those rules th a t m ust not be fired from state Si in order th a t redundant states are not produced. How is the forbidden set com puted? Consider the situation in Figure 1.1. Following the imposed order on the rules, any rule Pa com m utative with rule Pj m ust not be fired from state Sij if it is already fired from state £,• and k < i. Also, if a parallel rule P* is forbidden in state Si then it m ust be forbidden from state S^ . Hence the following definition. D e fin itio n 4.1 The forbidden set is computed as F = 0 (4.1.1) Fij = n, n {Fj U {P»|s,i € T ree A j > k }) Note the use of *>’ in the definition of the forbidden set above. The ‘> ’ part is due to the imposed order to avoid duplicate sequence of rules. The equality p art is due to the fact, from the repetition lemma 4.1, th a t firing a rule more than once in succession does not add any more facts to the database. The forbidden set for each state is com puted as part of the search procedure. This strategy together w ith any heuristics can be used to effectively reduce the search space. The sequential search procedure is given next. S e a rc h P ro c e d u re (sequential firing) 1. Preprocessing (a) C onstruct the parallelism m atrix P . (b) For each rule i construct the parallel sets IT ,-. 2. Initialization (a) Let current state Si * — (B,id)\ Fi < — {}. (b) If current state is the goal then stop. 3. Search (a) matching : compute conflict set C ,- for state S ',- (b) Forbidden Set : com pute forbidden set Fi ac cording to Eq. 4.1.1. (c) resolution : From the applicable rules (conflict set) select a rule Pj £ Fi according to heuris tics. Pj € E Ci\ A Pj ^ Fi. (d) firing/state expansion: i. Fire the rule Pj to obtain Sij. ii. If S^ is the goal then stop. (e) If dictated by search strategy goto 3.(c) to se lect another rule. (f) Select, according to heuristics, the next state to expand (g) Go to 3.(a). The theorem below proves th a t all the states are visited by the above pro cedure and th a t the search tree does not contain equivalent derivations. D e fin itio n 4.2 A search tree is complete if and only if addition of a legal i path to the tree produces a state which is equivalent to some other state i i already in the tree. D e fin itio n 4.3 A search tree is m inim al if removal of a state from the tree renders the tree incomplete. The following theorems conclude th at the search procedure produces a min imal and complete tree. T h e o re m 4 .1 The search procedure produces a complete Tree. proof: (by induction and contradiction) Assume th a t there exists a legal state Spipa— Pi— P n ^ Tree (4.1.2) Further let SpiP2— P i ^ Tree (4.1.3) The basis of the induction is the initial state S which m ust be in the Tree. Now it is necessary to show th a t Sp1 pa...pfpi+ 1 € Tree. This situation where | both (4.1.2) and (4.1.3) hold could arise only if rules PP i and PPi+l are com- I m utative and only in two (m utually non-exclusive) cases : ! 1 Case 1. pi+1 < pi and SPlP2...P i+ 1 € Tree. See Figure 4.1. This implies ' 5 p i P 2- - - p . - — i p . ' + i p . - £ Tree I s i 'PiPa—Pi PiPa—P i-iP i+ i P 1P3 —P i- lP i+lP i ^ P iP a—P i-iP iP i+ x Figure 4.1: Case 1 in the proof of theorem 4.1 37 However, this is equivalent to ^piP2 -p,-ip.p.+i ^ Tree since PP { and PP i+ 1 are comm utative. Thus, by induction, SPlP2_ ._ Pi.._ P n £ Tree. j Contradiction! Case 2. PPi+1 G Fp1 p2...p{ Note Fx is the set of forbidden rules for state Sx. This case could arise only if the rules Pp._1,P Pi, and PPi+l are com m utative and only in the following two cases: Case 2.1. p,+i < pj_1 < p, and SPlP2,.,Pi_lPi+1 € Tree. See figure 4.2 This implies ^piP2 - p .- 2P.+iPi-iP.- Tree However, c = c PlP2"*P»— 2Pt+lP« — lP« ^PlP2— P»-2P»-lP»P»+l since the rules PPi+l, Ppt-i, and PP i are comm utative. Now by induction this leads to contradiction as before. Case 2.2 Pp. £ FpiP2--Pi-i This is the same as case 2 but w ith one less reduction. This will have to be repeated until the initial state S is reached which clearly is in the Tree as its root. This is a contradiction. T h e o re m 4.2 The search procedure produces a minimal Tree. Proofidea: To the contrary, assume th at two equivalent states could be found in the I j Tree. Let pip2 .. .p n be a non-identity perm utation of qiq2 ... qn and i ^PiPa-.-Pn € Tree = < S '9ig2.,.9 ri £ Tree ______ 38 i The proof runs along the same lines as for theorem 4.1 to show th a t pip2 • • • Pn = Q 1Q 2 • • • Q n which is a contradiction. The details are om itted. 4 .1 .2 A n E x a m p le E xam p le 4.1 Consider the production system in example 3.1. The original search space to depth 5 consists of 80 states as shown in figure 4.3. The par allel sets are as follows: III = {1 2 4 5 6} n 2 = {1 2 4 6} n 3 = {3 4 6} n 4 = {1 2 3 4 6} n5 = {15} II6 = {1 2 3 4 6} T he forbidden sets after the initial firings are as follows: Fi = {1} F t = {12 } Fs = {3} F4 = {1 2 3 4} The reduced search space obtained w ith the search procedure is shown in figure 4.4 according to the expansion algorithm contains only 37 states. This amounts to 54% reduction over the original search space of depth 5. The reduction m ay be very significant for practical production 1 systems. 39 s > Pi- 1 ?P iP 2— P f-a P i-lP .- P iP a-'P i— 2 P1— 1 Pr PlP2**.P»— 2P»4*1 i ■'PiPa— P i-a P f+ iP i-i ■ P p , PlP2--P«— 2P»+lP i-lP t ■'P1P2— P ,-2 P .-lP .P ,+ l Figure 4.2: Case 2.1 in the proof of theorem 4.1 40 Figure 4.3: Original search space (80 states) ! 41 Fire production Pn Figure 4.4: Reduced search space (sequential; 37 states, 54% reduction) 42 4.2 Sequential to Parallel Transformation The search procedure in section 4.1.1 still did not fire the productions in par allel. The procedure effectively m ade the search tree leaner but the height re m ained the same. Firing productions in parallel will shorten the tree height, j This is depicted in figure 4.5. I i J If there is a chain of parallel productions they can potentially be executed ! in parallel. Recognizing such chains w ith look ahead mechanism would be prohibitive and no guarantee can be m ade about the state skipping. However, it may be observed th a t a chain of two productions occurs in the (reduced) search space if and only if the two productions also appear in the same level, i.e., the states 5,-y,S,-,Sj always occur together. The occurrence of Si [5j] in the tree and the productions being parallel implies Pj [P,] is applicable to state Sj [5,] resulting in the state Sii [5*1. A simple justification for the observation would be th at the states S ij, S ',-, Sj are not equivalent and hence a complete, minimal or not, tree m ust con tain all of them . Clearly, this claim holds for chains having more than two productions. Now consider the transform ation shown in figure 4.6. It adds to the tree all combinations of parallel productions in a given level. Addition of such states keeps the tree complete but not minimal since equivalent states obtainable I | by applying the rules in the increasing order of rule num bers already appear in the tree. 43 Original Parallelized Original Reduced Sequential Reduced 7 r — A reduced Parallel i 1.2 o o Figure 4.6: Serial to parallel transform ation 45 4 .2 .1 P a ra llel S earch P ro c ed u r e It is possible to keep the tree minimal by suitable modifications to the proce dure in section 4.1.1. In order to avoid duplication of the states introduced by the transform ation shown in figure 4.6 one m ust totally forbid the appli- t cation of parallel rules in sequence. Let S j denote the state obtained from state S by applying the productions i {i^S* € /} in parallel. I ! Let the parallel set 11/ of a set of rules {P»|* € 1} be the set of possibly m utually parallel rules {P<|* £ I}, th a t is, V*',i 6 I\i ± j,P ij = 0 or, 11/ = n,e/IL Again, let II denote the set of all rules, th a t is, II = {P i,i = l..r} The com putation of the forbidden set can then be done by D efin itio n 4.4 F = {} (4.2.4) F u = Jlj n {Ft U Hk \ShjuK) e Tree) \ 1 i Partial parallelism means not all possible combinations of m utually parallel I rules can be fired in parallel. This m ay reflect the num ber of available pro- cessors and such other lim itations. For example, according to heuristics, say 10 m utually parallel rules are to be fired in parallel. However, there being only 8 processors it is not possible to fire them all in parallel. It m ay also reflect the conditions in the knowledge base which do not enable all parallel rules at a given step. For example, of the input-output dependent but paral lel rules Pi and P2 only P j is firable at a given step. Firing P i then enables P 2 to be fired. In such a case, obviously, sequential firing of parallel rules m ust be allowed to preserve the completeness of the search tree. If full parallelism is utilized no chains of parallel productions will ever be encountered in the tree and hence the recursive definition of the forbidden set can be simplified: F = 0 (4.2.5) Fi j = n j However, it is very rare th at full parallelism exists in a production system due to the state of knowledge base and the rule conditions as explained above. The parallel search procedure can now be w ritten as follows: 47 S e a rc h P ro c e d u re (full/partial parallel firing) 1. Preprocessing (a) C onstruct the parallelism m atrix P . (b) For each rule i construct the parallel sets Ilj. 2. Initialization (a) Let current state S i < — [B, id)] Fi < — {}. (b) If current state is the goal then stop. 3. Search (a) matching : com pute the conflict set C j for state S i (b) forbidden set : com pute Fi according Eq. 4.2.4 or 4.2.5. (c) resolution : From the applicable rules (conflict set) select mutually parallel rules P j = {Pj\Pj £ F i} according to heuristics: P j C C i A P j C I I j A P j n F r = 0. (d) firing i. Fire the rules P j in parallel to obtain S u . ii. If S u is the goal then stop. (e) If search strategies dictate goto 3.(c) to select an other set of rules. (f) Select, according to heuristics, the next state to expand (g) Go to 3.(a). T h e o re m 4 .3 The parallel search procedure produces a complete and m ini mal search tree. j proofidea: j Clearly, by the additive nature of the transform ation the tree rem ains com- ( plete. It is helpful to consider the set of m utually parallel rules fired as a single rule. Full Parallelism: The firing of two parallel rules in sequence is clearly for bidden by Eq. 4.2.5 and hence there is no possibility of equivalent states occurring in the search tree. The tree remains minimal. Partied Parallelism: This is a com bination of the full parallelism and no parallelism. By considering the m utually parallel rules th a t are fired as a single rule, the proof is essentially the same as the proofs of theorems 4.1 and 4.2. 4 .2 .2 A n E x a m p le i E x a m p le 4.2 Consider the six rule production system of example 3.1. A fully parallel search tree is shown in figure 4.7. Com pare this w ith the reduced t sequential search space shown in figure 4.4. By coincidence the num ber ! of states has rem ained the same b u t the tree height has been reduced . to 5 from 7. In general, the num ber of states is expected to increase i i and the height to decrease. 49 © Note: denotes i.e. P-i and P2 executed in paxailel ......................... -........................ J 1 I Figure 4.7: Fully parallel reduced search space 4.3 7 r -A Transformation It has already been shown th at the sequence of com m utative rules can be executed in parallel which reduces the state space as shown in figure 4.8. The side effect, however undesirable it may be, is th a t the interm ediate state is skipped. Consider the situation in figure 4.9(a). The rules Pi and P 2 are parallel but not Pi and P 3. Firing productions Pi and P 2 in parallel could preclude visiting the state S 1 3 and hence Siz appears explicitly in the serial to parallel transform ation in Figure 4.9(b). The transform ation sought in figure 4.10 is such th a t the state S 1 3 C Sjujs. This, in spite of skipping the interm ediate state by parallel firing, still allows one to reach the goals reachable from the original state S 13. This transform ation would keep the parallel search space leaner and yet the goal is guaranteed to be reachable. 4 .3 .1 C o n d itio n s for w — X T ran sform ab ility Firing of rule P2 m ust not prevent firing of rule P 3 or the rules fired after rule P 3, i.e., the following two conditions m ust be satisfied: • firability condition: the rule P 3 m ust be firable from state S(i2), and • reachability condition: the goals reached by states S 13, S 134, S 1345, ... m ust still be reachable. A m athem atical form ulation these conditions is given in the following sec- i = rule num ber B = knowledge base at state 5,- _ _ _ _ _ _ _ _ _ _ _ _I _ _ _ _ _ _ _ _ _ _ _ _ _i Figure 4.8: Parallel productions in a sequence © (a) Situation (3) (b) T ransform ation 1,2 S , ! \ B i = rule num ber B = knowledge base at state Si l _ _ _______________________ :____________i Figure 4.9: Situation and serial to parallel transform ation 0 O O i = rule num ber B = knowledge base at state Si ________________________ j Figure 4.10: % — A transform ation 4 .3 .2 F ira b ility C o n d itio n The fact th at P3 m ust be firable from S ^ ) (firability condition) implies C3 C {Si - LI) U R I See Eq. 3.1.5 T h at is, Cs = ( ( s t - l ;) + i% )c s = (SiLZ + R^Cs = s J 4 c 3 + r ; c 3 Since P3 was firable from Si, SiC 3 = C3. Hence, c s = l * c 3 + r ; c 3 = {L* + R * 2)Cs S ince L* n R* = 0 by d efin ition, lz + r ; = l ; Hence, i s = i s - i j or, 0 = C3 n {L2 - K 2) (4.3.6) Note th a t equation 4.3.6 is satisfied if P3 and P2 are not input dependent. 4 .3 .3 R ea c h a b ility C o n d itio n In order th a t the states S i3 and its children be unaffected by the parallel firing of Pi and P2 (reachability condition) it m ust be S u C S(u), (4.3.7) 55 This implies Now, ’(12)3 l~ l ^13 — < S i3 (4.3.8) ! Sis = SxLg + i?* (4.3.9) ; S(i2 ) 3 = { S J 4 + R * 2)L* + R * 3 (4.3.10) i j From Eq. 4.3.9 and 4.3.10, S(i2)3 n Sis = ({S iL i + R l) L % + R l) {S iL i + R l) = {{SiL i + R D L lS iL l) + r ; = (I4 + r ; ) S i L i + r ; (4 .3.11) Since L 2 + R *2 = L 2, from 4.3.11 S(12,s n S u = S iL lL l + R * 3 = S xL t + R l (4.3.12) From 4.3.8, 4.3.9, and 4.3.12 L l C (4.3.13) As expected, this means P2 m ust not take away anything th a t P 3 does not. This too can be determ ined statically like the dependences. As can be seen, these conditions are related to the basic dependences. Hence 7 r — A transform ations can be carried out based on the dependence m atrices and w ithout checking the working memory. 56 4.4 An Example E xam p le 4.3 Consider the production system PS4 of 4 rules: P : L* = C* < — K — > R * Pi: A < — — ► B C H P 2: D *— C — > E F P3: B C < — H — ► GD P4: G H < — D — ► A C Initial State: A C D The rules P 2 and P3 satisfy the conditions 4.3.6 and 4.3.13. Also, the rules Pi and P 2 are parallel. The original search space is shown in figure 4.11. The serial to parallel transform ation produces a tree of shorter height as shown in figure 4.12. Further application of the 7 r — A transform ation produces a leaner search space as shown in figure 4.13. 4.5 Cycles W hen a previously fired rule becomes eligible again, and if fired, could take the knowledge base to a ‘previous sta te ’. This could create an endless cycle. Cycles are detrim ental to production systems since the goal may never be | reached once stuck in a cycle. Although cycles may be characterized in many i ! ways it is difficult to derive any efficient means of detecting or avoiding the , cycles. An example of cycle follows. 57 ACD ACEF ABCEFH ADEFGH ACDEF rule num ber B = knowledge base at state Si _____________________________ i Figure 4.11: Original search space for PS of Exam ple 4.3 ABCDH ADGH ACD 58 ACD BCDEFH ACEF DEFGH ACDEF BCDH DGH ACD i = rule num ber B = knowledge base at state Si i __ - i — - j Figure 4.12: Serial to parallel transform ation of PS4 59 ACD i r BCDEFH O DEFGH 1' ACDEF i = rule num ber B = knowledge base at state Si Figure 4.13: ir — A transform ation of PS4 E x a m p le 4.4 Consider the three productions: Pt : A B — ► C ; P2: C D — ► EF] and P3: E F — ► A B D . If the initial state contains the token A B D then a cycle P iP 2P2 clearly | exists. In this cycle, it m ay be noted th at, application of the three | rules in sequence leaves the knowledge base unchanged. I i Exam ple 4.4 leads to the observation: a cycle Ptl P,-2 . . . P,n of length n is completed in state if the knowledge bases satisfy the relation B x C Bjfjij,,, jn. This means th a t if the net change in the database is only additive after a sequence of rules then they could give rise to a cycle. This may suggest th at by checking previous states in backward order in_1 5 ..., for net non-additive change, a cycle could be detected and avoided. However, this requires memorizing or computing again the net changes for every state in the path, neither of which is efficient and acceptable. 4.6 Summary Dependence analysis leads to transform ations for reduction of the search space. Sequential and parallel search procedures th a t produce minim al and complete search trees were presented. Heuristics also m ay be used to further i reduce the search space. 7 r — A transform ation for further reduction was in- i ; troduced and conditions for its applicability were derived. These conditions i were found to be related to the basic dependences and hence n — A transfor- * m ations can be applied w ithout regard to the state of knowledge base. I Chapter 5 Rule Space Transformations Techniques for search space reduction considered in C hapter 4 are applicable during execution. The state space may be further reduced if the productions or the rules are preprocessed. Composing productions into larger produc tions m ay improve efficiency and decomposing productions into prim itive i productions m ay expose parallelism and improve speedup. The io dependent composition lemma is an adaptation from [EK78,ER78]. Usually, there will be groups of productions which deal w ith specific items J and tasks of an expert system. Partitioning of productions into such groups I may lead to faster allocation and better control strategies. The allocation i j problem is considered in C hapter 6. Control issues and handling of vari- | ables in productions and in the knowledge base are discussed in C hapter 7. | Com position and decomposition of rules discussed in this chapter may help J regroup the rules. i _________ 62 j 5.1 Composition W hen heuristics perm it, the derivation sequence via input-output dependent productions may be collapsed by applying a single production which is the composition of the productions in the sequence. L e m m a 5.1 Two productions Pi and P2 are composable into a third pro duction Ps, denoted P 3 = Pi ° Pi, if Pi is not input dependent on Pi. Proof: Follows from the definition of the input dependence. Now, C iB \ = C2{B — L \) + C iR \. Since P2 is not input dependent on P i, C2 n (Li — K i) = 0. Hence, C2B 1 = C2{B + K i + R i) = C2(B + R i) Ci C (P i " i- P i) Therefore the production P2 is firable after P i is fired. L e m m a 5.2 The io-dependent composition P3 of two input-output depen dent productions P\ and P 2 when P2 is firable after Pi is given by Ci U {Ci - D i0) t P3 : L i U L i • < — K i U {K2 — Di0) — ^ P i U P 2 where D i0 , called io-dependence, stands for some of the elements causing the production P2 to be input-output dependent on production Px, namely, D ^ c ((P i - P i) n c2 ). .63 Proofidea: A proof in the graph gram m ar framework can be found in [EK78,ER78], though the same proof technique cannot be used here. It is sufficient to show th a t SPlp2 = Spl 0 p2- By definition, Sp 3 = (B — (L i + L i)) + R \ + Ri- Assuming the production P\ is not output-input dependent on Pi i.e., R \ n (L2 — K i) = 0, Sp3 ~ BL±Li -t~ Ri " h R2 = B L \L i + JR1L2 + Ri = (BL\ + R\)Li + Ri = ((jB — L\) -f- — Li) + Ri = s PiPi Such productions reduce the burden of m atching of the common gluing and context elements. A sim ilar definition and lemma can be w ritten for output-input dependent productions. L em m a 5.3 The oi-dependent composition P 3 of two output-input depen dent productions Pi and Pi when Pi is firable after Pi is given by C 1 U C 1 t Pz : L i U {L i — D 0i) « — K i U K i —» ■ (I?i — D ai) U R i where, Doi, called oi-dependence, stands for some of the elements causing the production Pi to be output-input dependent on production P i, namely, , D ^ C i i R i - K j n L i ) . \ ! ; Such productions avoid adding and then replacing of elements causing the output-input dependence. This could increase the system throughput. * j 64 Based on the above results a generalized composition theorem can be ob tained: T h e o re m 5.1 The composition P3 of two productions Pi and P 2, denoted P 3 = P i o P 2 when P 2 is firable after Pi is given by C i U (C 2 - D i0) t P 3 : L \ U (X »2 — Doi) < — K \ U { K 2 — Dio) — * (Pi — Doi) U P2 Proof: Follows from the lemmas 5.2 and 5.3. It is im portant to note th a t dependences other than input, input-output, and output-input dependences do not affect the composition theorem . The composition can be carried out w ithout regard to the database, hence it is static and can be perform ed prior to execution and only once. E x a m p le 5.1 Consider the input-output dependent productions P 2 and P 5 from ex ample 3.1. The dependences £ > ,« , and D 0 { are both S(e,g). The two rules and their composition are shown in the table below. L = C K R P 2: P(a,b) S{b,c) P(a,b) P(a,b) T (c ,f) S{e,g) P 5: R{e,d) S(e,g ) R{g,d) N{f,9) P 2 0 P 5: P{a,b) S(b,c) R(e,d) R(g,d) P(a,b) P(a.b) T(c,f) N(f,g) I 5.2 Decomposition The composition theorem 5.1 allows merging of comm unicating productions I to reduce or to eliminate the comm unication overhead. Reduction of com- I j m unications may be desirable in itself. Nevertheless, the composition of productions has the undesirable side effect of destroying parallelism. Trade offs can be m ade between the two conflicting effects. To do so, an analogous procedure for decomposition of a compound production into sim pler but possibly comm unicating productions can be proposed. D e fin itio n 5.1 A production C' T c P ' : V K ' R ' is said to be a partial production of a production C T c P : L J - K - U R denoted as P ' C P , if 1. K ' C K and 2. V C L or R! C R ; S. the morphism I' [r1 ] is a restriction of the morphism I [r] with respect to ( L ',K ‘) [(R ',K ')J. D e fin itio n 5.2 Two partial productions Pi and P 2 of production P are said to cover the production P if 1. K = (K t U K 3), 2. L C L i U L 2, and 8. R Q R \ U R 2. 66 D e fin itio n 5.3 A n ordered pair of productions (Pi, P 2) is a decomposition of a production P if Px o P 2 = P , that is, given any arbitrary state S containing the context graph of P and P i, B p = B px p2. T h e o re m 5.2 A n ordered pair of productions (Pi Pz) is a decomposition of | production P if and only if 1. CX = C and L x C L, 2. Cz C . (Cx — L i) + P i , S. Pi and P 2 are partial productions of P , 4. P i and Pz together cover P , 5. R x — R = Lz — L Q U where U is the token domain of the production system, and 6. P 2 - P = 0 Comment: The conditions (3-6) make sure the effect of the com ponent pro ductions is same as th at of the original production. The condition (1) ensures th a t P i is firable under the same context. The condition (2) allows the sec ond production to be fired after the first one. The condition (5) requires th a t additional tokens introduced m ust be brand new i.e., they do not oc cur in any other productions. The new tokens are usually introduced for synchronization purpose. I Proof: (If p a r t) Let R x — R = L z ~ L = T, the set of new tokens. Since the two productions are partial and together cover P L x + Lz = L T (5.2.1) P 1 + P 2 = P + r (5.2.2) 67 Also from condition (6) and 5.2.2 L 2 — T C L R i - T C. R (5.2.3) Let the current state S be such th a t P is firable. Since C = C i, P i is also firable from S . This yields j?Pj = (1? — L i) + R \ = B L \ + R \ Since C \ C B C 2 C C L i + R x C £ Pl This implies P 2 is firable from Spl yielding J5pjP2 = B L \ L 2 + R \L 2 + R 2 — B {L \ + L 2) + R iL 2 + R 2 — B {L T) -f- R \L 2 -t- Rz from 5.2.1 Since the knowledge base B or the LHS of production P cannot have the new tokens introduced by T B plpi = B L + R \L 2 + R 2 (5.2.4) Noting th at R 2L 2 C R 2, B ptp2 — B L + R \L 2 -f- R 2L 2 - 4- R 2 = B L + (i?i + R 2)L 2 + R 2 = b l + { r + t)TT 2 + R 2 from 5.2.2 Since T C L 2 JB p,p 2 = B L + R L 2 + R 2 = B L + { R - K 2) + R 2 = BL + R since K 2 C R 2 = BP 68 (O n ly if p a r t ) It is required to show th at the conditions (l)-(6) are necessary. Condition 1: W ithout condition (l) the production P\ cannot be fired under sim ilar situation as the original production P . If Ci C C then P\ can be fired more often than P . If C C C\ then Pi may not be applicable when P is. Condition 2: P 2 may not be firable after Pi if (Ci — Li) + R i C C2. Condition 3: Arguments m ust be given for the three conditions for Pi to be a partial production of P . Condition on the context graph is already shown to be necessary. If the gluing graph K% is not a subgraph of K then the unrelated elements may be replaced. If L i — L 7^ 0 then the application of Pi will annihilate some tokens th at P would not. It may be argued th a t P 2 could be arranged to complement the effect of L i to produce the same net effect. However, th a t is possible only if the annihilated tokens already exist before Px is fired, otherwise P 2 would add those tokens and the result will not be same as firing the production P . Hence Pi m ust necessarily be a partial production of P . Condition 4: Clearly, the net effect m ust be at least as much as th a t of the original production P in annihilation and in cre ation of tokens under same gluing conditions. Condition 5: In order not to affect the other productions only new j tokens can used for synchronization. Also the final result m ust ! have neither more nor less tokens than the result th a t can be obtained by the original production. Condition 6: R 2 may not add extra elements not in R because th at effect cannot be undone as it is the last action by the ordered pair (P i,P 2). t I Clearly, decomposition of a production into partial productions th a t together i j cover the original production is not unique. However, the decomposition where no new tokens are added is of particular interest: C o ro lla ry A decomposition of production P , not input dependent on itself, into (P i,P 2) such that R i — R = L 2 — L = 0 results in a pair of parallel productions P i || P 2. Proof: Check the input and output-input dependences: Input Dependence: Self input independence of P implies C*L* — 0 Since no new tokens are added and the production is covered L — L \ U p 2 * Then, C*L\ + C *L * 2 = 0 or C*L\ = 0 and C*L\ = 0. Now, Ci n ( l 2 - k 2) = + k ) l ; = c m = C*L; = 0 Also from condition of (2) of the decomposition theorem C 2 U (Li - K i) C (Ci - L x + R i) U L{ However, C \L \ = 0 and R \L \ = 0. This means the productions are not input dependent. Output-input Dependence: R \ P I (L 2 — K 2) C R n L = 0 since no new tokens axe added. Similarly, R 2 n (Li — K i) = 0. Hence the | two productions are not output-input dependent. 70 Consider the following rules from the m onkey-and-bananas problem in Appendix: (ru le (5 monkey fetches bananas) (if (and (monkey holding nil) (monkey hungry for bananas) (bananas on floor) )) (th e n (- (monkey holding nil)) (- (bananas on floor)) (+ (monkey has bananas)) )) can be decomposed into input-output dependent rules: (ru le (5-1 monkey bends down) (if (and (monkey holding nil) (monkey hungry for bananas) (bananas on floor) )) (th e n (- (monkey holding nil)) (+ (monkey has bent down)) (+ (monkey has bananas)) )) and (ru le (5-2 monkey rises w ith bananas) (if (and (monkey hungry for bananas) (bananas on floor) )) (- (monkey has bent down)) (th e n (- (bananas on floor)) )) However, nothing is gained by this decomposition as the original rule contains only a couple of action items to be divided among the resulting rules. It does not increase parallelism but adds communication. Such decompositions should be avoided. W hen the given production is self input dependent, it is still possible, as seen in example 5.2, to decompose it into prim itive productions. However, it autom atically implies output-input dependence and hence a non-parallel decomposition. Nevertheless, decomposition into non-parallel productions may still be considered since it tries to exploit parallelism within a rule and could be gainful. 5.3 Graph Theoretic Transformations The dependences among productions can be represented by graphs where nodes represent the productions and arcs represent the dependences. Graphs for each type of dependence are considered separately. Consider the three transform ations shown in figure 5.1 Splitting/Joining These are special cases of decomposition and com position theorems. The dotted edge represents the new dependences added. This kind of splitting th at connects two groups of nodes through a newly added edge could simplify the problem of partitioning produc tions. It also restricts communications w ithin narrow channels. Deletion Deletion of a node distributes its role among its neighbors. This m ay not be very helpful since it adds dependences. Duplication Duplication of a rule in the graph corresponds to allocating the rule to more than one processor. It increases comm unications by the added dependences and the required synchronization efforts by the copies of the production to keep the database consistent. However, it may increase parallelism and remove comm unication bottlenecks. 72 S plit/Join Deletion © — © — © Duplication Figure 5.1: Transform ations on Dependence Graphs 73 5.4 Summary In this chapter transform ations on productions were considered. Compo sition of rules was shown to elim inate communications at the expense of parallelism in the m atching phase. Composition could also reduce the exe cution time. Decomposition of a production into m any prim itive productions can reveal parallelism. This could also lead to even distribution of rules and hence a balanced load on processors. G raph theoretic transform ations were suggested. Further study of the properties of these transform ations is re quired. i 74 Chapter 6 The Allocation Problem The transform ations in the previous chapters did not take into account the machine on which the production system was to be executed. Productions and the knowledge base m ust be partitioned and each partition allocated to one processor. This is precisely the m apping problem , as discussed in [Bok81,TM84,Sto79,Pri81], in the context of production systems. 6.1 Formulation of the Allocation Problem The objective is to allocate r productions to n processing elements such th a t the total cost consisting of comm unication costs and loss of parallelism is minimized. Let the production system be characterized by its depen dence m atrices: C ,P € {0, l} rXr. Let D € 9?nxn be the distance m atrix characterizing the interconnection network. Further, let d ,-y represent the cost of comm unicating a unit message from processor i to processor j . Let X € { 0 ,l} rxn be the allocation m atrix where xtJ- = 1 if production P, is assigned to processor j and x y = 0 otherwise. 75 6 .1 .1 C o m m u n ica tio n C ost For the sake of simplicity, the communication m atrix may be assum ed to represent not ju st the comm unication requirem ents but the actual costs of comm unication; the non-zero entries in the comm unication m atrix may be i modified to reflect the actual costs if necessary. Such a cost may be a function i i of the ‘am ount of dependence’ which may be autom atically m easured by the num ber of common elements th a t cause the dependence. This is justified ! since the length of comm unication packet, and the tim e to encode and decode it, m ay depend on the common elements. Although the costs incurred at the transm itting and the receiving processors m ay not be equal, the communication m atrix is defined to be sym m etric, again, to keep the m atters simple. The comm unication cost between the productions P, and P} is XikCijXjidki i.e., the cost of comm unicating between processors P E k and P E \ provided Pi is assigned to P E k (x,k = 1), Pj is assigned to P E i (xyj = l) , and the two productions comm unicate (ctJ = 1). The total communication cost of the j allocation J2i 1 2j 12k 1 2i XikCijXjidki can be compactly w ritten as i i E c = (X T C X ) • D (6.1.1) where • represents the ‘scalar product’, i.e., A • B = J2j 12i a*jbij. 6 .1 .2 P a ra llelism L oss W hen two parallel productions are assigned to the same processor there is a loss of potential parallelism. The parallelism loss between two productions \ | Pi and Pj is some m ultiple of XikXjkPiji the productions are parallel (Pij — l) and are assigned to the same processor. The total parallelism loss of allocation can be w ritten compactly as Ep = X X T • P (6.1.2) Several improvements could be m ade to make the loss of parallelism Ep a bit more realistic: If m parallel rules are assigned to the same processor the tim e required to execute them is O(m); If they are assigned to different processors then the tim e required to execute is 0(1) in the best case. Thus the potential loss of speedup or parallelism is O(m) and not 0 (m 2) as given by the above form ula 6.1.2. Also some productions get executed more th an others do. These differences can be easily modeled by modifying the communication and parallelism m a trices. Simply replace each element cm in C by uikCki and each element pkl in P by WkPki where w* is the execution frequency of the production P*. Similarly, specific execution tim e of each production can be incorporated in the P m atrix. The objective function E is a suitable combination of the communication cost E c and the loss of parallelism Ep. For simplicity, ju st take the sum of the two. I 6 .1 .3 T h e C o n stra in ts Obviously, every production m ust be assigned to a processor. In addition, j let every rule be assigned to one and only one processor. In order to avoid | trivial solutions and load imbalances such as all rules being assigned to just I one processor, assume th a t each processor is capable of handling only a 77 few productions. Let the ‘production capacity’ of the processor i be Ri i.e., processor i cannot handle more than Ri rules. This lim itation may arise because of limited local memory. These constraints are reasonable and effectively restrict range of solutions. 6 .1 .4 0-1 L in ea r-Q u a d ra tic P ro g ra m m in g F orm u lation The allocation problem can now be stated as follows: 0-1 L in e a r-Q u a d ra tic P ro g ra m m in g F o rm u la tio n M inim ize E = E e + E p = (X T C X ) • D + X X T • P subject to n Y . x ij = 1 1 < * < 3= 1 r y ; < R j i < j < n i=l Evidently, a solution is possible only if the num ber of rules is less than the total production capacity of the processors, i.e., r < 2D "=1 Rj- 6 .1 .5 S o lu tio n s j The problem as form ulated is quadratic in nature and it is very difficult to i ; solve. The problem has been discussed extensively and num erous references have been provided by Garfinkel and Nemhauser [GN72], Salkin[Sal75], and 78 Gillette [Gil76]. Im plicit enum eration algorithm s have been proposed by Balas [Bal67], Hansen [Han72], Glover [Glo65,Glo68], and others. Also, taking advantage of the fact th at the variables are Boolean, the problem can be reduced to 0-1 linear program m ing, as suggested by Zangwill [Zan65] and W atters [Wat67], at the expense of introducing more variables as shown | in section 6.1.6. | Directly attacking the Boolean problem, m ethods of solution have been sug gested by Hansen [Han72] and Hammer and Rudeanu [HR66]. An implicit enum eration algorithm is provided by Laughhunn [Lau70] for quadratic binary program m ing which is directly applicable to the allocation problem. Iterative and heuristic m ethods suggested by Breuer [Bre72] for the place m ent problem in VLSI dom ain can also be employed. M ethods to solve optim al task assignment problem in distributed computing systems have been proposed by Stone [Sto79], Bokhari [Bok81], Price [Pri81], Shen and Tsai [ST85], Recently, heuristic m ethods have been proposed for similar problems, al though not form ulated as 0-1 linear-quadratic program m ing, by Chu and Lan [CL87] and Baba et al. [BYH87], The heuristic m ethod provided in sec- j tion 6.2 is sim ilar to the graph m atching approach of Shen and Tsai [ST85]. In the next section how the 0-1 linear-quadratic program m ing form ulation can be reduced to 0-1 linear program m ing problem is discussed. 79 6 .1 .6 R e d u c tio n to 0-1 L inear F orm u lation Expanding the objective function E = E E E E XkiCkixijdi,- + E E E x kix u p kl 3=1 i= 1 J f c = l 1 = 1 k=l 1 = 1 i'=l Letting q*j = x kixij w ith the understanding th at if production Pk is assigned to processor * and qfj = < the production Pi is assigned to processor j 0 otherwise E = ± t ± ± $ C kldij + ± ± ± QiiPki 3=1 i= 1 J t= l /=1 fc= l 1=1 i=l The constraints too can be rew ritten in term s of the Boolean variable q-j as follows: • Each rule is assigned to one and only one processor: ,. J!L " .. 1 < k < r - ^ f c + E E 4 ? = 0 i=i 3 = 1 l < i < n • No more than Ri rules be assigned to processor i: E < Et 1 < * < » J b = l Note th a t q*J = ql £ . Taking this sym m etry into account the num ber of variables and constraints can be reduced and the problem transform ed to a 0-1 linear program m ing : 80 0-1 Linear P ro g ra m m in g F orm ulation M inimize E = <lilckidij + J2J2Y1 QiiPki 3 = 1 t=l *= 1 1 = 1 J f c = l 1 = 1 1 = 1 subject to - * # + E £ « “ = ° 1=1j=l 1 < k < r 1 < i < n This corresponds to a total of n 2r(r -j- l) /2 variables and n + r(r + l) /2 constraints. Compare this w ith W atter’s method[Wat67] where one obtains (r + n + r2n 2) inequalities in r2n 2/ 2 variables, though which one is to be preferred is not clear. 6.2 A* Algorithm : A Heuristic Method Let P$ be the set of productions and P& be the set of processors. The goal of the allocation problem is to find a m apping A : Ps — * Pe , possibly m any to one, such th a t for two communicating productions P,-, Pj € Ps there exists a p ath between the processors A(Pt), A (P,) (= P®. This can be form ulated as state space search problem and the A* algorithm described by Nilson [NilTl] can be used to solve the problem. The search can be hastened w ith good ! heuristics. The algorithm is guaranteed to find an optim al solution if the < l i heuristic used is an underestim ator. i I i ! 81 i 6 .2 .1 S ta te S p a ce S earch The state space is a tree and each node in the tree is a state. The search starts w ith the root and proceeds by expanding the current node adding new nodes to the tree. The search stops when the goal node is picked up by the expansion procedure. The exact nature of the states and the expansion i procedure depends on the application domain. In the present case they may be described as follows. 1. A State is represented by (X,Q) where X is the partially developed allo cation m atrix itself and Q C 1,2,... ,r. If * € Q then the production P j is assigned to a processor j for which Xij = 1. 2. The Initial State is the root node. It is an em pty state, i.e., none of the rules has been assigned to a processor and X = 0 and Q = {}. 3. The Expansion Procedure: Select an unassigned production Pj,* ^ Q ac- j cording to some selection criterion. Generate a new node for each possible as- j signm ent of P j to processor j = 1... n. The new node, besides inheriting the state information from its parent, is updated to reflect the new assignment: = 1 and Q — Q U {*}. An assignment x,j is possible if the rule capac ity of processor j is not exceeded (second constraint in 0-1 linear-quadratic program m ing formulation) and a communication p ath exists from processor j to every other processor I which already contains a production P * which J comm unicates w ith production P j. The latter restriction simply ensures th at every pair of communicating rules can comm unicate and the comm unication cost E L i £ I= i XkiCkidij of the new assignment x,j is not driven to infinity. 4. The Goal State is any state th a t has all the rules assigned i.e., Q = {1,2,... r}. The search stops when such a state is the next to be expanded. ■ 82 6 .2 .2 T h e A* A lg o rith m The above scheme lacks a strategy for selecting the next node to expand. It does not specify any quantity to optimize. However, it can be augmented w ith procedures and heuristics for computing the path costs. The cost of the node s is defined by f{ s ) = gr(s) + h(s) where < ? (< s ) is the path cost from the root to the node s and h(s) is an estim ate of the path from node s to a goal node. The state w ith the least cost, /(s ), shall be selected for expansion. The goal node sought is of m inim um cost which is the sum of communication cost and parallelism loss. Thus it is apparent th at the functions g and h m ust be defined in term s of communication costs and parallelism losses. If the heuristic h(s) is guaranteed never to exceed the m inim um of actual costs to goal nodes, denoted h*(s), then this scheme becomes an A* algorithm and is guaranteed to find the goal node w ith the m inim um cost. Let the p ath cost </(s) be com puted by the objective function using the partial allocation m atrix X : g ((X , Q)) = (X T C X ) • D + X X T • P The heuristic function h(s) may be defined in m any ways. For example, h(s) = 0 results in a uniform-cost search. It never exceeds h*(s) and hence is guaranteed to find the m inim um cost goal node. However, the search will J be inefficient. The more realistic the heuristic function and closer it is to h*, t the more efficient the search becomes by not exploring the costlier paths. t The heuristic h() is com puted as follows. Every unassigned rule Pi, I 0 Q , m ust be assigned to some processor when the goal is reached. Assignment I i ! 83 of Pi to different processor results in different increases in the path cost. Let h() be the sum of m inim um of such increases in the total cost for each unassigned rule:h() = minj=i..ri{flf()|a:y — </()} where g()\xij is the path cost of the node as com puted by m aking xij = 1. Clearly, considering only the m inim um increases ensures th at h() never exceeds the actual cost to goal. i This can be verified to be closer to the actual costs than the heuristic used by Shen and Tsai [ST85]. The complete A* algorithm can now be w ritten: A* A lg o rith m fo r 0-1 L in e a r-Q u a d ra tic I P ro g ra m m in g P ro b le m I 1. O P E N <- {(X = 0, Q = {})}. 2. I f O P E N = {} th e n goal is unsatisfiable; stop. 3. Select a state to expand: S «- (X ,Q ) € OPEN\(g(S) + h(S)) is minimum. O P E N «- O P E N - {S }. 4. I f Q = {l,2,...r} th e n S is optim al goal state; stop. 5. Select a rule Pt - to assign: Let i\i Q A miny{fir(S')|a:tJ} is m a x im u m . 6. Expand node S : j Vj € {1,2,... ,n} A p o ssib le (X ,i,y ) : O P E N 4 - O P E N u {(X + [xy = 1], Q u {*'})}• I j 7. G o to 2. j j The function p o ssib le (X , i,j) returns true if the assign- . m ent of rule Pi to processor j is possible: I r n r j p o ssib le (X , i,j)d =Rj > Y l x ki A '52Y 1 x kiCk*dij ± oo I k= 1 1 = 1 k= 1 84 E xam p le 6 .1 Consider a production system with the first 4 rules of Exam ple 1 and a tree interconnection with three processors as in figure 6.1. The pro duction capacity of each processor is set to 2. The comm unication and parallelism m atrices are Parallelism matrix Communication matrix I 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 1 p = c = 1 1 1 0 1 1 1 0 0 0 0 1 0 1 0 1 PE2 P E I PE3 Figure 6.1: A Tree of 3 processors The search tree generated by the A* algorithm is shown in figure 6.2. The program also takes advantage of the sym m etries in the intercon nection and the rules to further reduce the size of the search tree. For example, of the three nodes th a t can be generated from the root node, the node w ith rule 1 in processor 1 and the node w ith rule 1 in processor I 3 are identical and hence only one of them is generated. 5 (g T r5@ 3 9(jT)2 5 (g )4 5 (pU2 9 @ 2 9© 2 5 J 0 )2 5 ( 0 4 1 0 f 9)0 n = node num ber r = rule num ber g = g(n) g(n)h p = processor h = h(n) Figure 6.2: Search tree generated by A* algorithm for Exam ple 6.1 86 6.3 Distribution of Knowledge Base Com m unication requirem ents m ay differ depending on the distribution of tokens among the memory elements and the actions among the processing elements. Such distribution m ust be guided by the allocation of rules. Per- I haps, the allocation of rules m ust take these distribution into consideration. This can be done by reform ulating the allocation problem. The token distribution strategy will have to make a fundam ental decision, namely, w hether a given token is duplicated in more than one memory ele- I m ent. If the token is not duplicated, then all the rules th at refer to it may induce communications. If the token is duplicated then comm unications are induced to m aintain consistency. W hen the R ETE type m atching is used, where every rule knows whether it is firable at every instance, duplication of tokens serves no purpose since comm unicating rules will have to send/receive messages regardless of dupli cation. i In this case, ju st to minimize communications the token should reside with i one of the communicating rules. A token should be assigned to a memory I element where none of the rules referring to the token reside only when it is i ) not possible to assign otherwise. An added benefit of this strategy is th a t it also minimizes the memory usage. This greedy strategy seems to be good enough since extra comm unication | | is not introduced and hence there is no need to reform ulate the allocation ! problem. 6.4 Summary Rule interdependences, in addition to detecting parallelism, dictate commu nication requirem ents in a parallel system with the rules distributed over the processing elements. This leads to the problem of partitioning the database and the set of rules among the processors. The problem of rule allocation could be reduced to 0-1 linear program m ing problem. T h at m ethod, how ever, is im practical for large systems. The allocation problem m ay be solved using the A* algorithm . The problem then becomes one of finding good heuristics to use w ith the A* algorithm . Finding suboptim al solutions to the allocation problem , not addressed in this thesis, should be considered for large systems. 88 Chapter 7 Modeling Control and , Variables The production system model assumed so far lacks controllability and the ability to handle variables. While the former may not be a deficiency the latter certainly is a m ajor drawback. This chapter addresses these two is- j sues. Representing control and how it can be used in parallel processing are j discussed. Enhancem ents to the production system model are proposed to I i handle variables in productions and in the database. 7.1 Control Control in a pure production system is solely d ata driven and is exercised indirectly through interacting tokens, the conflict resolution strategy, and I the search strategy. This lack of control is the m ain reason for m odularity and flexibility of production systems and a good reason to keep it th a t way. I I 89 F irst, there are strong objections to achieving control indirectly by the above means. For example, conflict resolution strategy makes the selection irre- vokable which cannot model the non-determ inistic nature of the production systems. Achieving control via specialized tokens violates the m odularity principle. It I also introduces dependences and hence performance degradation. Usually a search strategy is selected for efficiency considerations. Different search strategies may give rise to different solutions. Exercising control via search strategy foregoes efficiency considerations. Secondly, unrestricted application of productions in specific domains may not be meaningful and control could be used to effectively restrict the solution set and the solution paths to achieve better performance. Plans and scripts require knowledge about sequences of productions rather than the actions of a single production [Sac77,Sac79,Sha77,Wil79|. These considerations have lead researchers to propose direct and separate control over the order of application of productions. I 7.1.1 C o n tro l S p ecifica tio n Program m ed or controlled string gram m ers proposed by Rosenkrantz [Ros87] ! were extended to graph gram m ars by Bunke [Bun78] and others. This model i specifies a control diagram th a t not only restricts the order of applications i but also provides the start and the term ination conditions. Georgeff [Geo82], I ! on the other hand proposes to use regular expressions to provide procedural i control. Brief description of the two models follows. The original papers should be referred to for full details. 90 7 .1 .1 .1 B u n k e M o d e l The control diagram is a graph where nodes labeled w ith productions and two special symbols START and STO P and the edges are labeled w ith YES j or NO. Further, there is only one START node and exactly one STO P node. ; The START node has no incoming edges and the STOP node has no outgoing I edges. The derivation starts w ith the direct YES-successor of the START node. If a production at a given node is applicable then the production is applied and its YES edge is taken else the NO edge is taken. The derivation stops when the STOP node is reached. 7 .1 .1 .2 G eorgefF M o d e l Control is specified by context free languages over the control words which are regular expressions of productions. For example, (P i P x+ P iP j+ P2P1+ P2P2)* represents the derivation sequences of even length. Only those derivations specified by one of such control words are legal derivations. Moreover, pro cedures containing these control words can be w ritten. For example, the following construct is taken from [Geo82]: w h ile b d o P i,P 2, . . . , Pn o d The advantage of procedural control using context free gram m ar is th at the construct could be easily transform ed into control productions and mixed w ith regular productions. For example, the above construct can be repre- 1 sented by introducing the productions Pq : b — > nil and P r » + i : — > nil | I and using the control language (P0P 1P 2. . . Pn)*Pn+1. Context-free control languages can be used to specify recursive procedures or sub-goal control constructs. This naturally corresponds to hierarchy and partitioning needed for parallel processing. And informally, Bunke model is the flowchart equivalent of the regular language model of Georgeff. It is also easier to m anipulate the regular strings than graphs. Therefore, the latter model is adopted for further study. 7.1.2 T ran sform ation s on C o n tro l S trin g s Clearly, reduction of the state space addressed in C hapter 4 can be further enhanced by the use of control information. One way to do so is to incorpo rate the control in the com putation of forbidden sets. Alternately, the control can be kept as a separate decision box. The following (non-deterministic) algorithm may be w ritten for the state-space search. w h ile current-state ^ goal d o 1. select active productions using control 2. com pute conflict set 3. optionally, com pute forbidden set 4. select production(s) for firing 5. fire the selected production(s) o d Based on the dependence analysis of C hapter 3, transform ations on control itself can be defined which would further reduce the search space. In addition to the operators of the regular expressions let || stand for parallel execution, i.e., -P 1II-P 2 execute productions P i and P 2 in parallel. Also, let the 92 operator || be of precedence higher than th a t of the concatenation operator in regular expressions. Then the following properties of || may be noted: Pi 1 1 P 2 = P2||Pi (commutative) P x ||(P 2 ||P 3) = (P i||P a )||P s (associative) P i|j(P 2 + P 3) = -Pi11-^ 2 + Pi\\Ps (d istrib u tive o v e r + ) e||Px = Pi (e is id en tity for ||) P ro p o sitio n 7.1 A chain of productions (P iP 2 ■ • • Pm) may be collapsed to a single composed production Px o P 2 o ... o Pm. P r o p o sitio n 7.2 For two parallel productions Pi and P 2 the following sub stitutions may be introduced: 1. (P 1 IIP2)* — > e + P x||P 2 2. (P 1 + P 2)* — ♦ e + P i + P 2 + P i||P 2 These substitutions are direct consequences of lemma 4.1. Also the n — A transform ation of section 4.3 can be used successfully to operate on the regular expressions. A direct consequence of the 7 r — A trans form ation is L em m a 7.1 Let Pi , P 2, and P 3 be productions satisfying the firability and reachability conditions (see sections 4-3.2 and 4-3.3). The following paral- lelization may be introduced: (P i + P 2P 3E ) —♦ (P x||P 2)P 3P where E is another control expression. Given the dependences among the rules, the above properties may be used and the substitutions may be autom atically carried out by a program , much like the vectorizing compilers, to obtain parallelized control. Also, the excel lent treatm ent of concurrency by Hoare [Hoa85] should be consulted in this regard. 93 7.2 Variables in a Production System In the production system model used thus far no provision was m ade for vari ables either in the rules or in the database. Two different names represented I two different entities or objects. This is not at all sufficient for practical i production systems. ! E x a m p le 7.1 j Consider the following database in figure 7.1 in the light of Example 3.1. Using the production Pi of Exam ple 3.1 it is possible to infer (Adam lives — in California). However it not possible to infer (Ahur lives — in Arizona) since the rule does not m atch w ith the database. It is desirable to arrive at both the inferences w ith a single rule shown in figure 7.1 given th a t P, V, C, and S stand for a person, a vehicle, a color, and a state respectively. t 7.2 .1 H a n d lin g V ariab les j Extensions to handle variables in the graph gram m ar context have been pro- j posed by Parisi et al. [PEM87]. It is suggested to have structured alphabet j and to relax the condition on the graph m orphism required for matching, j Similar treatm ent is given here in the current model. j Consider each node in a production or in the database as a variable. Asso- j ciated w ith each variable x, let there be a dom ain set, denoted d o m (x ), of | values it can assume. Then c 6 d o m (i) is a legal value for x. A constant | simply has a singleton set as its domain and in such a case the variable may ! be replaced by its value for convenience. ! I i 94 D a ta b a se Adam green T c 0 1 o r red T c 0 1 o r owns --------> car registered CA Ahur owns ----► truck registered AZ R u le t c 0 1 o r C T c 0 1 o r registered owns ' y registered lives— in Figure 7.1: D atabase and a rule for example 7.1 D efin ition 7.1 Given two variables x and y, x is an acceptable substitution (value) for y , written x □ y, if and only if d o m (()i) C dom (()y). Also x = y if and only if x C y A y U Z x. Now, in order to m atch the variables in the productions and the database w ith one another they need to be unified. Clearly, z such th a t dom (z) = dom (x) fl dom (y) is the most general unifier (mgu) of x and y. Two struc tures L i and L i can be m atched or unified if and only if a substitution o can I be found such th a t cr(Li) = < 7(1/2). j j R u le A p p lica b ility : A production is applicable only if a consistent substitution <7 th at unifies its context graph C w ith a portion of the database. R u le F irin g: W hen a rule is fired the portion of the database unified with the LHS L is restricted by the mgu of L and th a t portion, i.e., the domain of each variable x unified w ith variable y in L is replaced by d o m (x )— m gu(x, y). Any variable in G whose dom ain becomes 0 is, along w ith its arcs if any, dropped from the database. D ep en d en ces: The dependence relations between rules can be redefined as follows: D efin ition 7.2 Two rules P and P ' are said to be input dependent, written P tx P ', if 3U 3V 3U ’3 V ’ 3 U C (L - K ) A V C C A U * C {V - K l) A 7 ' C C ' A (m gu(U , V 1 ) 7^ 0 V m gu(U r, V ) 0) D efin ition 7.3 A rule P is output-input dependent on a rule P 1 , written ' P < P ', if I 3U 3V ' 3 U C R A V 1 Q{L' - K') A m gu(U ,V ') ^ 0 D e fin itio n 7.4 A rule P is input-output dependent on a rule P ', written P > P ', if 3U 3V ' B U G C A V ' C(R' - K ' ) A m gu(U ,V ') ± 0 | W ith these enhancem ents the search procedures in sections 4.1.1 and 4.2.1 I j should rem ain the same. The results on 7 r — A transform ation, and the composition and decomposition theorems of C hapter 5 can be enhanced in the light of new definitions of the dependences. 7.3 Summary Controlled production systems are introduced. A model for control specifi cation using regular expressions is adopted. Dependences among rules lead to transform ations on the control strings. These may be used in building “parallelizing compilers” for production systems. The model of production system is enhanced to handle variables in the pro ductions and in the database. Dependences are redefined. The analysis of the thesis, w ith possible extensions, remains valid in the enchanced model. Chapter 8 Conclusions Due to their poor performance on sequential machines production systems have not gained widespread acceptance. Parallel processing of production systems is an alternative. However, previous studies have indicated th at I limited apparent parallelism exists in production systems. These studies were based on m easurements m ade on production systems w ritten for sequential processing. This leads to the question of transform ing a production system to expose all the parallelism. In this context, this research has focused upon the issues of reduction of the com binatorial search space, partitioning of knowledge base, and m apping of productions into processors. Three kinds of transform ation techniques for parallel processing are the state space, the rule-space, and the rule-space to processor-space transform ations. The state space techniques reduce the com binatorial search space. The rule-space techniques transform given pro ductions into other productions in order to restrict and reduce the search t space and to reveal and exploit parallelism. j { f 98 ■ The notion of dependence between rules is introduced which forms the crux of the techniques described herein. The dependence analysis leads to detection of parallelism and communication requirements. Techniques to reveal and utilize parallelism in production systems are presented. A rigorous procedure for search space reduction is given. The reduced search 1 space is shown to be minimal and complete. A distinct advantage of this m ethod is th at it does not exclude the use of heuristics for further reductions. The search procedure is shown to be usable for parallel processing. The n — A I I transform ation is shown to further restrict the search space. i The state space may be further reduced if productions are preprocessed. Composing productions into larger productions may improve efficiency and decomposing productions into primitive productions may expose parallelism and improve speedup. Composition of rules is shown to elim inate commu nications at the expense of parallelism in the m atching phase. Composition i j could also reduce the execution time. Decomposition of a production into j m any prim itive productions can reveal parallelism. This could also lead to J even distribution of rules and hence a balanced load on the processors. The detection of comm unication requirements leads to the problem of parti tioning the database and the set of rules among the processors. The problem of rule allocation could be reduced to 0-1 linear program m ing problem. It may become impractical to use integer program m ing for solving large prob lems. The allocation problem may be solved by heuristics and the problem i becomes one of finding good heuristics to use w ith the A* algorithm . 1 » ^ Controlled production systems are introduced. A model for control specifi- ! cation using regular expressions is adopted. Dependences among the rules j 99 then lead to transform ations on the control strings. These may be used in building a “parallelizing compiler” for production systems. Finding suboptim al solutions to the allocation problem , not addressed in this thesis, should be considered for large systems. Also, issues such as proba bilistic knowledge of rule firing frequencies and data token occurrences, use of control and its distribution etc. need further attention. Since the tech niques described herein are based on static analysis of rules it is hoped th at they would lead to parallelizing compilers for production systems. However, static analysis of rules does have its lim itations and cannot reveal or uti lize all the available parallelism. Dynamic analysis and explicit introduction of parallelism in productions as well as in control by program m ers may be considered. Appendix A [ t I Software Tools I J This appendix provides a brief description of the software tools developed to ! test and utilize the concepts theorized in the thesis. A .l Productions Productions are represented by lists. They may contain variables. The ! form at of a production is defined as: <production> < ID > < b o d y > < contextG raph> < context> < token> <variable> < an d O r> < actionG raph> < action > <addO rD elete> < domains > (rule < ID > < body> {<dom ains>}* ) string | list | num ber ((if <contextG raph>) (then <actionG raph>)) ({context}*) | (< andO r> <contextG raph>) {<token>}* ATOM | <variable> | {(<token>)}* {ATOM AND | OR {<action>}* (<addO rD elete> < token> ) + I — (<variable> {ATOM}*) I 101 A. 2 Programs Program s to solve the allocation problem tend to be com binatorial in nature. Symmetries in the hardw are and the productions lead to symmetric or iso m orphic solutions. Enormous savings in com putational tim e is m ade if such i sym m etries are used to prune the search for a solution. The symmetries, for i j now, are input manually. I I D E P E N D is a program th at takes a list of productions and produces the i selected dependence matrices. Autom atic unification of variables is carried out to detect dependences among rules. It produces a descrip tion of the production system containing the num ber of rules and the selected m atrices. A S T A R is a program implementing the A* algorithm to solve the alloca- j tion problem. It takes the description of a production system from I I DEPEND and the description of hardware. The hardw are description contains the num ber of processors, the m inim um cost m atrix, and the symm etries among the processors. It produces the allocation m atrix. P S 2 I P is a program th a t takes the descriptions of the production system | and the hardw are and formulates the 0-1 linear integer program m ing 1 by introducing extra variables. i i I I P is a program to solve the integer program m ing problem. I IP 2 A L O is a program th at deciphers the output of IP and produces the allocation m atrix. j J 102 ' S IM U L A T E is a parallel production system sim ulator. Productions, de scriptions of the production system and the hardware, and the alloca tion m atrix are its inputs. This remains incomplete. Figure A .l shows the relations among the above program s and the usage. A.3 An example This example illustrates the use of some of the program s described above. Since variables are not used the domain specifications are om itted. Productions (rule (0 bring bananas into room ) ( i f 0 ) (then (+ (bananas above table)))) (rule (1 monkey sees bananas ) (if 0 ) (then (+ (monkey hungry for bananas)))) (rule (2 monkey grasps bananas ) (if (and (monkey on table) (monkey hungry for bananas) (monkey holding nil) (bananas above table) )) (then (+ (monkey has bananas)) (- (monkey holding nil)) )) (ru le (3 monkey grasps stick) (if (and (monkey on floor) (monkey holding nil) (stick on floor) )) (then (- (monkey holding nil)) (- (stick on floor)) (+ (monkey holding stick)) )) 103 Productions Network Description ! SYMMETRY J i_______ i SYMMETRY i PS2IP IP2A L0 ASTAR DEPEND i SIMULATE i i i_____________________________________ Figure A .l: Program flow (ru le (if (and (th e n (- (+ (+ (4 use stick to hit bananas) (monkey holding stick) (bananas above table) )) (bananas above table)) (bananas on floor)) (monkey hungry for bananas)) )) (ru le (5 monkey fetches bananas) (if (and (th e n (- (- (+ (monkey holding nil) (monkey hungry for bananas) (bananas on floor) )) (monkey holding nil)) (bananas on floor)) (monkey has bananas)) )) (ru le (if (th e n (ru le (if (th e n (ru le (if (th e n + + (6 monkey drops stick) (monkey holding stick)) (monkey holding stick)) (monkey holding nil)) (stick on floor)) )) 7 monkey eats bananas) and (monkey hungry for bananas) (monkey has bananas) )) (monkey hungry for bananas)) (monkey has bananas)) + (monkey eats bananas)) (monkey has not eaten today)) )) 8 monkey climbs table) and (monkey on floor) (table in room) )) (monkey on floor)) + (monkey on table)) -f (monkey hungry for bananas)) )) (ru le (9 monkey jum ps off table) (if (monkey on table)) (th e n (- (monkey on table)) (+ (monkey on floor)) )) (ru le (10 monkey is hungry) (if (monkey has not eaten today)) (th e n (+ (monkey hungry for bananas)) )) (ru le (11 stack a on b) (if (and (a on table) (b on table) (monkey on table) )) (th e n (- (a on table)) (+ (a on b)) ( ■ + • (monkey hungry for bananas)) )) Hardware Model Mesh : 4 processors, capacity 3 ;num ber of processors 4 ; distance m atrix (mesh of 4 processors) 0.0 1.0 1.0 2.0 1.0 0.0 2.0 1.0 1.0 2.0 0.0 1.0 2.0 1.0 1.0 0.0 ; rule capacities of each processor 3 3 3 3 ; infinity 99.0 ; sym m etry lists (1) (1 2 4) 0 The DEPEND program produced the following matrices: ; num ber of rules 12 ; Dependence M atrix : + II + IO 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 0 1 1 0 1 0 1 1 1 0 1 0 0 1 1 0 0 0 0 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 1 0 1 1 1 0 1 ; Dependence M atrix : + II + OI 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1 1 0 1 1 0 1 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0 1 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 107 Feeding the above descriptions to ASTAR program produced the following allocation: ; Allocation M atrix for # P ”m abl.dep”+ # P ’ ’m4.r3” ;; 16425 level=12 g = 71.0 h = 0.0 (13.0 (11 5 1)) (22.0 (8 6 2)) (23.0 (7 4 3)) (13.0 (12 10 9)) ; num ber of nodes generated = 12312 Productions 11, 5, and 1 were allocated to processor 1 w ith a relative cost of 13.0. The total cost was 71.0 It may be noted th at all dependence m atrices for the examples were produced by DEPEND program . I Bibliography | [Bal67] I I | [Bok81] j [Bre72] [Bun78] [BYH87] l j [CL87] [DK76] [DM87] ! j I [Ehr78] ! i i 1 [EK78] E. Balas. Discrete programming by the filter m ethod. Operations Research, 15:915-957, 1967. S.H. Bokhari. On the m apping problem. IE E E Trans, on Comp., C-30(3):207-214, M arch 1981. M.A. Breuer. Design Autom ation of Digital Systems. Prentice Hall, Englewood Cliffs, N .J., 1972. H. Bunke. Program m ed graph gram m ars. pages 155-166, Springer-Verlag, 1978. T. Baba, S.B. Yao, and A.R. Hevner. Design of a functionally dis tributed, multiprocessor database machine using d ata flow analy sis. IE E E Trans, on Comp., C-36(6):650-666, June 1987. W.W. Chu and L. M-T. Lan. Task allocation and precedence relations for distributed real-time systems. IE E E Trans, on Com puters, C-36(6), June 1987. R. Davis and J. King. A n Overview of Production Systems. M a chine Intelligence, Wiley, 1976. V. Dixit and D.I. Moldovan. Semantic network array processor and its applications to image understanding. IE E E Trans, on Pattern Analysis and Machine Intelligence, PAM I-9(l):153-160, January 1987. H. Ehrig. Introduction to Algebraic Theory of Graph Grammars. Volume 73 of Lecture Notes in Computer Science, Springer-Verlag, 1978. H. Ehrig and H.-J Kreowski. Algebraic theory of graph gram m ars applied to consistency and synchronization in data bases. In Proc. of Workshop on Graph Theoretic Concepts in Computer Science, 1978. [ER78] [For82] i | [Geo82] l i : [Gil76] j [Glo65] [GI068] [GN72] [Gup84a] [Gup84b] j [Han72] 1 [Hoa85] I | [HR66] i 1 i [Lau70] i I ' [LBM85] H. Ehrig and B.K. Rosen. Decomposition of Graph Grammar Pro ductions and Derivations. Volume 73 of Lecture Notes in Com puter Science, Springer-Verlag, 1978. C.L. Forgy. RETE: a fast algorithm for many pattern/m any object m atch problem. Artificial Intelligence, 19(1), September 1982. M.P. Georgeff. Procedural control in production systems. A rtifi cial Intelligence, 18:175-201, 1982. B.E. Gillette. Introduction to Operations Research. McGraw-Hill, N.Y., 1976. F. A. Glover. A m ultiphase-dual algorithm for the zero-one integer program m ing problem. Operations Research, 13:879-919, 1965. F.A. Glover. Surrogate constraints. Operations Research, 16:741- 749, 1968. R.S. Garfinkel and G.L. Nemhauser. Integer Programming. Wiley, N.Y., 1972. A. G upta. Implementing OPS5 on DADO. In Proc. of the Inter- nat. Conf. on Parallel Processing, August 1984. A. G upta. Parallelism in Production Systems: The sources and the Expected Speed-up. Technical Report, Dept. Comp. Sci., Carnegie- Mellon Univ., Dec. 1984. P. Hansen. Q uadratic zero-one program m ing by implicit enumer ation. In F.A. Lootsma, editor, Numerical Methods for Non-linear Optimisation, Academic Press, 1972. C. A. R. Hoare. Communicating Sequential Processes. Prentice- Hall, U.K., 1985. P.L. Hammer and S. Rudeanu. Psuedo-Boolean Methods for B i valent Programming. Springier-Verlag OHG, Berlin, 1966. D.J. Laughhunn. Q uadratic binary program m ing with application to capital budgeting problems. Operations Research, 18:454-461, 1970. E. K ant, L. Brownston, R. Farrel, and N. M artin. Programming Expert Systems in OPS5. Addison Wesley, Reading, M ass., 1985. 110 [MF78] [Mol86] I [Mol87] i [MY80] [Nag78] [Nil71] [Ofl84] [OD87] [PEM87] [Pos43] [Pri8l] ! j [Ros87] i ! [Sac77] D. M cDorm ett and C. Forgy. Pattern Directed Inference Sys tems, chapter: Production System Conflict Resolution Strategies, pages 177-199. Academic Press, London, 1978. D.I. Moldovan. A model for parallel processing of production sys tems. Proc. International Conf. on Systems, Man, and Cybernet ics, Atlanta, Ga., October 1986. D.I Moldovan. R U B IC A multiprocessor for Rule-Based Systems. Technical R eport, Dept, of Electrical Engg., Univ. of Southern Calif., 1987. R. Miller and C-K Yao. On formulating sim ultaneity for studying parallelism and synchronization. Journal of Computer and Sys tem s Sciences, 20:203-218, 1980. M. Nagl. A tutorial and Bibliographical Survey on Graph Gram mars. Volume 73 of Lecture Notes in Computer Science, Springer- Verlag, 1978. N.J. Nilson. Problem Solving Methods in AI. McGraw-Hill, N.Y., 1971. K. Oflazer. Partitioning in parallel processing of production sys tems. In Proc. of the Internat. Conf. on Parallel Processing, Au gust 1984. A.O. Oshisanwo and P.P. Desiewicz. A parallel model and archi tecture for production system. In Proc. of the Internat. Conf. on Parallel Processing, August 1987. F. Parisi-Presicce, H. Ehrig, and U. M ontanari. G raph rewriting w ith unification and composition. Unpublished, 1987. E. Post. Formal reductions of the general com binatorial problem. American Journal of M athematics, 65:197-268, 1943. C.C. Price. The assignment of com putational tasks among pro cessors in a distributed system. In Proc. of National Computer Conf., pages 291-295, AFIPS, 1981. D .J. Rosenkrantz. Program m ed graph gram m ars and classes of formal languages. JACM , 16:107-131, 1987. E.D. Sacerdoti. A structure for plans and behaviour. North- Holland, N.Y., 1977. I l l L . [Sac79] [Sal75] j [Sha77] » i j [Sha85] [ST85] I I [Sto79] [Sto85] [TM84] [Ten86] I I [Wat67] I I [Wil79] [Zan65] I i ! i j i i i E.D. Sacerdoti. Problem solving tactics. In Proc. IJC A I' , pages 1077— 1085, 1979. H.M. Salkin. Integer Programming. Addison-Wesley, Reading, MA., 1975. R.C. Shank. Scripts, Goals, Plans, and Understanding. Erlbaum , Hillside, N J, 1977. D.E. Shaw. Non-von’s applicability to three AI task areas. In Proceedings of the IJC AI, 1985. C-C. Shen and W-H. Tsai. A graph m atching approach to optim al task assignment in distributed systems using a m inim ax criterion. IE E E Trans, on Computers, C-34(3): 197-203, M arch 1985. H.S Stone. M ultiprocessor scheduling with the aid of network flow algorithms. IE E E Trans, on Software Engg., SE-3(7):85-93, July 1979. S.J. Stolfo. Five parallel algorithms for production system exe cution on the DADO machine. In Proc. of the National Conf. Artificial Intelligence, AAAI, 1985. F.M . Tenorio and D.I. Moldovan. M apping production systems into multiprocessors. In Proc. of the Internat. Conf. on Parallel Processing, August 1984. F.M . Tenorio. Parallel processing techniques for production sys tem s. Ph.D . Dissertation, Dept, of Elec. Engg., Univ. of Southern Calif., Los Angeles, 1986. L.J. W atters. Reduction of integer polynomial program m ing prob lems to zero-one linear program m ing problems. Operations Re search, 15:1171-1174, 1967. D. Wilkins. Using plans in chess. In Proc. IJC AI, pages 960-967, 1979. W. Zangwill. M edia selection by decision program . Journal of Advanced Research, 5:30-36, 1965. 112 Index action 2 AI problems 1 algorithm 4 R E TE , 5 allocation 62 m atrix 75, 83 problem 5, 75, 81 A m dahl’s law 30 applying see firing assignment problem 79 associative 93 ASTAR 102 Baba, 79 backtracking 3 backward chaining 3 Balas, E. 79 bidirectional strategy 3 Bokhari, S. 79 branching factor 3 Breuer, M. 79 Bunke, H. 90 capacity 78, 85 chaining 3 backward, 3 forward, 3 of parallel productions 43 chain of rules 93 Chu, W .W . 79 com binatorial 1, 6 communicating productions 29, 66, 81, 82 comm unication 1, 5, 15, 65 cost 75, 76 m atrix 21, 22, 29 p ath 82 requirem ent 29 tim e 4 com m utative 20, 93 rules 20, 32 com m utativity 26 complete tree 36, 49 compose 19 composition 63 io-dependent 63 oi-dependent, 64 theorem 65 concatenation 93 condition 2 firability, 55 goal, 2 reachability, 55 conflict 3, 89 set 3 constraints 77 context 11 control 2, 3, 62 diagram 90, 91 in a production system 2 in pure PS 89 model 89 parallelized, 93 procedural, 90, 91 specification 90 words 91 cost 75, 76 cover 66 cube 16 113 cycle 57 Dasiewicz, P.P 30 d ata 3 database see knowledge base decomposition 65, 67 i into parallel productions 70 ! non-parallel, 72 ' theorem 67 ! DEPEND 102 ‘ dependence 19, 21 ; am ount of, 76 analysis 19 w ith variables 96 dependent 21, 29 input-output 21, 29 io, 62, 63 oi, 64 input 21 input-ouput 21 output-input 21 derivation 20, 63 distance m atrix 15, 16 distribution strategy 87 J distributive 93 j dom ain 13 j application, 3, 82 I set 94 duplication 72, 87 element 76 context, 64 gluing, 64 memory, 15 processing, 15 i equivalence of states 20 execution 4 frequency 77 tim e 19 , expansion 82 expert system 13 ! firability 93 firability condition 55 firing 2, 15, 35, 96 parallel, 26, 46, 48, 55 repeated 33 forbidden set 34, 46, 47 computing, 34, 92 definition of, 34 Forgy, C.L 4 forward chaining 3 frequency of execution 77 full parallelism 47 Garfinkel, R.S. 78 Georgeff, M .P 90 Gillette, B.E. 79 Glover, F.A. 79 gluing 13 goal 2 directed strategy 3 gram m ar 10, 13 graph 11 gram m ar 10, 13, 64 interface, 11 theoretic transform ation 72 greedy strategy 87 G upta, A. 4, 30 Hammer, P.L. 79 handling variables 94 Hansen, P. 79 hardw are 15 heuristic 79, 81 heuristics 34, 35, 63, 81, 83 implicit enum eration 79 initial state 2 input 21, 29 input-output 29 dependent rules 21, 29 interface graph 11 io dependence 63 io dependent 62 IP 102 IP2ALO 102 iterative m ethod 79 joining 72 knowledge 2 base 2, 13, 15 base,distribution of, 87 j control, 2 i declarative, 2 I | Lan, L.M-T. 79 i Laughhunn, D .J. 79 i linear form ulation 80 linear-quadratic 78, 79 link 16 bidirectional, 16 unidirectional, 16 loss of parallelism 75, 76 m apping 1, 81 m atching 4, 15, 35 I m atrix 75, 83 ! communication, 21, 22, 29 ! distance, 75 | m inim um distance, 15 1 parallelism, 21, 22, 35 | memory usage 87 message 15 ; passing 5 j mgu 96 ! MIMD 15 I minimal 36 tree 38, 46, 49 i model 91 Bunke, 91 j control, 89 I Georgeff, 91 ■ hardw are, 15 parallelism, 26 system , 10 ' variable, 89 m odularity 89, 90 Moldovan, D.I. 4 m onkey-banana problem 71 m orphism 11, 13, 66 Nemhauser, G.L. 78 net change 61 network 75 interconnection, 15 R ETE, 4, 87 Nilson, N.J. 81 non-deterministic 3 null production 93 num bering 34 objective function 77, 80, 83 Oflazer, K. 4 oi dependence 64 OPS5 13 optim al solution 81 Oshisanwo, A.O 30 output-input 21 parallel 19 firing 26, 46, 55 processing 1, 6 productions 20, 26, 33 set 34, 46 parallelism 4, 26, 66 condition level, 31 full, 47 loss 75, 76 m atrix 21, 22 model 26 node level, 30 partial, 46 production level, 30 parallelized control 93 Parisi-Presicce, F. 13, 94 partial parallelism 46 partial production 66, 69 path cost 83 persistent 11 placement problem 79 115 Post, E. 2 precedence 93 Price, C.C. 79 procedural control 90, 91 procedure, expansion, 82 j processor, transm itting, 76 i production 10, 101 ■ capacity 78, 85 j chain 93 I communicating, 29, 66 j covering of, 66 j identity, 93 m ost general, 4 null, 93 parallel, 20 partial, 66 production system 2, 3 model 10 non-determ inistic, 3 sequential, 4 program flow 103 program m ing 81 linear-quadratic, 78, 79 1 PS2IP 102 I I I reachability 93 reachability condition 55 reduction 38 composed, 19 parallel, 19 search space, 92 sequence 20 sequential, 19 state space, 32 j to 0-1 linear programming 80 ! regular expression 90, 91 j repeated firing 33 { restriction 66 R E T E algorithm 4, 29 | network 29 1 Rosenkrantz, D.J. 90 Rudeanu, S. 79 rule see production based system 2 num bering 34 rule-space 6, transform ations 62 Salkin, H.M. 78 search 35 procedure 33, 35, 38 procedure, parallel 46, 47, 48 procedure, sequential 34 search 3 space 3, 6 state space, 81, 82, 92 strategy 89 tree 36 tree, m inimal, 36 uniform cost, 83 sequential 19 firing 35 production system 4 search procedure 34, 35 to parallel transform ation 43 set 3 domain, 94 forbidden, 34, 46, 47 parallel, 34, 35, 46 solution, 2 Shaw, D.E. 4 Shen,C-C. 79 SIMD 15 SIMULATE 103 solution set 2 space 6 rule, 6, 62 search, 3, 6 splitting 72 state 2, 20, 82 expansion 35 initial, 2 skipping 43, 51 state space 82 116 | reduction 32 unifier 96 search 81, 82 92 transform ations 32 variable 94 Stolfo, S.J 4, 30 handling, 94 Stone, H. 79 in PS 94 I strategy 3 modeling 89 j bidirectional, 3 W atters, L.J. 79, 81 j conflict resolution, 3, 89 ; d ata driven, 3 Zangwill, W. 79 distribution, 87 goal directed, 3 greedy, 87 search, 89 string gram m ar 90 program m ed, 90 substitution 96 sym m etry 76, 80, 85, 102 synchronization 72 token 69 system 2, 3 expert, 13 model 10 production, 2, 3 rule based, 2 Tenorio, F. 4 throughput 5 tim e 19 token 69 transform ation 72 on control strings 92 7T- A, 51 rule space, 62 sequential to parallel, 43 state space, 32 tree 36 height 43 search, 36 Tsai, W-H. 79 underestim ator 81 unification 96 117
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Parallel processing techniques for production systems
PDF
Parallel language and pipeline constructs for concurrent computation
PDF
Reliability analysis and optimization in the design of distributed systems
PDF
An architecture for parallel processing of "sparse" data streams
PDF
Communications-efficient architectures for massively parallel processing
PDF
Processor efficient parallel graph algorithms
PDF
Editing techniques for multi-version objects
PDF
Communication scheduling techniques for distributed heterogeneous systems
PDF
Occamflow: Programming a multiprocessor system in a high-level data-flow language
PDF
A bit-plane architecture and 2-D symbolic substitution for optical computing
PDF
A "true concurrency" approach to parallel process modeling, verification and design
PDF
Dynamic load balancing for concurrent Lisp execution on a multicomputer system
PDF
Softman: An environment supporting the engineering and reverse engineering of large scale software systems
PDF
A process elaboration formalism for writing and analyzing programs
PDF
Parallel computing with optical interconnects.
PDF
High-Level Synthesis For Asynchronous System Design
PDF
Practical structured learning techniques for natural language processing
PDF
Specification, verification, and implementation of concurrent programs
PDF
On the composition and decomposition of datalog program mappings
PDF
Performance evaluation of the Time Warp distributed simulation mechanism
Asset Metadata
Creator
Dixit, Vishweshwar V. (author)
Core Title
Transformation techniques for parallel processing of production systems
Degree
Doctor of Philosophy
Degree Program
Computer Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Computer Science,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c17-769345
Unique identifier
UC11345323
Identifier
DP22764.pdf (filename),usctheses-c17-769345 (legacy record id)
Legacy Identifier
DP22764.pdf
Dmrecord
769345
Document Type
Dissertation
Rights
Dixit, Vishweshwar V.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA