Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PARALLEL PROCESSING OF PRODUCTION SYSTEMS ON DATA-FLOW MULTIPROCESSORS by Andrew Sohn A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Engineering) August 1991 Copyright 1991 Andrew Sohn U M I Number: DP22834 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is d ependent upon the quality of the copy subm itted. In the unlikely event that the author did not sen d a com plete m anuscript and th ere are m issing pag es, th e se will be noted. Also, if material had to be rem oved, a note will indicate the deletion. UMI Dissertation Publishing UMI D P22834 Published by P roQ uest LLC (2014). Copyright in th e Dissertation held by the Author. Microform Edition © P roQ uest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United S tates C ode ProQuest' P roQ uest LLC. 789 E ast Eisenhow er Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 - 1346 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90089-4015 fh X > . CpS ’9 1 S682 3 W o /\ I-7° This dissertation, written by ANDREW SOHN under the direction of hXs Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillm ent of re quirem ents for the degree of D O C TO R OF PH ILOSOPH Y Dean of Graduate Studies Date Augus t 1 2 , 1 991 DISSERTATION COMMITTEE -----------------------v ^ Chairperson Acknowledgments I would like to express my profound gratitude to Professor Jean-Luc Gaudiot, the chair man of my dissertation committee, for his valuable guidance, for his constant encourage ment, and for his having enough faith and trust in me. I have benefited greatly from his extensive technical knowledge and experience during the course of this research. I would like to continue to value his insights and encouragement. I am grateful to the other members of the dissertation committee: Professor V. K. Prasanna Kumar for his helpful comments on parallel algorithms and architectures, and Professor Christoph von der Malsburg for his valuable suggestions on connectionist and neural network aspect of production systems. Their suggestions and constructive criti cism are of particular help in refining this thesis. I also want to thank Professor Paul S. Rosenbloom for his invaluable criticism on production system processing and Professor Dan I. Moldovan for showing me his view on parallel knowledge processing. The Macro Data-flow Multiprocessor Project is an ongoing project at the University of Southern California. Thanks are due to my fellow members of the Data-flow Research Group at USC, Dr. Robert Lin, Namhoon Yoo, Chinhyun Kim, and Daekyun Yoon. Thanks also go to Mary Zittercob for her administrative help. I would like to acknowledge the financial support received from the National Science Foundation under grant No. CCR-9013965. My most sincere thanks go to my best friend, my wife Sun Kyung, for her moral sup port and endless encouragement. Without her support, this thesis would not have been possible. Finally, I would like to dedicate this work to the Lord Jesus Christ who has giv- en me wisdom and courage through countless incidents, and will be with me for the rest of my life and forever. iii Contents Acknowledgments ii List of Figures vii List of Tables ix Abstract x 1 Introduction 1 1.1 Parallel Processing Perspective.......................................................... 2 1.2 Adaptive Processing Perspective...........................................................................5 1.3 Organization of the Thesis..................................................................................... 8 2 Background 11 2.1 Production Systems............................................................................................... 11 2.1.1 The production system paradigm............................................................. 11 2.1.2 The Rete match algorithm......................................................................... 13 2.2 Data-flow Principles and Machines.................................................................... 15 2.2.1 Basic data-flow principles of execution...................................................16 2.2.2 The static data-flow principle...................................................................17 2.2.3 The dynamic data-flow principle............................................................. 19 2.2.4 A dynamic data-flow machine.................................... 20 2.2.5 The Manchester data-flow machine.........................................................23 2.2.6 The SlGMA-1 data-flow machine.............................................................24 2.2.7 The macro data-flow principle................................................................. 26 2.3 Processing Production Systems...........................................................................27 2.3.1 Inefficiencies in the production system paradigm................................. 27 2.3.2 Parallel processing approaches to solving the inefficiencies................ 29 2.3.3 Adaptive processing approaches to solving the inefficiencies.............30 3 Production System Processing in Data-flow 33 3.1 Suitabilities............................................................................................................34 3.1.1 Suitability of data-flow principles to production systems.................... 34 3.1.2 Mapping production systems onto a multiprocessor.............................36 3.1.3 The Rete algorithm in a multiprocessor environment...........................37 3.2 The MRN-based Match Algorithm..................................................................... 38 3.2.1 The MRN network.................................................................................... 39 3.2.2 Allocation of productions......................................................................... 40 3.2.3 Distribution of multiple working memory elements.............................. 44 4 Parallel Im plementation on a Micro Data-flow M ultiprocessor 48 4.1 A Simulator M odel.............................................................................................. 48 4.2 An Example..........................................................................................................50 4.2.1 Activities at time Tq: Steps 1-3.................................................................50 4.2.2 Activities at time Steps 4-7................................................................ 53 4.2.3 Activities at time T^. Steps 8-11.............................................................. 54 4.3 Analysis of the Example................................................... 56 4.4 Simulation and Performance Evaluation........................................................... 58 4.4.1 One-input nodes and array operations.....................................................58 4.4.2 Independent execution of two-input nodes............................................. 60 4.4.3 Parallel execution of two-input nodes.....................................................61 4.4.4 Performance evaluation............................................................................ 62 4.5 Summary............................................................................................................... 64 5 Parallel Implementation on a M acro Data-flow M ultiprocessor 66 5.1 Production System Processing in Macro Data-flow.........................................67 5.1.1 The macro data-flow principle..................................................................67 5.1.2 Macro from an AI processing perspective.............................................. 68 5.2 List Comparison Operations in Data-flow.........................................................70 5.2.1 A micro-actor perspective.........................................................................70 5.2.2 A macro actor/token perspective............................................................. 72 5.3 Formation of Macro A ctors................................... 74 5.3.1 Guidelines for well-formed macro-actors............................................... 74 5.3.2 An example on the conversion process................................................... 76 5.4 Simulation and Performance Evaluation .............................................. 80 5.4.1 Simulation results............................................................... 80 5.4.2 Performance evaluation............................................................................ 82 5.5 Summary............................................................................................................... 83 6 Performance Evaluation of the Multiple Root Node (MRN) Approach 85 6.1. Implementation of the MRN-based Interpreter................................................ 86 6.1.1 Characteristics of the implementation.....................................................86 6.1.2 Descriptions on data structures................................................................ 87 6.1.3 An example on data structures................................................................. 91 6.1.4 Organization of the program.....................................................................93 6.2. Measurements at Compile Time.........................................................................94 6.2.1 Benchmark production system programs............................................... 95 6.2.2 Measurements on grouping.......................................................................96 6.3. Runtime Measurements.......................................................................................98 6.3.1 Execution time on one-input nodes.........................................................98 v 6.3.2 Number of comparison operations on one-input nodes...................... 100 6.3.3 Distribution of groups.............................................................................102 6.4 Performance Evaluati on....................................................................................... 103 6.4.1 Comparison of MRN and OPS5............................................................103 6.4.2 Discrepancy in the distribution of wmes and condition elements........... 107 6.5 Summary................................................................................................................108 7 Conclusions and Future Research 111 7.1 Conclusions............................................................................................................112 7.2 Future Research: Implementation of Production Systems on Intel iPSC/ 2 Multicomputer.................................................................................................. 114 7.3 Future Research: Implementation of Production Systems in SISAL...............116 7.4 Future Research: Connectionist Production Systems.......................................119 7.4.1 Future hierarchy in production systems................................................ 120 7.4.2 Representing production systems in feature space..............................121 Appendix 125 Bibliography 151 List of Figures 2.1 An architecture of production systems......................................................................12 2.2 A Rete network for Rule 1..........................................................................................14 2.3 Snap shot of the data-flow graph for an expression x=a*b - c+d..........................18 2.4 Snap shot of a data-flow graph for the dynamic tagged-token principle...............20 2.5 Organization of a single processing element............................................................21 2.6 Organization of the Manchester data-flow computer..............................................23 2.7 Organization of a processing element of the SlGMA-1............................................25 2.8 Production systems as a search, (a) Local latencies, (b) Global latency...............28 2.9 Parallel processing of production systems, (a) Reduction in processing time of the matching step, (b) Simultaneous exploration of the search tree by having many PEs follow multiple paths, path 1,..., path n..................................30 2.10 Adaptive processing of production systems, where the inference engine learns new rules or heuristics at either compile time or runtime............................. 31 2.11 Reduction of global latency in the search space using adaptive processing techniques, (a)original search space with 19 states and 13 paths, (b) reduced search space with 10 states and 5 paths..................................................................... 31 3.1 Two bottlenecks of the Rete algorithm in a multiprocessor environment. (1) piling up of wmes on an arc of the root node, which results in a sequential distribution of wmes to all CEs one at a time. (2) 0(ri) or OQn) comparisons in two-input nodes....................................................................................................... 38 3.2 An MRN network. RNn distributes wmes to CEs under RN1 through RNn. A wme (ij) refers to a wme with i AVPs, where j signifies its arrival order. The MRN network also demonstrates a parallel distribution of wmes, where n RNs can simultaneously distribute n different wmes to the network..................39 3.3 An MRN network for the three rules.........................................................................42 3.4 A redundant allocation policy. Twelve PEs are used to allocate the three productions. CEs with n AVPs are allocated to PEs of Group-n.............................43 3.5 Sequential distribution of wmes. Only one wme can be distributed to all PEs at a time. To distribute 12 wmes, it takes at least 12 steps (or time units).............45 3.6 Parallel distribution of wmes. Three wmes can be distributed to PEs at a time. To distribute 12 wmes, it takes max{number of wmes in each group}.............................................................................................................................46 4.1 Snapshot of the MRN-network after the first match cycle Tq ................................ 51 4.2 Snapshot of the MRN-network after the second match cycle 7 \............................53 4.3 Snapshot of the MRN-network after the third match cycle T2 ...............................55 4.4 A data-flow graph for nodes 11 through 14 of Rule 2. ‘ror+4’ is to rotate right 4 times, ‘rol-4’ is to rotate left 4 times..............................................................59 4.5 Simulation results by 1 PE for independent two-input nodes processing..............60 4.6 Simulation results by 2 PEs for parallel two-input node processing..................... 61 5.1 A data-flow graph in micro-actors for the comparison of two lists....................... 71 5.2 Mapping micro actors into macro actors. A set of macro actors, B={bi,...,bm}, is derived from a set of micro actors, A={aj,...,an}, based on the guidance criteria, where m<n................................................................ 74 5.3 A conversion process for the comparison operation on two lists, (a) a micro actor data-flow graph, (b) a macro actor.................................................................... 77 5.4 Converting a micro-actor data-flow graph to macro actors: (a) micro-actors (b) macro actors............................................................................................................79 5.5 Simulation results, (a) execution time, (b) network load.........................................82 5.6 Ratio of sequential distribution to parallel distribution, (a) execution time, (b) net work load...........................................................................................................82 5.7 Speedup of sequential distribution vs. parallel distribution.................................... 83 6.1 Data structure used in the MRN implementation.................................................... 88 6.2 Organization of the MRN-based production system interpreter.............................93 6.3 Distribution of condition elements over groups measured at compile tim e.......... 97 6.4 Execution time profile of matching one-input nodes............................................... 99 6.5 Number of comparison operations on one-input nodes.........................................101 6.6 Runtime distribution of wmes.................................................................................. 102 6.7 Summary on the runtime distribution of w m es......................................................104 6.8 Ratio of one-input match tim e................................................................................. 105 6.9 Ratio of number of comparison operations............................................................. 106 6.10 Average ratio on four production systems.............................................................. 107 6.11 Discrepancy between CE and wme distribution....................................... ...........108 6.12 A complete execution time for two approaches......................................................109 7.1 A parallelism profile of the Tower of Hanoi with 10 disks.................................. 118 7.2 Partitioned Pattern Tree (PTT) for PM. (apartitions P into three groups, (b) f 2 into four groups, (c) fo into six groups, (d) All three subtrees in (a),(b),(c) are merged to form a partitioned pattern tree. Numbers next to arcs uniquely define each pattern in the feature space............................................123 7.3 A production memory in 2-dimensional feature space..........................................124 List of Tables 4.1 Summary of the 11 steps in the example.................................................................. 57 4.2 Simulation time units for one-input nodes and array operations............................60 4.3 Matching CE1 with RM17 of Rule 2 by 1 P E ..........................................................61 4.4 Parallel execution of CE1 and CE2 of Rule 2 ..........................................................62 5.1 Simulation time, T, and network load, L, for a production system with 15 rules executed on MDFM. SD = Sequential Distribution, PD= Parallel Distribution................................................................................................................... 81 5.2 Ratio of sequential distribution to parallel distribution ................................. 81 5.3 Speedup of a generic production system executed on MDFM............................... 81 6.1 Characteristics of benchmark production systems................................................... 96 Abstract The importance of production systems in artificial intelligence has been repeatedly dem- * onstrated by a large number of rule-based expert systems developed and used for the last decade. As the number and size of expert systems grow, there has however been an emerging obstacle in the processing of such an important application: the large match time. Much effort has been therefore expended on developing special algorithms and ar chitectures dedicated to the efficient execution of production systems. This thesis presents methods to improve the processing time of production systems. In particular, two approaches are undertaken to reduce the matching time of an inference cycle: software approach in algorithm level and hardware approach in implementation level. Bottlenecks of the most widely used match algorithm have been identified, based on which a new algorithm, the MRN match algorithm, is developed in algorithm level. Experimental results of benchmark production systems indicate that the MRN match al gorithm can give 4-6 fold speedup over the best match algorithm, regardless of the type of a computer used. The MRN match algorithm can give a multiplicative effect when cou pled with any match algorithm used for production systems. In implementation level, data-flow principles of execution have been employed as ar chitectural models. A generic production system is implemented in micro actor data-flow principle to explore the parallelism in fine grain processing. Parallelism in the medium grain processing of production systems is explicated through the implementation of a pro duction system in macro actor data-flow principle. Simulation results indicate that the data-flow multiprocessors can give 17 fold speedup out of 32 processing elements when coupled with the MRN match algorithm. The approaches taken in this thesis demonstrate that the data-flow principles of execution are not limited to numerical computation but also find applications in non-numeric computation. x Chapter 1 Introduction The importance of production systems in artificial intelligence (AI) has been repeatedly demonstrated by a large number of expert systems. Those earlier rule-based expert sys tems such as Prospector [13], R1 [42], and Mycin [10] have had a major impact on the development of expert systems. This rapid development of production systems are due mostly to the fact that the way that the productions are structured is very similar to the way that the people talk about how they solve problems. Among other characteristics the pro duction system paradigm offers modularity, uniformity, and naturalness. Modularity refers to the fact that individual production systems in the rule base can be added, deleted, or changed independently. They behave much like independent pieces of enowledge. Changing one rule, although it may affect the performance of the system, can 5e accomplished without having to worry about direct effects on the other rules, since rules communicate only by means of the context data structure; they do not call each other directly. This relative modularity of the rules is important in building the large rule bases Lf current AI systems. Another general attribute of production systems is the uniform structure imposed on the knowledge in the rule-base. Since all information must be encoded within the rigid structure of production rules, it can often be more easily understood, by another person, or 1 by another part of the system itself, than would be possible in the relatively free form of semantic net or procedural representation scheme, for example. A further advantage of the production system paradigm is the ease with which one can express certain important kinds of knowledge. In particular, statements about what to do n predetermined situations are naturally encoded into production rules. Furthermore, it is hese kinds of statements that are most frequently used by human experts to explain how hey do their jobs. As the number and size of expert systems grow, there has however been an emerging obstacle in the processing of such an important AI application: the large match time. In rule-based production systems, for example, it is often the case that the rules and the knowledge base needed to represent a particular production system would be on the order of hundreds to thousands. It is thus known that simply applying software techniques to the matching process would yield intolerable delays. Indeed, the time taken to match patterns over rules can reach 90% of the total computation time spent in production systems [16]. {The need for faster execution of production systems has spurred research in both the soft ware and hardware domains, including connectionist architectures. 1.1 Parallel Processing of Production Systems n the software domain, the Rete state-saving match algorithm has been developed for fast pattern matching in production systems [16]. The motivation behind developing the Rete algorithm was based on the observation, called temporal redundancy which saves in mem ory the information concerning the changes in the working memory between production cycles, and then utilizes them at a later time [9]. The algorithm has been put in practice for numerous production systems and is known to be the most efficient algorithm for pattern matching in production system interpreters. 2 Inefficiencies in the state-saving Rete algorithm were identified, based on which the lon-state-saving Treat match algorithm was developed [45]. The improvements made over the Rete algorithm were motivated by the McDermott’s conjectures which states that he retesting cost will be less than the cost of maintaining the network of sufficient tests 43]. Experimental results of the Treat algorithm on various benchmark production system urograms demonstrated that this conjecture can be substantiated to a limited extent. However, the Rete match algorithm and its improved pattern matcher are based on a sequential processing environment and running them on parallel machines requires much effort to yield a higher performance. The bottlenecks of the matching algorithm in parallel environments were identified, based on which the MRN approach has been introduced to Darallelize the matching step [21]. When the MRN approach was initially developed, it vas intended to run on parallel environments. However, its running on a sequential ma chine also gave a significant speedup over the sequential Rete algorithm and can give a multiplicative effect on any pattern matching algorithm [59]. For further speeding up the processing of production systems, another approach has 5een taken in the rule firing stage of the production systems, called multiple rule firing 35,40,46]. Firing many rules that do not cause any conflict in the database, this approach attempts to manifest the potential parallelism in the rule firing stage, if any. Furthermore, :he removal of a selection stage from the production system paradigm would completely Darallelize the entire production cycle, thereby resulting in true parallel production sys- :ems. To do this, however, a careful analysis of the rule dependencies has to be performed to ensure the correctness of the endeavor. In other words, the solution resulting from the multiple rule firing must be equivalent to the one resulting from a single rule firing. In the hardware domain, requirements of special architectures for production systems lave been identified and various machines have been designed and/or built along the con ventional von Neumann model of execution [1, 25, 26, 27, 37, 38, 47]. Among the 3 architectures, a special architecture called DADO, dedicated to production system pro cessing, has been built and operational since 1985 [64]. Performance evaluation of that machine with 1023 8-bit processors has been reported in [63]. Production systems have also been implemented on a general purpose computer, En core Multimax, a tightly-coupled shared memory multiprocessor with 2-20 processing elements [25]. An objective behind this study is to investigate the very fine grain parallel ism in production systems. Experimental results indicated that a shared memory multiprocessor can give a significant speedup when the fine grain parallelism is utilized. The suitability of message passing computers for production system processing has )een investigated and the evaluation of their performance has been made on the Nectar simulator, a message passing multicomputer with low message overhead [1,27]. Simula- ion results of the benchmark production systems on Nectar indicated that the loosely coupled message passing multicomputer can be effective in processing of production sys- ems, provided that tasks can be evenly distributed. The conventional control-flow model of execution described above, however, is lim ited by the “von Neumann bottleneck” [7]. Architectures based on this model cannot easily deliver large amounts of parallelism [3]. The potential parallelism in the problem will not ?e fully manifested due to the von Neumann bottleneck. Furthermore, the lack of program mability in the von Neumann model of execution would limit the potential parallelism in the given problem. The data-driven model of execution has therefore been proposed as an alternative to solve these problems. These principles have been surveyed [62,68]. Various research efforts on the development of data-flow multiprocessors and languages have been ongoing for the last two decades [20]. Prototype data-flow multiprocessors including the Epsilon project of the Sandia National Laboratory [20] and the Sigma-1 data-flow super- lomputer of the Electro-Technical Laboratory in Japan have been built and currently operational [31,32,]. 4 The applicability of data-flow principles of execution to various numeric computa- ions such as Partial Differential Equations [41], Digital Signal Processing [19], etc. has >een investigated, where issues related to how to map the numeric algorithms onto data flow multiprocessors are discussed. Simulation results have indicated that the data-flow irinciples of execution can indeed extract from the given problems as much the potential parallelism as the given problems have. The applicability of data-flow principles of execution to symbolic computations (or AI processing) has been investigated [21]. The pattern matching operation of production sys tems has been successfully mapped onto a fine grain dynamic data-flow multiprocessor and its success has demonstrated that data-flow principles of execution can also find some applications in symbolic computation [22]. To the characteristics of AI problems, the medium grain processing approach, called multilevel macro actor/token has been applied to implement a subset of symbolic compu tation. A generic production system has been mapped onto a macro data-flow multiprocessor simulator [59]. This medium grain approach utilizes the distinctive char acteristics that AI problems exhibit and has given a significant performance improvement over the fine-grain processing. 1.2 Adaptive Processing of Production Systems In an effort to find a completely new way of processing production systems, several at tempts have been recently made along the connectionist networks [17,54,55,66,67]. The underlying argument in these various attempt is: The ultimate goal of developing AI pro duction systems would be the modeling or at its best the simulating of human intelligence. At the same time, connectionist architectures or neural networks (NN) try to model or sim ulate the way in which the human brain behaves. Both AI and NN appear to have been 5 going towards an ultimate common goal, achieving the human intelligence. However, these two fields have been coming from two completely different paths to achieve this common goal. If these two different approaches are to achieve the human-like intelligence, one should come to believe that there must be a common ground. If the above argument is put into a different perspective, we would come to conclude that (1) a set of efficient techniques developed in AI may be able to help solve the prob lems that NN have difficulties in solving, and (2) by the same token, a set of efficient techniques developed in neural networks may be able to help solve the problems AI finds difficult. For example, the conventional AI search problems such as Traveling Salesman Problem, N-Queen Puzzle, etc. are found to have difficulties for large problems size due mostly to the fact that the processing time to search the state space is exponential. How ever, neural networks are known effective in solving such AI search problems [33]. The argument just described above is where the connectionist production systems come in to believe that production systems can utilize the techniques developed in neural networks. Employing techniques developed in connectionist architectures, the production system paradigm is believed to be equipped with the following capabilities which other wise would be difficult to achieve: massive parallelism, quick search, and fault tolerance. The first advantage of connectionist production systems would be in achieving mas sive parallelism. A set of simple neuron-like processing elements may hold the key to some important aspects of intelligence [29]. The second advantage, quick search, refers to the fact that neural networks can retrieve relevant items very quickly and without apparent effort. Even when incomplete information is present, neural networks can sometimes in terpolate across the missing data [30]. The third advantage, fault-tolerance, lies in the fact that in many domains, the recognition abilities of neural networks far exceed that of other Lystems. When some of the neurons malfunction, the results produced by the neural net works would be imperfect but still usable since each macroscopically important behavior 6 of the network is implemented by many different microscopic units. However, there are problems that neural networks have to overcome. Among these problems, neural networks find it difficult to implement the variable binding problem which AI systems commonly use. Another disadvantage stems from the fact that neural networks have difficulties in solving problems associated with a temporal sequence such as production systems, planning problems, etc. Furthermore, neural networks have to de velop good learning algorithms in order to be able to handle many different types of problems in the real world. Multiple layers of feedforward networks have been proposed to suit the production system paradigm [17]. In order to represent the search space in neural networks, local rep resentation is used in, where a concept is directly assigned to a particular neuron [55]. A particular implementation of this local representation for production systems has been re ported, where the three layers of the ring-structured network was used [56]. The main objective behind using such layers is in reducing the pattern matching time. However, the local representation has been found difficult to implement the variable binding. Another mportant drawback for the local representation is in the lack of scalability, i.e., it would pe impractical to use the local representation when the problem size is large and varying. To avoid such drawbacks in local representation, a distributed representation has been developed, where a concept is assigned to a pattern of activity of many or all neurons [67]. While it partially permits the variable binding, the distributed representation requires a so phisticated learning algorithm in order to be practical [14]. To avoid this impracticality for arge problem size, the hierarchical representation has been introduced, based on which production systems are transformed and represented in n-dimensional feature space [54]. This thesis limits the scope to the parallel processing perspective and leave the adaptive processing perspective for future research. However, the new representation technique has peen given to guide the development of production systems in connectionist architectures. 7 1.3 Organization of the Thesis This thesis attacks the problem of parallel processing of production systems in two phases: (1) the development of a parallel match algorithm for production systems to suit parallel processors, (2) the implementation of the parallel algorithm on a data-flow multiprocessor and on a conventional sequential processor. Chapter 2 gives a brief introduction to production systems and the Rete match algo rithm. A brief introduction to data-flow principles of execution is also presented in the chapter as well. Three different data-flow machine models, a static machine, a dynamic nicro machine model, and a new machine model, our macro data-flow multiprocessor, are o follow the basic principles. Problems of processing production systems in general are dentified and approaches to the problems are described from two different levels, sequen- ial and parallel algorithm level, and parallel machine implementation level. Chapter 3 identifies the suitability of the data-flow principles to the implementation of the production system paradigm. Issues related to mapping production systems to data flow architectures are presented along the suitability. The inefficiencies of the Rete algo rithm in multiprocessor environments are identified, based on which a new algorithm, called a Multiple Root-Node (MRN) match algorithm, is introduced. Allocation and dis tribution policies to implement the MRN algorithm on a data-flow multiprocessor are developed in the chapter as well. Chapter 4 presents a parallel implementation of the MRN algorithm on a Micro Data flow Multiprocessor. A specific example which uses our strategies is worked out. The Modifications to the actor set as well as the program graph design are shown for execution on the Tagged Token Micro Data-flow multiprocessor. The program graph design tech niques of the Rete algorithm in a data-flow environment are described and simulations are carried out. Performance observations obtained for a data-driven environment are com 8 pared to those of a conventional control-flow approach. The results of a deterministic simulation of this multiprocessor architecture demonstrate that artificial intelligence pro duction systems can be efficiently mapped on data-driven architectures. Chapter 5 presents a parallel implementation of the MRN algorithm on a Macro Data- low Multiprocessor to further explore the applicability of data-flow principles of execu tion to production systems. In particular, an attempt is made to manifest the medium grain parallelism in production systems. Some characteristics existing in production systems are identified from the parallel processing perspective, based on which the macro data-flow principle is explained from AI perspective. A brief analysis is presented to show why me- clium grain macros are preferred to fine grain micros. Several strategies about how to derive well-formed macros from micros are presented for the problem domain of produc tion systems. A generic production system is selected as a target program and is written in a micro data-flow graph. Those strategies are exercised to convert the micro data-flow graph into a macro data-flow graph. Simulation of the generic production system is carried out on our execution model, macro data-flow simulator. Performance evaluation is also discussed in the section. Experimental results on the new match algorithm are intended to verify that the macro approach would indeed manifest the medium grain parallelism in jroduction systems and that introduction of multiple root nodes would give several fold speedup over the sequential Rete algorithm. Chapter 6 contains a slightly different discussion from Chapter 4 and Chapter 5, in that it departs from the multiprocessor environments and comes back to a single processor en vironment. A complete MRN-based production system interpreter is implemented in Common Lisp to verify its performance improvement in algorithm level. Issues related to the implementation are discussed, followed by the presentation of several distinctive fea tures of the MRN-based production system. Those five benchmark production system programs chosen for this study are introduced along with various statistics measured at 9 compile time on them. Among the statistics, it is the grouping information which is central to the MRN-based production system interpreter. Various experiments using the bench mark production system programs are performed on a sequential machine, Sun 4/90, in real time and various runtime statistics are collected.All the statistics gathered at runtime are plotted in terms of production cycle numbers. To ensure the correctness of the mea surement, an important criterion, a number of comparison operations, is measured in terms of production cycle numbers. Performance evaluation on the two approaches, the MRN approach and the Rete-based OPS5, are made in terms of number of comparison opera- * ions and execution time. Chapter 7 addresses the conclusions of this dissertation and the possible directions in parallel and adaptive processing of production systems. Among the future research topics, a connectionist adaptive production system is presented in detail. Anew technique, called hierarchical representation of production systems, is presented along with an example to guide the direction of connectionist production systems. Other topics for future research include an ongoing effort with the SISAL (Streams and Iterations in Single Assignment Language) implementation of production systems, an implementation of the MRN-based jroduction system interpreter on a message-passing multicomputer, the Intel iPSC, etc. The SISAL effort is intended to better understand the potential parallelism in the produc tion system paradigm. The Intel iPSC effort, when complete, will allow us to obtain a more accurate performance measure of the MRN-based production system interpreter as well as he Rete-based OPS5 on conventional message passing multicomputers. Appendix gives a complete listing of Lisp codes on the MRN-based production system interpreter. Lisp codes for data collection and the performance evaluation are omitted in he Appendix. 10 Chapter 2 Background Necessary background is briefly presented in this chapter. A brief introduction to produc tion systems is given, followed by a description of the Rete match algorithm for production systems. An example is also given to illustrate the underlying concepts. The three data-flow principles of execution are presented: static, dynamic, and macro actor. Various machine architectures developed or under development are presented to contrast the principles. Problems associated with production systems are discussed. Possible solu tions to the problems which have been reported or currently undertaken are summarized from the parallel processing perspective and the adaptive processing perspective. 2.1 Production systems The production system paradigm is described in this section, followed by the Rete algo rithm, the most widely used match algorithm for production systems. ■2.1.1 The production system paradigm A production system paradigm shown in Figure 2.1 consists of a production memory TM), a working memory (WM), and an inference engine (IE). PM (or rulebase) is com i l posed entirely of conditional statements called productions (or rules). These productions (perform some predefined actions when all the necessary conditions are satisfied. The left- hand side (LHS) is the condition part of a production rule, while the right-hand side (RHS) is the action part. LHS consists of one to many elements, called condition elements (CEs) while RHS consists of one to many actions. The productions operate on WM which is a database of assertions called working memory elements (wmes). Both condition elements s and wmes have a list of elements, tailed attribute-value pairs (AVPs). The value to an attribute can be either constant or variable: the former in lower case and the latter in upper case (which are similar to the Pro log representation). To further illustrate, consider the following simple production system: Production Memory Working Memory Production | Rules Memory 1 Inference^ Awm Engine Working Memory wmes Awm: change in working memory Figure 2.1: An architecture of production systems Rulel: [(c X) (d Y)] [(bY)] [(p 1) (q 2) (r X)] [Remove (b Y)] ;Action 1 [Make (c 1) (d Y)] ;Action 2 CE1 CE2 CE3 wm el: [(p 1) (q 2) (r *)] wme2: [(r =) (d +)] wme3: [(c *) (d +)] wme4: [(b 3)] wme5: [(b +)] wme6: [(p 1) (q 3) (r 7)] 12 The rule above will perform action 1 and action 2 when all the conditions are verified in the working memory. The inference engine executes an inference cycle which consists of three steps: pattern matching, conflict resolution, followed by rule firing: • Pattern Matching: The LHSs of all the production rules are matched against the current wmes to determine the set of satisfied productions, wmel in the above example can satisfy CE3 with the variable X instantiated to * whereas wme6 can not. This step will eventually identify three wmes, 1, 3, and 5 which all satisfy the above rule with X and Y instantiated respectively to * and +. • Conflict Resolution: If the set of satisfied productions is non-empty, one rule is selected for execution in the next step. Otherwise, the execution cycle halts be cause there are no satisfied productions. In the above example, only one rule is satisfied. It is therefore selected. • Rule Firing: The actions specified in the RHS of the selected productions are per formed. In the above example, a new wme, [(c 1) (d +)], is added to the working memory and wme4, [(b +)], is deleted from the working memory upon rule fir ing. The inference engine will halt the production system either when there are no satisfied jroductions or when the desired solution is found. >2.1.2 The Rete match algorithm The Rete match algorithm is a highly efficient approach used in the matching of objects in production systems [16]. The simplest possible matching algorithm would consist in going through all the rules and WMEs one by one until one (or several) match(es) has (have) been found. The Rete algorithm, however, does not iterate over the WMEs to match all the rules. 13 Instead, it constructs a condition dependency network, saves in memory the information concerning the changes in the working memory between production cycles, and then uti lizes them at a later time. This is based on the observation, called temporal redundancy, I that there is little change in the working memory between production cycles [9]. The R ete! ^algorithm further reduces the matching time by sharing identical tests among productions. It stems from the fact that the productions have many similar or identical parts, called structural similarity. This second improvement, however, is useful for sequential unipro cessor environments. When structural similarity is implemented in a multiprocessor environment, it would cause a substantial amount of communication overhead [22,24]. Therefore, given a set of rules a network is built which contains information extracted from the LHSs of the rules. Figure 2.2 shows a network built for Rule 1 shown earlier, which consists of several types of nodes: Root Node Rule 1 6( (bY) 4 (cX) ; ! • One-input nodes TestY Memories-::::;;; Two-input nodes TestX Rulel is instantiated. Figure 2.2: A Rete network for Rule 1. 14 • The Root Node (RN) distributes incoming data-tokens (or wmes) to sequences of children nodes, called one-input nodes. • One-Input Nodes (OIN) test intra-element features contained in a condition ele ment, i.e., compare the value of the incoming wmes to some preset value in the condition element. For example, the first condition element of Rulel contains 3 intra-element features and therefore 3 OINs are needed to test them. The test re sult of the one-input nodes are propagated to nodes, called two-input nodes. • Two-Input Nodes (TIN) are designed to test inter-condition features contained in two or more condition elements. The variable^, which appeared in both CE1 and CE3, must be bound to the same value for rule instantiation. Attached to the two- input nodes are left- and right memories in which wmes matched through one- input nodes are saved. Another type of two-input node is called Negated Two- Input Nodes (NTIN) which are designed to process condition elements preceded by - (not). NTIN tests to determine whether no wme satisfies it. The result from two-input nodes, when successful, are passed to nodes, called terminal nodes. • Terminal Nodes (TN): Each terminal node represents a rule and triggers it when all the preceding nodes have done their tests over the incoming wmes. A pre defined conflict resolution strategy is then invoked to select and fire a rule. 2.2 Data-flow Principles and Machines The three data-flow principles of execution are now presented: static, dynamic, and macro actor. Various machine architectures developed or under development are presented to contrast the principles. 15 2.2.1 Basic data-flow principles of execution Developing and using a large scale MIMD (Multiple Instruction Multiple Data) multipro-1 1 i j 1 cessor involves much effort both at the software level and at the hardware level [3]. Among many issues at the software level is programmability. Programmers cannot be ex pected to be able to schedule and synchronize the hundreds or thousands of tasks that are required to fully utilize the resources of such a machine. The current state of the art in pro gramming MIMD machines relies on the programmer to express all parallelism using low- evel synchronization and communication constructs. Another important issue that must be addressed when developing and using a large multiprocessor is synchronization, which refers to the ordering of instruction execution ac cording to their data dependencies. Cooperating processes in a multiprocessor j | < environment must often communicate and synchronize. Typical synchronization methods j exercised in the conventional von Neumann model of execution are use of shared variables j and message passing. When using shared variables, either mutual exclusion or conditional synchronization is commonly employed to ensure that no other process enters the critical section and modify the shared variables. Problems associated with these synchronization methods are that if the critical section is busy, the process attempting to enter the critical section must wait, resulting in wasting of precious computing resources if the wait is long. Furthermore, if members of a group of processes which hold resources are prevented in- Jdefinitely to access resources held by other processes within the group, deadlocks will immediately arise in this otherwise highly concurrent multiprocessor system. The data-flow model of computation, proposed in the early 1970s as an alternative to the conventional control-driven model of computation, explicitly addresses the issue of programmability and the synchronization. The first problem, programmability, is resolved )y use of functional languages, where any variable can be assigned a data value only once, called the single-assignment principle. Programs are compiled into data-flow graphs, |Which represent the data dependencies among instructions. An instruction executes as soon as all the required operands are available. Since instruction execution is triggered by the availability of operands, the computation is capable of tolerating arbitrary memory la tencies and allows data to arrive in an arbitrary order. Data-flow principles of execution offer the runtime synchronization of operations, which the conventional multiprocessor system has difficulties in overcoming. Synchroni zation is enforced at the instruction level because every instruction waits for all its operands to be produced before executing. Data-flow principles are parallel in nature and thus would allow a very large number of different tasks to be efficiently and safely allo- | i cated to the entire machine. A comprehensive surveys on data-flow machines can be found | in [62,68]. ! I i t I 2.2.2 The static data-flow principle The fundamental principles of data-flow developed by Jack Dennis in the early 1970s [11] form the foundations of what became known as the static data-flow architecture and also served as the basis for subsequent developments, including dynamic data-flow systems. A data-flow program is represented as a directed graph consisting of actors, which represent instructions, and arcs, which represent data dependencies among actors. Operands are propagated along the arcs in the form of data packets, called tokens. The execution of an instruction is called the firing of an actor. The basic instruction firing rule common to all data-flow systems is that an actor is fired as soon as tokens are present at all its input arcs. When an actor fires, a token from each input arc is removed and a result token is placed on each of its output arcs. Figure 2.3 shows an example of a data-flow graph to evaluate the expression, x=a*b - c+d. \ Q c © d © W w \J \J Figure 2.3: Snap shot of the data-flow graph for an expression x = a*b - c+d. In the above example, the actor labeled 1 is enabled as two tokens, 1 for a and 2 for b, lave arrived on input arcs of actor 1. The firing of this actor produces a new token, 2, jwhich is received by actor 3. Similarly, actor 2 will fire when the two tokens, 4 for c and j3 for d, arrive at the input arcs of actor 2, resulting in a new token 1. These two new tokens, •2 and 1, in turn, enable the firing of the actors 3, producing another new token, 3. For a simple graph, such as the one just discussed, the basic firing rule is sufficient to guarantee proper execution. Problems arise when a graph is reentrant, as for example, in jthe case of a loop body. In this case it could happen that an actor is enabled while a token is still present on one of its output arcs. If it fires, more than one token could be present on a single arc. Several schemes have been proposed to deal with this problem. The static data-flow approach simply disallows more than one token to reside on any one arc. The single-token-per-arc constraint is enforced through acknowledgment signals generated by additional destination address fields in an activity template. In the data-flow graph, ac knowledgment signals are shown by additional arcs leading from consuming to producing actors. A token on an acknowledgment arc indicates that the corresponding data arc is 18 empty, i.e., is ready to receive a token. When an actor fires, it sends one signal along each acknowledgment arc. A small model consisting of four micro-coded processors was built; however, it suf fered from a number of serious drawbacks [12]. Among them, the feedback interpretation of data-flow programs using acknowledgment arcs allows only a limited amount of paral lelism to be exploited because consecutive iterations of a loop can only partially overlap in time but not concurrently. Thus, with acknowledgment signals, only a pipelining effect may be achieved, where the maximum length of the pipeline is bounded by the critical path through the loop body. Another undesirable effect of the feedback interpretation, in addi tion to making the machine more complex to design and operate, is the fact that token traffic is doubled. Since communication is a serious problem in any multiprocessor sys tem, keeping the number of tokens to a minimum is an important requirement. i j2.2.3 The dynamic data-flow principle The performance of a parallel machine increases significantly when loop iterations and 1 subprogram invocations can proceed in parallel [2]. To achieve this, each iteration or sub program invocation should be able to execute as a separate instance of a reentrant subgraph. To distinguish between activities belonging to different instances, the basic ac tivity names are extended: each carries an iteration count and also the procedure context within which it executes. Thus, each actor is replicated for every loop iteration and every procedure invocation. This replication, however, is only conceptual. In the implementa- jtion, only one copy of any data-flow graph is actually kept in memory but each instruction is fetched and executed independently for any iteration number and procedure context. With this approach, each arc can be viewed as a bag that may contain an arbitrary num ber of tokens with different tags (colors). Each activity is uniquely identified by a tag of 19 the form c.s.i, where c is the context, s is the activity name (instruction number), and i is j the iteration number. The context uniquely identifies the current procedure invocation | | I while the iteration number uniquely identifies the current iteration instance. Actors in the tagged-token data-flow principle will fire when tokens carrying identical tags are present at each of its input arcs. This eliminates the need for any feedback signals, thus increasing parallelism and decreasing token traffic. Data-flow machines that employ this method are called tagged-token or dynamic data-flow machines. Figure 2.4 illustrates the tagged-to ken principles. On the left arc there are four tokens each of which carries a different tag while on the right arc five tokens present again with different tags. The only identical pair found on both arcs is the one with filled which enables the actor labeled 1. When the actor fires, When the actor 1 fires, it consumes the two filled tokens and produces the new result ;oken, of which the color (or tag) is again filled, as depicted in Figure 2.4. Figure 2.4: Snap shot of a data-flow graph for the dynamic tagged-token principle. 2.2.4 A dynamic data-flow machine |The MIT Tagged-Token Data-flow Machine is based on the principles just outlined [5], It consists of a number of identical PEs connected through an n-dimension hypercube pack- et-switching network. Figure 2.5 shows the organization of a single PE of the Tagged- Token Data-flow Machine. It consists of a Matching Store Unit (MSU), an Instruction Fetch Unit (IFU) with access to a Program and Constant Memory, an Arithmetic Logic Unit (ALU), a Token Formatting Unit (TFU), and a Token Queue (TQ). Matching Store Unit Program and Constant Memory Instruction Fetch Unit j'lil lllj 1)1111'"' ALU Token Formatting Unit To/from the communication network Token Queue Figure 2.5: Organization of a single processing element A program under execution is distributed over the program memories of the different PEs. The Token Queue contains data tokens produced by previous firings of actors in the current PE or other PEs. In the latter case, they have arrived in the Token Queue through 21 the communication network. Tokens are removed from the queue by the MSU. If a token is destined for a monadic operator, it is passed directly to the IFU. Otherwise, it is for a dyadic operation; a matching phase is initiated. This involves comparing the tag of a token to the tags of all tokens currently in the MSU. If a match is found, the token pair is for warded to the IFU; otherwise, the token is stored in the MSU to await the arrival of its partner. The IFU retrieves an instruction from the Program Memory according to the tag car ried by the operand token(s). The fetched instruction may also include a literal or a reference to a constant to be used as an operand. In the latter case, the constant is fetched jmmediately from the Constant Memory. When a complete executable packet is assem bled, consisting of an opcode, operand values (literals, constants, or values carried on tokens), destination references, and a tag, it is passed to the ALU for execution. The ALU computes a new value according to the opcode and, in parallel, the new tags are derived. The result and the new tags are sent to the TFU, where new tokens are assem bled and forwarded to the local Token Queue, to another PE if the destination address is non-local, or to an I-Structure Storage, if the operation is an access to a structure. To solve the problem of large data structures, the concept of I-structures has been pro posed by [6]. An I-structure may be viewed as a repository for data values, which obey the single-assignment principle. That is, each element of the I-structure may be written into only once. After it has been filled, it may be read any number of times. The MIT Tagged-Token Data-flow Machine was the first to use the dynamic data-flow principles and the concept of I-structure. It has been simulated on a Multiprocessor Emu- ation Facility consisting of 32 TI Explorer Lisp Machines interconnected by a high-speed petwork, which has been operational since January 1986. The experiences gained by the MIT Tagged-Token Data-flow Machine are being used to build a prototype of its succes- >or, the Monsoon Architecture. 22 2.2.5 The Manchester data-flow machine The same principles of tagged-token data-flow were also developed independently at the University of Manchester. The resulting Manchester Data-flow Computer is the first ac tual hardware implementation of a dynamic data-flow computer ever built [28]. The prototype, consisting of a single execution ring, became operational in October 1981. A block structure of this ring is shown by Figure 2.6. Matching Store Unit Instruction Fetch Unit Token Queue 20 ALUs Switch To/from the communication network Figure 2.6: Organization of the Manchester data-flow computer. The inadequacy of handling complex data structures has been perhaps the most serious jdrawback of the Manchester Data-flow Computer. Another problem was the implementa tion of an efficient Matching Store Unit. First, the pipelined hashing strategy used for the Matching Store Unit requires a rather long pipeline and thus degrades performance under 23 low parallelism. Second, a separate Overflow Unit was necessary to prevent menacing overflows of the Matching Store Unit. This has later been enhanced by dynamic resource scheduling in the form of a hardware throttle. Finally, the ability to feed 20 concurrently jworking ALUs by just one pipeline did not prove feasible due to the Matching Store Unit bottleneck. Though the Manchester Data-flow Computer was too small to run any signif icant applications, it was able to demonstrate that pipelines in a data-flow computer can oe kept busy almost effortlessly [5]. J2.2.6 The SIGMA-1 data-flow machine Die SIGMA-1, built at the Electro-Technical Laboratory in Japan, is the most ambitious tagged-token data-flow computer to date [32,49]. It is a supercomputer for large-scale nu merical computations and has been operational since early 1988. It consists of 128 Processing Elements and 128 Structure Elements interconnected by 32 Local Networks (10 by 10 crossbar packet switches) and one global two-stage Omega Network. Sixteen Maintenance Processors are also connected with the Structure Elements and with a Host Computer for I/O operations, system monitoring, and maintenance operations. Figure 2.7 shows the structure of one processing element, implemented with several gate array chips and a large memory. It operates as a synchronous two-stage pipeline. The first stage is the firing stage, consisting of a FIFO Input Buffer, an Instruction-Fetch Unit accessing a Program Memory, and a Waiting-Matching Unit supported by chained hash ing hardware. The Input Buffer receives data tokens generated locally or by other PEs. The processing element number is stripped before a token enters the Input Buffer. The Instruction-Fetch Unit and Waiting-Matching Unit work simultaneously on the same token, transmitted from the Input Buffer. If no match is detected, the fetched instruc- :ion is discarded. Otherwise, it is passed to the second stage for execution. This stage 24 consists of an Execution Unit and a Destination Unit. The former includes a Floating-Point Arithmetic Unit, a Multiplier, and a Structure Address Generator. The Destination Unit produces the destination addresses for the result tokens, which are routed to the local Input Buffer or to a remote PE. Instruction Fetch Unit Program Memory Instruction Fetch Unit Matching Store Unit Mlllll] Execution V Unit A Destination Unit Figure 2.7: Organization of a processing element of the SIGMA-1 The SIGMA-1 also contains the first implementation of I-Structure Storage for han dling large structures and it provides hardware support for local memory management. This includes buddy allocation and several waiting-matching functions, such as “sticky/ non-sticky” read operations. Unlike the Manchester machine, however, these functions are used only for maintaining loop constants, rather than handling general data structures. The implementation of each PE as a synchronous two-stage pipeline keeps the total size of a PE small and suitable for scaling-up. The short pipeline is especially advanta geous when the parallelism in a program is small. It demonstrated a performance of 170 V 1FL O PS on a small program while the peak performance reaches 427 MFLOPS [31]. 25 2.2.7 The Macro data-flow principle Dne of the main problems of tagged-token data-flow machines has always been the imple-1 mentation of a Matching Store Unit. For reasons of performance, an associative memory would be ideal. Unfortunately, the amount of memory needed to store tokens waiting for a match tends to be very large, which renders this approach impractical or, at least, not cost-effective. As a result, all existing data-flow machines use some form of hashing, sometimes supported by hardware hash tables. However, hashing techniques are typically not fast enough to be used effectively as a single stage in the instruction processing pipe line. This often results in a long hashing pipeline, like that of the Manchester Data-flow [Computer, thus degrading performance of sequential applications. j The macro data-flow principle has therefore been proposed by Gaudiot and Ercegovac j to alleviate the problems associated with the number of tokens to be matched at the Match ing Store Unit [18]. A macro actor is a collection of scalar instructions. The objective behind lumping instructions into one larger unit is to improve performance by exploiting locality within these larger units. In fine grain data-flow computation, high overhead need ed to respect the functionality in execution will result in poor performance at low levels of parallelism. Indeed, to execute fine grain graphs (where each actor represents a single in struction), execution, communication, computation overhead must be enforced to associate with actual computation actors to insure the correct execution. Therefore, over- iead problem will be inevitably created and will degrade performance. When the grain size of a graph is increased, in other words, an actor now represents several operations instead of one single operation in graphs, the overhead problem can be easily alleviated. Indeed, with actors of various sizes, the amount of non-compute opera- :ions and the cost of communication can be significantly reduced. However, one should also note that the increasing size of actors (with larger granularity) may reduce the avail- 26, able parallelism in programs and increase memory and communication latency. Hence, forming macro-actors from fine grain micro actors is a trade-off between latency and par allelism. The Macro Data-flow Project currently underway at USC is based on the macro prin ciple. The machine architecture is organized in two level: the dynamic data-flow model at the high level and the conventional von Neumann model at the low level. The dynamic data-flow principle employed at the high level exploits the programmability and the inher ent synchronization methods the dynamic data-flow principle offers. The conventional | | | von Neumann model at the low level exploits the program locality to reduce the overhead j incurred in the token traffic and tag matching. The machine consists of 64 PEs, connected ! through a six-dimensional hypercube interconnection network. In between two neighbor ing PEs are three facilities: two communication nodes and a link. Each PE has four facilities connected in a pipe, as is done in the MIT Tagged-token Data-flow Computer, shown in Figure 2.5. An initial assessment on this hybrid machine indicates that the Macro data-flow indeed alleviates the problems associated with the dynamic principles, where j the tag matching time is of major concern in developing data-flow multiprocessor [41]. j t i f I 2.3 Processing Production Systems 3roblems associated with the Production system paradigm are identified and approaches o the problems are iterated from two different perspectives: parallel processing and adap tive processing perspectives. The approaches taken in this thesis are then presented from the two perspectives. 2.3.1 Inefficiencies in the production system paradigm The production system paradigm described earlier presents two inefficiencies. First, the processing mechanism itself is inherently sequential, i.e., the three steps must be per 27 formed in sequence by an inference engine. The definition embedded in the production system prevents its efficient evaluation on a parallel machine. Second, there are heavy memory dependencies in the matching step. All the condition elements and wmes to be matched must be repeatedly stored and recalled whenever a new inference cycle is started. Indeed, the time taken to match conditions over rules can reach 80%-90% of the total com- ! outation time spent in production systems [16]. The production system paradigm can be viewed as a state space search, consisting of local and global latencies. The search space with both latencies is depicted in Figure 2.8, where PM, CR, and RF stand respectively for pattern matching, conflict resolution, and rule firing. Figure 2.8: Production systems as a search, (a) Local latencies, (b) Global latency. The local latency, x, refers to the processing time of a particular inference cycle in the production system paradigm. Each step in the production cycle such as matching, conflict Initial state (a) resolution, or rule firing is considered a local latency, as shown in Figure 2.8(a). The glo bal latency, T, depicted in Figure 2.8(b), is the processing time of a nondeterministic, i number of inference cycles in the search tree of the state space. Given the initial state, the inference engine finds the next state by executing the inference cycle. Based on the control strategy or heuristics, the system will select the state in the search tree it will explore. The global latency, T, is thus linearly proportional to the number of states, n, to be explored in Ihe search tree, i.e., T<xnx, when no backtracking is made. 2.3.2 Parallel processing approaches to solving the inefficiencies A . set of techniques that have been identified to reduce the latencies in the PS paradigm lan be classified basically into two categories: (1) parallel hardware/software processing and (2) adaptive/heuristic processing [69]. The first approach, parallel processing of pro duction systems, attempts to reduce the total processing time by using many processing elements (PEs). From the hardware perspective, applying many processing elements to pattern matching will reduce the local latency in an inference cycle, thereby reducing the | total processing time, as depicted in Figure 2.9(a). Exploring the search tree simultaneously by many PEs would eventually reduce the to tal processing time, i.e., reduction in the global latency. Figure 2.9(b) illustrates this parallel processing of search tree. This simple hardware approach with infinitely many PEs can eliminate problems associated with backtracking and can hopefully find a desired solution in a finite amount of time. However, this technique is impractical and too costly j since for most practical AI problems the number of possible states in the search tree would be exponential. This leads to the development of parallel software to help parallel ma chines work efficiently. Initial state ( s Figure 2.9: Parallel processing of production systems, (a) Reduction in processing time of the matching step, (b) Simultaneous exploration of the search tree by having many PEs follow multiple paths, path 1,..., path n. 2.3 3 Adaptive processing approaches to solving the inefficiencies The second approach, adaptive/heuristic processing, aims at reducing the global latency, i T, i.e., the total number of inference cycles. In this approach, production systems can be viewed as searching the state space to find a desired goal. The search can be performed to prune nonpromising branches in the search tree by use of an additional information such as heuristics and/or new rules learned either at compile time or at runtime. Putting together this additional information, the conventional production system architecture shown in Fig ure 2.1 can transformed into an adaptive production system depicted in Figure 2.10. There the inference engine generates not only the change in the working memory, AWM, but also the change in the production memory, APM. A similar approach we have attempted for the planning problem along the line of neural networks stems from the fact that by learning I knowledge at run time the system will take less processing time to solve a similar problem at a later time [58]. 30 APM: change in production memory Production g Rules ------------► Memory B ^ I M Inference!* 4pm * -------- ► ( .................. \ t n 9,ne J Awm' » — — ► Working g wmes Memory 1 Awm: change in working memory Figure 2.10: Adaptive processing of production systems, where the inference engine learns new rules or heuristics at either compile time or runtime. Another important approach to this adaptive processing can be made by firing multiple xiles in parallel. When many rules are fired in parallel, many paths in the search tree can >e simultaneously explored, thereby reducing the total number of inference cycles execut ed in productions systems [39,40,46]. Figure 2.11 shows an example on the effect of reduction in the search space when multiple rules are fired in parallel [47]. Initial state (1 10 12) (13 14) (15) (16) (17) (18) (19 Initial state i (a) Figure 2.11: Reduction of global latency in the search space using adaptive processing techniques, (a)original search space with 19 states and 13 paths, (b) reduced search space with 10 states and 5 paths. 31 The original search space depicted in Figure 2.11(a) has 19 states and 13 paths whereas he reduced search space shown in Figure 2.11(b) now has 10 states and only 5 paths. The reduction in paths is 8, which is of significant. When the problem size is substantial, the parallel firing of multiple rules would significantly reduce the search space, thereby ren dering intractable problems to tractable. However, there are many problems to be resolved for this to be practical, including rule dependency analysis. Much effort is currently being expended along this direction throughout the research community. 32 Chapter 3 Production System Processing in Data-flow As we have discussed earlier, the production system paradigm has been implemented in various computer systems, ranging from conventional sequential uniprocessors, to general purpose multiprocessors, as well as special purpose multiprocessors dedicated to produc tion systems. The study in this thesis has focused on the data-flow principles of execution. Since the underlying architectural model employed in this study is drastically different from the architectural model investigated thus far, many issues such as mapping produc tion systems to data-flow principles, allocation of resources, etc. have to be clarified before production systems are implemented on the target data-flow multiprocessor. We shall first identify in this chapter the suitability of the data-flow principles of exe cution to processing production systems, followed by a discussion of the issues related to mapping production systems onto data-flow multiprocessor. In the course of mapping, any necessary modifications to the chosen algorithm will be identified and resolved to suit the jtarget multiprocessor. Bottlenecks of the Rete algorithm in data-flow multiprocessor en vironment are identified and solutions to them are presented, resulting in the MRN network. This new match algorithm invalidates the conventional policies on resource al- 33 ocations. A new policy for the allocation of productions has therefore been developed and ^resented in this chapter, followed by a dynamic wme distribution policy. 3.1 Suitabilities Based on the background information discussed in the previous chapter, we present the suitability of data-flow processors for production system processing. The necessary map ping schemes to fit the Rete match algorithm and the data-flow multiprocessor are identified in this section. Bottlenecks in the Rete algorithm are identified, which will result in the development of a new match algorithm. 3.1.1 Suitability of data-flow principles to production systems Hie applications of data-flow computers studied thus far fall basically into the area of nu merical computations such as signal processing [19], partial differential equation solvers 41], matrix manipulation, etc. Indeed, data-flow execution is generally thought to be more applicable to numerical applications rather than symbolic processing because: • The data structures used in symbolic computations are irregular and nondeter- ministic compared to the fairly regular and predictable data structures created and used in numerical computations. • The basic entity used in numerical computations is a number (either floating point or fixed point) while in symbolic computations, it is an object or a set (list) of objects. It requires good modeling techniques to represent the structure of ob jects into numerical values. For the foregoing reasons, the following are identified as necessary modifications to a data-flow multiprocessor in order to accommodate symbolic computations: 34 • Due to the larger size of the data elements, data tokens must be allowed to carry more information than the single scalar element allowed in, say, the basic Tagged Token Data-flow Architecture. • Fewer primitive functions are needed in symbolic computations than are re quired for numerical computations, where complex functions are often executed. In order to effectively utilize this advantage, the ALU needs major modifica tions. By adding several simple functional units to each PE, throughput will substantially increase. As we shall discuss later in Chapter 5, the above observations lead to the development of a macro actor/token approach among data-flow principles of execution. Details on nec essary modifications as well as an objective behind using data-flow principles of execution for the implementation of production systems are iterated in [22]. As we have demonstrated in the previous chapter, data-flow principles of execution and the Rete match algorithm present a match both at the level of the implementation and !at the level of execution principles. Indeed, executing the Rete algorithm on a data-flow multiprocessor has the following advantages over execution on a conventional control- flow computer: • The execution principles of the Rete algorithm are driven by incoming data to kens, i.e., execution may proceed whenever data are available. In any situation, multiple firings of actors in data-flow and comparison tests in the Rete algorithm are possible unless PEs are busy. • Both are based on the single assignment principle, i.e., no data modifications ex cept arrays. 35 • Both a data-flow machine and the Rete algorithm need dependency graphs which are obtained from the problem domain. • The requirement for the memorization capability in two-input nodes of the Rete algorithm assumes a good structure handling technique. This can be effected by using the I-Structure Controller in the dynamic data-flow machine. • The dynamic data-flow architecture allows an easy manipulation of the counters attached to the wmes. The counter for negated-pattern processing can be treated the same as other tags in the dynamic architectures. 3.1.2 Mapping production systems onto a multiprocessor Mapping production systems onto multiprocessor systems has been investigated in several ways in the recent literature. Direct mapping employed in the DADO project uses “full distribution,” which allocates a production to an available PE [63]. In this approach, pro duction-level parallelism can be achieved by having several PEs operate simultaneously on wmes to match productions. In [23] a relevancy between the rules and the wmes is iden tified and used to directly allocate rules to PEs. The relevancy is defined as “A working memory element is relevant to a production if it matches at least one of its condition ele ments.” However, the direct mapping of a rule to a PE is not likely to yield a good performance as Gupta has reported in his thesis [24]. It has been suggested by Bic that the semantic network can be directly viewed as a data-flow graph [8], Each node in the semantic network corresponds to an active element capable of accepting, processing, and emitting value tokens traveling asynchronously along the arcs. The other approach suggested by Tenorio and Moldovan may be consid ered an indirect mapping [65]. In this approach, all productions are analyzed and grouped according to the dependency between productions to enable firing of multiple rules. 36 The mapping scheme adopted for our simulation, however, is different from the afore mentioned approaches. The motivation for the choice of an alternative method is in two jfacts: First, the architecture we have adopted is based on data-flow principles of execution. Since the parallel model employed in this paper exploits parallelism at the production lev- jel, condition level, and further subcondition level (attribute-value pair level), the mapping scheme must be efficient to utilize all the possible forms of parallelism inherent to both jdata-flow principles and the Rete algorithm. Second, the Rete algorithm presents two bot tlenecks which substantially degrade the performance of the production system in our parallel machine: 3.1.3 The Rete algorithm in a multiprocessor environment The Rete algorithm is a highly efficient matching algorithm for production system in a se quential processing environment [16]. When used in a parallel machine environment, it, however, presents two apparent bottlenecks: one in the root node and the other in two-in- put nodes. Figure 3.1 illustrates the two bottlenecks. Assuming that each condition Llement (CE) is ideally distributed to a different processing element (PE), tokens coming into the root node will immediately pile up on the input arc of the root node since there is Lne and only one root node which can distribute tokens one at a time to all CEs. For the network shown in Figure 3.1 where there are n condition elements, the root node will have to make nm distributions to the network when m wmes are present on the input arc of the root node. The second inefficiency can also be seen on Figure 3.1. Assume that m tokens are re ceived and matched on the left input arc of the two-input node. Further assume that a token is received and matched on the other input of the two-input node. The arrival of this last token will trigger the invocation of m comparisons with the values received and stored in the left memory of the two-input node. On the average, there will be 0 (m ) such tests. 37 Should the situation have been reversed and n tokens be in the right memory, a token on he left side would provoke 0(n) comparisons. The internal workings of this two-input lode are therefore purely sequential. In order to avoid wasting time in searching through he entire memory, an effective allocation of two-input nodes and one-input nodes should )e devised. wmes Root Node Two-Input Node Figure 3.1: Two bottlenecks of the Rete algorithm in a multiprocessor environment. (1) piling up of wmes on an arc of the root node, which results in a sequential distribution of wmes to all CEs one at a time. (2) 0(ri) or 0(m ) comparisons in two-input nodes. 3.2 The MRN-based Match Algorithm A . new match algorithm is introduced to overcome the bottlenecks of the Rete match algo rithm described above. Policies on allocation of productions and distribution of multiple wmes at runtime are introduced to support the new match algorithm. 38 ! ---------------------------------------------------------------------------------------------------------------------- 3.2.1 The MRN network Die first bottleneck described above can be resolved by introducing multiple root nodes 'MRN) in the network, as depicted in Figure 3.2. This introduction of multiple root node i is based on the observation that a wme that has n AVPs never matches a condition element 1 j I (CE) that has m AVPs where n<m. For example, a wme, [(a 1) (b 2)], cannot match a CE, [(a X) (b Y) (c Z)], where X, Y, Z are variable, since the wme is missing the third attribute- value pair (AVP) (c Z). However, a wme [(a 1) (b 2) (c 3) (d 4)] can match the CE. wmes wmes wmes Figure 3.2: An MRN network. RNn distributes wmes to CEs under RN1 through RNn. A wme (ij) refers to a wme with i AVPs, where j signifies its arrival order. The MRN network also demonstrates a parallel distribution of wmes, where n RNs can simulta neously distribute n different wmes to the network. j Constructing an MRN network is straightforward. All LHSs are split into condition el ements (CEs). All CEs are grouped based on the number of AVPs in a CE, i.e., a CE with n AVPs belongs to a group n. Associated with each group is a root node which distributes a set of wmes to a particular group of CEs of the MRN network. For example, RN2 of Fig ure 3.2 distributes wmes with 2 AVPs to those CEs, where each CE has not more than 2 i AVPs. A simple algebra would clarify the obvious advantage of using the MRN network. 39 Suppose that the network has n groups, each of which has equally m CEs, i.e., the total number of CEs is nm. Assuming that the number of wmes generated due to a rule firing in each production cycle is constant, i.e, k, then the original Rete network will need nmk dis tribution. Assuming that k wmes are equally distributed over the n groups, i.e., kin wmes per each group, the MRN network will only need (1+2+...+n)mk/n distributions. For an equal distribution of wmes over the groups, the MRN approach is guaranteed to yield at east a 2-fold speedup over the conventional Rete network. In the remainder of this chap- ;er, this claim will be substantiated using various production system programs. 1 I 3.2.2 Allocation of productions Productions are partitioned into LHSs and RHSs. LHSs are further partitioned into pat terns. All patterns are logically grouped together according to the number of attribute- jvalue pairs (AVPs) in the patterns. There are two ways of allocating productions onto the PEs: redundant- and minimum allocations. The first one, redundant allocation, does not follow the structural similarity of the Rete algorithm. There is no sharing of productions in this strategy. All patterns are copied and independently allocated. The major advantage to using this strategy stems from the fact that there is less communication overhead be tween PEs. However, this will consume a lot of processor space and be costly as the number of productions that share patterns or part of patterns increases. The second policy, minimum allocation, follows the structural similarity. The major advantage behind adopting this concept is in the fact that reducing the computation time in the matching step can also be achieved by keeping all the PEs busy. At the same time storage usage can be substantially reduced. However, this will increase overhead in inter- processor communication. In the data-flow multiprocessor environment, the redundant allocation strategy is cho- 40 sen as a production allocation policy. A major reason for adopting this policy is in the fact that the runtime communication overhead is much more expensive than simply using more memory space. Depending on the number of groups in all condition elements, the processor space is partitioned logically into a two-dimensional array PE[i,j], where i is a group number and j is a PE identification number within the group. Allocating one-input nodes is straightfor ward. A condition that has i AVPs(or OINs) is allocated to PE[i,j], where j&O. There are however many factors affecting the allocation of two-input nodes. The I-structure control ler is used to solve the second bottleneck issue since two-input nodes require a structure handling capability due to the saving of information about changes in the working mem ory. A two-input node is split into two memories; left- and right memories. A memory VlEM[i,j] is allocated to PE[i,j], where the corresponding one-input nodes are allocated. Allocating a memory to a PE will ensure an even distribution of processing load across the processor space. At the same time, we can realize parallel matching at condition level. In what follows, we informally describe our allocation algorithm: Procedure Allocate Condition Elements to PEs 1. NRULE A number of rules in the system; 2. For i= l to NRULE do 3. NCOND *— A number of conditions in the RULEfi]; 4. For j= l to NCOND do 5. NAVP[i,j] •*- A number of AVPs in CoNDITION[j] of RULE[i]; 6. n USED[NAVP[i,j]] ;A number of PEs used in GROUP[i] 7. For k = l to NAVP[i,j] do 8. PE[NAVP[i,j],n] OIN[i,j,k] ;one-input node allocation 9. PE[NAVP[i,j],n] * - MEM[i,j] ;memory allocation 10. USED[NAVP[i,j]] n+1; Terminal nodes are not explicitly allocated to PEs for our simulation. Instead, we make the last cycle of a two-input node notify a matching status. If a last two-input node for a 41 certain rule says matched, then the rule is said to be instantiated. To illustrate the above allocation policy, consider the following production memory of three rules and the MRN network constructed for the three rules (Figure 3.3): Rule 1 Rule 2 Rule 3 [(a Z) (b Y)] [(c X) (d Y)] [(p 1) (q 2) (r X)] [(p 1) (q 2) (r X)] [(c X) (d W)] [(1 5) (m 6) (n W) (o Z)] L(c X) (d W)J - [(a *) (c 5) (e 7) (f W)] [Make (p 1) (q 2) (r X)] [Modify (c Y) (d X)] [Remove 11 □ Test W f Rule 3 satisfied Ru e 1 satisfied Rule 2 satisfied y Figure 3.3: An MRN network for the three rules. Note that Rules 1 and 3 are designed to demonstrate the performance of the negated two-input node. Ellipses in Figure 3.3 correspond to one-input nodes. Two input nodes are represented by boxes whereas negated two-input nodes are represented by double boxes. I Numbers labeled on nodes are used to indicate where nodes are allocated in the processor space. Note also that for the simplicity of presentation, OINs 11-13 are shared by NTIN8 & TIN14, and OINs 15 &16 by TINs 17 & 26. The actual implementation does not share the similar nodes to avoid the overhead in inter-processor communication. Based on the j ( above allocation policy, the network is allocated to PEs, shown in Figure 3.4. I 0 1 2 3 4 Group r Figure 3.4: A redundant allocation policy. Twelve PEs are used to allocate the three productions. CEs with n AVPs are allocated to PEs of Group-n. PEs are partitioned into 5 different groups, where PEs in Group n contains patterns having n AVPs. Group 1 is not used in our example since no condition pattern has only one A VP. Consider the first pattern of Rule 2, [(p 1) (q 2) (r X)], for example. The sequence of nodes in the pattern and the left memory for that pattern are labeled 11 through 14 in Fig- 43 jure 3.4 (11 through 13 are one-input nodes). Since the pattern has 3 AVPs, it is classified into Group 3 and allocated to PEI of Group 3, denoted by PE3,1. The second pattern of Rule 2 in Example 2 has 2 AVPs and right memory, labeled 15 through 17. It is classified into Group 2 and allocated to PE2 of Group 2, denoted by PE2,2. In the above allocation policy, we observe that the number of PEs needed to allocate productions is proportional to the number of inter-element feature tests in the productions. ! 7 or example, suppose that a certain system has n productions and that there are on the av erage m inter-element feature tests per production. For each inter-element feature test, two | memories are needed. The number of PEs needed to allocate n productions would then be j 2mn. For the three rules shown above, there are 3 rules and on the average 2 inter-element j feature tests. In total, 12 PEs are used, as depicted in Figure 3.4. j 3.2.3 Distribution of multiple working memory elements Although the Rete algorithm is designed to save computation time in matching patterns Lver wmes, there is a bottleneck at the root node as discussed at the beginning of this sec tion. In order to overcome this barrier, we propose one scheme which simultaneously ^distributes many different tokens to many PEs at a time, provided that many wmes are available at the same time for distribution. It is based on the fact that certain wmes even tually fall into PEs in certain group, where they may be matched, wmes that have i AVPs never match patterns that have j AVPs such that /</'. j t Whenever the new wmes that are generated due to the rule firings become ready forj i distribution to the network, the PEs perform the following operations: Procedure Distribute_wm.esJo_PEs 1. NAVP[n] * — A number o f AVPs in wme[n] 2. Attach NAVP[n] tag to wme[n] 3. Route wme[n] to PE[i,j] for all j such that i=NAVP[n]. 44 Assume that the three rules listed in the previous section are compiled and allocated to the PEs according to the allocation policy described in section 4.2. Suppose further that a set of wmes shown below is available and is about to be distributed into the network in I 7 igure 3.4 at a certain time t. We will show the efficiency of our distribution policy as fol- ows: Working Memory w m el: [(p 1) (q 2) (r *)] wme2: [(p 1) (q 2) (r =)] wme3: [(p 1) (q +) (r 3)] wme4: [(1 5) (m 6) (n +) (o *)] wme5: [(1 5) (m 6) (n 6) (o 2)] wme6: [(a *) (c 5) (e 7) (f 6)] wme7: [(c *) (d 6)] wme8: [(c 3) (d +)] wme9: [(c 3) (d 6)] wmelO: [(a +) (b 6)] w m ell: [(a 2) (b 6)] wme!2: [(c 2) (d =)] If the Rete algorithm distributes one wme at a time to the network through the root node, it would take 12 time units to distribute them. This is depicted in Figure 3.5, where one wme at a time is sequentially distributed to all PEs. wmes Group 19 20 22 23 Figure 3.5: Sequential distribution of wmes. Only one wme can be distributed to all PEs at a time. To distribute 12 wmes, it takes at least 12 steps (or time units). 45 A number of comparison tests which are performed at the very first one-input nodes '1, 4, 9, 11, 15, and 19) will reach 108 (= 9PEs X 12wmes). For example, when w m el is I ^distributed, all 9 PEs to which patterns are allocated make a comparison test in parallel. Only two PEs, PE3,0 and PE3,1, will succeed in matching. This forces the machine to op erate in Single-Instruction-stream-Multiple-Data-stream (SIMD) execution mode although it has a Multiple-Instruction-stream-Multiple-Data-stream (MIMD) processing capability. Applying our distribution policy, the 12 wmes are partitioned into 3 groups and the group numbers are assigned to wmes. Wmes 7 through 12 get group #2 while 1 through 3 get #3 and 4 through 6 get #4. The total number of comparison tests performed at the very first one-input nodes in three sequences reduces to 36 (= 6x4 + 3x2 + 3x2), as shown in Figure 3.6. There are three bins in Figure 3.6, where each bin corresponds to a certain group. In each group, wmes are sequentially distributed to PEs belonging to the corre sponding group in the PE space. However, between groups wmes are simultaneously ^distributed. O H K E K E K S K & O SD ODD wmes Figure 3.6: Parallel distribution of wmes. Three wmes can be distributed to PEs at a time. To distribute 12 wmes, it takes max{number of wmes in each group}. 46 If we define the speed-up for the distribution policy S=Ns/Np > where N s and N p are respec tively numbers of comparisons to be performed for sequential distribution and parallel ^distribution, we will obtain the speedup, 5=108/36=3 for the given set of wmes. The num ber of groups in working memory determines the speed-up S. In the worst case, only one |wme can be distributed to all PEs at a time as shown in Figure 3.5. Note that in the original Rete algorithm, a sequential distribution, analogous to our worse case, would be imple mented. Instead, our improvement provides the extra parallelism although this scheme Lepends heavily on the fact that wmes will be evenly classified to all groups. A detailed analysis of the MRN approach will be given in Chapter 6, where several benchmark production systems are implemented on the MRN-based production system in- j terpreter, which was written for the purpose of verifying the performance of the MRN approach. In the mean time, we shall come back to data-flow, and discuss issues related to the implementations of production systems on data-flow multiprocessors. i 47 Chapter 4 Parallel Implementation on a Micro Data-flow Multiprocessor The MRN-based production system interpreter is implemented on a Micro data-flow mul- iprocessor. The dynamic data-flow principles are chosen and the MIT Tagged-Token Data-flow Computer is employed as a simulator model. Several assumptions at the hard ware level are made to suit the processing of production systems in the simulator. To give a better understanding of the execution of production systems on the target machine, the ;simple production system with 3 rules given in the previous chapter is used along with a set of 12 wmes. A complete execution sequence is presented in detail. The allocation and distribution policies described in the previous chapter are exercised in the execution. Var ious runtime statistics are measured to identify the behavior of production systems on the micro data-flow machine. Experiment results are analyzed to compare with experimental results drawn from the conventional von Neumann computers. 4.1 A Simulator Model A simulation approach has been taken to investigate the performance of the MRN-based match algorithm. A dynamic data-flow multiprocessor, described in Chapter 2, is chosen 48 as a simulator model. Each PE has several simple functional units which can enable the parallel matching of Attribute-Value Pairs (AVPs). The number of simple functional units in the PE would range from 1 to 10 due to the fact that there are no more than 5 attribute- value pairs in any condition of the left hand side of the rules. Furthermore, the following assumptions are made in the simulation for the sake of evaluation: • A simulation time unit, x, is set to 1 Jisec. • Each PE runs at 3 MHz clock — 1 fxsec/instruction. • A wme can match a condition element in It. • The routing time for a token to reach any PE is set to lx. • Each unit in the PE shown in Figure 2.5 takes lx. • Each PE can execute 10 comparison tests at a time. • The time taken for the I-Structure Controller (ISC) is the same as other units in the PE. • On the average, there are 3 elements (1 TIN and 1 NTIN) per rule. Note above that the simulation time units taken for the ISC and other units are equally set to 1. In fact the ISC takes longer than the other units. However, there are other units hat take relatively shorter than the ISC, this therefore offsets our original assumption. Besides the assumptions listed above, three time units are defined which are used in he following example. They are a unit time t, a loop, and an abstract time T. A unit time t is the time taken for a token to pass through any physical unit in the PE. A loop is the time taken by a wme to go through a PE and come back to the input switch. Again, we interchangeably use tokens and wmes throughout the paper. T denotes an abstract time to demonstrate parallel matching performed at the condition-level and distinguish various events occurring in a high level execution sequence. 49 4.2 Example The execution sequence of the MRN-based production system interpreter on a data-flow multiprocessor system is presented. We use the three rules and the 12 wmes shown in Chapter 3, as well as the corresponding network shown in Figure 3.3. The allocation and distribution policies discussed in the previous section are used to show an implementation of the Rete match algorithm in data-flow multiprocessors. 4.2.1 Activities at time T^: Steps 1-3 Assuming w m el, wme4, and wme7 are simultaneously injected into the network at time T0, the following steps will take place: Step 1: wm el [(p 1) (q 2) (r *)] comes into the network as a data token and is distrib- jted to 2 PEs in GROUP 3 since it has 3 Attribute-Value Pairs. Let us consider an execution sequence in PE3,1. 1. Loop 0 in the PE3,1 for intra-element feature test: a) At time tQ , the switch in PEO forwards the token to the Matching/Store Unit (MSU) where the token is identified as a monadic comparison actor. Indeed, the other term of the comparison is “built-in” the comparison actor, therefore no matching is necessary. b) At time tv the token can be sent to the Instruction Fetch Unit (IFU), where a built-in operand (two one-input nodes and one binding node labeled 11,12, and 13, depicted in Figure 3.3 and also Figure 4.1) and opcode (comparison func tion) are fetched from a program memory (PM) and sent to the ALU. c) At time t2, receiving two operands and opcode by the ALU, five comparison operations are simultaneously performed on five pairs, i.e., on ‘p’ and ‘p’, ‘1’ 50 and ‘1’, ‘q’ and ‘q’, ‘2’ and ‘2’, and V and V in five functional units. As point ed out in section 2.3, we assume that each ALU has several simple functional units to support a parallel execution at the sub-condition level. Note that a vari able X is automatically bound to * when the comparisons are successful, d) At time ty after two one-input nodes are successfully compared in the ALU of PE3,1, the data token [(p 1) (q 2) (r *)] is sent to the Token Formatting Unit (TFU), where the necessary tagging operation is done (since the architecture model adopted is a dynamic data-flow architecture). The output module (not shown in Figure 2.5) routes the token back to PE3,1 for two-input node opera tions as it receives it from the TFU. This top portion is omitted for clarity. See Figure 3.3 for this part. | TestY | \ 6 TestX wm e/ TestX 13r (rX) r f------- ^ wmel V L J / a wmel wme7 v J wmel (fW) )25 Test W wme7 Test W f wme4 Figure 4.1: Snapshot of the MRN-network after the first match cycle Tq. 2. Loop 1 in the PE3,1 for array operation: a) At f4, the switch in PE3,1 sends the token to MSU. 51 b) At t5, the IFU fetches an append opcode. c) At t6, the w m el is sent back to the switch in PE3,1. 3. Loop 2 in the PE3,1 for saving w m el. a) At fg, the switch in PE3,1 sends the token to the MSU. b) At tg, the I-Structure Controller (ISC) copies LM14 and appends w m el to it (shown in Figure 4.1). 4. Loop 3 through 4 in the PE3,1 for inter-element feature test: PE3,1 checks the MSU to see if any wme arrived from PE2,2, in which RM17 is allocated. Assum ing that step 1 finishes before step 2, no wme arrived at the MSU of PE3,1. It sends out wm el to PE2,2. Step 2: wme7 [(c *) (d 6)] is distributed to 4 PEs in GROUP2 since it has 2 AVPs. 1. Loop 0 through 2 in PE2,2 for intra-element feature test and saving wme7 in RM17 (shown in Figure 4.1). 2. Loop 3 through 5 in PE2,2 for inter-element feature test about X: PE2,2 checks the MSU to see if any wme arrived from PE3,1. As assumed in the step 1.4 the match ing operation on wm el is performed before step 2, so wm el has been stored in LM14 and sent to the MSU of PE2,2. To check the consistency in variable instan tiations, the values of attribute r in wm el of LM14 and c in wme7 of RM17 are compared and found equal. Two wmes are put together with w m el,7, which is sent to LM18 of PE0,1. See step 4 for next sequence. Step 3: wme4 [(1 5) (m 6) (n +) (o *)] is distributed to 2 PEs in GROUP4 since it has 4 AVPs. 1. Loop 0 through 2 in PE4,0 for intra-element feature test and saving wme4 in RM23 (shown in Figure 4.1). 2. Loop 3 and 4 in PE4,0 for inter-element feature test: PE4,0 checks its MSU but no 52 wme has arrived from LM18 of PE0,1 since the step 4 is not yet completed. It therefore routes wme4 to PE0,1. 4.2.2 Activities at time 7j_: Steps 4-7 Step 4: w m el,7 is received by PE0,1. 1. Loop 0 and 1 in PE0,1 for saving w m el,7 in LM18 (shown in Figure 4.2). 2. Loop 2 through 4 in PE0,1 for a second inter-element feature test: PE0,1 checks the MSU and finds wme4, which has been sent from step 3. The values of attribute ‘d’ in w m el,7 of LM18 and ‘n’ in wme4 of MSU are compared. The test fails due to the inconsistent variable instantiations. The values of W in wme4 and w m el,7 are respectively £ + ’ and ‘6’ and certainly different. It then sends out w m el,7 to PE4,0. See step 7. This top portion is omitted for clarity. See Figure 3.3 for this part. ^|r TestY ^ wme10 wme7 w m f * 7 , 1 0 TestX wmel Test X i ' r \ wmel wme7 18 (fW) )25 Test W wme7 Test W f f -N wmel ,7 wme4 wme5 v 7 Rule 2 satisfied Figure 4.2: Snapshot of the MRN-network after the second match cycle Tj. 53 Step 5: wmelO [(a +) (b 6)] is distributed to 4 PEs in GROUP2. 1. Loop 0 through 2 in PE2,0 for intra-element feature test and saving wmelO in RM3 (shown in Figure 4.2). 2. Loop 3 through 5 in PE2,0 for inter-element feature test about Y: PE2,0 checks the MSU and finds wme7, which has been received from step 2. The values of attribute ‘b’ in wmelO of LM3 and d in wme7 in MSU are compared and found successful. wmelO and wme7 are put together to wme7,10, which is sent to LM7 of PE0,0. See step 8 for next sequence. Step 6: wme3 [(p 1) (q +) (r 3)] is distributed to the 2 PEs in GROUP 3. No PE succeeds in matching since built-in operand [(p 1) (q 2) (r X)] and wme3 are different. Step 7: wme5 [(1 5) (m 6) (n 6) (o 2)] is distributed to 3 PEs in GROUP4. 1. Loop 0 through 2 in PE4,0 for intra-element feature test and saving wme5 in LM23. 2. Loop 3 and 5 in PE4,0 for the second inter-element feature test about W: PE4,0 checks the MSU and finds w m el,7, which has been sent from step 4. The values of attribute ‘d’ in w m el,7 of MSU and ‘n’ in wme5 of RM23 are compared and found equal. 3. Loop 6 in PE4,0 for rule instantiation: w m el,7 and wme5 are put together into w m el,5,7, which is sent to terminal node for selection step. At this time, Rule 2 is said to be satisfied with X, W, and Z instantiated respectively to ‘6’, and ‘2’. 4.2.3 Activities at time T 2 ' Steps 8-11 Step 8: wme7,10 is received by PE0,0. 1. Loop 0 and 1 in PE0,0 for saving wme7,10 in LM7 (shown in Figure 4.3). 2. Loop 2 and 3 in PE0,0 for a second inter-element feature test: It checks the MSU 54 to see if any wme arrived from PE3,0. Assuming step 8 completes before step 9, no wme is found in MSU of PE0,0. wme7,10 is then routed to PE3,0. See step 9 for next sequence. Step 9: Assume that the conflict is resolved. Rule 2 fires and -wmel is distributed to " 5 E3,0 and PE3,1 through RN3 since -wmel has 3 AVPs. 1. Loop 0 in PE3,0 for intra-element feature test: Nodes 11 through 13 are executed on -wmel. 2. Loop 1 through 3 in PE3,0 for memory examination: Recall that PE3,0 contains a negated element. PE3,0 checks if w m el exists in RM8 and selects w m el from the RM8. This top portion is omitted for clarity. See Figure 3.3 for this part. | TestY ^ \ i 13( (rX) wmelO wmel 1 wme7 wme 7,11 ^ y' TestX f TestX f 14 wme7 v wme7 ■ 1 0 O . c 1 Test W Test W 26 r s------ O wme7 I wme6 j V ------- V 18 Rule 1 satisfied r A wme4 wme5 V J 23 1 wme6,7 J Figure 4.3: Snapshot of the MRN-network after the third match cycle T2- 3. Loop 4 and 5 in PE3,0 for counter manipulation: Assume that there is only one w m el in RM8, as is the case, and that the counter attached to wm el is 1. Here, again the counter on wme is treated in a similar fashion as other tags attached to a 55 data token. The counter tag is decremented by one and found zero. Of course, at the same time PE3,1 deletes wmel from the RM14 at T2 and in turn w m el,7 from LM18 at T3 by the same manner. 4. Loop 6 and 7 in PE3,0 for second inter-element feature test about variable X: It checks the MSU and determines if any wme arrived from PE0,0. As we assumed in Step 8.2, there is wme7,10 in MSU. The values of attribute c in wme7,10 of MSU and r in -wmel or RM8 are compared and found equal. Now, Rule 1 is sat isfied with X, Y, and Z instantiated respectively to *, 6, and +. Any conflict resolution strategy will proceed. Step 10: w m ell, [(a 2) (b 6)] is distributed to 4 PEs in GROUP2 and goes through the intra-element feature test in PE2,0. Upon matching, w m ell is stored in LM3. PE2,0 checks its MSU to determine whether any wme arrived from PE2,l.It finds wme7 in it. T ;heck an inter-element feature test about Y, the values of attribute b in wmel 1 of LM3 and d in wme7 of RM6 are compared. The test succeeds. wme7 and wm el 1 are put together into wme7,11,which is sent to LM7 of PE0,0 (shown in Figure 4.3). Step 11: wme6 [(a *) (c 5) (e 7) (f *)] is distributed to 3 PEs in GROUP4 and goes hrough an intra-element feature test in PE4,1. Upon matching, wme6 is stored in RM27. jThe counter on wme6 is examined and found nonzero. Recall that PE4,1 contains negated element. No inter-element feature test about W is necessary. 4.3 Analysis of the Example The condition-level parallel matching was demonstrated in step 1, 2, and 3, where the dy namic parallel distribution of wmes is made. Steps 5, 6, 7, and 8 as well as steps 9,10,11, and 12 also show the condition-level parallel matching. Negated-element handling and de- eting wmes are detailed in steps 9 and 11. The advantage of the allocation policy we 56 adopted is apparent in the above example. By allocating memories to different PEs, all the cross-checking activities for the inter-element features are distributed throughout the sys- ;em so that memory operations are performed asynchronously. Table 4.1 summarizes the above 11 steps. T, A, C, and R in Table 4.1 stand respective- y for a test of equivalence, an array operation, a check of arrival, and routing. Each operation takes 1 loop except A, which takes 2 loops. Step Executed at No. of loops Type of operations PEs involved 1 T Ao 5 T, A, C, R pe3,0, pe3,l 2 T 6 T, A, C, T, R pe2,0 thru pe2,3 3 To 5 T, A, C, R pe4,0 thru pe4,2 4 T, 4 A, C, T, R pe0,l 5 T, 6 T, A, C, T, R pe2,0 thru pe2,3 6 T! 1 T pe3,0, pe3,l 7 7 T, A, C, T, R pe4,0 thru pe4,2 8 T J 2 4 A, C, R pe0,0 9 T 2 8 T, T, A, T, T, T, R pe3,0, pe3,l 10 T a2 6 T, A, C, T, R pe2,0 thru pe2,3 11 T l 2 6 T, A, T, T pe4,0 thru pe4,2 Table 4.1: Summary of the 11 steps in the example From the above example, we identify the following six observations, each of which jerforms the typical operation in the Rete algorithm. Each operation is expressed in terms of number of loops. For observations 2 and 3, we assume that there is only one wme in any nemory. 1. TQ , intra-element feature test (a set of OINs) by 1 PE, = 1 (T from any step) 2. T{ , inter-element feature test (TIN) by 2 PEs, = 4 (A + C + T from step 2) 57 3. r n, negated-element test (NTIN) by 2 PEs, = 6 (C + A + T + T + C from step 9) 4. ra, adding a wme to a memory, = 4 (T + A + C from step 1) 5-T& deleting a wme from a memory, = 6 (T + C + A + T + T from step 9) 6. Tv routing a token from a PE to a PE, = 1 (R from any step) Furthermore, a number of loops executed to instantiate Rule 2, can be approximat ed as a summation of max{stepl, step2, step3} and max{step4, step6, step7}. Using Table jt.l, we find ^r2 = 6 + 7 = 1 3 . By the same token, Trl = Tr2 + max{step8, step9, steplO} = 13 + 8 = 21. Based on the above observations, we identify below the results that are to be compared with the simulation results in the following section: 1. R p ratio of processing time of TIN to OIN, = Tt/TQ = 4/1 = 4 2. R2, ratio of processing time of NTIN to TIN, = T JT t = 6/4 = 1.5 3. i?3, ratio of processing time of Routing to min{Tt, Tn}, = TJT{ s l/4 = 0.25 4. R rp ratio of instantiating Rule 1 to OIN, = Ttl/T0 = 21/1 = 21 5. Rj^, ratio of instantiating Rule 2 to OIN, = Tr2/TQ = 13/1 = 13 4.4 Simulation and Performance Evaluation A simulation approach has been taken to investigate the performance of the MRN-based match algorithm in a data-flow processing environment. 4.4.1 One-input nodes and array operations [n this simulation, a set of 12 wmes and three rules (shown in Chapter 3) are used. Rule 2 las been converted into a data-flow graph. Figure 4.4 shows the first condition element of 58 LM14 Incoming Built-in wmes Operand Data token Max Index RM17 from I I Middle =012 T . 4 append ror+ To middle sequence s=204 i l l i u 1 2tr I J i p : ^ n p u t Arc Input Port No. Functionality Actor No Output Arc Sample True Actor To next two-input node for second inter-element feature test Figure 4.4: A data-flow graph for nodes 11 through 14 of Rule 2. ‘ror+4’ is to rotate right 4 times, ‘rol-4’ is to rotate left 4 times. 59 Rule 2, i.e., nodes 11 through 14 of Figure 3.3. One-input nodes 11 through 13 of Figure 3.3 for intra-element feature tests are implemented through decision actors labeled 0, 2, 3, and 39 in Figure 4.4. All others in Figure 4.4 are for the two-input node 14 of Figure 3.3. Only one PE is used in this set of experiments. First, one-input nodes are tested and then hose successful in tests are appended and copied to another array. The results of these simulation runs are displayed in Table 4.2. Note that the memories are allocated to the same PE where the OINs are. Table 4.2 shows a sequence of one-input nodes takes about :L 5 time units, or 15x, using one PE. Each additional matching takes 13x. Trial No. of wmes OINs only Append & Select 1 1 17 29 2 2 0 47 3 3 43 61 Table 4.2: Simulation time units for one-input nodes and array operations i t t wmel ► wme9 wme12 wmei wmel p wmel 2 wme7 RM17 ► wme7 wme7 RM17 LM14 LM14 (c)199t by 1 PE LM14 RM17 (a)76t by 1 PE (b)142t by 1 PE Figure 4.5: Simulation results by 1 PE for independent two-input nodes processing. 4.4.2 Independent execution of two-input nodes Three conditions are tested separately one at a time. The condition elementl, [(p 1) (q 2) i r X)], is matched to a set of wmes with variations in the order (see Figure 4.5). wmes 1 60 and 2 are injected into the left sequence of Rule 2 assuming that the RM17 is filled with wmes 7, 8, 9, and 12 that have been matched with the middle sequence, [(c X) (d W)], of the same rule. Table 4.3 summarizes the simulation time. Trial 1, shown in Figure 4.5(b), indicates wm el matches against wme7 of RM17 which results in 76t. Notice in trial 5 that when no match occurs, i.e., when wme2 is placed into the network, the simulation time >ecomes 286r due to an exhaustive search in the RM17. Trial Input wme Order of RM17 Simulation Time 1 wmel 7 8 9 12 76 2 wm el 12 7 8 9 142 3 wm el 9 12 7 8 199 4 wmel 8 91 2 7 260 5 wme2 8 91 2 7 286 Table 4.3: Matching CE1 with RM17 of Rule 2 by 1 PE jl.43 Parallel execution of two-input nodes To identify the behavior of the parallel execution, two condition elements are executed in parallel, as depicted in Figure 4.6. It takes about 200-500t depending upon the number of wmes that have reached either LM14 or RM17 of the two-input node in Figure 4.6. Table 4.4 summarizes the results with various wmes coming into the network. wme9 wme7 RM17 wmel wme2 LM14 wmel 4 wme7 LM14 wmel - 4 -------► wme9 RM17 RM17 (a) 1 25t by 2 PEs (b) 108t by 2 PEs (c)229r by 2 PEs Figure 4.6: Simulation results by 2 PEs for parallel two-input node processing. 61 The first two columns in Table 4.4 show the wmes randomly coming into the network without any order and go to either left or middle sequence of Rule 2. X ’s in the table rep resent wmes that will never match wmes that come from the other sequence whereas O’s represent wmes that will match those from the other sequence. For example, the 1st row with X and X shows that 1 wme is distributed to each element, depicted in Figure 4.6(a) and there is no match. The 7th row with X O O and X X O shows that there are 3 wmes distributed to each element and that there are 2 matches. Trial Number Incoming wmes falling into Simulation Time CE1 CE2 1 PE 2 PEs 1 X X 166 125 2 o o 130 108 3 o o o o 207 146 4 X X X X 379 243 5 o x x o 327 229 6 0 X 0 x o 396 256 7 x o o x x o 521 337 8 x o x x o x 585 374 Table 4.4: Parallel execution of CE1 and CE2 of Rule 2 4.4.4 Performance evaluation With the simulation results and assumptions listed above the following results are identi fied: 1. TQ , the simulation time units for a PE to process one-input nodes and variable bind ings with one wme, is 17x and 13r for an additional wme (see Table 4.2). 2. Tt, the time units for a PE to process a two-input node with one wme, is 76x and 50t for an additional wme (see Table 4.3 and Figure 4.5). This fact validates our 62 approximation made earlier in section 4.2, where R p the ratio of Tt to TQ is 4 since Tt/T0 = 76/17 - 4. 3. Executing a two-input node with various wmes takes 125t, as shown in Table 4.4 and Figure 4.6. The 2 PEs to which 2 elements are allocated simultaneously match wmes that are randomly coming into the 2 elements. This fact again validates being 4. Since this test is done by 2 PEs, = (125/2)/17 « ■ * 4. We now calculate the time units for negated-element processing as follows: Given R2 ■ ■ = Tn/Tt = 1.5, R3 = T/Tf = 0.25, R rl = Ttl/T0 = 21 ,R r2 = Tr2/TQ = 13, TQ = l l x (simulation result shown in Table 4.2), and Tt = 125x (simulation result shown in Table 4.4 and Figure 4.6(a)). We find Tn =R2 Tt = 1.5 x 125 = 188 « 200c. When the routing time Tt is consid ered, we now find Tt = r t(l + i?3) = 125(1 + 0.25) = 156 and Tn = Ta( l + i?3) = 200(1 + 0.25) = 250. The time units to process either NTIN or TIN by 2 PEs is, therefore, not more than 300. Note that the approximation for Tn is based on the simulation result in Figure 4.6(a), where there is only two wmes, one in each memory of the two-input node. the time taken to process Rule 1 that has one regular TIN and one NTIN, is there- : 'ore approximately Tt + Tn = 156 + 250 < = 400t. In fact, in our earlier discussion in section 4.2, we approximated Trl = 21 loops and this approximation is validated as follows: Given i?rl = 21, TQ = llx , r rl = R tl TQ = 17 x 21 « 400t. For the Rule 2 that has 2 TINs, T ^ = 2 7t = 2 x 156 * * 300r. Suppose that a certain production system has rules with average number of two inter element features (1 two-input node and 1 negated two-input node) per rule and that there is only one wme matched through the one-input nodes and stored in each memory. The 63 data-flow model would instantiate a rule in 400t, which is equivalent to 0.4 msec. If there are more than 1 wme matched through one-input nodes and stored in each memory, Tv the ime taken to fire a rule will be proportional to the number of wmes stored in each memory 45], as verified by our simulation results shown in Table 4.4 and Figure 4.6(a) and (c). When there are on the average n wmes in each memory, Tt — 400« = OAn msec in the ab sence of conflict resolution. When the conflict resolution step (about 10% of total computation time [16]) is taken nto account, Tr = 0.4(1 + 10/90)n — 0.5n msec, where n is an average number of wmes stored in a memory. This Tt in turn gives 1000/0.5n = 2000/n rule firings/second. Corn ered to the analysis of the implementation of OPS5 onto DADO [23], the choice of a data flow multiprocessor gives a 2000/100« = 20/n fold in speed-up since DADO is estimated o be able to fire below 100 rules/second. 4.5 Summary The potential of data-flow multiprocessor systems for the efficient implementation of non numeric computations has been investigated in this chapter. Among the various data-flow architectures proposed, the dynamic principles have been chosen since they provide max imum parallelism in the problem domain. We have identified modifications to the implementation of the basic data-flow principles of execution before these can be used in in AI computation environment. The Rete algorithm has been chosen as a benchmark of symbolic computations on a iata-flow multiprocessor because it is the most widely used match algorithm. Inefficien cies in the implementation of the Rete algorithm on parallel machines have been identified and possible solutions to the problems have been worked out in our data-flow environ 64 ment. Simultaneous distribution of many wmes to many PEs and allocation of conditions and 0(n) iterations to different PEs have proven effective in delivering the parallelism in herent to the Rete algorithm and allowed by a given configuration of our data-flow architecture. The Tagged Token Data-flow Machine has been chosen for our simulation model. The illocation and distribution schemes we developed are exercised in our simulation. The Rete algorithm has been successfully implemented into a data-flow processing environ ment. The complete graph for a rule has been created for execution in a data-flow multiprocessor. To detect and estimate the different levels of parallelism in the production matching step, various simulations have been undertaken. Conditions in the rule are executed in par allel. Our simulation results show that our data-flow multiprocessor can fire at a rate of 1000 rules per second in the absence of conflict resolution implementation. Although a conflict resolution is not taken into account in implementing a production system here, the results we obtained reveal that symbolic computations on a data-flow multiprocessor com puter can indeed be processed efficiently. Comparison with conventional computers has shown that a high speed-up could be obtained from this approach. However, some problems in applying data-flow principles of execution remain un solved. One of the problems is the programmability in high-level language. Also, a complete implementation of conflict resolution algorithms will be next undertaken. In conclusion, it appears that the data-flow principles of execution are not limited to numer ical processing but will also find applications in some AI problems. 65 Chapter 5 Parallel Implementation on a Macro Data-flow Multiprocessor The applicability of data-flow principles of execution to matching operations for produc tion systems has been presented in the previous chapter. In this chapter, we further explore the applicability of data-flow principles of execution to production systems. It has been our observation that AI problems exhibit a behavior characteristically different from con ventional numeric computations. We demonstrate in this paper that a macro actor/token approach will best match these characteristics. Section 3 describes those characteristics of production systems from the parallel processing perspective, which we optimize by the utilization of macro data-flow principles. Characteristics of the production system para digm are identified, based on which we introduce the concept of macro tokens as a Lompanion to macro actors. A brief analysis along with simulation results is presented to show why medium grain macro actors are preferred to fine grain micro actors. Section 4 discusses several strategies about how to derive well-formed macro actors from micro ac- ;ors for production systems. A set of guidelines is identified in the context of production systems to derive well-formed macro actors from primitive micro actors.Parallel pattern matching is written in macro actors/tokens to be executed on our Macro Data-flow simu 66 lator. Section 5 gives simulation results based on our execution model, the macro data flow simulator, as well as performance evaluation. Simulation results demonstrate that the macro approach can be an efficient implementation of production systems. 5.1 Production System Processing in Macro Data-flow The basic macro data-flow principle is presented and its implications on production sys tem processing are identified from the macro perspective. 5.1.1 The macro data-flow principle A . macro actor is a collection of scalar instructions. The objective behind lumping instruc tions into one larger unit is to improve performance by exploiting locality within these larger units. Figure 5.3(b) shows a typical macro actor with a number of micro actors with in it. In fine grain data-flow computation, high overhead needed to respect the functionality in execution will result in poor performance at low levels of parallelism. In- Leed, to execute fine grain graphs (where each actor represents a single instruction), execution, communication, and computation overhead must be enforced to associate with actual computation actors to insure correct execution. Therefore, overhead problems will be inevitably created and will degrade performance. When the grain size of a graph is increased, an actor now represents several operations instead of one single operation. This can allow the overhead problem to be alleviated. The concept of macro-actors (several operations are grouped into a single actor) described by Gaudiot and Ercegovac [18], has been shown to bring a solution to the problems of a fine jrain computation model. Indeed, with actors of various sizes, the amount of non-compute operations and the cost of communication can be significantly reduced. However, one should also note that the increasing size of actors (with larger granularity) may reduce the 67 available parallelism in programs and increase memory and communication latency. Hence, forming macro-actors from fine grain micro actors is a trade-off between latency and parallelism. 5.1.2 Macro from an AI processing perspective The macros, when viewed from the AI processing perspective, preserve the AI paradigm at the high level. The basic data object in AI is a list which is a set of elements. When the Lemantics of facts or rules under question are concerned, operations on facts or rules are preferred in the form of lists rather than in the individual elements of the lists since each individual element does not carry useful information. To see the difference between macros and micros, consider for example an assertion such as BELIEVE(X Y), which means X believes the fact Y. This assertion, when imple mented for processing, can be represented as a list of three elements (BELIEVE X Y). If we break it into three elements and form three data tokens (BELIEVE), (X), and (Y) as a ^asic element to operate on, each of these three tokens does not carry useful information, lach data token by itself semantically stands for little but a data token. For the above as sertion, all three elements must be merged to give some useful information, i.e., they ^collectively operate to make an assertion. Furthermore, breaking the assertion into many elements and collecting them back to restore it will unnecessarily complicate processing. It is therefore indispensable for the jdata-flow principles of execution to process AI problems in the form of a list rather than in an individual token. We would like to have a primitive data token as a collection or list of simple elements to carry more information than the single scalar element allowed in the generic data-flow architecture. Similarly, a macro token is a collection of primitive data tokens. Consider an assertion 68 IS(X Y). This assertion, when implemented, can be represented as a list of three elements (IS X Y). If we break it into three elements and form three data tokens (IS), (X), and (Y) as basic elements to operate on, each of these three tokens carry little useful information. When viewed from the architectural perspective, macro actors will substantially re duce the overhead in matching tags of data tokens. When using dynamic data-flow principles [2], tokens carry tags which consist of the context, code block, or instance of a loop to which the token belongs. If the fact (IS X Y) is split into three data tokens and is compared with another three data tokens (IS), (X), and (Z), the tag matching time for three pairs of six data tokens is no less than three time units. However, when the two facts are compared in two lists, the tag matching time is only 1! There are, however, drawbacks in using macros. If the primitive actors are grouped and formed into macros, the parallelism in fine grain processing will be lost. The grouping must be carefully made so as to avoid inefficient macros. Putting too many micros into a macro will apparently decrease the parallelism in fine grain processing, resulting in deg radation of the performance. Forming a macro with too few micros will not give a noticeable improvement in performance. The production system paradigm which we are considering for our application domain is known to have data parallelism existing in many patterns and wmes. By putting too many micros into a macro, the data parallelism will likely diminish. The formation of mac ros is heavily dependent on problem domain. There must be a set of guidance criteria for the formation of macros. In section 3 we shall identify several heuristics from the produc tion system paradigm and establish criteria to guide the grouping process, thereby producing efficient and well-formed macros. In the mean time, we will discuss in detail the effectiveness of the macros for pattern matching operations. 69 5.2 List Comparison Operations on Macro and Micro To give an intuitive overview of the macro over micro, a simple list of comparisons is giv en to illustrate the performance of the macro. A complete analysis will be given shortly. 5.2.1 A micro-actor perspective Consider a typical match operation in OPS5 which compares the following two condition elements: Pi=[P (a X) (b Y)] andP2=[Q (a 1) (b 2)], where X and Y are variables and all others constants. In order to achieve the maximum parallelism in fine grain micro-actor op- jerations, a data-flow graph for the match operation can be drawn much like the one in Figure 5.1. The two lists are split into ten primitive data tokens: P, a, X, b, Y, Q, a, 1, b, and 2. The ten data tokens form five pairs: (P Q), (a a), (X 1), (b b), and (Y 2), each of jwhich would then be allocated to a different PE. Five comparison operations are executed in parallel by five PEs to achieve maximum parallelism existing in the fine grain process ing. The results of all three comparison operations are reported to actors allocated to PEO. k s shown in Figure 5.1, a final result of the matching operation is obtained from actor 9 of PEO after three levels of AND operations. To further examine the number of operations and the time units for this matching op eration, consider a hypothetical data-flow machine consisting of 64 processing elements TE). Between two neighboring PEs are three facilities: two communication nodes and a link. Assume that each PE has four facilities connected in a pipe (we shall come back to this in the following section). Let us briefly define various timing units which we are going to use throughout this paper. • x = Processing time for each processing facility in the system, 70 • tr = Token routing time to a neighboring PE, • tc = Time taken for a comparison operation, • fa = Time taken for an and operation, • fpe = Time taken for a PE to produce a result token after receiving token(s), Assuming for a moment that there is no token waiting for dyadic (2 input) operations, we have tc=ta=t^=4x and tr=3x. The total time taken to process the matching operation by 5 PEs would then be as follows: *micro = tc+tr+ta+tr+ta+tr+ta = 4*pe + 3tr = 4(4x) + 3(3x) = 25x. Figure 5.1: A data-flow graph in micro-actors for the comparison of two lists. Now, consider a typical match operation, shown in Figure 5.1, which compares two ists, (ai,...,an) and (bj,...,bn). To achieve the maximum parallelism existing in the fine grain micro approach, the n-pairs can be simultaneously compared in n PEs, each of which is connected through a (log n)-dim hypercube. Assume that two neighboring PEs must AND 71 communicate through three facilities (two communication nodes and a link) and that each PE consists of four facilities connected in a pipeline fashion. If each facility take x to ex ecute, the total time to process n comparisons on n PEs would be *micro,n = *c + ( a + [log2«l)ta + (P + riog2«l) * r + Y =4(l+[log2nx])T+3[log2n]x =(4+7[log2n])T (5-1) For the matching operation shown earlier in the section with two lists Pi=[P (a X) (b Y)] and p2=[Q (a 1) (b 2)], we verify from (Eq. 5-1) that tm^ cm-25x by substituting n for 3. Note that in this simple calculation, it is assumed that no token waits in the matching/ store unit of each PE to simply show how the list matching can proceed in a data-flow en vironment. Furthermore, all the comparison actors are ideally allocated to neighboring PEs such that a token routing can be done in a step, i.e., 3x (which may not be realizable). If any one of the six tokens happened to arrive at the designated PE late, then the re sults of other two comparison operations would have become useless. AI processing requires five condition-wme pairs to be executed in parallel such that the current state is correctly reflected to knowledge-base. If they were executed at different time, the current state of the knowledge is not correctly reflected and the result of the matching operation is not guaranteed to be correct unless some sort of synchronization mechanism for the knowledge-base is provided. In such a case the AI problem solving system under consid eration will not yield desirable solutions. 5.2.2 A macro-actor/token perspective Die macro approach provides a solution to the above problem of comparison operation. In the macro approach, the basic unit of the data-token operating on data-flow graph can be _______________________ 72 nade in the form of list. It preserves the semantics of wmes and correctly reflects the change of wmes to knowledge-base since both the wme and the condition element are compared as a whole. The number of PEs used in this operation is one and therefore there is no communication overhead. The total time taken to process the matching operation in the macro approach is simply proportional to the number of elements in the condition el ement. In general, the total time to compare two lists with n elements each on a PE, would be: 'macro,1 = ntc+(n-l)ta = 4 (2 /i-l)t (5-2) The ratio of the time taken for macro actors with 1 PE to micro actors with n PEs is R = 'macro,l/'micro,n = 4(2n-l)/(4+7[log2«l) = 0(n/\og2n). (5-3) Note that in the micro-actor approach, we assumed that the token routing is done in jane step, i.e., 3x. In general, such one-step routing for this kind of matching operation is impractical for a 6-dimension hypercube topology. Because 6-dimensional hypercube in terconnection network provides not more than 6 PEs in one step routing distance, to have more than 6 comparisons and routing performed in parallel requires very sophisticated al- ocation and routing policies. For the match operation in Figure 5.1, where n=5, we obtain tmacro=36T from (Eq.5- 2). Comparing with rmicro=25x the micro actor approach, we lose llx . Considering that ;he average number of elements in a list is five for production systems, the ratio becomes R »5/(log5) *-1.7, as can be verified from (Eq.5-3). The macro actors will outperform since there is no communication overhead involved in macro actors. The analysis described above is intended to show the macro approach and is by no means a complete analysis. More detailed simulation results and their performance evalu ation are presented shortly. ____________ 73 5.3 Formation of Macro Actors In the previous section, we discussed why macros are preferred over micros for processing AI problems. We shall now discuss in this section how we can obtain well-formed macros from micros for the PS paradigm. More precisely, we shall construct well-formed macros :o implement rules. 5.3.1 Guidelines for well-formed macro actors A macro-actor is a collection of scalar instructions. The objective behind putting many in structions into one is to improve performance by exploiting locality in the instructions. As seen from Figure 5.2, the set of ordered micro actors A={alv..,an}, is converted to a set of ordered macro actors B={bi,...,bm}, where b{ is a partially ordered set of micro actors and m<n. Well-formed macros (yvfins) would give a substantial performance im- Drovement as we have seen in the previous section. However, a question immediately arises as to how to form macros from a set of micros. Ill-formed macros will not simply give a desired performance enhancement but would potentially degrade the performance. Jlllill W m & P ' Figure 5.2: Mapping micro actors into macro actors. A set of macro actors, B={bi,...,bm}, is derived from a set of micro actors, A ={ai,...,an}, based on the guid ance criteria, gi,...,gm, where m<n. 1A A criterion must be carefully identified to obtain wfms. There can be many criteria we can possibly apply to a set of micro actors to form wfms. Yet, the parallelism existing in :he problem domain should be preserved after the formation of macros. Before we discuss :he formation of macros, let us give definitions we will use throughout this discussion. Let A be a set of micro (or primitive) actors, {a1,...,an}, and 7 \ be a set of data tokens, { * ! , . . manipulated by A. Let B be a macro actor derived from A such that BQA. Let se the time taken to process a micro actor on a PE. Let tn be the time taken to process A on n PEs. Let Tj be the time taken to process a macro actor on a PE. Let r be a ratio of 7\ :o tn, i.e., r=7\/rn. A macro actor B is said to be well-formed if r<e, eis2. An objective behind setting such a ratio is in the fact that if the processing time of the macro actor is not more than twice the processing time of the corresponding micro actors in an ideal environment, we shall form a macro actor from micro actors. The word “ideal environment” refers to an ideal allocation of micro actors on n PEs and an ideal routing policy of the data tokens. As we discussed earlier in section 3, achieving such an ideal en vironment would be impossible. The macro actors would be preferred over micro actors because of no data token routing, no waiting for the mating data token (for two operand instructions), etc. In this study, we simply set e to 2 for macro actor formation. We now briefly describe the formation of well-formed macro actors. Let /j be a set of tokens input to and Ox be a set of tokens output from actor ax . We denote the dependence relation for avcifEA as follows: If OjCjj such that i*j, ax A .a -^ for all i and j, where Z is a dependence operator which implies that ax must be executed before 3j. By applying the dependence relation CjZatj to A, we obtain an ordered set of actors, jp={£>l,...,£>m}, where bf={a | fljZoj is not true}. The dependence distance for bx ,bfEB is de fined as d(bx ,bj)=di j=i-j. The maximum dependency distance dmax for B is m - 1. 75 We list below five guidelines for the formation of wfms: 1. (Flow Dependency) Let and aij be two actors. A macro actor M={flj,flj} can be defined if 0 \Q j and ^i,j=L where 0 \ is an output of aj, /j is an input to ay and dj j is a dependence distance between a\ and ay 2. (Encapsulation Effect) Let a be a comparison actor and b be a set of true/false actors {b^,...,bn}. A macro actor M={a,b} can be defined if OaQf\, and da b=l. This guideline eliminates unnecessary true/false actors. 3. (List Processing) Let A={ai,...,an}, and L be a list of m data tokens {fj,... dm}- L e t/^ U .-.U /n and O = OiU...UOn. A macro actor M -{ a j,...,an} can be de fined if IfZlUOUL for l^issn and O ^ J for A :=dmax. This guide preserves the semantics of a list. 4. (Array Operations) Let A ={aj,...,«„} and# = {Append, Select, Create, Copy}. If AHZ?*0, a macro actor M can be formed on A -B . Separating array opera tions from the macro actors removes the potential bottleneck in array operations. 5. (Interconnection Topology) Let A={a1,...,an} such that dmax = Max{dy} =1 for all flj,ajG4. Let m be a dimension of hypercube interconnection network. If m<n, a set of macro actors { k} is defined where Mj={ai,...,am}, k= \n/ m \ and 1 *si*sk. 5.3.2 An example on the conversion process Using these five guidelines, we shall now write several macro actors to implement a sim ile rule. The functionality of the rule that are important to implement production systems 76 (vill be taken into account. Consider the following OPS5-like rule: Rule: [A (Y Z)] [B (c X) (d Y)] [C (p 1) (q 2) (r X)] ;;CE 1 ;;C E2 ;;CE 3 [Modify B (c Y) (d X)] "Action 1 Suppose that we have a Rete condition-dependency network constructed for the above ode [22]. Figure 5.3 shows a conversion process for the first condition element. A micro data-flow graph for the comparison operations on two elements, [A (Y Z)] and [A (B C)], is depicted in Figure 5.3(a) and the corresponding macro actor in Figure 5.3(b). Figure 5.3: A conversion process for the comparison operation on two lists, (a) a micro actor data-flow graph, (b) a macro actor. Rule Guide3 is applied to this conversion process as follows: Let A be a set of five ac tors { ai,...,^ } (three comparison actors and two AND actors), and L be a list of six data tokens {A,Y,Z,A,B,C}. Let be the output token of an actor flj. Applying a dependency Condition Element WME actor # MACRO EQUAL result token of actor #4 77 distance to the set A, we partition A into three setsA j, A j, and A3, where A 1={a1,a2, a3}, ^ 2 = { fl4}» and A 3 ={a5}. We then find dmax=2 because M ax{dA1A2, d A l,A3> rfA2,A3> = Max {1,2,1} = 2. We also observe that I = IiU ...U I5 = {L,rh ...,r5}, 0=Ol U ...U 05 = {rlt...,r5} (5-4) From (Eq.5-4), we have IjQlUO for l^i^5, and OA3=r50 . Therefore, Guide3 is sat isfied and a macro actorM = can be formed. After GuideS is applied to the data flow graph for the first condition element of the above rule, we obtain a graph shown in "igure 5.4. Note that those five comparison micro actors of Figure 5.3(a) have been re placed by a single actor #1, comp, in Figure 5.4(a) for the simplicity of presentation. In Figure 5.4(a), there are 2 actors related to array operations: ‘append’ and ‘select.’ GuideS states that if there exists an actor av in A such that aj£{append,select,...}, then a macro actor M is considered on the set A -a v Applying the GuideS to the graph partitions it into 3 sets of micro actors A \, A 2 , and A 3 , where A i= {ai,...,ai 2 }, A 2 ={ai 3 }, and ^ 3 ={a l 4 } * 1 ° first step of the conversion process, the set of actors however is converted to a macro actor A/j. We therefore treat Afj as a micro actor in the following discussion. Partitioning the graph into three graphs is shown in Figure 5.4(a). The last rule stems from the fact that there are six true/false actors in A j (Figure 5.4(a)). Guidel states that if there is a comparison actor A which immediately affects a set of true/false actors B such that Oa C/b and d p ^ - l , then we should form a macro actor M={AJB}. The macro comparison actor M just described above is for the first condition element of the rule shown earlier. The argument discussed above applies to other condition ele ments and shall not be discussed further. The set of guidelines described above is by no means a complete set. It can, however, serve as a starting point for the formation of wfins ______________________________78 © Index for memoryl AB C l IncominQ tokens Name of memoryl Name of '■ memory2 comp> 1-5, Index for memory2 0 0 0 / tV Of A j _ (a T r 10 0 append 13 After complete comparison (b) A B C Input port number MACRO macro actor number Output port number append select A B C 12 Figure 5.4: Converting a micro-actor data-flow graph to macro actors: (a) micro-actors (b) macro actors. 79 5.4 Simulation and Performance Evaluation \ simulation approach has been taken to investigate the performance of the MRN-based match algorithm in a data-flow processing environment. 5.4.1 Simulation results \ simulation approach has been taken to identify the performance of production system processing on the Macro Data-Flow Multiprocessor (MDFM) simulator [70]. The ma chine contains 64 PEs interconnected by a 6-dim hypercube network. The target production system, which we call a ‘generic production system,’ has 15 rules, all of which are written in micro actors based on the parallel version of the Rete algorithm [59]. A typical OPS5-like rule was shown in the previous section. Each rule has on the av erage 5 condition elements, 2 action elements, and 3 two-input nodes. Each condition element has on the average 3 one-input nodes and at least one variable in the value-part (see [16] for details). With the guidance criteria we developed, the micro actors for the rules are written in macro actors, each of which contains on the average 50 micro actors. Tables 1 through 3 show simulation time, network load, and speedup. Table 1 lists simulation time units and network load for sequential and parallel distribution of wmes vith various number of PEs. Table 2 derives the ratio of sequential distribution (SD) to parallel distribution (PD). Parallel distribution of wmes yields a maximum of 4.4 speedup and reduces a maximum of 2.5 times the network load over sequential distribution of the original Rete algorithm. Regardless of the number of PEs used, parallel distribution pro vides an average of 2.5 speedup and reduces the network load on the average 2.4 times. jTable 3 shows simulation results on speedup of using different number of PEs. Various curves for the simulations results are depicted in Figure 5.5. 80 Number SD1 PD1 SD2 PD2 of PEs T L T L T L T L 1 23619 0 8955 0 23544 0 8912 0 2 13269 9510 4589 4017 12161 9432 4840 3792 4 7239 15738 2614 6901 5873 15078 2895 6406 8 5047 22491 1701 9643 3296 21389 1545 8935 16 3519 26568 1423 11841 2101 26822 1038 11101 32 3336 33364 1314 14157 1434 31991 763 12874 Table 5.1: Simulation time, T, and network load, L, for a production system with 15 rules executed on MDFM. SD = Sequential Distribution, PD= Parallel Distribution. Number SD1/PD1 SD1/PD2 SD2/PD1 SD2/PD2 of PEs T L T L T L T L 1 2.6 N/A in N/A 2.6 N/A 2.6 N/A 2 2.9 2.4 2.7 2.5 2.7 2.3 2.5 2.5 4 2.8 2.3 2.5 2.5 2.2 2.2 2.0 2.4 8 3.0 2.3 3.3 2.5 1.9 2.2 2.1 2.4 16 25 2.2 3.4 2.4 1.5 2.3 2.0 2.4 32 2.5 2.4 4.4 2.6 1.1 2.3 1.9 2.5 Table 5.2: Ratio of sequential distribution to parallel distribution. No of PEs SD1 PD1 SD2 PD2 l 1.00 1.00 1.00 1.00 2 1.78 1.95 1.94 1.84 4 3.26 3.43 4.00 3.08 8 4.68 5.26 7.14 5.77 16 6.71 6.29 11.21 8.59 32 7.08 6.82 16.42 11.68 Table 53: Speedup of a generic production system executed on MDFM. 5.4.2 Performance evaluation From the simulation results, we verify that: First, our parallel network with multiple root nodes reported in [21] gives an impressive improvement over the original sequential Rete network: the number of groups we had among condition elements of our generic produc tion system is 3. The simulation time of the sequential Rete network, regardless of the number of PEs used, is almost always three times that of our parallel network, as seen from Table 1 and Figure 5.6. 35000 35000 SD1 PD1 SD2 PD2 SD1 PD1 SD2 PD2 30000 30000 ~ 0 25000 £ 25000 20000 20000 O 15000 X L U 10000 10000 5000 5000 12 16 20 Number of PEs 24 28 32 32 24 28 12 20 Number of PEs e E F 3 O 8 L U * 5 o * £ Figure 5.5: Simulation results, (a) execution time, (b) network load. 5 5 SD1/PD1 SD1/PD2 SD2/PD1 SD2/PD2 o ~ 4 O 3 2 1 0 32 24 26 20 O 3 3 O 2 O 3 CO tr SD1/PD1 SD1/PD2 SD2/PD1 SD2/PD2 Number of PEs 12 16 20 24 Number of PEs 28 32 Figure 5.6: Ratio of sequential distribution to parallel distribution, (a) execution time, (b) net work load 82 Second, the data-flow principles of execution can, not only efficiently perform nonnu meric computation, but also yield an impressive performance over the conventional von Neumann model of execution for production system processing. From the speedup curve of Figure 5.7, we find that data-flow principles of execution can indeed yield a 17-fold Lpeedup when 32 PEs are used, regardless of the type of matching algorithms. 32 SD1 PD1 SD2 PD2 28 24 O. 20 3 TP « 2 1 6 Q . tn 32 24 28 20 12 16 4 8 0 Number o f P E s Figure 5.7: Speedup of sequential distribution vs. parallel distribution. 5.5 Summary A . macro actor approach for AI problems, specifically production systems, has been dem- Lnstrated as an efficient implementation tool. Characteristics of production systems from parallel processing have been discussed to suit the macro data-flow multiprocessor envi ronment. A simple example on comparison operations has been explained in detail from the macro perspective. Several guidelines have been demonstrated to form wfrns. A con dition element of a rule in PS is converted to macro actors. The results of a deterministic simulation with 15 rules and more than 100 condition and action elements on the macro 83 data-flow simulator have revealed that the macro approach is an efficient implementation for the AI production systems. Indeed, the macro approach gives a 17-fold speedup on 32 PEs. Furthermore, our parallel matching algorithm with multiple root nodes gives an ad ditional speedup of 3, regardless of the machine used. Assessments of the data-flow systems on productions systems have proven effective and we are currently investigating ssues related to parallel firing of multiple rules toward true parallel production systems. 84 Chapter 6 (Performance Evaluation of the Multiple Root Node (MRN) Approach An implementation of the MRN-based production system in Common Lisp is presented and several distinctive features of the MRN-based production system are listed in the first section. Organization of the program as well as data structures used in the program are briefly presented to give a feeling of how the implementation is carried out. Those five Denchmark production system programs chosen for this study are introduced in Section 2 along with various statistics gathered at compile time on the five programs. Among the statistics, it is the grouping information that makes the MRN-based production system in terpreter different from other production system interpreters. Section 3 presents various data points obtained at runtime on the benchmark programs. All the statistics are gathered in terms of production cycle numbers. Among the data, an ^execution time profile of matching one-input nodes is measured in real time in terms of production cycle numbers. To ensure the correctness of the measurement, another impor- Itant criterion is measured, the number of comparison operations, again in terms of production cycle numbers. Performance evaluation on the two approaches, the MRN approach and the Rete-based ______________________________________________________________________________________ 85J 3PS5, are made in Section 4 in terms of number of comparison operations and execution :ime for one-input match step. The last section, Section 5, summarizes this chapter. 6.1 Implementation of the MRN-based Interpreter Implementation details of the MRN-based production system are presented in this section. Several distinctive features embedded in the implementation are iterated to show how Lisp systems can be constructed. Data structures as well as the organization of the program are briefly explained to help understand the implementation. 6.1.1 Characteristics of the implementation The MRN-based production system has been completely implemented in Common Lisp from scratch. It is listed in the Appendix and is currently operational. Its functionality is 100% up to the Rete-based OPS5 production system interpreter. The size of the program is approximately 3,000 lines in Common Lisp. The main features of the implementation are: • It is free of global variables except a single one which traces the number of wmes generated during the lifetime of a particular production system program, • Over 90% of the functions are written in tail-recursion, • A simple data structure using defstruct of Lisp is used. A major reason to avoid using global variables is in that the program should be easily sorted to various multiprocessor environments without having to change much of its source codes. By not using global variables, the potential communication and synchron ization overhead between processes would be reduced when ported to a multiprocessor ^environment. Furthermore, encapsulating the scope of variables within a function would allow us to analyze the data dependency, if any, between functions, thereby resulting in easy program partitioning. The ultimate goal of parallel processing, extracting and exploit 86 ing more potential parallelism from given codes, would then become within a reachable distance. This claim yet has to be substantiated and will be left as a future research topic. Much effort has been spent on writing the program in tail recursion. One reason to do so was also due partly to the portability to various multiprocessor environments. When functions are written in tail recursion, it can be much easier to understand its behavior since the program tracing is automatic. This easiness in understanding of the behavior of a program will directly translate into an easy conversion to iterations. Those vectorizing compilers or parallelizing compilers can be readily used to convert the Lisp programs into a language suitable for vector or multiprocessors. The third feature, a simple data structure defstruct, would not necessarily be considered a good feature for parallel environments. The main employing defstruct is that it will sim plify the implementation process due to its structuredness. This structured approach will shield the data dependency between data, i.e., dynamically changing memories in the net work. However, this dynamic data structure consumes more memory space than other data structures such as lists. There is certainly a trade-off between the runtime memory space and the easiness in programming and debugging. 6.1.2 Descriptions on data structures The data structures used in the implementation are illustrated in Figure 6.1. An array is used to implement production memory, where each element is a pointer to a production. Each production is implemented in a structure using the Lisp construct defstruct. There are seven entries in the structure of each production (the last one is used for debugging pur pose): • name is a name of the production specified in the production system program. • no is a unique number assigned to a production. It is for internal use only and is not related to the actual production number specified in the production system program. • cond is a list of condition elements, each of which is implemented in a structure. 87 Production Memory Rule 1 DC Rule 2 ^ R u len ^ ^n am e) GE) Q cond^i1 '! GE) Q nesQ tt-lasf Condition Element i i i t D j j i i i i i i i m H i i i i i H i m i i i i i i i m i i i m i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i K GE) Condition (LHS) """“An, f ^ & L r C o p J ) HR C A E 0 I (n a m e ) / \ • H ( e v a r) • 1 N - ( AEm) I C n o ) r a v p s ) Action (RHS) N ------------ 1 , Action Element GD G E ) ( evar^) (no-oin§) ( ty p e ) Goinsj (no-vars) ^ v a r s ) p o th e r) ( flag ) GD GE) (avp-t-) & ^omem) Q m err) Figure 6.1: Data structure used in the MRN implementation. 88] 235353532348535353484848 More details are presented below. • act is a list of actions, each of which is implemented in a structure. More details are presented below. • vars is a list of variables appearing in the LHS of the production. • negated is a list of flags, where each flag indicates the appearance of a negated con dition element. • tt-last-fired is a time tag to indicate the time, i.e., the production cycle number at which the production is fired, • last-instantiation is a list of instantiations, where each instantiation is a list of wmes based on which the production fired (not shown in Figure 6.1). Each condition element is implemented in a structure with 16 entries. The first entry, no-in-rule (or no in Figure 6.1), is an ordinal number (or position) of the condition element jwithin the LHS. For example, if a LHS has 3 CEs, the first CE is assigned to 1, the second jCE to 2, and so on. This information is useful in deciding which condition element to ac cess instead of searching through the whole list of CEs. It is found that this information helps save the processing time at runtime. The second entry, name, is simply a name of the CE, if there is any. The third entry, evar-name (or evar in Figure 6.1), is a name of the element variable assigned to the CE, if any. If no element variable name is assigned to the CE, the ordinal number, no-in-rule, will jbe inserted into the slot. The fourth entry, no-oins, indicates the number of attribute Value 'Pairs in the CE. The information, no-oins, might be considered redundant since it can be readily obtained by using the Lisp function length. However, in order to reduce the execu tion time, as much information is extracted at compile time as possible. The fifth entry, type, indicates whether the CE is a positive or negated condition ele ment. The sixth entry, oins, is a list of one-input nodes. The seventh entry, no-vars, shows how many variables appear in the CE. The eighth entry, vars, is a list of variables appeared in the CE. 89 The ninth entry, other-atr (or other in Figure 6.1), is a list of attributes extracted from Other condition elements. This information is designed exclusively for two-input nodes and the length of this entry indicates the number of tests required when checking the con sistency of variable binding is initiated in the two-input node. For example, suppose that the current CE is the second one in LHS and has a variable A- . Suppose further that the vari able X also appears in the first CE of the LHS. The entry other-atr will have an attribute of the variable X of the first CE. Extracting this kind of information at compile time will re duce a substantial amount of runtime since it does not require any search time. The tenth entry, mem-changed (or flag in Figure 6.1), is a flag to indicate whether there is any change in the memory attached to the CE. The main purpose of doing so is again to reduce runtime. The memory will be accessed only when this flag is set to true. The value of the flag is also propagated up to the production level to signal that whether this rule should initiate the two-input nodes assigned to the production, if any. The eleventh entry, var-to-cmp (or v-t-c in Figure 6.1), is a list of variables to be tested when the two-input node ever initiates consistency checking. The length of this list is a number of tests for the two-input node. The twelfth and thirteen entries serve the same pur pose. The twelfth entry, atr-to-cmp (or a-t-c in Figure 6.1), contains a list of attributes, each !of which corresponds to a variable in the eleventh entry, var-to-cmp. The thirteen entry, avp- to-cmp (or avp-t-c in Figure 6.1), is nothing but a list of Attribute Value Pairs, where each pair is the one which contains a variable. The fourteenth entry, op-to-cmp, is a list of operators. When a condition element con tains an Attribute Value Pair such as (attr <> <A>), the not-equal operator, <>, is extracted from the pair and inserted in the op-to-cmp list. The last two entries are two memories, omem and tmem. The entry, omem, is a list of jwmes, each of which matches the current CE. The last entry, tmem, contains a list of merged wmes. Each merged wme is a list of wmes. If the current CE is the very first one in the LHS, tmem will contain a list merged wmes, where a merged wme is a list of one wme. If the current CE is the second one in the LHS, tmem will contain a list merged 90 jwmes, where a merged wme is a list of two wmes. One is the one from omem of the current CE and the other is the one propagated down from tmem of the first CE. For the third CE, tmem will have a list of merged wmes, where a merged will be a list of three wmes. One is from omem of the current CE while the other two wmes are from tmem of the second CE. The data structure used for actions is quite simple compared to the one used for con dition elements. An action is implemented in a structure with five entries, as illustrated in Figure 6.1. The first entry, op, indicates whether the action is make, modify, or remove. The second entry, name, is a name of the action if there is any. The third entry, evar-name, is the element variable name which connects an action of RHS to a condition element of LHS. If there is no element variable name assigned to an action, an ordinal number of the corresponding CE is inserted at compile time. The fourth entry, no-in-rule (or no in Figure j6.1), is an ordinal number of the corresponding CE. The last entry, avps, is a list of At tribute Value Pairs. The data structure used for wmes is a structure which contains five entries. The first entry, tag, is a time tag to indicate at which production cycle it is generated. The second jentry, no, is a number in which it is created throughout the lifetime of production cycles. Ibis reflects the recency of a wme and is used in rule selection step. The third entry, type, indicates it is to be added to or deleted from working memory. The fourth entry, name, is the working memory element name. The last one, avps, is a list of attribute value pairs. 6.1.3 An example on data structures To give a better understanding of how data structures organized, an example is presented jelow. A rule shown below is taken from Monkey and Banana: (p mb.11 ((goal A status active A type holds A object-name <o>) <goal>) ((phys-object A name <o> A weight light A at <p> A on <> ceiling) <object>) ((monkey A at <p> A on floor A holds null) <monkey>) - (phys-object A on <o>) 91 (modify <object> A on null) (modify <monkey> A holds <o>) (modify <goal> A status satisfied)) After the above rule is compiled, we have the following structure with all the slots filled with information we mentioned earlier: #s(rule name mb.11 no 11 cond (#s(ce no-in-rule 1 name goal evar-name <goal> no-oins 3 type + no-vars 1 vars (<o>) op-to-cmp (equal) other-atr nil mem-changed nil var-to-cmp (<o>) atr-to-cmp (object-name) avp-to-cmp ((object-name <o>)) oins ((status active) (type holds) (object-name <o>)) omem nil tmem nil) #s(ce no-in-rule 2 name phys-object evar-name <object> no-oins 4 type + no-vars 2 vars (<o> <p>) op-to-cmp (equal) other-atr ((1 object-name)) mem-changed nil var-to-cmp (<o>) atr-to-cmp (name) avp-to-cmp ((name <o>)) oins ((name <o>) (weight light) (at <p>) (on <> ceiling)) omem nil tmem nil) #s(ce no-in-rule 3 name monkey evar-name <monkey> no-oins 3 type + no-vars 1 vars (<p>) op-to-cmp (equal) other-atr ((2 at)) mem-changed nil var-to-cmp (<p>) atr-to-cmp (at) avp-to-cmp ((at <p>)) oins ((at <p>) (on floor) (holds null)) omem nil tmem nil) #s(ce no-in-rule 4 name phys-object evar-name nil no-oins 1 type - no-vars 1 vars (<o>) op-to-cmp (equal) other-atr ((1 object-name)) mem-changed nil var-to-cmp (<o>) atr-to-cmp (on) avp-to-cmp ((on <o>)) oins ((on <o>)) omem nil tmem nil)) act (#s(ae op modify name phys-object evar-name <object> no-in-rule 2 avps ((on null))) #s(ae op modify name monkey evar-name <monkey> no-in-rule 3 avps ((holds var (1 object-name)))) #s(ae op modify name goal evar-name <goal> no-in-rule 1 avps ((status satisfied)))) vars ((<o>) (<o> <p>) (<p>) (<o>)) negated (1 (+ + + -)) tt-last-fired 0 last-inst nil) Since all the names are self-explanatory as we described earlier, we shall not go through the structure here. 92 6.1.4 Organization of the program rhe MRN-based production system interpreter consists of three major parts: pre-process- ng, production-cycle, and post-processing, as depicted in Figure 6.2 (also found in the urogram listing attached to Appendix). pre-processing post-processing production-cycle V ...... < |||||in . matching-step ; r ' selection-step m i m m action-step C match-oins ~ ~ T ~ match-tins J ^ collect-matched-rules ^ Q select-a-rule ^ f prepare-wmes-to-fi re-rule} , ...... V ^ (print-info-on-a-selected-rule) C update-selected-rule J ~ T ‘ reset-flags J Figure 6.2: Organization of the MRN-based production system interpreter. The first major part, pre-processing, takes in the production memory and the initial working memory, and compiles into data structures described above. After the rules are 93 compiled, information regarding groups for the MRN network is extracted from the com piled rules. Note that the function names shown in Figure 6.2 are those actual ones used I in the program and should be self-explanatory. The second major part, production-cycle, consists of three functions: matching-step, se- ection-step, and action-step. The function matching-step calls two functions: match-oins and match-tins. The function match-oins matches wmes and condition elements, and returns a modified production memory since those matched wmes are stored in memories attached to each condition element. After match-oins completes, the function match-tins is called to find if any wmes can be merged and propagated down the network. The second function of production-cycle is a function selection-step which in turn calls two functions: collect- matched-rules and select-a-rule. The third function of production-cycle is a function action- step which consists of four major parts: prepare-wmes-to-fire-rule, prirrt-info-on-a-selected- rule, update-selected-rule, and reset-fiags. The third major part, post-processing, is designed exclusively for recollection of the sta tistics which are measured and stored at runtime in the network. There are many functions | jwritten for performance analysis but not listed due to space constraint. Those function names used in the program were carefully selected such that they should be self-explana tory. We will not further discuss them. A complete source code is found in Appendix. 6.2 Measurements at Compile Time Five production system programs were chosen as benchmark programs. They all were ex ecuted on a sequential uniprocessor, Sun 4/90. Both the Rete-based OPS5 and MRN-based interpreters were tested to measure their performance. Both interpreters produce exactly the same results, in terms of the number of rule firings, the rule firing sequence, the num ber of wmes generated, the wme generation sequence, etc. Some facts on the five programs are measured and listed in this section along with the group information which is central to the MRN network. 94 6.2.1 Benchmark production system programs There are several reasons in using benchmark systems: among them, it would allow us to compare the relative performance of different production system interpreters. Stating the results of a particular benchmark on two different systems (or approaches) usually causes people to believe that a blanket statement ranking the systems in question is being made. I Hie proper role of benchmarking is to measure various dimensions of production system performance and to order them along with each of these dimensions. At that point, in formed users will be able to choose a system that meets their requirements or will be able o tune their programming style to the performance profile. The first problem associated with benchmarking is knowing what is being used. The ive programs chosen for performance analysis are well-known ones and widely used among the researchers in the production system community. One should note here is that the size of production systems is not central to its performance evaluation. Indeed, Gupta has commented in his thesis work that (1) we should not expect smaller production sys tems (in terms of number of productions) to run faster than larger ones, and (2) there is no reason to expect that larger production systems will necessarily exhibit more speedup from parallelism [24]. The five programs used in this study are briefly described below: 1. Brick Sorting is a simple program for a robot to pick a brick from a pool of bricks and place them in ascending or descending order in size. 2. Monkey and Banana (MAB) is a program where a hungry monkey grabs a ba nana hanging from the ceiling, given a couch and a ladder in the room. 3. NM onkeys and M Bananas (NMAB) is an extension to the MAB with n mon keys and m bananas, where each monkey is to find a banana for itself, given a similar room environment with more bananas. 4. Waltz Labeling is a modified version of the original Waltz labeling algorithm in that it assigns a label (chosen from a finite set of possible labels) to each edge of a line drawing, where each label gives semantic information about the nature 95 of the edge and the regions on each of its sides. 5. N-Queen is a classical problem which places n queens on nxn board such that each row, column, and diagonal contains no more than one queen. The following information collected from the above five production system programs characterize various aspects of the benchmark programs. Production No of No of No of OINs No of No of Avg Rule WMEs System Rules CEs Acts Executed TINs Groups CEs/Grp Firings Generated Brick 7 16 15 2336 2 4 4 20 60 MAB 25 70 43 8409 59 5 14 16 58 5MAB 23 60 43 45195 39 5 12 113 278 Waltz 48 198 100 174891 90 5 40 245 297 8-Queen 19 68 71 151985 36 6 11 1044 3866 Table 6.1: Characteristics of benchmark production systems Note from Table 6.1 that the size of each program is not in the order of hundreds but in the order of tens. However, the problem size is unimportant when analyzing the paral lelism in production system programs as Gupta has concluded in his thesis [24], Our purpose is to measure the relative performance of the MRN approach in terms of execution time along production cycles. What is important in our performance evaluation is the in formation on groups of a production system program. Indeed, we find that even the small size of a production system program such as the Brick Sorting problem would suffice, as will be discussed shortly. 6.2.2 Measurements on grouping |Grouping the condition elements (CEs) based on the number of Attribute Value Pairs (AVPs) is central to the MRN approach. This would allow us to partition the production 96 systems into many pieces each of which can be processed independent of the incoming newly generated wmes. Measuring the distribution of condition elements over groups at compile time, we can predict the potential parallelism in the given production systems. Figure 6.3 depicts the distribution curve, where the jc-axis shows group numbers and the y-axis the percentage of each group in a particular production system program. o > A * -» C Q ) O « CL 70 60 50 40 30 20 Distribution of Condition Elements B 3 Group 2 H I Group 3 E 2 Group 4 □ Group 5 Group 6 J L Brick M A B N M AB Waltz 70 Distribution of Condition Elements 60 Brick MAB 50 Waltz 40 D > 30 20 Group Number Figure 6.3: Distribution of condition elements over groups measured at compile time. 97 In the Brick Sorting problem (Figure 6.3), there are four different groups, where a group-n contains condition elements, each of which has n Attribute-value Pairs. For ex- jample, the first group Group2 occupies slightly above 30% whereas Group 5 does 6% of the total number of condition elements. To be more specific, Group 2 has 5 CEs whereas Group5 has only one CE, where the total number of CEs is 16. As we can see from Figure 6.3, the condition elements of Monkey and Banana are rel atively equally distributed over four groups, compared to that of Waltz Labeling where one group is dominant over other groups. This dominance of a group over another is not desirable and does not yield a good performance. We shall come back to this analysis after we run the programs on both approaches. 6.3 Runtime Measurements Information measured at runtime are classified into three different categories: the execu tion time of a match step, the number of comparison operations for one-input nodes, and the distribution of wmes. All the measurements are done against production cycle num bers. A window of 20 production cycles is applied to each program, which suffices our purpose. 6.3.1 Execution time on one-input nodes Figure 6.4 shows the execution time of matching one-input nodes measured at each pro duction cycle. In the figure there are several production cycles whose execution time run joff the boundary. They are intentionally left out. Several points running off the boundary are unimportant for our purpose since we wish to show the relative performance. For Brick Sorting and Waltz, it appears that both the MRN and OPS5 show a rather regular behavior .while they maintain a reasonably constant distance between the two curves along the x- axis. For example, in the Waltz, the differences between two execution time curves for the 98 jcycle numbers 5 to 16 are relatively constant, except at the cycle numbers 3, 4, and 17. A similar behavior is also observed in Brick. This kind of proportional distance between two Lurves is important in predicting the possible outcome of the MRN approach. 20 3.0 Monkey and Banana Brick Sorting 2.5 MRN O P S 5 MRN O P S 5 16 14 < D 6 12 c S 1.5 3 0 ) X m 1.0 10 0.5 0.0 Production Cycle Number Production Cycle Number 30 N Monkeys & M Bananas Waltz Labeling 1.0 25 MRN O PS5 MRN O P S 5 0.8 0.6 ~ 15 L U 0.4 0.2 0.0 2 0 Production Cycle Number Production Cycle Number Figure 6.4: Execution time profile of matching one-input nodes. The MAB and NMAB, however, exhibit a slightly different behavior compared to ©rick and Waltz. For example, the MRN curve in MAB gives an amplification factor high- |er than the one for Brick or Waltz. This irregular behavior is due partly to the memory management policy, garbage collection, in Lisp which contributes to inaccurate perfor _______________________________________________________________________________ 99 mance measurements. We shall give a more accurate measurement shortly. Over all, it is obvious that the MRN outperforms the OPS5 in any of the four problems. 6.3.2 Number of comparison operations on one-input nodes Another statistic measured at runtime is counting the number of comparison operations. The comparison here is not meant to be the number of one-input nodes or two-input nodes but is an actual comparison operation in Lisp such as (equal x y), where x and y are atoms. To give a better understanding of what criterion is used, consider the following simple Lisp function member which checks whether an atom ‘a ’ is a member of the list ‘I’: (defun member (a I) (cond ((null I) nil) ((equal a (car I))) (t (member a (cdr I))))) Now, suppose that the function is called with the following values to the parameters: (member 1 ’(2 6 4 7 1)). It is clear that the function member will be called five times and therefore, the number of comparison operations will be five. As we noted earlier, the criterion, measuring the execution time of one-input nodes in real time, does not correctly reflect the real execution time due to the system load, the number of users, etc. We therefore introduce another criterion, measuring the number of an actual comparison operations, which would serve as a better indicator for our perfor mance measurement. This assumption has been confirmed in program running which will be discussed shortly. Figure 6.5 shows the number of comparison operations for four different production system programs. When we considered the real execution time, we discussed that the be haviors of the four programs are rather irregular. In other words, the MRN curve of MAB in Figure 6.4 (execution time curve) gave an amplification factor higher than the one for Brick or Waltz. However, that irregular behavior no longer persists, as observed in Figure 100 j6.5. This consistent behavior is due mostly the new criterion, the number of an actual com parison operations, which does not take into account the memory management policy, the system load, etc., thereby resulting in more accurate behavior of programs. It is again clear from the four plots in Figure 6.5 that the MRN outperforms the OPS5 for all four prob lems. 0000 2000 Brick Sorting Monkey and Banana - 3000 1800 MRN OPS5 C 1600 O 2 1400 S L O 1 2 0 0 c § 1000 MRN OPS5 ■ 3000 7000 - 3000 - 5000 tm s. 4000 o o 2 3000 600 - 2000 400 1000 200 20 Production Cycle Number Production Cycle Number 0000 10000 N Monkeys & M Bananas Waltz Labeling - 3000 9000 MRN OPS5 MRN OPS5 O 8000 7000 7000 - 5000 6000 - 3000 5000 4000 4000 3000 - 2000 2000 1000 1000 20 20 Production Cycle Number Production Cycle Number Figure 6.5: Number of comparison operations on one-input nodes. 101 6 3 3 Distribution of groups Figure 6 . 6 shows the runtime distribution of wmes and condition elements for four prob lems. Take the Brick Sorting, for example. At runtime, there is no wme generated for group2, group4, and group 6 . In other words, no wme with 2 AVPs, 4 AVPs, or 6 AVPS is created by any rule firing rule. Those wmes generated for Brick at runtime fall into either group3 or group5. As we can observe from Figure 6 .6 , there is a considerable amount of discrepancy between the runtime wme distribution and the compile CE distribution. 80 Monkey and Banana Brick Sorting 60 CEs at CT WMEs at RT Discrepancy CEs at CT WMEs al RT Discrepancy • O ' 60 50 50 O) 40 40 30 20 20 Group Number Group Number -i-------1-------1-------1 i N Monkeys and M Bananas CEs at CT WMEs at RT Discrepancy 3 4 5 Group Number 80 70 60 50 30 20 Waltz Labeling o " - CEs at CT -o W MEsatRT Discrepancy a. Group Number Figure 6.6: Runtime distribution of wmes. 102 For MAB, however, the situation becomes different. As we can observe from Figure 6 .6 , the discrepancy for MAB becomes much smaller compared to that for Brick. MAB and NMAB show a relatively low discrepancy whereas Brick and Waltz show a rather ligh discrepancy in terms of the wme distribution and the CE distribution. The bar chart shown in Figure 6.7 gives a global view of the runtime distribution of jwmes over groups for the four problems. Contrary to the compile time distribution of con dition elements, most of the wmes generated at runtime fall into a few distinctive groups. ! as we observe from the bar chart, all the four problems have basically two groups actively working at runtime. However, all these distribution curves are problem dependent and there is no single rule which can predict the behavior of the runtime distribution of wmes. A simple conclusion we could draw from these discrepancy plots would be that more the discrepancy there is, more the speedup there will be. We shall come back to this when we discuss the ratio of MRN to Rete. 6.4 Performance Evaluation Based on the foregoing three different types of observations, i.e., one-input match time, number of comparison operations, and distribution of wmes, we shall analyze below the performance of both approaches. 6.4.1 Comparison of MRN and OPS5 Figure 6 . 8 shows the ratio (or speedup) of MRN to OPS5 on one-input match time for four jdifferent programs. Here, the ratio means simply the comparison of two match time units for one-input nodes. Again, jc-axis is plotted against the production cycle numbers whereas y-axis indicates the ratio of two different approaches. For Brick Sorting, for example, the one-input match time of Ops5 at production cycle number 13 is about 8 times more than that of MRN. For NMAB, the one-input match time ■ o f Ops5 at the cycle 13 is about 17 times more than that of MRN. 103 Distribution of WMEs at Runtime Group 2 Group 3 E 3 Group 4 □ Group 5 Group 6 B rick MAB NMAB W a ltz Distribution of WMEs at Runtime 70 Brick MAB N MAB Waltz 60 © CO a s C 40 © 20 G roup Number Figure 6 .7: Summary on the runtime distribution of wmes. 10. 4. Even though evaluating the two approaches based on the execution time in real time shown in Figure 6 . 8 may not be accurate due to the system load, the number of users, the number of garbage collection, etc. It is clear from Figure 6 . 8 that there is a substantial speedup, ranging from 2 to over 2 0 , depending on the programs and the production cycle number. z CE L O C/5 Q . o CE 20 Ratio of One-Input Match Time ■ b Brick MAB N MAB Waltz 14 1 2 t) 8 6 4 2 0 1 0 1 2 1 4 1 6 1 8 2 0 0 2 6 8 4 Production C ycle N um ber Figure 6 .8 : Ratio of one-input match time. Figure 6.9 gives a more accurate performance measure. It uses the number of compar ison operations at each production cycle. The x-axis is plotted against the production cycle number while they-axis is again the ratio (or speedup) of the MRN-based match to Rete- based match. To closely examine the speedup curve, consider again the production cycle number 13 as we did for Figure 6 . 8 For Brick Sorting of Figure 6.9, the number of comparison operations performed by he Rete-based match at cycle 13 is about 8 times more than that by the MRN-based match, rhis speedup of 8 is exactly the same as the one we obtained from Figure 6 .8 . 105 Ratio of Number of Comparisons o— Brick ...... MAB O NMAB a Waltz 1 6 - Z DC L O in QL o o 1 0 1 2 1 4 1 6 1 8 2 0 2 6 0 4 8 Production Cycle N um ber Figure 6.9: Ratio of number of comparison operations. Now, consider NMAB of Figure 6.9. The speedup, however, becomes different from what we would expect. Closely examining the NMAB curve of Figure 6.9 at cycle 13, we jfind that the speedup is 8 ! We remember that the speedup we obtained from Figure 6 . 8 for NMAB at cycle 13 is 17. This is not surprising because measuring the real time can be af fected by many factors which we iterated several times. Nevertheless, it is clear from the two speedup curves plotted in Figure 6 . 8 and Figure j6.9 that the MRN-based match algorithm outperforms the Rete-based match algorithm. Since the objective here is to compare the performance of the two match algorithms, the two figures would suffice the stated objective. Figure 6.10 summarizes the performance of the two approaches. The thin solid line jwith small circles indicates the average ratio of two approaches based on execution time. The thin dotted line with hollow rectangles shows the average ratio of two approaches based on the number of comparison operations. The thick solid line is the average of the 106 jtwo thin average curves. When the thick line is summed and averaged along the produc tion cycle number of *-axis, it gives an eventual speedup of six. In other words, the average speedup of the MRN approach over OPS5 on one-input match time would reach to six fold for the four production programs considered in this study. 14 ----------,--------- ,--------- ,--------- ,--------- ,----------,--------- ,--------- ,--------- ,--------- ,— Average Ratio on Four Production System s 1 2 - Z t) cr 2 £ 8 CL o .2 6 1 5 lX 4 2 - 0 i i_______ I________I______ I_______ I_______ I______ I---------- 1 _______ I— 0 2 4 6 8 10 12 14 16 18 20 Production C ycle N um ber Figure 6.10: Average ratio on four production systems. 6.4.2 Discrepancy in the distribution of wmes and condition elements It is interesting to note the discrepancy between the compile time distribution of condition ^elements and the runtime distribution of wmes. By finding the discrepancy between them, jwe can more accurately locate the behavior of each production system, thereby identifying the potential speedup for a given production system program. Figure 6.11 displays all the discrepancies for the four programs. Note the discrepancy curves in Figure 6.11 and the speedup curves of Figures 6 . 8 and 6.9. Among the four dis t 1 ------- 1 ------- 1 --------1 --------1 ------- 1 --------1 --------1 ------- r Average Ratio on Four Production System s J I ___ I ___ I ___ 1 I ___ I ___ L 107 crepancy curves, the Brick curve has a high and regular behavior, which can in turn :ranslate to a high speedup. The curve for Brick in Figure 6.11 verifies this relation in which the speedup is high when the fluctuation is low. 70 60 5 0 o' < J ) S> 40 T O c < s o S > 3 ° CL 20 T > 0 0 1 2 3 4 5 6 7 G roup Num ber Figure 6.11: Discrepancy between CE and wme distribution. However, the above statement on the relation between the discrepancy and the speed up would have to be further substantiated by much more experimental results. It would be difficult to relate the discrepancy curve and the speedup curve. Most problems are runtime jdependent and a simple prediction rule would be problematic. Based on our observations, it would be concluded that if there is more discrepancy between the compile time distri bution of CEs and the runtime distribution of wmes, the production system program would lave more potential parallelism. 6.5 Summary Hie main purpose of this chapter was in evaluation of the performance of the multiple root node (MRN) approach to production systems. It has been verified that the MRN approach , , , , , ,------ Discrepancy between CE and WME Distribution Brick MAB NMAB Waltz 108 jwould give a multiplicative effect on the Rete-based production systems. The two criteria used in this study, one-input match time and number of comparison operations in a win dow of 20 production cycles, have shown that the MRN approach can indeed give 6 fold speedup on the average. All these results are based on the real time execution, none of them in this chapter is based on simulation results. The experimental results reported in this chapter are all based on the extraction of the parallelism in algorithm level. They are not related to the number of processing elements, as done in the previous two chapters. In a word, the MRN approach, even when imple mented in a single uniprocessor Sun 4/90, gave a multiplicative effect on the Rete-based production systems as demonstrated in this chapter. When many processing elements were jused, it also gave a further multiplicative effect as demonstrated in the previous two chap ters. A complete execution time of production cycles is illustrated in Figure 6.12 to help give an overall view of the two approaches. Again, the x-axis is plotted in production cycle 3.0 Brick Sorting MRN Match MRN Select MRN Act O PS5 Match O PS5 Select O PS5 Act 2.5 f f i £ 2.0 I — c o 1.5 3 o < D X LU 1.0 0.5 0.0 20 Production C ycle N um ber Figure 6.12: A complete execution time for two approaches. 109 numbers but the y-axis at this time is plotted in the total production cycle time. The total matching time is high compared to the selection time or action time as Forgy has indicated [16]. In any case, the MRN-based production system interpreter outperformed the Rete- I based production systems. 110 Chapter 7 Conclusions and Future Research Advances this thesis has made in the production system paradigm are summarized. The two approaches, the software approach in algorithm level and the hardware approach in implementation level, are summarized in an attempt to reduce the matching time of an in ference cycle. Bottlenecks of the most widely used match algorithm have been identified, jased on which a new algorithm, the MRN match algorithm, was developed in algorithm evel. In implementation level, data-flow principles of execution have been employed as architectural models. Those topics that are beyond the scope of this thesis but are important to advance the technology in parallel adaptive processing of production systems are listed as future re search topics. The first topic, implementing the MRN-based production system interpreter on Intel iPSC/2, is suggested to further verify the performance of the MRN-based inter preter on message-passing multicomputers. The second topic, a SISAL implementation of ;he MRN-based interpreter, is to identify and extract the complete parallelism profile in production systems. The third topic, hierarchical representation of production system, is intended to depart from the conventional technology of production system processing and to open the new technology in the direction of adaptive processing of production systems. I l l 7.1 Conclusions The work presented in this thesis has advanced the technology in parallel processing of AI sroduction systems in two aspects: First, it demonstrated the potential of data-flow multi processor systems for the parallel implementation of production systems. Second, it introduced a new match algorithm for AI production systems. To achieve the stated objectives, the data-flow principles of execution have been cho sen as a computational model among various execution principles, since they provide a facility to exploit the maximum potential parallelism in the problem domain. Among the data-flow principles of execution, two principles are selected as architectural models: the jdynamic micro-actor principle and the dynamic macro-actor principle. Modifications to the implementation of the basic data-flow principles of execution have been identified be- bre these can be used in an AI computation environment. The production system (or rule-based expert system) paradigm has been chosen as a senchmark of AI problems because it is one of the most widely used AI applications. As a particular instance of the production system paradigm, an OPS5-like production system interpreter has been adopted again due mostly to its wide use. Among the three steps of an inference cycle of production systems, the match step has been selected since it has the nost potential parallelism and more importantly is the most time consuming step. To demonstrate the applicability of data-flow principles of execution to symbolic com mutations, the Rete algorithm has been chosen as a benchmark of symbolic computations since it is known as the most efficient match algorithm for production systems. Inefficien cies of the Rete match algorithm when implemented in parallel machines have been identified, based on which a new match algorithm, the MRN algorithm, is introduced to better suit the multiprocessor environments. Issues related to map the MRN algorithm to multiprocessors are identified and possi- 112 lie strategies have been developed for our data-flow multiprocessor environment. Allocation of condition elements and 0 (n) iterations existing in two-input nodes to differ ent PEs have proven effective in delivering the parallelism inherent to both the Rete algorithm and the MRN algorithm. Simultaneous distribution of many wmes to many PEs at the same time would give a further speedup in a given configuration of our data-flow architecture. The MIT Tagged Token Data-flow Computer has been chosen for our micro-actor simulation model. The allocation and distribution strategies we developed were exercised in our simulation. The complete graph for a rule has been created to execute in the simu- ator model. The MRN and Rete algorithms have been successfully implemented in a data flow processing environment. To detect and estimate the different levels of parallelism in the production matching step, various simulations have been performed. Condition elements of the rule were exe cuted in parallel. Simulation results indicated that our micro data-flow multiprocessor can Ire at a rate of 1000 rules per second in the absence of conflict resolution implementation. Although a conflict resolution is not taken into account in implementing a production sys tem here, the results we obtained reveal that symbolic computations on a data-flow multiprocessor computer can indeed be processed efficiently. Comparison with conven- :ional computers has shown that a high speedup could be obtained from this approach. A macro-actor approach for AI production systems has been demonstrated as another efficient implementation tool. Characteristics of production systems from parallel pro cessing have been identified to suit the macro data-flow multiprocessor environment. A simple example on comparison operations has been presented in detail from the macro perspective. Several guidelines have been developed to form well-formed macros. To help clarify the macro approach, a condition element of a rule was converted to macro actors. The results of a deterministic simulation with a production system of 15 rules on the macro 113 data-flow simulator have revealed that the macro approach is an efficient implementation for AI production systems. Indeed, the macro approach gave 17 fold speedup out of 32 PEs used. Furthermore, our MRN-based parallel matching algorithm gave an additional speed- ip of 3, regardless of the type of a machine used. To evaluate the algorithmic level performance of the multiple root node (MRN) ap proach, a complete production system interpreter using the MRN algorithm has been implemented in Common Lisp from scratch. Several benchmark production system pro grams were chosen to evaluate the performance of the MRN match algorithm. Various experiments using the benchmark production system programs were performed on a se quential machine, Sun 4/90, in real time and various statistics were collected both at compile time and at runtime. To ensure the correctness of the performance, an important criterion, a number of comparison operations, was measured against production cycles. Performance evaluation on the two approaches, the MRN approach and the Rete-based 3PS5, were made in terms of number of comparison operations and execution time. Its performance on sequential machine, SUN 4/90, has verified that the MRN approach can give a multiplicative effect on the Rete-based production systems. The two criteria used in this study, one-input match time and number of comparison operations in a window of 20 production cycles, has shown that the MRN approach can indeed give 6 fold speedup on the average. All these results were based on real time execution and were not related to the number of processing elements, as done in the aforementioned data-flow multiproces sor environments. The MRN approach gave a multiplicative effect on the Rete-based production systems when run on a sequential uniprocessor Sun 4/90. The MRN approach would give a further multiplicative effect when many processing elements were used. In any case, the MRN-based production system interpreter outperformed the Rete-based pro duction system interpreter. 114[ 7.2 Future Research: Implementation of Production Systems on Intel iPSC/2 Multicomputer Advent of the second generation message passing multicomputers has gained much atten tion on processing AI problems due to its low cost and fast message passing capability. According to the simulation results reported recently, the message passing multicomputers can effect the large matching time inherent in production system processing [1,27]. This aromising simulation results are due partly to the fact that the recent technological advanc es in computer architecture can substantially reduce the large communication overhead incurred in message passing multicomputers. As we have seen in the previous chapters, the MRN-based production system inter preter has been developed to suit the parallel processors. Its algorithm level improvement yielded a substantial speedup even when a single and sequential processor is used. It would be interesting to further explore the performance of the MRN-based interpreter in message passing multicomputers. Our choice for the message passing computer is an Intel Hypercube iPSC/2 (intel Personal Super Computer) since it is available and can be readily used. The Intel iPSC was originally designed for numerical supercomputing. Recently, a version of Concurrent Lisp has become available for the iPSC that attempts to exploit the parallelism offered by the hypercube topology in parallel/distributed AI processing. By porting the MRN-based production system interpreter on Intel Hypercube, the fol- owing two important objectives can be achieved: (1) a parallelism profiling of the MRN- based production system interpreter and the Rete-based OPS5, and (2) verification of the claim derived from the simulation results reported by Gupta et al. [1]. When production systems are executed in a real multiprocessor (or multicomputer) environment, the runt ime behavior of nondeterministic AI production systems can be better understood, thereby identifying the source of parallelism or bottlenecks of the given problem under consider ation. Doing so would clarify the relative performance of the MRN-based production system interpreter, which in turn would be able to advance the technology in parallel/dis- ributed processing of production systems. The second objective is similar to the first objective in nature. However, it differs in hat experiences gained from an actual implementation of an AI system on a message passing multicomputer {not simulator) would help guide the right direction for prospective AI implementors who consider implementing AI systems on message passing computers. Considering that an implementation of a large and practical AI system is costly and often requires much time and effort, it would be advantageous to predict the possible outcome so as to minimize the effort involved. 7.3 Future Research: Implementation of Production Systems in SISAL Hie second topic, a SISAL (Streams and Iterations in Single Assignment Language) im plementation of the MRN-based interpreter, is intended to identify and extract the complete parallelism profile in production systems. The high-level language, SISAL, de veloped at Lawrence Livermore National Laboratory in cooperation with other institutions 'Colorado State University, University of East Anglia, University of Manchester, Digital equipment Corporation), is one of the high-level applicative languages, intended to serve as a side effect free, parallel language for multiprocessor systems [44]. In SISAL, six basic scalar types are defined: boolean, integer, real, double real, null, and character. Data structures in SISAL may be records, unions, arrays, or stream. Each basic data type has its associated set of operations, while record, union, array, and stream types are treated as mathematical sets of values just as the basic scalar types. In particular, under the forail construct, these types can be used to support identification of concurrency 116 'or execution on highly parallel processors. It is the identification of concurrency that is central to the SISAL implementation of production systems. To further clarify the objective stated above, consider a classical example, Tower of Hanoi. Since most AI search problems are nondeterministic and I/O bound, i.e., more memory accesses than arithmetic and logic computations in nature, it would be difficult to identify the parallelism in them. However, SISAL allows us to identify the potential par allelism to a certain accuracy. The Tower-of-Hanoi can be implemented in SISAL in double recursion as follows: define hanoi typetwodim = array [array [character]]; function hanoi (nrinteger; from, to, use character returns twodim) if n = 0 then array twodim Q else hanoi (n-1, from, use, to) || array [1 :array [1 :from, to]] || hanoi (n-1, use, to, from) end if end function Using one of the capabilities SISAL provides, the parallelism profile can be identified, as plotted in Figure 7.1. The top plot shows the parallelism profile of the Tower of Hanoi by using 64 processing elements. The bottom plot assumes an infinite number of process ing elements with 10 disks. The actual execution time for the bottom one is 3191, more than six times of what is shown in the figure. The main purpose of showing such parallel ism is that the AI search problems exhibit very little parallelism. As we can observe from he figure, there is a significantly small amount of parallelism in the Tower-of-Hanoi. Re gardless of the problem size and the number of processing elements used, the profiles always exhibit a similar parallelism distribution, an impulse or burst type. The main reason exhibiting such an impulse type of parallelism is that the problem it self is sequential in nature. Unlike numeric computations, moving one state to another 117 64 56 Tower of Hanoi with 64 PEs 46 3 disks 4 disks 5 disks 6 disks E o > 40 a > 32 ID a. 24 2 5 0 3 0 0 1 5 0 200 0 5 0 1 0 0 T i m e 16 0 0 14 0 0 Tower of Hanoi with Infinite PE s 1200 for 10 disks E c o 1000 8 0 0 C D a. 6 0 0 4 0 0 200 3 0 0 4 0 0 5 0 0 1 0 0 200 0 T i m e Figure 7.1: A parallelism profile of the Tower of Hanoi with 10 disks. when searching the state space requires a completion of the current state. In other words, moving one state to another can be done only when the current state is completed. A move to a next state based on the partial (or incomplete) state would likely result in a conflicting 118J or incorrect state. This conflicting state in the search space is obviously meaningless. The Durst type of parallelism profile demonstrated in Figure 7.1 clearly indicates that there something must be done to AI search problems. This is where parallel processing tech niques can take part in to improve the parallelism profile. Implementing production system interpreters in SISAL, not only can we identify the source of potential parallelism such as the one depicted in Figure 7.1 but more importantly jean make an effective use of the parallelism information. A further understanding of the problem under consideration would allow us to develop a more suitable, and/or faster matching algorithm for production systems. An efficient and faster processing of produc tion systems would in turn give us an opportunity to scale up the size of production systems which would hopefully solve the problems in real world. 7.4 Future Research: Connectionist Production Systems A new representation technique, called hierarchical representation, is introduced as one of the possible topics for future research. An objective behind developing this new repre- jsentation technique is to help develop connectionist adaptive production systems. As we have seen from the previous chapters, symbolic processing of production systems requires much processing power in terms of both the hardware and the software. By using a dra matically different approach like the one we introduce in this chapter, the symbolic production systems can be transformed into conventional numeric systems, where the search and match problem no longer persists. This transformation of symbolic systems to numeric systems would then allow us to relatively easily map the production systems to connectionist architectures, thereby being able to exploit the advantages the connectionist architectures have to offer. 119 7.4.1 Feature hierarchy in production systems In one end of the spectrum of representation techniques used in artificial neural networks (ANNs) is the local representation [17,55], where a concept is assigned to a neuron. While it is quite straightforward to implement and easy to understand its mapping, the local rep resentation reveals several major drawbacks such as difficulty in capturing information regarding variable bindings, weakness in fault tolerant computing, varying number of neurons for different problem size, and so on. On the other end of the spectrum, the distributed representation is used, which assigns a concept to a pattern of activity of many neurons [29]. This method which is often said to resemble the way in which the brain works provides a graceful degradation for faulty neu rons, thereby giving fault tolerant computing. However, it is very difficult to visualize the jmapping and to understand how it works. Without a good learning algorithm, it is almost impossible for a large problem size to be appropriately trained for practical applications [14]. To overcome the difficulties in local and distributed representations, we introduce a hierarchical representation which attempts to combine the two techniques at different lev els of information hierarchy. In the high level of the information hierarchy that we obtain from the problem domain, the local representation is used to partition the problem while in the low level, the distributed representation is used to implement the mapping of neu rons. A set of features we derive from the production system domain is listed below in order of decreasing importance: •/j: A number of AVPs in a pattern • / 2: Similarity in attribute parts of patterns • f y Similarity in value parts of patterns 120 The value parts in the third feature include not only the constants assigned to attributes 5 ut also variables. The three fe aturesabo ve are used as criteria to partition the pat ents into groups of patterns. A tree for each feature /• is constructed from the groups of patterns. Merging three trees into one, we obtain a partitioned pattern tree (PTT). Upon constructing PPT, we build a 3-dimensional space 91, called feature space, where a pattern p i of the production system can then be uniquely defined as a pattern vectorP[if\,f2^ 3)- 7.4.2 Representing PM and WM in feature space Letp^=\_{a^ Vj),...,(an vn)] be a pattern (condition pattern, action pattern, or wme) in a pro duction system, where a^ is an attribute and v^ is a fixed value or a variable. Suppose that a production system has m patterns, P= {pp.-.,pn,}. We shall define the three features, as f°9ows: (l)Given a patternpy=[(a^ V j ) , . . . , ( a n vn)], it has n number of AVPs, de noted by r)(Pi)=«- (2)Two patternsp i = [ ( « 1 V j),...,^ vn)] andpj=[(a1' Vj%..,(an' vn')] are said to be attribute similar, denoted byPf*Py if Vi, a~ a{. (3)p x andpj are said to be value similar, denoted by P\=Py if Vi, v— v^. From (2) and (3), we define thatp^ is equivalent to O j, denoted by Pj=Pj if Pf=Pj and P\-Py The first feature/j partitions P into n subsets P 1 ? ...,Pn such that for p^EP^, y\(Pi)=k for all i. After partitioning, each patternpj in P k will have k AVPs. The second feature / 2 par titions P into m subsets P^,...r Pm such that forPj,PjEPk,Pj*pj V ij, i*j. After partitioning, jatterns in P k will be similar in attributes. The third feature / 3 which uses the value simi larity measure partitions P into r subsets P 1 ,...rPf such that for Pj,Pj(EPk, P jsPj V ij, i*j. After partitioning, patterns in P k will be similar in values. 121 Consider two rules listed below: Rule 1 Rule 2 P l: [(a 1 ) (b 2 )] p 2: [(a 1) (b 3)] P y [(p 1) (q 2) (r 3)] p 4: [(p 2) (q 3) (r 3)] p 5: [(a 3) (q 2)] Per [(t 3)] [Make (c 1) (d 2)] [Remove 1st pattern] Let P={P\,—,P(} be the six condition patterns. The first feature /j partitions P into {P6}> i P i V 5 } > and {PyPjt since r|(p 6 )= l, r](p1)=r](p2)=rj(p5)=2, and rK>3) =r\(pA )=3, as depicted in Figure 7.2(a). Using f 2, we obtain {w6}, {P5 }, and {p3 ,p4}, as shown in Figure 7.2(b), sincep^p\^Pz^Ps^PiTPa- The third featu re^ partitions P into {Pl}> \P t}’ {P3}’ {P4}’ {Ps}> and {P6} > as shown in Figure 7.2(c). Numbers on arcs indi cate the group a pattern belongs to. Merging (a), (b), and (c) of Figure 7.2 yields a Dartitioned pattern tree, as depicted in Figure 7.2(d). Once the partitioned pattern tree is built, the assignment process for each pattern to the feature space is automatic. The numbers next to arcs in Figure 7.2 are assigned to coordi nate values (/ 1 / 2 / 3 ) ° f the feature space. From Figure 7.2(d), we obtain the following shows the mapping of six patterns to corresponding points in the 2 -dimensional feature space 9t2 (the third feature / 3 is not shown for the simplicity of presentation). It is interesting to note in Figure 7.3 that those patterns such asjplsp 2 that have the same coordinates: p^(2,2,2), p 2(2,2,3), p 3 (3,4,5), p 4 (3,4,6), p 5 (2,3,4), and/? 6 ( l,l,l) . Figure 7.3 lumber of AVPs and are similar in attributes form a cluster in the 2-dimensional feature space. It is this cluster that we use to match patterns over wmes. 122 1 lllllllll 3 IIIIM H l|]||||lH " PM 1 ii iii im iiI 2 H IH K Iljllllllll- Figure 7.2: Partitioned Pattern Tree (PTT) for PM. (a)/j partitions P into three groups, (b) / 2 into four groups, (c) fy into six groups, (d) All three subtrees in (a),(b),(c) are merged to form a partitioned pattern tree. Numbers next to arcs uniquely define each pattern in the feature space. 123 Production Memory Feature Space 912 Figure 7.3: A production memory in 2-dimensional feature space. We are not going to implement this hierarchical representation in the proposed archi- ecture. Instead, we shall briefly outline the way to implement the pattern matching procedure. Suppose that all the condition patterns are partitioned into c clusters, Cj,...,Cc, in the 2-dimensional feature space 9t2. A cluster Ci which represents a set of particular pat tern vectors is assigned to a unit network JV j, consisting of m neurons «j,...,nm. The coordinate value f°r £j> denoted as can then be trained in a unit network N\ such that when a wme W j is presented to N if the corresponding cluster can be activated )y N^. By building a set of i=l,...,c and training them, we will be able to achieve 0(1) pattern matching time. Furthermore, by doing this the heuristics which enable the good or optimal selection of choice points in the search space can be efficiently represented in |erms of energy function. Selection of a choice point in the context of search tree is basi cally the same as an optimization process in the context of artificial neural networks. 124 Appendix Complete Lisp Codes for the MRN-based Production System Interpreter « « ^ ^ ^ ^|g ^ ^ ^ ^ ^ ^ ^*|# ^ ^ ^ ^ « |g ^ ^ ^ ^ ^ 2 |g j | g ^ )jg jjg ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ;;;; The MRN-based Production System Interpreter, written by Andrew Sohn, June 1991. ;;;; Minor differences between MRN-based and OPS5 ;;;; 1. Productions are not compiled when loaded, i.e, the function “p” is not called. ;;;; 2 . {} in a program file is replaced by Q. ;;;; 3. Space is inserted between A and attribute: A atr — * A atr 1... *;;; There is one and only one global variable: *number* to trace wme numbers generated. J ;;;; Notes on the documentation: ;;;; 1 . signals the beginning of a major function: compile pm, match, select, act. ;;;; 2 . signals the beginning of minor functions within a major function. ;;;; 3. Source codes are in courier, comments are in roman. ;;;; 4. Functions related to performance analysis are not included due to space constraint. * fl ■ ■ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ j|a a^a a|a a|a ^ ^ <]|a jja ^ j|a a|^ j|a j|a ^ ^ a a|a aja a|g j|a a|a J^a a|g ^ ^ j|a tja !(in - p a c k a g e :mrn) ****** ********************** ************ ****** ************ ************ ******** ;;;; DATA STRUCTURES for rules, condition elements, actions, and working memory elements. • J * •* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * it * * * * * * * a ( : * * s g : * * * * * * # * i f . '(d ef s t r u c t r u l e name no co n d a c t v a r s n e g a te d t t - l a s t - f i r e d l a s t - i n s t ) ( d e f s t r u c t c e n o - i n - r u l e name e v a r-n a m e n o - o in s t y p e o i n s n o - v a r s v a r s o t h e r - a t r m em -changed v a r -to -c m p a t r - t o - c m p a v p -to -c m p o p -to -c m p omem tmem) ( d e f s t r u c t a e op name ev a r-n a m e n o - i n - r u l e a v p s) '(d ef s t r u c t wme t a g no t y p e name a v p s ) ;no=wme no in the order created in time !(s e tq *num ber* 1) ;global variable to trace wme numbers "”*********************************************************************************** ;;;;THE SYSTEM BEGINS HERE. ""*********************************************************************************** '(d efu n p r o d u c t io n - s y s t e m s (wm pm) 125 ( s e t q 1 ( p r e - p r o c e s s i n g u n pm )) ;l=w-p-ga-gl ( s e t q p m -t t ( p r o d u c t i o n - c y c l e ( f i r s t 1 ) ( s e c o n d 1 ) ( t h i r d 1))) ( p o s t - p r o c e s s i n g ( f i r s t p m -tt) ( s e c o n d p m -t t ) ( f o u r t h 1))) ;;PRE-PROCESSING: compile an initial wm, pm, and extract necessary information from pm, etc. (d e fu n p r e - p r o c e s s in g (wm pm) ( s e t q wm (c o m p ile -w o r k in g -m e m o r y wm) ) ;Compile an initial wm in strut ( s e t q n o - o f - r u l e s ( le n g t h pm )) ( s e t q p m -a r r a y (c o m p ile -p r o d u c tio n -m e m o r y n o - o f - r u l e s p m )) ( s e t q in d e x ( - n o - o f - r u l e s 1 ) ) ;index starts from 0 ( s e t q g r o u p - l i s t ( g r o u p - c e s - u s in g - l e n g t h in d e x p m -a r r a y )) ( s e t q g r o u p -a r r a y (r e a r r a n g e -g r o u p s g r o u p - l i s t ) ) ;Retums an array ( s e t q p m -a rra y ( r e s e t - f l a g s p m -a r r a y ) ) ;Reset ce-mem-changed to NIL ( l i s t wm p m -a r r a y g r o u p -a r r a y g r o u p - l i s t ) ) ;;PRODUCTTON CYCLE: main cycle begins here. (d e fu n p r o d u c t io n - c y c le (wm pm g r o u p ) ( p r o d - c y c l e wm () pm g ro u p 1 ) ) (d e fu n p r o d - c y c l e (wm c s pm g r o u p s t t ) ( l e t ( (w m -cs-pm ( p r o d - c y c le - 1 wm c s pm g r o u p s t t ) ) ) ;tt=time tag forwmes ( l e t ( (wm ( f i r s t w m -cs-p m )) ( c s ( s e c o n d w m -cs-p m )) (pm ( t h i r d w m -cs-p m ))) (co n d ( ( n u l l c s ) ( l i s t pm t t ) ) ( ( = t t 2 0 ) ( l i s t pm t t ) ) ;Stop for collecting data and debugging ( t ( p r o d - c y c le wm c s pm g r o u p s (+ t t 1 ) )) ))) ) (d e fu n p r o d - c y c l e - 1 (wm c s pm g r o u p s t t ) ( s e t q pm ( m a t c h in g - s t e p wm g r o u p s p m )) ( s e t q s r - c s ( s e l e c t i o n - s t e p pm )) ;sr=selected-rule ( s e t q nwm-pm ( a c t i o n - s t e p ( c a r s r - c s ) (s e c o n d s r - c s ) pm t t ) ) ( l i s t ( f i r s t nwm-pm) (s e c o n d s r - c s ) ( s e c o n d nwm -pm ))) (d e fu n m a t c h in g - s t e p (wm g r o u p s pm) (p r in t-g r o u p -o n -n e w -w m wm) ( d is t r ib u t e - w m e s wm g r o u p s pm)) (d e fu n s e l e c t i o n - s t e p (pm) ( s e t q c s ( c o l l e c t - m a t c h e d - r u l e s pm )) ( s e t q s e l e c t e d - r u l e ( s e l e c t - a - r u l e - f r o m - c s c s ) ) ( l i s t s e l e c t e d - r u l e c s ) ) (d e fu n a c t i o n - s t e p ( s e l e c t e d - r u l e c s pm t t ) ( s e t q new-wm ( p r e p a r e - w m e s - t o - f ir e - r u l e s e l e c t e d - r u l e t t ) ) ( p r i n t - i n f o - o n - a - s e l e c t e d - r u l e s e l e c t e d - r u l e t t ) ; ( p r in t - w m - c s - s e le c t e d - r u le - n e w - w m pm c s s e l e c t e d - r u l e new-wm t t ) ; d b x ( s e t q pm ( u p d a t e - s e l e c t e d - r u l e s e l e c t e d - r u l e pm t t ) ) ( s e t q pm ( r e s e t - f l a g s pm)) ( l i s t new-wm pm)) ;;POST PROCESSING: printing out some statistics,... (d e fu n p o s t - p r o c e s s i n g (p m -a rra y t t g r o u p - l i s t ) (fo r m a t t "~%end — no p r o d u c t io n tru e-% " ) ( p r i n t - s t a t i s t i c s g r o u p - l i s t p m -a rra y t t ) ) ****************************************************************************** ;;;;MATCH1NG STEP: match oins and storing are done in this step. ;;;;For functions related to matching, the parameters are in the following order: (wme ce) ;;;;wmes are stored in the order received, ((most recent)... (least recent)) ;;;;match-oins sets ce-mem-changed, match-tins sets the flag INSTD if all matched 126 '(d efu n d is t r ib u t e - w m e s (wm g ro u p s pm) ;Match oins, then match tins j ( m a t c h - t in s (m a t c h -o in s wm g r o u p s pm 'm in e ) ) pm) f * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 4 c * * * * * * * * * * * * * * * * * * * ;; MATCH ONE-INPUT NODES * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * '(d efu n m a tc h - o in s (wm g - a r y pm f ) ;g-ary=group array, f=flag (c o n d ( ( n u ll wm) pm) ( t (m a tc h -o in s (c d r wm) g - a r y ( s e n d -a -w m e -to ( c a r wm) g - a r y pm f ) f ) ) ) ) ;;array index and wme-avps have the same indexing scheme. ;;Groups[0]=(gl), [l]= (gl g2), [2]=(gl g2 g3), g*=«m cn) ...) '(d efu n s e n d -a -w m e -to (wme g ro u p s pm fla g ) (co n d ((equal f l a g 'm in e ) ;Use my distribution strategy ( s e n d - t o wme ( a r e f g r o u p s ( le n g t h (w m e-avps wme))) pm f l a g ) ) ( t ( s e n d - t o wme ( a r e f g r o u p s ( - ( a r r a y - t o t a l - s i z e g r o u p s ) 1 ) ) pm f l a g ) ) ) ) ;;Send a wme to a particular ce of a particular rule, designated in g. '(d efu n s e n d - t o (w q pm f la g ) ;g=((l 2) (2 4) ...), (rule-noce-no) (co n d ( ( n u l l g) pm) ( t ( s e n d - t o w (c d r g) ( s e n d - t o - a - r u l e w ( c a a r g ) ( s e c o n d ( c a r g ) ) pm f la g ) f l a g ) ) ) ) ;;m=0...n-l (an array index), cn=l...m (an element position in the list) '(d efu n s e n d - t o - a - r u l e (w rn cn pm f la g ) ( s e t q n e w -c e s ( s e n d - t o - a - r u l e - 1 w ( r u le - c o n d ( a r e f pm r n ) ) cn f l a g ) ) ( s e t f ( r u le - c o n d ( a r e f pm r n ) ) n e w -c e s ) pm) ;Retum a modified pm (d e fu n s e n d - t o - a - r u l e - 1 (w c e s cn f la g ) (co n d ((equal (w m e-ty p e w) '+ ) ( a d d -w m e -to -c e s 1 w c e s () cn f l a g ) ) (t ( r e m o v e -w m e-fro m -c e s 1 w c e s () c n f l a g ) ) ) ) j;************************************************************************************ "Match and remove wmes. Given (cel ce2 ce3 ... ce-n) and i=3, it starts removing wme from ce3 to ce-n ;;since cc4 ... ce-n might also have wme in the tmem. •;************************************************************************************ (defun r e m o v e -w m e -fr o m -c e s ( i wme c e s new p o s f la g ) (co n d ((equal f l a g 'm in e ) (r e m o v e -w m e -fr o m -c e s-m in e i wme c e s p o s new)) ( t (r e m o v e -w m e -fr o m -c e s-o p s 5 wme c e s 1 new)))) ;;i=predefined ce position. When it hits the ce, it starts removing from omem. ,;It scans rest of the ces in the rule to remove from tmem. (defun r e m o v e -w m e -fr o m -c e s-m in e ( i w c e s p o s new) (co n d ( { n u ll c e s) ( r e v e r s e new)) ((= i p o s ) (r e m o v e -w m e -fr o m -c e s-m in e (+ i 1) w (c d r c e s) p o s (c o n s (rem o v e -w m e-fro m -c e w ( c a r c e s) p o s ) new))) ((> i p o s ) (r e m o v e -w m e -fr o m -c e s-m in e (+ i 1) w (c d r c e s) p o s (c o n s ( rem ove-w m e-from -tm em w ( c a r ces) p o s ) new))) ( t (r e m o v e -w m e -fr o m -c e s-m in e (+ i 1) w (c d r c es) p o s (c o n s ( c a r ces) n e w ))))) (d e fu n r e m o v e -w m e -fr o m -c e s -o p s 5 (wme c e s n new) ( l e t ( ( c e s (r e m o v e -w m e -fr o m -c e s -o p s 5 -l wme c e s n ()))) (co n d ( ( n u l l ces) ( r e v e r s e new)) ( t ( r e m o v e -w m e -fr o m -c e s-o p s5 127 wme (c d r ces) (+ n 1) (c o n s ( c a r c e s) new)))))) '(d efu n rem o v e-w m e-f r o ih - c e s - o p s 5 - l ( w i n new) (c o n d ( ( n u l l 1) ( r e v e r s e new)) ( (com p are-ce-w m e ( c a r 1) w) ( r e m o v e -w m e -fr o m -c e s -o p s 5 -l w (c d r 1) n (c o n s (rem o v e -w m e-fro m -c e w ( c a r 1) n ) new))) ( t ( r e m o v e -w m e -fr o m -c e s -o p s 5 -l w (c d r 1 ) n (c o n s (rem ove-w m e-from -tm em w ( c a r 1) n ) ne w ))))) ;Retum a modified ce, remove w from omem and tmem. (d e fu n r e m o v e-w m e-fro m -ce (w c e n) J (rem ove-w m e-from -tm em w ( rem ove-w m e-from -om em w c e ) n ) ) ;;Check if (nth of the merged)=w. Remove the merged if the merged contains w. d e fu n rem o v e-w m e-f rom-omem (w c e ) ( s e t q l e n g l ( le n g t h (ce-om em ce))) ; Order is IMPORTANT, here! ( s e t f (ce-om em c e ) ( rem ove-w m e-from -om em -1 w (ce-om em c e ) ())) ( s e t q le n g 2 ( l e n g t h ( ce-om em c e ))) (an d (> l e n g l le n g 2 ) ( s e t f (ce-m em -ch a n g ed c e ) ft)) ce) ;retum ce (d e fu n rem o v e-w m e-f rom -om em -1 (w omem new) (co n d ( ( n u l l omem) new) ( (tw o-w m es-sam e w ( c a r omem)) (rem ove-w m e-from -om em -1 w (c d r omem) new)) ( t ( rem ove-w m e-f rom -om em -1 w (c d r omem) (c o n s ( c a r omem) new))))) ,;(setq tmem ’((((a 1) (b 1» ((a 1) (b 3))) (((a 1) (b 3)) ((a 1) (b 2))))) ;;(remove-merged ’((a 1) (b 2)) tmem 2 0 ) — > ((((A 1) (B 1)) ((A 1) (B 3)))) (defun rem ove-w m e- from -tm em (w c e n) ( s e t f (ce-tm em c e ) (rem ove-w m e- from -tm em -1 w (ce -tm e m c e ) n ())) c e ) (d e fu n rem ove-w m e- from -tm em -1 (w tmem n new) (c o n d ( ( n u l l tmem) new) ; (reverse new)) 6/27 ( ( w - is - in - t m e m w (car tmem) n) ( rem ove-w m e- fro m -tm em -1 w (c d r tmem) n new) ) ( t (rem ove-w m e-from -tm em -1 w (c d r tmem) n (c o n s ( c a r tmem) new))))) (defun w - is - in - t m e m (w m erged-wm e n ) ;Retums T of NIL (tw o-w m es-sam e w ( g e t - n t h - e l e m e n t n m erg ed -w m e))) (d e fu n tw o-w m es-sam e (w l w 2 ) ;AVPs can be in any order. (an d ( e q u a l (wme-name w l ) (wme-name w2)) ( t w o - a v p - l i s t s - s a m e (w m e-avps w l ) (w m e-avps w2)))) (two-avp-lists-same ’((a 1) (c 3) (b 2)) ’((b 2) (a 1) (c 3))) --> T |{ d e fu n t w o - a v p - l i s t s - s a m e (w l w 2 ) (co n d ((and ( n u l l w l ) ( n u ll w2))) ( (n o t (member ( c a r w l ) w2 : t e s t 'equal)) n i l ) (t ( t w o - a v p - l i s t s - s a m e (r e m o v e -a v p ( c a r w l ) w l ) (r e m o v e -a v p (car w l ) w2))))) ;(remove-avp ’(c 3) ’((a 1) (b 2) (c 3) (d 4))) ~> ((A 1) (B 2) (D 4)) (defun r e m o v e -a v p (a 1) (rem o v e— 1 a 1 ())) (d e fu n re m o v e -1 (a 1 12) (co n d ( ( n u l l 1) 12) ((equal a ( c a r 1 ) ) (ap p en d ( r e v e r s e 1 2 ) (c d r 1))) (t (re m o v e -1 a (c d r 1) (c o n s ( c a r 1) 12))))) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ;:Match a wme to a ce and then store in omem if matched. 128 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * d e fu n a d d -w m e -to -c e s ( i wme c e s new n f l a g ) (c o n d ( ( e q u a l f l a g 'm in e ) (a d d -w m e -to -c e s -m in e i wme c e s new n ) ) ( t ( a d d - w m e -t o - c e s - o p s 5 i wme c e s new n ) ))) (d e fu n a d d -w m e -to -c e s -m in e ( i wme c e s new n ) (co n d ( ( n u l l c e s ) ( r e v e r s e n ew )) ( (a n d (= i n) (com p are-ce-w m e ( c a r c e s ) wm e)) ( append ( r e v e r s e new) (c o n s (sto r e -w m wme ( c a r c e s ) ) (c d r c e s ) ) ) ) ( t (a d d -w m e -to -c e s -m in e (+ i 1) wme (c d r c e s ) (c o n s ( c a r c e s ) new) n)))) '(d efu n a d d -w m e -to -c e s -o p s 5 ( i wme c e s new n) (co n d ( ( n u l l c e s ) ( r e v e r s e n ew )) ((c o m p a r e-c e-w m e ( c a r c e s ) wme) ( a d d - w m e -t o - c e s - o p s 5 (+ i 1) wme (c d r c e s ) (c o n s (sto r e -w m wme ( c a r c e s ) ) new ) n ) ) ( t ( a d d -w m e -to -c e s -o p s 5 (+ i 1) wme (c d r c e s ) (c o n s ( c a r c e s ) new) n)))) ’ (d e fu n s to r e -w m (w c ) ( s e t f (ce-om em c ) (c o n s w (ce-om em c ) )) ( s e t f (ce-m em -ch a n g ed c ) ' t ) ;mem modified ( and (= (c e - n o - i n - r u l e c ) 1) ;For only first ce, i.e., CE1 only ( s e t f (ce-tm em c ) (c o n s ( l i s t w) (ce-tm em c)))) c ) ;Retum the modified ce. ;The match (ce,wme) is done w.r.t. a CE, NOT a wme. ;(d e fu n com p are-ce-w m e ( c e wme) (co n d ( ( e q u a l (ce-n a m e c e ) (wme-name wm e)) ( c o m p a r e -c e -w m e -1 ( c e - o i n s c e ) (w m e-avps w m e))) ( t n i l ) ) ) |(d efu n c o m p a re-ce-w m e-1 ( o in s w avp s) ( c o n d ( ( n u l l o i n s )) ;If ce is exhausted, all MATCHED. ( ( n o t ( c o m p a r e -c e a v p -w a v p s ( c a r o i n s ) w a v p s )) n i l ) ( t (co m p a r e -c e -w m e-1 (c d r o i n s ) w a v p s ) ) ) ) ;; (compare-ceavp-wavps ’(al null) ’((a2 v2) (a3 v3))) — > T, ’(al null) ’((al v l) (a2 v2))) — > NIL |(d e fu n c o m p a r e -c e a v p -w a v p s ( c e - a v p w avp s) ( c o n d ( ( n u l l w a v p s) n i l ) ;If wavps is exhausted, NO match ((m em ber (s e c o n d c e - a v p ) ' ( n u l l n i l ) ) ;If ce-val is null or nil, ( c e - a t t r - n o t - in - w m e c e -a v p w a v p s)) ( (co m p a r e -c e a v p -w a v p c e -a v p ( c a r w a v p s ) ) ) ( t (co m p a r e -c e a v p -w a v p s c e - a v p (c d r w a v p s ) ) ) ) ) ;;(ce-attr-not-in-wme ’(a3 nil) ’((al v l) (a2 v2) (a3 nil))) — > T |(d e fu n c e - a t t r - n o t - i n - w m e ( c e - a v p w avp s) ( co n d ( ( n u l l w a v p s)) ;Exhausted, i.e. does not exist. ( ( e q u a l ( c a r c e - a v p ) ( c a a r w a v p s )) ;If exists, then val are the same? (member (s e c o n d ( c a r w a v p s)) ' ( n u l l n i l ) ) ) ( t ( c e - a t t r - n o t - in - w m e c e - a v p (c d r w a v p s ) ) ) ) ) ;Compare a wme-avp and a ce-avp. (compare-avp ’(a 1) ’(a <var>)) — > T '(d efu n co m p a re -c ea v p -w a v p ( c e - a v p w m e-avp) (c o n d ( ( c o m p a r e -a t r -p a r t ( c a r w m e-avp) ( c a r c e - a v p ) ) ( c o m p a r e -v a l- p a r t (c d r w m e-avp) (c d r c e - a v p ) ) ) ( t n i l ) ) ) 129 (d e fu n c o m p a r e - a tr - p a r t ( a l a 2 ) (e q u a l a l a 2 ) ) (d e fu n c o m p a r e - v a l- p a r t (wv c v ) (c o n d ( ( l i s t p ( c a r c v ) ) ( c o m p a r e -v a l— p a r t - 1 wv ( c a r c v ) ( le n g t h ( c a r c v ) ))) (t ( c o m p a r e - v a l- p a r t - 1 wv c v ( le n g t h cv))))) (d e fu n c o m p a r e - v a l- p a r t - 1 (wv c v n) (c o n d ( ( = n 1) ( e a s e l ( c a r wv) ( f i r s t c v ) )) ((= n 2) ( c a s e 2 ( c a r wv) ( f i r s t c v ) (s e c o n d cv))) ((= n 3) ( c a s e 3 ( c a r wv) ( f i r s t c v ) (s e c o n d c v ) ( t h i r d c v ) )) (t ( c a s e 2 - r e p e a t ( c a r wv) cv)))) (d e fu n e a s e l (w c ) ;w=val-part of wme, c=val-part of ce (c o n d ( ( v a r i a b l e c ) ( l i s t w c ) ) ;(holds <ol>), no comparison necessary, ( ( e q u a l w c ) w) ; Constant, simply compare ( t n i l ) )) ;;Compare (holds banana) and (holds <> apple) or (holds <> <obj>) (d e fu n c a s e 2 (w op c ) ;w=from wme, op=operator from ce, (co n d ( ( v a r i a b l e c ) w) ;Retumsw ( t ( c o m p a r e -w ith -o p w op c )))) (d e fu n c a s e 2 - r e p e a t (w c - l i s t ) ;(case2-repeat ’banana ’(<> null <> <ol>)) (c o n d ( ( n u l l c - l i s t ) w) ;w satisfies (....) ( ( n o t (c o m p a r e -w ith -o p w ( f i r s t c - l i s t ) (s e c o n d c - l i s t ) ) ) n i l ) ( t ( c a s e 2 - r e p e a t w (c d d r c - l i s t ) ) ) ) ) ;;(holds <ol> <> <o2>) (holds <vl> > 1) (d e fu n c a s e 3 (w c l op c 2 ) ;w =from wme, o p = o p e r a to r from c e , (c o n d ((a n d ( v a r i a b l e c l ) ( v a r i a b l e c2))) ((a n d ( v a r i a b l e c l ) ( n o t ( v a r i a b l e c 2))) ( c o m p a r e -w ith -o p w op c2)) (t (c o m p a r e -w ith -o p w op c l ) ) ) ) ;Retumsw ;;(compare-with-op 3 ’>= 4) --> NIL, (compare-with-op ’this ’<> ’this) — > NIL (d e fu n c o m p a r e -w ith -o p (w op c ) (c o n d ( ( e q u a l op '< > ) ( n o t (e q u a l w c))) (t ( f u n c a l l op w c)))) (d e fu n c a s e l - l i s t (w c - l i s t ) (c o n d ( ( n u l l c - l i s t ) w) ;w satisfies (....) ((c o m p a r e -w ith -o p w ( f i r s t c - l i s t ) ( s e c o n d c - l i s t ) ) ( c a s e l - l i s t w (c d d r c - l i s t ) ) ) ( t n i l ) )) ;;Retums T if an atom is in the form of "<**»>" ( d e fu n v a r i a b l e ( a ) ;Atom: possibly in <var> form. (c o n d ((nu m berp a ) n i l ) ((a n d ( s e a r c h "<" ( s t r i n g a ) ) ( s e a r c h ”>" ( s t r i n g a))) (> ( le n g t h ( s t r i n g a )) 2)))) ;T if it has more than 2 chars. •;************************************************************************************ ;;MATCH TWO-INPUT NODES • ' * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ♦ * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * (d e fu n m a t c h - t in s (pm) ( m a t c h - t in s - 1 ( a r r a y - t o t a l - s i z e pm) pm )) (d e fu n m a t c h - t in s - 1 ( i pm) (c o n d ( ( = i 0) pm) ( t ( m a t c h - t in s - 1 ( - i 1) ( m a t c h - t i n s - f o r - a - r u l e ( - i 1) pm))))) (d e fu n m a t c h - t i n s - f o r - a - r u l e ( i pm) ( s e t f ( a r e f pm i ) ( m a t c h - t i n s - f o r - a - r u l e - 1 ( a r e f pm i ) ) ) pm) 130 (d e fu n m a t c h - t i n s - f o r - a - r u l e - 1 (p ) ( s e t q l e f t ( c a r ( r u le - c o n d p))) ( s e t q c o n d ( s c a n - c e s l e f t (c d r ( r u le - c o n d p ) ) ( l i s t l e f t ) ) ) ( s e t f ( r u le - c o n d p) co n d ) p) ;Retums a production, p. ;;Retums new cond elements. Skip if no wmes matched this CE and type is + Process if type is For - ces, merge-all-comb even if there is no wmes in RM. (d e fu n s c a n - c e s ( l e f t c e s new) (c o n d ( ( n u l l c e s ) ( r e v e r s e n ew )) ( ( o r (ce-m em -ch a n g ed l e f t ) (ce-m em -ch a n g ed ( c a r ces))) ( s c a n - c e s ( c a r c e s ) (c d r c e s ) (c o n s ( s c a n - c e s l l e f t ( c a r c e s ) ) n e w ))) ( t ( s c a n - c e s ( c a r c e s ) (c d r c e s ) (c o n s (c a r c e s ) new))))) ;;Modifies the tmem of this rule. (d e fu n s c a n - c e s l (p r e v c u r r ) ;prev>CEl, curr>prev in order ( w i t h - o n e - c e (ce-tm em p r e v ) c u r r ) ) (d e fu n w i t h - o n e - c e (lm c u r r ) ( s e t q wmes (ce-om em c u r r ) ) . ( s e t q l a s ( c e - o t h e r - a t r c u r r ) ) ( s e t q r a s ( c e - a t r - t o - c m p c u r r ) ) ( s e t q o p s ( c e - o p - t o - c m p c u r r ) ) ( s e t q v a r s ( c e - v a r - t o - c m p c u r r ) ) ;This is for merge-all-comb ONLY! ( s e t q n ( - ( c e - n o - i n - r u l e c u r r ) 1 ) ) ( s e t q t y p e ( c e - t y p e c u r r )) ( s e t q m - l i s t ( p r e p a r e - a - lis t - o f - m e r g e d - w m e s t y p e c u r r lm wmes v a r s l a s r a s o p s ) ) ( s e t f (ce-m em -ch a n g ed c u r r ) (m o d ify -m e m -c h a n g e d -fla g m - l i s t ) ) ( s e t f (ce-tm em c u r r ) (m o d ify -tm em t y p e m - l i s t (ce-tm em c u r r ) ) ) c u r r ) ;Retums a right CE ;;If m-list is NOT empty, then set the mem-changed flag to TRUE. (d e fu n m o d ify -m e m -c h a n g e d -fla g ( m - l i s t ) (co n d ( m - l i s t t ) ( t n i l ) )) (d e fu n m o d ify -tm em ( t y p e m - l i s t tmem) (c o n d ( ( e q u a l t y p e '+ ) (ap p en d m - l i s t tm em )) ( t m - l i s t ) )) ;;Match tins and return a list of merged wmes if any. (d e fu n p r e p a r e - a - lis t - o f - m e r g e d - w m e s ( t y p e c u r r lm wmes v a r s l a s r a s o p s ) (co n d ( ( e q u a l t y p e '+ ) ( p o s - c e lm wmes v a r s l a s r a s o p s ) ) ( t ( n e g - c e lm wmes v a r s l a s r a s o p s ) ) ) ) (d e fu n p o s - c e (lm wmes v a r s l a s r a s o p s ) (co n d ( ( n u l l v a r s ) (m e r g e -a ll-c o m b lm wmes ())) (t (rem ove ' n i l (com p-lm -w m es lm wmes l a s r a s o p s ( )) ) ) ) ) (d e fu n n e g - c e (lm wmes v a r s l a s r a s o p s ) (c o n d ((a n d ( n u l l v a r s ) ( n u l l w m es)) (m e r g e -a ll-c o m b lm ( l i s t (m ake-wm e)) ())) ((a n d ( n u l l v a r s ) wmes) n i l ) ((a n d v a r s ( n u l l w m es)) (m e r g e -a ll-c o m b lm ( l i s t (m ake-w m e)) ())) (t ( n e g - c e - 1 lm wmes l a s r a s ops)))) (d e fu n n e g - c e - 1 ( lm wmes l a s r a s o p s ) ( l e t ((lm ( c o m p -lm -w m e s -fo r -n e g -c e lm wmes l a s r a s o p s ) ) ) 131 (co n d ( ( n u l l lm ) n i l ) ( t ( m e r g e -a ll-c o m b lm wmes ( )) ))) ) (d e fu n c o m p -lm -w m e s -fo r -n e g -c e (lm wmes l a s r a s o p s ) (co n d ( ( n u l l wmes) lm) ( t (c o m p -lm -w m e s -fo r -n e g -c e (c o m p -lm -w m e -fo r -n e g -c e lm ( c a r wmes) l a s r a s o p s ( ) ) (c d r wmes) l a s r a s ops)))) ;;Returns a list of modified-lm and w (or nil) depending on the match condition. '(d efu n com p -lm -w m e-f o r - n e g - c e (lm w l a s r a s o p s new) (c o n d ( ( n u l l lm ) new) ( ( co m p -lw -rw ( c a r lm ) w l a s r a s o p s ) (c o m p -lm -w m e -fo r -n e g -c e (c d r lm ) w l a s r a s o p s n ew )) ( t (c o m p -lm -w m e -fo r -n e g -c e (c d r lm ) w l a s r a s o p s (c o n s ( c a r lm ) new))))) (d e fu n m e r g e -a ll-c o m b (lm wmes new) (co n d ( ( n u l l wmes) new) ( t (m e r g e -a ll-c o m b lm (c d r wmes) (a p p e n d (m -a -c lm ( c a r wmes) ( ) ) new))))) ;(m-a-c ’((wl w2) (w3 w4 w 5» ’w6 ()) ~> ((Wl W2 W6) (W3 W4 W5 W6)) '(d efu n m -a -c (lm wme new) (co n d ( ( n u l l lm ) new) ( t (m -a -c (c d r lm ) wme (c o n s (ap p en d ( c a r lm ) ( l i s t wm e)) new))))) ;;Retum those wmes matched in the IMMEDIATE previous ps cycle. ’ (d e fu n g e t-n -w m e s (omem n new) ;(get-n-wmes’(1 2 3 4 5) 2 ())-> (1 2) (co n d ( ( n u l l omem) ( r e v e r s e n ew )) ( ( = n 0) ( r e v e r s e n ew )) ( t (g e t-n -w m e s (c d r omem) ( - n 1) (c o n s ( c a r omem) new))))) ;;This is for positive ces. Compare left mem and newly matched right wmes '(d e fu n com p-lm -w m es (lm wmes l a s r a s o p s new) (co n d ( ( n u l l wmes) new) ( t (com p-lm -w m es lm (c d r wmes) l a s r a s o p s (ap p en d (com p-lm -wm e lm ( c a r wmes) l a s r a s o p s ( ) ) new))))) (d e fu n com p-lm -wm e (lm w l a s r a s o p s new) (co n d ( ( n u l l lm ) new) ; ( r e v e r s e n ew )) 6 /2 7 ((c o m p -lw -r w ( c a r lm ) w l a s r a s o p s ) ;n o NOT h e r e l! ( com p-lm -wm e (c d r lm ) w l a s r a s o p s (c o n s (m erge-w m es ( c a r lm) w) n e w ))) ( t (com p-lm -wm e (c d r lm) w l a s r a s o p s new)))) ;;Compare left wme and right wme with ops operators. Find vail for atr from lw. Find val2 for atr from rw. ;;Get an op from ops. Apply op (<> or =) to vail and val2. ;;If all atrs are exhausted, then all var bindings are correct. ;;lw is a MERGED wme, i.e., (wmel) or (wmel wme2) or (wmel wme2 wme3)...rw=a wme. '(d efu n co m p -lw -rw (lw rw l a s r a s o p s ) ;n=number of wmes in lw (co n d ( ( n u l l l a s ) ) ;Retum TRUE ( (c o m p -lw -r w l lw rw ( c a r l a s ) ( c a r r a s ) ( c a r o p s ) ) (co m p -lw -r w lw rw (c d r l a s ) (c d r r a s ) (c d r ops))) (t n i l ))) ;T1N var comparison failed. [d e fu n c o m p -lw -r w l (lw rwme l a r a op) ;la=(no atr), Returns T or NIL ( s e t q lwme ( g e t - n t h - e l e m e n t ( c a r l a ) l w ) ) ( s e t q l v a l u e ( g e t - v a l - f o r - a t r (w m e-avps lwm e) (c a d r la))) ( s e t q r v a lu e ( g e t - v a l - f o r - a t r (w m e-avps rwme) r a ) ) 132 ( f u n c a l l op r v a lu e l v a l u e ) ) ;wrt the right value, not left value (d e fu n g e t - n t h - e l e m e n t (n 1) ;(get-nth-element 3 ’(ab c d e)) -> c (co n d ( ( = n 1) ( c a r 1 ) ) ( t ( g e t - n t h - e l e m e n t ( - n 1) (c d r 1))))) ;;wme can be a wme or a MERGED wme. Return a VALUE to the atr (d e fu n g e t - v a l - f o r - a t r ( o in s a t r ) ;(get-val-for-atr’(((a l)(b 2 )) ((c3) (d 4 )))’c) (co n d ( ( n u l l o i n s ) n i l ) ( ( l i s t p ( c a r o i n s ) ) ( g e t - v a l - f o r - a t r (a p p en d ( c a r o i n s ) (c d r o i n s ) ) a t r ) ) ( ( e q u a l ( c a r o i n s ) a t r ) (s e c o n d o i n s ) ) ;Be careful about cddr. It assumes ( t ( g e t - v a l - f o r - a t r ( cd d r o i n s ) a t r ) ) ) ) ;a wme is tuple, i.e., (a v) (d e fu n m erge-w m es (lw rw) (co n d ( ( l i s t p lw ) (ap p en d lw ( l i s t rw) )) ;Merge (WME1 WME2) and WME3 ( t ( l i s t lw rw)))) ;;;;COLLECT all the rules that have their LHS matched. Done by rule numbers. Returns a list of rules (d e fu n c o l l e c t - m a t c h e d - r u l e s (pm) ( c o l l e c t - m a t c h e d - r u l e s - 1 ( a r r a y - t o t a l - s i z e pm) pm ())) (d e fu n c o l l e c t - m a t c h e d - r u l e s - 1 ( i pm c s ) (c o n d ( ( = i 0) c s ) ((c e -tm e m ( c a r ( l a s t ( r u le - c o n d ( a r e f pm ( - i 1))))) ) ( c o l l e c t - m a t c h e d - r u l e s - 1 ( - i 1) pm (c o n s ( a r e f pm ( - i 1 ) ) cs))) (t ( c o l l e c t - m a t c h e d - r u l e s - 1 ( - i 1) pm cs )) )) ;;;;SELECT A RULE. Goes thru 3 steps if necessary. If cannot, select randomly. I ; ; ; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * (d e fu n s e l e c t - a - r u l e - f r o m - c s ( c s ) (co n d ( ( n u l l c s ) n i l ) ( t ( s e l e c t - a - r u l e - f r o m - c s - 1 cs)))) ;;(m instance-position-in-tmem-of-last-ce, a list of wme-nos-of-the-instance) ';;Instance position is needed because a certain rule can have many occurrences. "position of the selected instance in the tmem of the last ce (d e fu n s e l e c t - a - r u l e - f r o m - c s - 1 ( c s ) ( s e t q s e l - i n f o ( s e l e c t - a - r u l e ( g e t - a l l - i n s t a n t i a t i o n s c s ()))) ( s e t q s e l - r u l e ( g e t - s e l e c t e d - r u l e (c a r s e l - i n f o ) cs)) ( p u t - s e l e c t e d - i n s t a n c e - i n - f r o n t - o f - t m e m (s e c o n d s e l - i n f o ) s e l - r u l e ) ) (d e fu n p u t - s e l e c t e d - i n s t a n c e - i n - f r o n t - o f - t m e m (p o s r ) (co n d ( ( = p o s 1) r ) ;It is already in front. Do nothing! ( t ( p u t - s e l e c t e d - i n s t a n c e - i n - f r o n t p o s r )))) (d e fu n p u t - s e l e c t e d - i n s t a n c e - i n - f r o n t (p o s r ) ( s e t q o ld -tm em (ce -tm e m ( c a r ( l a s t ( r u le - c o n d r ) ) ) ) ) ( s e t q new-tm em ( p u t - n t h - e l e m - i n - f r o n t p o s o ld -tm em ) ) ( s e t f (ce-tm em ( c a r ( l a s t ( r u le - c o n d r)))) new-tm em ) r ) ;;(put-nth-elem-in-front 5 ’(8 7 9 2 3 4)) — > ( 3 8 7 9 2 4 ) (d e fu n p u t - n t h - e l e m - i n - f r o n t (p o s tmem) ( s e t q 1 - r ( s p l i t - w r t - p o s i t i o n tmem p o s ) ) ;l-r=(l-list r-list), r-list has elem ( s e t q 1 - l i s t ( f i r s t 1 - r ) ) ;(left) ( s e t q e ( c a r (s e c o n d 1-r))) ;desired element 133 ( s e t q r - l i s t (c d r (s e c o n d 1 - r ) ) ) ;(right) (a p p en d (c o n s e 1 - l i s t ) r - l i s t ) ) : ;(split-wrt-position ’( 8 7 9 2 3 4 ) 5) --> ((8 7 9 2) (3 4)) '(d efu n s p l i t - w r t - p o s i t i o n (1 n) ( s p l i t - w r t - p o s i t i o n - 1 () I n ) ) (d e fu n s p l i t - w r t - p o s i t i o n - 1 ( f r o n t 1 n) (co n d ( ( = n 1) ( l i s t ( r e v e r s e f r o n t ) 1 ) ) ( t ( s p l i t - w r t - p o s i t i o n - 1 (c o n s ( c a r 1) f r o n t ) (c d r 1) ( - n 1))))) (d e fu n g e t - s e l e c t e d - r u l e (r n c s ) (co n d ( ( n u l l c s ) ' E r r o r - i n - g e t - s e l e c t e d - r u l e ) ( ( = rn ( r u le - n o ( c a r cs))) ( c a r cs)) (t ( g e t - s e l e c t e d - r u l e rn (c d r c s ) ) ) ) ) ;;Retums a list of all inst’ns from all rules in the cs. ;;Each inst is in the form of (m inst-pos wme-nos), wme-nos are in desc order. '(d efu n g e t - a l l - i n s t a n t i a t i o n s ( c s new) (co n d ( ( n u l l c s ) new) ( t ( g e t - a l l - i n s t a n t i a t i o n s (c d r c s ) (ap p en d ( g e t - i n s t s ( c a r cs)) new))))) ( de f un g e t - i n s t s (p ) ;Get all instantiations from the rule ( l e t ((tm em (ce-tm em ( c a r ( l a s t ( r u le - c o n d p))))) ( r u le - n o ( r u le - n o p ) ) ( n o - o f - n e g - c e s ( - ( le n g t h ( r u le - c o n d p ) ) ( c a r ( r u le - n e g a t e d p ) )))) ( l i s t - i n s t a n t i a t i o n s tmem 1 r u l e - n o n o - o f - n e g - c e s ()))) (d e fu n l i s t - i n s t a n t i a t i o n s (tmem i rn nn new) (co n d ( ( n u l l tmem) new) ( t ( l i s t - i n s t a n t i a t i o n s (c d r tmem) (+ i 1) rn nn (c o n s ( l i s t - 1 ( c a r tmem) rn i nn) new))))) ;;Retum (Rule-no, Position of the inst in many inst’s, A rule instantiation) > ( d e fu n l i s t - 1 (m erged rn i n) ;m=rule no, n=no of +ces in the rule ( l e t ((w m es ( g e t - n - e le m e n t s n m erged ()))) ( l e t ((w m e-n o s ( e x tr a c t-w m e -n o s wmes ()))) ( l e t ( ( r a t i n g (c o m p u t e -r a t in g wmes 0))) ( l i s t r n i w m e-nos rating))))) ;;Merged-wme contains ONLY wmes from +ces. wmes from -ces sould not be here. (d e fu n e x tr a c t-w m e -n o s (m erged new) (co n d ( ( n u l l m erged) ( s o r t new '>)) ;Built-in sort func — > (9 6 5 2 ...) ( t ( e x tr a c t-w m e -n o s (c d r m erged) (c o n s (wm e-no ( c a r m e r g e d )) new))))) (d e fu n c o m p u te - r a tin g (wmes r a t i n g ) 0) ;;Select based on (1) wme-nos (value, then length), (2) rating, (3) random (d e fu n s e l e c t - a - r u l e ( c s ) ( s e l e c t - t h e - b e s t - r u l e ( c a r c s ) (c d r c s ))) (d e fu n s e l e c t - t h e - b e s t - r u l e ( b e s t r e m a in d e r ) (co n d ( ( n u l l r e m a in d e r ) b e s t ) ( ( w h i c h - i s - b e t t e r b e s t ( c a r r e m a in d e r ) 0 1) ( s e l e c t - t h e - b e s t - r u l e b e s t (c d r r e m a in d e r )) ) ( t ( s e l e c t - t h e - b e s t - r u l e ( c a r r e m a in d e r ) (c d r r e m a i n d e r ) ) ) ) ) ;;index, i, indicates which strategy currently to use. ;;flag indicates which one better, -1 for left, 1 for right, 0 for tie. a=b=(m pos wme-nos rating) (d e fu n w h i c h - i s - b e t t e r (a b f l a g n) (c o n d ( ( = f l a g - 1 ) t ) jcurrently best is better than new one ( ( = f l a g 1) n i l ) ;new one is better ( ( = n 1) ( w h i c h - i s - b e t t e r a b ( c o m p a r e - t w o - l is t s a b ) (+ n 1))) ((= n 2) ( w h i c h - i s - b e t t e r a b ( c o m p a r e - r a tin g s a b) (+ n 1))) 134 (t t))) ;random-selection, select the first one (defun compare-two-lists (a b) (compare-two-lists-1 (third a) (third b))) (defun compare-two-lists-1 (11 12) (cond ((and (null 11) (null 12)) 0) ;tie, try other strategy ((null 12) -1) ;a is better ((null 11) 1) ;b is better ((> (car 11) (car 12)) -1) ;a is better ((< (car 11) (car 12)) 1) ;bisbetter (t (compare-two-lists-1 (cdr 11) (cdr 12))))) (defun compare-ratings (a b) (compare-ratings-1 (car (last a)) (car (last b)))) (defun compare-ratings-1 (vl v2) (cond ( (> vl v2) -1) ;a is better ((< vl v2) 1) ;b is better (t 0))) ;tie ;;;;*********************************************************************************** ;;;;RULE FIRING: prepares wmes based on the info contained in the selected rule. ;;;;Returns a list of new wmes to be distributed to the network. J •;*•*********************************************************************************** (d e fu n p r e p a r e - w m e s - t o - f i r e - r u le (r t t ) (co n d ( ( n u l l r ) n i l ) ( t ( s e t q m erged-wm e (c a r (ce-tm em ( c a r ( l a s t ( r u le - c o n d r ) ) ) ) ) ) (p rep a re-w m es ( r u le - c o n d r ) ( r u l e - a c t r ) m erged-wm e () t t ) ))) (d e fu n p rep a re-w m es ( c e s a e s m erged new t t ) (co n d ( ( n u l l a e s ) ( r e v e r s e n ew )) ( t (p r ep a re-w m es c e s (c d r a e s ) m erged (ap p en d ( a c t i o n ( c a r a e s ) c e s m erged t t ) new) t t ) ))) (d e fu n a c t i o n (a e c e s m erged t t ) ( a c t i o n - 1 ( a e - o p a e ) a e c e s m erged t t ) ) (d e fu n a c t i o n - 1 (op a e c e s m erged t t ) (co n d ( ( e q u a l op 'm ake) ( l i s t (a c t-m a k e a e m erged t t ) ) ) ;to be consists of ( ( e q u a l op 'm o d ify ) ( a c t - m o d if y ae c e s m erged t t ) ) ;Retums (+ -) ( ( e q u a l op 'rem o v e) ( l i s t ( a c t-r e m o v e a e c e s m e r g e d ))) ( t ' O t h e r - a c t i o n s - t o - b e - im p le m e n t e d ) )) ;;Retrieve from wme avps that appear only in ce-avps. Set the type to ;;Retuming a wme will not remove all those related wmes. ( d e fu n a c t-r e m o v e ( ae c e s m erged -w m es) ( s e t q tmp-wme ( g e t - n t h - e l e m e n t ( a e - n o - i n - r u l e a e ) m erged -w m es)) ( s e t q d e l e t e d (m ake-w m e)) ;Create a new structure ( s e t f (wme-name d e l e t e d ) (wme-name tm p-w m e)) ;Set the name of wme using ae-name ( s e t f (w m e-typ e d e l e t e d ) ' - ) ( s e t f (w m e-tag d e l e t e d ) (w m e-tag tm p-w m e)) ( s e t f (wm e-no d e l e t e d ) *NUMBER*) ( s e t q *NUMBER* (+ 1 *NUMBER*)) ( s e t f (w m e-avps d e l e t e d ) (w m e-avps tm p-w m e)) d e l e t e d ) ;;Modify an existing wme. Make a new one and delete the old one. Returns (to-be-added to-be-deleted) (d e fu n a c t - m o d if y (a e c e s m erged t t ) ( s e t q d e l e t e d (a c t-r e m o v e a e c e s m er g e d )) ;Prepare a wme to be removed ( s e t q ad d ed (m ake-w m e)) ;Create a new structure ( s e t f (wme-name a d d ed ) (wme-name d e l e t e d ) ) ;Set the name of wme using ae-name ( s e t f (w m e-typ e a d d ed ) '+ ) 135 (setf (wme-tag added) tt) (setf (wme-avps added) (modify-values (ae-avps ae) (wme-avps deleted) merged)) (setf (wme-no added) *NUMBER*) (setq *NUMBER* (+ 1 *NUMBER*)) (list added deleted)) ;Retum (added deleted) ;;For <var>, find a vlue to <var> from MERGED wme, using info (ae-nos-to-cmp ae) ;;atcs can not be used since the location of attr is not given at comp time. ;;Used only to tell how many var are in the ae-avps. aavp=ae-avp, wavps=wme-avps ;;(setq merged ’(((a 1) (b 2)) ((c 3) (d 4) (e 5)))) ;;(modify-values ’((c 8) (d <v>)) ’((1 b)) merged ’((c 3) (d 4) (e 5))) ~> ((C 8) (D 2) (E 5)) (defun modify-values (aavps wavps merged) (replace-aavps (set-val aavps merged ()) wavps)) ;(replace-aavps ’((a 1) (b 2)) ’((a 4) (b 5) (c 3))) - > ((a 1) (b 2) (c 3)) '(defun replace-aavps (aavps wavps) (cond ((null aavps) wavps) (t (replace-aavps (cdr aavps) (replace-aavp (car aavps) wavps ()))))) (defun replace-aavp (aavp wavps front) (cond ((null wavps) (cons aavp (reverse front))) ;aavp may not be in wavps. ((equal (car aavp) (caar wavps)) (append (reverse front) (cons aavp (cdr wavps)))) ( t ( r e p la c e - a a v p aavp (c d r w a v p s) (c o n s ( c a r w a v p s) front))))) ;;Tested individually and working. But not tested with everything together. "Create a NEW wme. Make can also have a <VAR> in it, i.e., must use merged wme. '(defun act-make (ae merged tt) (setq sym (make-wme)) ;Create a new structure (setf (wme-name sym) (ae-name ae)) ;Set the name of wme using ae-name (setf (wme-tag sym) tt) (setf (wme-type sym) '+) (setf (wme-avps sym) (set-val (ae-avps ae) merged ())) (setf (wme-no sym) *NUMBER*) (setq *NUMBER* (+ 1 *NUMBER*)) sym) ;avp=(attr compute ((ce-nol attrl)+(ce-no2 attr2))) OR (attr var (ce-nol attrl) '(defun set-val (avps merged new) (cond ((null avps) new) ((equal (second (car avps)) 'compute) (set-val (cdr avps) merged (cons (set-compute (car avps) merged) new))) ((equal (second (car avps)) 'var) (set-val (cdr avps) merged (cons (set-var (car avps) merged) new))) (t (set-val (cdr avps) merged (cons (car avps) new))))) (defun set-compute (avp merged) (setq value-part (car (last avp))) (setq opl (get-value (first value-part) merged)) ;operand 1 (setq op2 (get-value (third value-part) merged)) ;operand2 (list (car avp) (compute-final-value opl (second value-part) op2))) (defun get-value (op merged) ;op=(ce-noattr) ( co n d ( ( numberp o p ) o p ) ;op is not a list. Return the value. (t (get-val-from-merged (car op) (second op) merged)))) ;;(get-val-from-merged 2 a4 ’(((al vl) (a2 v8)) ((a4 v4) (a5 v5)))) — > v4 136 (d e fu n g e t- v a l- f r o m - m e r g e d ( c e - n o a t t r m erged ) ( g e t - v a l- f r o m - m e r g e d - 1 a t t r (w m e-avps ( g e t - n t h - e l e m e n t c e - n o m e r g e d )) ) ) (d e fu n g e t- v a l- f r o m - m e r g e d - 1 ( a t t r wme) (co n d ( ( n u l l wme) ' E r r o r - i n - g e t - v a l- f r o m - m e r g e d - 1 ) ( ( e q u a l a t t r (c a a r wm e)) (c a d a r wm e)) ( t ( g e t - v a l- f r o m - m e r g e d - 1 a t t r (c d r wme))))) (d e fu n c o m p u t e - f i n a l - v a l u e (o p l o p -c o d e o p 2 ) (co n d ((a n d (num berp o p l) (num berp o p 2 ) (member o p -c o d e ' ( + - * /))) ( f u n c a l l o p -c o d e o p l o p 2 )) ( t ' E r r o r - i n - c o m p u t e - f i n a l - v a l u e ) ) ) ;;avp=(attr ’var (ce-no attr2)), returns (attr val), val is from the merged wme. (d e fu n s e t - v a r (a v p m erged) ( l i s t ( c a r avp ) ( g e t — v a l-fr o m -m e r g e d ( c a r ( t h i r d a v p )) (s e c o n d ( t h i r d a v p )) m e r g e d ))) ;;************************************************************************************ ;;Print information on the selected rule, name, those instantiated wm e-nos,... ;; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * (d e fu n p r i n t - i n f o - o n - a - s e l e c t e d - r u l e (r c y c l e - n o ) (co n d ( ( n o t ( n u l l r ) ) (fo r m a t t "~2d . - s " c y c l e - n o (r u le -n a m e r ) ) (p r in t-w m e -n o s r)))) (d e fu n p r in t-w m e -n o s ( r ) (an d (ce-tm em ( c a r ( l a s t ( r u le - c o n d r)))) (p r in t-w m e -n o (c a r (ce-tm em ( c a r ( l a s t ( r u le - c o n d r ) ) ) ) ) ) ) (fo r m a t t ”-%")) (d e fu n p r in t-w m e -n o (wmes) (co n d ((null w m es)) ( t (fo r m a t t " ~s" (wm e-no ( c a r wmes))) (p r in t-w m e -n o (c d r wmes))))) ;;Classify wmes in terms of no of AVPs. Returns a list of counters. (d e fu n c la s s i f y - w m (wm g ) (co n d ((null wm) g) (t ( c l a s s if y - w m (c d r wm) ( c la s s if y - w m e ( c a r wm) g))))) (d e fu n c la s s if y - w m e (w g ) ( c la s s if y - w m e - 1 ( le n g t h (w m e-avps w ) ) g ())) (d e fu n c la s s i f y - w m e - 1 (n g new) (co n d ( ( = n 0) (ap p en d ( r e v e r s e new) (c o n s (+ ( c a r g ) 1) (c d r g ) ))) (t ( c la s s if y - w m e - 1 ( - n 1) (c d r g ) (c o n s ( c a r g) new))))) ;;(update-total-wm-group ’(1 2 3 4) ’(1 1 1 1) ())->(2 3 4 5) (d e fu n u p d a te - to t a l- w m - g r o u p (g tm p -g n ew -g ) (co n d ( ( n u l l g ) ( r e v e r s e n e w -g )) ( t ( u p d a te -to ta l-w m -g r o u p (c d r g ) (c d r tm p -g ) (c o n s (+ ( c a r g ) ( c a r tm p -g )) new-g))))) ;;Reset ce-mem-changed to NIL. Returns a MODIFIED PM. (d e fu n r e s e t - f l a g s (pm) ( r e s e t - f l a g s - 1 ( a r r a y - t o t a l - s i z e pm) pm )) (d e fu n r e s e t - f l a g s - 1 ( i pm) (co n d ( ( = i 0) pm) 137 ( t ( r e s e t - f l a g s - 1 ( - i 1) ( r e s e t - r u l e - f l a g s ( - i 1) pm))))) (d e fu n r e s e t - r u l e - f l a g s ( i pm) ( s e t f ( r u le - c o n d ( a r e f pm i ) ) ( r e s e t - c e - f l a g s ( r u le - c o n d ( a r e f pm i ) ) ())) pm) (d e fu n r e s e t - c e - f l a g s ( c e s new) (co n d ( ( n u l l c e s ) ( r e v e r s e n ew )) ( t ( r e s e t - c e - f l a g s (c d r c e s ) (c o n s ( r e s e t - c e - f l a g ( c a r ces)) new))))) (d e fu n r e s e t - c e - f l a g ( c e ) ( s e t f (ce-m em -ch a n g ed c e ) ' n i l ) c e ) ;;Delete the latest instantiation of the selected rule from tin-mem of the rule (d e fu n u p d a t e - s e l e c t e d - r u l e (p pm t t ) (a n d p ( s e t f ( a r e f pm ( r u le - n o p ) ) ( u p d a t e - r u l e - i n f o p t t ) ) ) pm) (d e fu n u p d a t e - r u l e - i n f o (p t t ) ( s e t q l a s t - c e ( c a r ( l a s t ( r u le - c o n d p)))) ( s e t q tmem (ce-tm em l a s t - c e ) ) ( s e t f (ce-tm em l a s t - c e ) (c d r tm em )) ( s e t f ( r u l e - t t - l a s t - f i r e d p) t t ) p) ;Retums a new updated prod, p. ;;;;BUILD NETWORK: Compile production memory (PM) and build the Rete network. •";*********************************************************************************** (d e fu n c o m p ile -p r o d u c tio n -m e m o r y (n pm) ;n=the number of rules ( s e t q co m p ile d -p m (c o m p ile -p m pm 0 ())) (fo r m a t t "-%") ( p u t - i n - a r r a y c o m p ile d -p m (m a k e -a r r a y n) 0 ) ) (d e fu n p u t - i n - a r r a y (pm a r r i ) (co n d ( ( n u l l pm) a r r ) ; Returns the pm in array ( t ( p u t - i n - a r r a y (c d r pm) ( p u t - a - p - i n - a r r a y ( c a r pm) a r r i ) (+ i 1))))) (d e fu n p u t - a - p - i n - a r r a y (p a r r i ) ( s e t f ( a r e f a r r i ) p) a r r ) (d e fu n c o m p ile -p m (1 n pm) (c o n d ( ( n u l l 1) ( r e v e r s e pm)) ( t (c o m p ile -p m (c d r 1) (+ n 1) (c o n s ( c o m p ile - a -p r o d ( c a r 1) n) pm))))) (d e fu n c o m p ile - a - p r o d (1 n) ( c o m p i l e - a - t i n ( c o m p ile - a - p r o d - 1 1 n))) “ * * * * * * 4^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ;;Compile one input nodes and set all the fields, except tins. "************************************************************************************ (d e fu n c o m p ile - a - p r o d - 1 (1 n ) ;Order for setf is IMPORTANT! ( s e t q l h s - r h s ( c o m p i le - a - r u l e 1 ) ) ( s e t q r ( m a k e - r u le ) ) ( s e t f (r u le -n a m e r ) (c a d r 1 ) ) ( s e t f ( r u le - n o r ) n) ( s e t f ( r u le - c o n d r ) ( f i r s t l h s - r h s ) ) ( s e t f ( r u l e - a c t r ) (s e c o n d l h s - r h s ) ) ( s e t f ( r u l e - t t - l a s t - f i r e d r ) 0) ( s e t f ( r u le - n e g a t e d r ) ( c o u n t - n o - o f - n e g a t e d - c e s ( r u le - c o n d r ) () 0 )) 138 ( fo r m a t t " * " ) ;Write * for each rule compiled, r ) ;Retum a compiled rule [(defun c o u n t - n o - o f - n e g a t e d - c e s ( c e s 1 n) (co n d ( ( n u l l c e s ) ( l i s t n ( r e v e r s e 1))) ( ( e q u a l ( c e - t y p e ( c a r ces)) '-) ( c o u n t - n o - o f - n e g a t e d - c e s (c d r c e s ) (c o n s ' - 1) (+ n 1))) (t ( c o u n t - n o - o f - n e g a t e d - c e s (c d r c e s ) (c o n s ' + 1) n)))) (d e fu n c o m p il e - a - r u le (1 ) ( s e t q l h s - r h s ( s p l i t (c d d r 1) ())) ;Split a rule. Returns (LHS RHS) ( s e t q l h s ( c a r l h s - r h s ) ) ( s e t q r h s (c a d r l h s - r h s ) ) ( s e t q c o m p ile d - lh s (c m p -lh s l h s n i l 1 ) ) ( s e t q c o m p ile d -r h s (cm p -rh s r h s c o m p ile d - lh s ())) ;To find wme name for aes ( s e t q c o m p il e d - l h s - r h s ( l i s t c o m p ile d - lh s c o m p ile d - r h s ) ) c o m p i l e d - l h s - r h s ) (d e fu n s p l i t (1 l h s ) ;(split ’(1 2 3 --> 4 5 ) 0 ) -> ((1 2 3) (4 5» (co n d ( (eq ( c a r 1) ' — >) ( l i s t ( r e v e r s e l h s ) (c d r 1))) (t ( s p l i t (c d r 1) (c o n s ( c a r 1) lhs ))) )) (d e fu n c m p -lh s (1 r e s u l t n) ; Com pile LHS of a rule. (co n d ( ( n u l l 1) ( r e v e r s e r e s u l t ) ) ;n=ordinal position of the ce in the rule ( ( e q u a l ( c a r 1) ' - ) (c m p -lh s (c d d r 1) (c o n s (cm p -c e (c a d r 1) ' - n) r e s u l t ) (+ n 1))) (t (c m p -lh s (c d r 1) (c o n s (cm p -c e ( c a r 1) '+ n) r e s u l t ) (+ n 1))))) ;l=((goal A a <> 1 A b 2) <evar>), sym=data-structure associated with 1 . ;;(cmp-ce ’((goal A a <> 1 A b 2) <evar>) (gensym)) [(defun cm p -ce (1 t y p e n) ;Retums a symbol (struct) set to values (co n d ( (= ( le n g t h 1) 2 ) ;Containsele-var (c m p -c e l (c a a r 1) (c d a r 1) (c a d r 1) t y p e n ) ) ( t (c m p -c e l ( c a r 1) (c d r 1) n i l t y p e n)))) ;No e l e - v a r f (d e fu n c m p -c e l (name o i n s e v a r-n a m e t y n ) ;Retums sym (struct) set to values ( s e t q sym (m a k e -c e ) ) ;Assign SYM to a CE structure ( s e t f ( c e - n o - i n - r u l e sym ) n) ( s e t f (ce-n a m e sym) name) ( s e t f ( c e - o i n s sym) ( l i s t - a v p a i r s o i n s ())) ( s e t f (c e -e v a r -n a m e sym) e v a r-n a m e) ( s e t f ( c e - n o - o i n s sym ) ( le n g t h ( c e - o i n s s y m ))) ( s e t f ( c e - t y p e sym) t y ) ( s e t f ( c e - v a r s sym) ( g e t - v a r s - i n - c e ( c e - o i n s sym ) ())) ( s e t f ( c e - n o - v a r s sym ) ( le n g t h ( c e - v a r s s y m ))) ( s e t f (ce-m em -ch a n g ed sym) ' n i l ) ; I n i t i a l i z e f l a g sym) ( d e fu n l i s t - a v p a i r s (1 p a i r s ) ;(list-avpairs ’(A a <> 1 A b 2) 0 ) — > ((a <> 1) (b 2)) (co n d ( ( n u l l 1) ( r e v e r s e p a i r s ) ) ( ( e q u a l ( c a r 1) ,A ) ( l i s t - a v p a i r s (c d r 1) (c o n s (c m p -a v p a ir (c d r 1) ( ) ) pairs))) (t ( l i s t - a v p a i r s (c d r 1) pairs)))) [(d e fu n c m p -a v p a ir (1 a v p a i r ) ;(cmp-avpair ’(a <> 1 A b 2) 0 ) — > (a 1) (co n d ( ( n u l l 1) ( r e v e r s e a v p a ir ) ) ( ( e q u a l ( c a r 1) ,A ) ( r e v e r s e a v p a i r ) ) ( t (c m p -a v p a ir (c d r 1) (c o n s ( c a r 1) avpair))))) ;(find-variable ’((AT <P1>) (ON FLOOR) (ON <F>)) ()) ~> ((AT <P1>) (ON <F>» 139 (d e fu n g e t - v a r s - i n - c e ( o in s v a r s ) ;((AT<P1>) (ONFLOOR)) (c o n d ( ( n u l l o i n s ) ( r e v e r s e (rem ove n i l v a r s ) ) ) ( t ( g e t - v a r s - i n - c e (c d r o i n s ) (c o n s ( c o n t a in - v a r ( c a r o i n s ) ) vars))))) ;;(contain-var ’(<> <P1>)) (contain-var ’(obj (<> <P1>))) (d e fu n c o n t a in - v a r ( v a l - p a r t ) ;val-part=(<> <P1>) (c o n d ( ( n u l l v a l - p a r t ) n i l ) ( ( l i s t p ( c a r v a l - p a r t ) ) ( c o n t a in - v a r ( c a r v a l - p a r t ) ) ) ( ( v a r i a b l e ( c a r v a l - p a r t ) ) ( c a r v a l - p a r t ) ) ( t ( c o n t a in - v a r ( c d r v a l - p a r t ) ) ) ) ) ;;Compile RHS of rules. Returns a list of symbols, each of which is for a AE. [(defun cm p -rh s (r h s c e s new) (co n d ( ( n u l l r h s ) ( r e v e r s e n ew )) ( t (cm p -rh s (c d r r h s ) c e s (c o n s (cm p -ae ( c a r r h s ) c e s ) new))))) (d e fu n cm p -ae (a e c e s ) ;Retums a symbol (struct) set to values (co n d ( ( e q u a l ( c a r a e ) 'm ake) (c m p -a e l ( c a r a e ) (s e c o n d a e ) n i l (c d d r a e ) ces)) ( ( e q u a l ( c a r a e ) 'm o d ify ) (c m p -a e l (c a r a e ) ( fin d - a e -n a m e a e c e s ) (s e c o n d a e ) (c d d r a e ) ces)) ( ( e q u a l ( c a r a e ) 'rem o v e) ;c a n h a v e M ANY e v a r s t o rem ove (c m p -a e l ( c a r a e ) ( fin d -a e -n a m e a e c e s ) ( s e c o n d a e ) n i l ces)) (t ' e r r o r - in - c m p - a e ) ) ) ;Call error routine ;Order of the following stmts is important (d e fu n c m p -a e l (op name ev a r -n a m e o i n s c e s ) ( s e t q sym (m a k e -a e )) ;Assign SYM to a AE structure ( s e t f ( a e - o p sym ) op ) ( s e t f (a e-n a m e sym ) name) ( s e t f ( a e -e v a r -n a m e sym) e v a r -n a m e ) ( s e t f ( a e - a v p s sym) ( l i s t - a e - a v p s o i n s ces)) ( s e t q n o s - v a r s ( f i n d - a v p s - w i t h - v a r ( a e - a v p s sym ) () () 1 ) ) ( s e t f ( a e - n o - i n - r u l e sym) ( r e l a t e d - t o - w h i c h - c e e v a r -n a m e ces)) sym) (d e fu n l i s t - a e - a v p s ( o in s c e s ) ;’(A a <> 1 A b 2) 0 ~> ((a o 1) (b 2)) ( a d d - c e - n o - a t t r ( l i s t - a v p s o i n s ( ) ) c e s ())) (d e fu n l i s t - a v p s (1 p a i r s ) (co n d ( { n u l l 1) ( r e v e r s e p a i r s ) ) ( ( e q u a l ( c a r 1) ,A ) ( l i s t - a v p s (c d r 1) (c o n s ( l i s t - a v p (c d r 1 ) ( ) ) pairs))) (t ( l i s t - a v p s (c d r 1 ) pairs)))) (d e fu n l i s t - a v p (1 a v p a ir ) ;(list-avp '(a <> 1 A b 2) 0) "> (a 1) (co n d ( ( n u l l 1) ( r e v e r s e a v p a ir ) ) ( ( e q u a l ( c a r 1) ,A ) ( r e v e r s e a v p a ir ) ) ( t ( l i s t - a v p (c d r 1) (c o n s ( c a r 1) avpair))))) ;;Given avp=(attr (compute (<x> + <y>))) OR (attr val) ;;Retums (attr ((c ce-nol attrl) + (c ce-no2 attr2))) OR (attr val) !(d e fu n a d d - c e - n o - a t t r (a v p s c e s new) (co n d ( ( n u l l a v p s ) ( r e v e r s e n ew )) ((m em ber 'com p u te ( f l a t t e n (c a r avps))) ( a d d - c e - n o - a t t r (c d r a v p s ) c e s (c o n s (fo r -c o m p u te ( c a r a v p s ) c e s ) new))) 140 ( ( v a r - i n ( c a r a v p s ) ) ( a d d - c e - n o - a t t r (c d r a v p s ) c e s (c o n s ( f o r - v a r ( c a r a v p s ) c e s ) new))) (t ( a d d - c e - n o - a t t r (c d r a v p s ) c e s (c o n s ( c a r a v p s ) new))))) ;;ces=(((al <x>) (a2 v8)) ((a4 <v2>) (a5 <y>))) ;;(for-compute ’(a7 (compute (<x> + <y>))) ces) — > (a7 compute ((1 a l) + (2 a5))) (d e fu n fo r -c o m p u te (a v p c e s ) ( s e t q c o m p u te -p a r t (s e c o n d (s e c o n d avp))) ; (<x> + <y>) ( s e t q o p l ( f i n d - r e f - f o r - o p e r a n d ( f i r s t c o m p u te -p a r t) ces)) ( s e t q op2 ( f i n d - r e f - f o r - o p e r a n d ( t h i r d c o m p u te -p a r t) ces)) ( l i s t ( c a r avp ) 'com p u te ( l i s t o p l (s e c o n d c o m p u te -p a r t) op2))) ;Retums (ce-no atr) to compare for a <var> if op is <var> ;;(find-ref-for-operand ’<z> (((al <x>) (a2 v8)) ((a4 <z>) (a5 v5)))) — > (2 a4) (d e fu n f i n d - r e f - f o r - o p e r a n d (op c e s ) ( co n d ( ( num berp o p ) o p ) ;if op is <var>, then find corresponding ( t ( f i n d - a t t r op c e s 1)))) ;attrto<var>from ;; Given avp=(attr <VAR>), returns (attr var (ce-no attr2)) ;;(for-var ’(a7 <v2>) ’(((al v l) (a2 v8)) ((a4 <v2>) (a5 v5))))— >(a7 var (2 a4)) (d e fu n f o r - v a r (avp c e s ) ( l i s t ( c a r avp ) 'v a r ( f i n d - a t t r ( v a r - i n avp ) c e s 1))) ;(find-avps-with-var ’((1 v l) (2 <v2>) (3 <v3>)) 0 1) --> ((2 3) (<V2> <V3>)) “Return a list of vars and location index for quick access. (d e fu n f in d - a v p s - w i t h - v a r ( o in s v a r s n o s n) (co n d ( ( n u l l o i n s ) ( l i s t ( r e v e r s e n o s ) ( r e v e r s e vars))) ( ( v a r - i n ( c a r o i n s ) ) ( f in d - a v p s - w i t h - v a r (c d r o i n s ) (c o n s (c a d a r o i n s ) v a r s ) (c o n s n n o s ) (+ n 1))) (t ( f i n d - a v p s - w it h - v a r (c d r o i n s ) v a r s n o s (+ n 1))))) (d e fu n f i n d - a t t r s ( v a r s c e s new) (co n d ( ( n u l l v a r s ) ( r e v e r s e n ew )) ( t ( f i n d - a t t r s (c d r v a r s ) c e s (c o n s ( f i n d - a t t r ( c a r v a r s ) c e s 1) new))))) ;;Find attr for the given <var> from ces. Returns (ce-no atr) to compare for a <var> (d e fu n f i n d - a t t r (v a r c e s n) (co n d ( ( n u l l c e s ) ' ERROR-IN— FIND— ATTR) ;Error. Must have VAR in CEs ((m em ber v a r ( c e - v a r s ( c a r ces))) ;Use ALREADY compiled information. ( f i n d - a t t r - 1 v a r ( c e - o i n s ( c a r ces)) n 1)) ;n=ce no, l =avp no in a ce ( t ( f i n d - a t t r v a r (c d r c e s ) (+ n 1))))) (d e fu n f i n d - a t t r - 1 (v a r o i n s n m) (co n d ( ( n u l l o i n s ) n i l ) ;Error. Must have VAR in CE ((and (e q u a l ( v a r - i n ( c a r oins)) v a r ) ( n o - n o t - e q u a l - o p - i n v a r () ( c a r oins))) ( l i s t n ( c a a r oins))) ;Retums (CE-NO ATTR). Use m if needed ( t ( f i n d - a t t r - 1 v a r (c d r o i n s ) n (+ m 1))))) ;;(no-not-equal-op-in ’<01> 0 ’(HOLDS (<> NULL <> <01>))) --> NIL ;;Test if <var> is proceeded by not-equal-op <>, i.e., (<> <var>)? Then, NIL. (d e fu n n o - n o t - e q u a l - o p - i n (v a r p r e v o in ) (c o n d ((null oin)) ( ( n u l l p r e v ) ( n o - n o t - e q u a l - o p - i n v a r ( c a r o in ) (c d r oin))) ( ( l is t p ( c a r oin)) ( n o - n o t - e q u a l - o p - i n v a r p r e v (ap p en d ( c a r o in ) (c d r oin)))) 141 { ( e q u a l ( c a r o in ) v a r ) ( n o t (e q u a l p r e v '<>))) (t ( n o - n o t - e q u a l - o p - i n v a r (c a r o in ) (c d r oin ))) )) ;;Those aes which have EVAR do not have NAME in ae. Find NAME using EVAR-NAME. ;;ae=(modify 1) or (modify <obj>), return ae-name, i.e., wme-name '(d efu n fin d -a e -n a m e (a e c e s ) (c o n d ( ( n u l l c e s ) 'E r r o r - in - f in d - a e - n a m e ) ( ( numberp ( s e c o n d ae)) ( ce-n a m e ( g e t - n t h - e l e m e n t ( s e c o n d a e ) c e s ))) ( ( e q u a l (s e c o n d a e ) (c e -e v a r -n a m e ( c a r ces))) (ce-n a m e ( c a r ces))) (t (fin d - a e -n a m e a e (c d r ces))))) (d e fu n c o n t a in - e v a r (a ) ;(contain-evar’<var>) ~> T j ( e q u a l (c h a r ( s t r i n g a ) 0) (c h a r "<" 0))) ;;Finds a no-in-rule which relates ce and ae through evar-name. Some ops prgms do not use evar-name. ;;They instead use 1,2,3,... evar-name could be 1,2,... or <obj>. Returns a number. (d e fu n r e l a t e d - t o - w h i c h - c e (ev a r -n a m e c e s ) (co n d ( ( n u l l c e s ) ' E r r o r - i n - r e l a t e d - t o - w h i c h - c e ) ( (num berp e v a r -n a m e ) e v a r -n a m e ) ;(modify 1) (remove 2) ( ( e q u a l (c e -e v a r -n a m e ( c a r ces)) e v a r -n a m e ) ( c e - n o - i n - r u l e ( c a r ces))) (t ( r e l a t e d - t o - w h i c h - c e ev a r-n a m e (c d r c es) ))) ) ’ * * ♦ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * afe * * * * * * * * * * * * * * * * ;; Compile two-input nodes. • ■ * * * * * * * * * * * * * * 4c * * * s|e * * * * * * * * 4c * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ■(defun c o m p i l e - a - t i n (p ) ;Get VARs from a rule ( s e t f ( r u l e - v a r s p) ( c o l l e c t - v a r s ( r u le - c o n d p ) ())) (an d (> ( le n g t h ( r u le - c o n d p ) ) 1) ( s e t f ( r u le - c o n d p ) ( s e t - a v p - t o - c m p {) ( r u le - c o n d p) ())) ( s e t f ( r u le - c o n d p ) ( s e t - a v p - t o - c m p - f o r - c e l ( r u le - c o n d p)))) ( s e t f ( r u le - c o n d p ) ( s e t - i m p o r t a n t ( r u le - c o n d p ) () ( r u le - c o n d p))) p ) ;Retum a prod with TTNs attached, d e fu n s e t - im p o r t a n t ( c e s new c e s 2 ) ;ces2 is not modified. (c o n d ((null c e s ) ( r e v e r s e n ew )) ( t ( s e t - im p o r t a n t (c d r c e s ) (c o n s ( s - t - v - o p ( c a r c e s ) c e s 2 ) new ) ce s2 )))) (d e fu n s - t - v - o p ( c e c e s ) ( s e t f ( c e - o p - t o - c m p c e ) ( f i n d - o p - f o r - v a r ( c e - a v p - to - c m p c e ) ())) (c o n d ( ( > ( c e - n o - i n - r u l e c e ) 1) ;First CE does not need this info. ( s e t f ( c e - o t h e r - a t r c e ) ( f i n d - o t h e r - a t t r s c e ce s)))) c e ) (d e fu n f i n d - o p - f o r - v a r (1 new) (c o n d ( ( n u l l 1) ( r e v e r s e n ew )) ( t ( f i n d - o p - f o r - v a r (c d r 1 ) (c o n s ( f - o - f - v () (c d a r 1 ) ) new))))) ;;(f-o-f-v 0 ’(<> <var>)), (f-o-f-v 0 ’((< > <var>))), (f-o-f-v () ’(<= var <> <var>)) — > <> ;;(f-o-f-v 0 ’(<var>)) NIL (d e fu n f - o - f - v (p r e v v a l - p a r t ) ( l e t ((op ( f - o - f - v - 1 p r e v v a l - p a r t ) ) ) (co n d ((not (member op '(<> = nil > < <= >=))) ' E r r o r - i n - f - o - f - v ) ( ( e q u a l op 'nil) 'e q u a l) ;Means no particular operator. ( ( e q u a l op '= ) 'e q u a l) 142 ( ( e q u a l op '< > ) 'n o t - e q u a l ) ( t op)))) ; There is a special operator, like (d e fu n f - o - f - v - 1 (p r e v v a l - p a r t ) ;value-part contains a var <var> ( c o n d ( ( n u l l v a l - p a r t ) ' =) ;No <> in front of <VAR>, meaning = test ( ( l i s t p ( c a r v a l - p a r t ) ) ( f - o - f - v - 1 () ( c a r v a l - p a r t ) ) ) ;val-pt=((<><P1>)) ( ( v a r i a b l e ( c a r v a l - p a r t ) ) prev);forval-part=(<Pl>) ( t ( f - o - f - v - 1 (c a r v a l - p a r t ) (c d r val-part))))) (d e fu n n o t - e q u a l (a b ) ( n o t (e q u a l a b))) Returns ((ce-no atr) ...) to compare for <var>’s ;;(CE2... (attr3 <var>) ...) ce=(CE5 ...(attr2 <var>) ...) — > (2 attr3) (d e fu n f i n d - o t h e r - a t t r s ( c e c e s ) ( s e t q n ( c e - n o - i n - r u l e ce)) ( f i n d - o t h e r - a t t r s - 1 ( c e - v a r - t o - c m p c e ) ( g e t - n - e le n ie n t s ( - n 1) c e s ( ) ) ())) (d e fu n g e t - n - e le m e n t s (n 1 new) ;(get-n-elements 3 ’(1 2 3 4 5 ) ()) (co n d ( ( = n 0) ( r e v e r s e n ew )) ( t ( g e t - n - e l e m e n t s ( - n 1) (c d r 1) (c o n s ( c a r 1) new))))) (d e fu n f i n d - o t h e r - a t t r s - 1 ( v a r s c e s new) (co n d ((null v a r s ) ( r e v e r s e n ew )) ( t ( f i n d - o t h e r - a t t r s - 1 (c d r v a r s ) c e s (c o n s ( f - o - a - 1 ( c a r v a r s ) c e s ) new))))) (d e fu n f - o - a - 1 (v a r c e s ) (co n d ((null c e s ) ' E r r o r - i n - f - o - a - 1 ) ;E r r o r ((m em ber v a r ( c e - v a r s ( c a r ces))) ( g e t - n o - a t t r v a r ( c a r ces))) (t ( f - o - a - 1 v a r (c d r ces))))) ;;(get-attr ’v3 (vl v2 v3) (al a2 a3) 2) --> (2 a3), means attr a3 of 2nd CE. (d e fu n g e t - n o - a t t r (v o t h e r - c e ) ( l i s t ( c e - n o - i n - r u l e o t h e r - c e ) ( g e t - f r o m - o i n s v ( c e - o i n s other-ce)))) (d e fu n g e t - f r o m - o in s (v o i n s ) (co n d ( ( e q u a l ( v a r - i n ( c a r oins)) v) ( c a a r o i n s ) ) ;Retum (CE-NO ATTR) ( t ( g e t - f r o m - o i n s v (c d r oins))))) ,„Works on a rule level. Collect VARs of CEs of a RULE and attach to a RULE. (d e fu n c o l l e c t - v a r s (p v a r s ) ;p=(cel ce2 ce3 ...) is in symbol form (co n d ( ( n u l l p) ( r e v e r s e v a r s ) ) ( t ( c o l l e c t - v a r s (c d r p) (c o n s ( c e - v a r s ( c a r p ) ) vars))))) ;;FOR CE1 ONLY!! Generate tins and attach to the first CE. ;;Since set-avp-to-cmp does not set values to ce-atr-to-cmp and ce-var-to-cmp ;;for FIRST ce, I simply copy the second CE into the first CE. (d e fu n s e t — a v p - t o - c m p - f o r - c e l ( c e s ) ( s e t f ( c e - v a r - t o - c m p ( f i r s t ces)) ( c e - v a r - t o - c m p (s e c o n d ces))) ( s e t f ( c e - a v p - to - c m p ( f i r s t ces)) ( f i n d - a v p - f o r - v a r ( f i r s t ces))) ( s e t f ( c e - a t r - t o - c m p ( f i r s t ces)) ( f i n d - a t r - o n l y (c e - a v p - to - c m p ( c a r ces)) ())) c e s ) ;;Retums CEs (except CE1) with ce-atr-to-cmp and ce-var-to-cmp set to values. ;;It goes (nil & CEl)->nil, (CE1 & CE2)->CE2, (CE1 CE2 & CE3)->CE3,... (d e fu n s e t - a v p - t o - c m p (11 12 13) (co n d ((null 12) ( r e v e r s e 13)) (t ( s e t - a v p - t o - c m p (ap p en d 11 ( l i s t ( c a r 12))) (c d r 12) (c o n s ( s - a - t - c 11 ( c a r 12)) 13))))) 143 (d e fu n s - a - t - c ( c e s c ) ( s e t q v a r s ( g e t - a l l - v a r s c e s ())) ;Get all vars in CE or CEs ( s e t q com vars ( r e m o v e - d u p lic a t e s ( g e t- c o m - v a r s v a r s ( c e - v a r s c ) ()))) ( s e t f ( c e - v a r - t o - c m p c ) com vars) ( s e t f ( c e - a v p - to - c m p c ) ( f i n d - a v p - f o r - v a r c ) ) ( s e t f ( c e - a t r - t o - c m p c ) ( f i n d - a t r - o n l y ( c e - a v p - to - c m p c ) ())) c) (d e fu n f i n d - a t r - o n l y (a v p s new) (c o n d ( ( n u l l a v p s) ( r e v e r s e n ew )) ( t ( f i n d - a t r - o n l y (c d r a v p s ) (c o n s ( c a a r a v p s ) new))))) (d e fu n g e t - a l l - v a r s (1 12) (co n d ((null 1) 12) (t ( g e t - a l l - v a r s (c d r 1) (ap p en d ( c e - v a r s ( c a r 1 ) ) 12))))) (d e fu n g e t- c o m - v a r s (11 12 new) ;Find common vars in both ces (c o n d ( ( n u l l 1 1 ) ( r e v e r s e n ew )) ((m em ber ( c a r 1 1 ) 12) ( g e t - c o m - v a r s (c d r 11) 12 (c o n s (c a r 11) new))) (t ( g e t - c o m - v a r s (c d r 1 1 ) 12 new)))) ; Return ATR of the VAR. (f-a-f-c ’((ATR1 2 <V1>) (ATR2 4 <V2>)) ’<V2>) ~> ATR2 j(d efu n f i n d - a v p - f o r - v a r ( c ) ( f - a - f - c ( c e - o i n s c ) ( c e - v a r - t o - c m p c ) ())) (d e fu n f - a - f - c ( o in s v a r s new) (co n d ( ( n u l l v a r s ) ( r e v e r s e n ew )) ( t ( f - a - f - c o i n s (c d r v a r s ) (c o n s ( f - a - f - c l o i n s ( c a r v a r s ) ) new))))) (d e fu n f - a - f - c l ( o in s v a r ) (co n d ((null v a r ) n i l ) ((null o i n s ) n i l ) ( ( e q u a l ( v a r - i n ( c a r oins)) v a r ) ( c a r oins)) ; ( c a a r o i n s ) ( t ( f - a - f - c l (c d r o i n s ) var)))) ;;Check see if <var> is contained. 1 is the val-part of a wme. (var-in ’((1 2) (3 4) (5 <var>))) — > <var> j ( d e fu n v a r - i n (1 ) ;1 can be (<VAR>) or ((<> <VAR>)) (co n d ( ( n u l l 1) nil) (( l is t p ( c a r 1 ) ) ( v a r - i n (ap p en d ( c a r 1) (c d r 1)))) ;1=((<> <P1>)) ( ( v a r i a b l e (c a r 1 ) ) ( c a r 1 ) ) ;forl=(<Pl>) ( t ( v a r - i n (c d r 1))))) ;;;;*********************************************************************************** ;;;;INITLALIZE WM: Compile working memory (WM) into structure •;"***********************#*********************************************************** (d e fu n c o m p ile -w o rk in g -m em o r y (vim) (co m p ile-w m wm ( ) 0 ) ) (d e fu n com p ile-w m (wm new t t ) ;tt=time tag of wmes in this inf. cycle (co n d ((null wm) ( r e v e r s e n ew )) ( t (co m p ile-w m (c d r wm) (c o n s (c o m p ile -a -w m e ( c a r wm) t t ) new) t t ) ))) (d e fu n c o m p ile -a -w m e (wme t t ) ( s e t q sym (m ake-w m e)) ( s e t f (w m e-typ e sym) ' + ) ( s e t f (wme-name sym ) ( c a r wm e)) ( s e t f (w m e-avps sym) ( l i s t - a v p a i r s (c d r wme) ())) ( s e t f (w m e-tag sym) t t ) ( s e t f (wm e-no sym) *NUMBER*) ( s e t q *NUMBER* ( + 1 *NUMBER*)) 144 sym) *********************************************************************************** ;;;;GROUP CEs based on the number of avps in each CE. ;;;;Retum rule-no and ce-no in groups: (((2 1) (25 2)) ((1 2) (5 3))...) i ; ; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * '(d efu n g r o u p - c e s - u s in g - l e n g t h (n pm) ;n=(no-of-rules - 1) | ( g r o u p -c e s n pm ( g e n e r a t e - n - e le m e n t s (+ ( f in d - m a x - o in s n pm 0) 1) ()))) (d e fu n g r o u p - c e s ( i pm g) (co n d ( ( < i 0) ( r e v e r s e - e a c h - g r o u p g ())) (t ( g r o u p - c e s ( - i 1) pm ( g r o u p - c e s - 1 ( r u le - c o n d ( a r e f pm i ) ) ( r u le - n o ( a r e f pm i ) ) g) )))) (d e fu n g r o u p - c e s - 1 ( c e s n g) (co n d ((null c e s ) g) ( t ( g r o u p - c e s - 1 (c d r c e s ) n ( g r o u p - c e s - 2 n (c a r c e s ) g))))) (d e fu n g r o u p - c e s - 2 (r n c e g) | ( a d d - t o - n t h - g r o u p ( l i s t rn ( c e - n o - i n - r u l e ce)) ( c e - n o - o i n s c e ) g ())) ;;l=(rule-no ce-no-in-rule), n=ce-no-oins=group-no ;;(add-to-nth-group ’(5 7) 3 ’((1) (2) 0 (4)) 0 ) - > ((1) (2) ((5 7)) (4)) (d e fu n a d d - t o - n t h - g r o u p (1 n g f r o n t ) (co n d ( ( = n 0) (ap p en d ( r e v e r s e f r o n t ) (c o n s (c o n s 1 ( c a r g ) ) (c d r g)))) (t ( a d d - t o - n t h - g r o u p 1 ( - n 1) (c d r g) (c o n s ( c a r g) front))))) (d e fu n r e v e r s e - e a c h - g r o u p (g new) (co n d ((null g) ( r e v e r s e n ew )) ( t ( r e v e r s e - e a c h - g r o u p (c d r g ) (c o n s ( r e v e r s e ( c a r g ) ) new))))) (d e fu n g e n e r a t e - n - e le m e n t s (n 1) ( co n d ( ( = n 0) 1) ( t ( g e n e r a t e - n - e le m e n t s ( - n 1) (c o n s n i l 1))))) ;;Find the max number of oins in CEs. Will be the max number of groups '(d efu n f in d - m a x - o in s ( i pm max) ;i=pm index ( co n d ( ( < i 0) m ax) ( t ( f in d - m a x - o in s ( - i 1) pm ( f in d - m a x - o in s - 1 ( r u le - c o n d ( a r e f pm i ) ) max))))) (d e fu n f in d - m a x - o in s - 1 ( c e s max) (co n d ((null c e s ) max) ( (> ( c e - n o - o i n s ( c a r ces)) max) ( f in d - m a x - o in s - 1 (c d r c e s ) ( c e - n o - o i n s ( c a r ces)))) (t ( f in d - m a x - o in s - 1 (c d r c e s ) max)))) ;;Group array[0] contains groupl, [l]=(gl g2), [2]=(gl g2 g3) ;;SubGroup5=((2 1) (25 2) ...), where cel of rule2 has length 5 (1 name + 4 avps) !(defun r e a r r a n g e -g r o u p s (g r o u p s) ( r e a r r a n g e - g r o u p s - 1 ( le n g t h g r o u p s) g r o u p s (m a k e -a r ra y ( le n g t h groups)))) (d e fu n r e a r r a n g e - g r o u p s -1 ( i g a r r a y ) (co n d ( (= i 0) a r r a y ) ( t ( r e a r r a n g e -g r o u p s - 1 ( - i 1) g ( fo r m -a -g r o u p i g array))))) ;;Get n groups from the whole group based on index '(d efu n fo r m -a -g r o u p ( i g r o u p s a r r a y ) ( s e t f ( a r e f a r r a y ( - i 1 ) ) ( f o r m - a -g r o u p - 1 i g ro u p s ())) a r r a y ) (d e fu n f o r m -a -g r o u p -1 (n g new) ;n=no of avps including wme-name 145 (cond ((= n 0) new) (t (form-a-group-1 (- n 1) (cdr g) (append (car g) new))))) ; • • • * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 4 c * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ;;;;UTILinY FUNCTIONS for gathering statistics, debugging, etc. • y * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * '(defun print-wm-cs-selected-rule-new-wm (pm cs sel-rule new-wm tt) (get-and-print-working-memory pm) (print-info-on-conflict-set cs) (print-info-on-a-selected-rule sel-rule tt) (format t " % " ) (print-info-on-wm new-wm) (format t "=============-%")) (defun get-and-print-working-memory (pm) (setq wm (get-all-wmes pm)) (setq wm (remove-same-wmes wm)) (setq wm (separate-wmes-by-name wm ())) (setq wm (flatten (sort-by-name (sort-each-group wm ())))) (print-info-on-wm wm) ) (d e fu n p r in t- m e r g e d - g r o u p - a r r a y (a r y ) ;print info on the rearranged groups ( p r in t - m e r g e d -g r o u p -a r r a y -1 ( a r r a y - t o t a l - s i z e a r y ) ary)) (defun print-merged-group-array-1 (i ary) (cond ((= i 0)) (t (format t "~s~%" (aref ary (- i 1))) (print-merged-group-array-1 {- i 1) ary)))) (defun print-statistics (groups pm tt) (format t "-% Statistics-----%") ( s e t q n o -g r o u p s ( c o u n t-g r o u p s 0 g r o u p s ) ) ;why not length? It may contain NIL. ( s e t q n o - c e s - p e r - g r o u p - l i s t ( g e t - n o - o f - c e s - p e r - g r o u p g r o u p s ())) (setq no-ces (sum no-ces-per-group-list 0)) (setq ce-distribution-in-percent (in-percent no-ces-per-group-list no-ces ())) (format t "CE-distribution ~s~%" no-ces-per-group-list) (format t "CE-distribution-in-percent ~s~%" ce-distribution-in-percent) (format t "No-of-rules ~s~%" (array-total-size pm)) (format t "No-of-CEs -s-%" no-ces) (format t "No-of-groups -s-%” no-groups) (setq no-of-rules/group (how-many-rules-per-group groups ())) (format t "No-of-rules/group ~s~%” no-of-rules/group) (format t "No-of-rules/group-in-percent -s-%" (in-percent no-of-rules/group (sum no-of-rules/group 0) ())) (format t "Average-no-of-CEs/group ~d~%" (round (/ no-ces no-groups))) (format t "WMEs-generated -s-%" (- *number* 1)) (format t "Total-oins -s-%" *no-oin-test*) (format t "Total-rule-firings ~s~%" (- tt 1)) (setq no-tests/tin/ce-list (count-no-tests-per-tin (array-total-size pm) pm ())) (format t "No-tests/tin/ce-list -s -%" no-tests/tin/ce-list) (setq no-tests/tin/ce-ordered-list (count-occurrence no-tests/tin/ce-list 146 (generate-n-zero (find-max no-tests/tin/ce-list 0) ()))) (format t "No-tests/tin/ce-ordered-list ~s ~%" no-tests/tin/ce-ordered-list) (setq total-tin-tests (sum no-tests/tin/ce-ordered-list 0)) (format t "No-tests/tin/ce-in-percent ~s~%” (in-percent no-tests/tin/ce-ordered-list total-tin-tests ())) (format t "Total-tin-tests ~s ~%” total-tin-tests) (setq no-ces/rule-list (find-no-ces-per-p (array-total-size pm) pm ())) (format t "No-ces/rule -s-%" no-ces/rule-list) (setq no-ces/rule-ordered-list (count-occurrence no-ces/rule-list (generate-n-zero (find-max no-ces/rule-list 0) ()))) (format t "No-ces/rule-ordered-list -s-%" no-ces/rule-ordered-list) (format t "No-ces/rule-in-percent -s-%" (in-percent no-ces/rule-ordered-list no-ces ())) (setq no-aes/rule-list (find-no-aes-per-p (array-total-size pm) pm ())) (setq no-aes (sum no-aes/rule-list 0)) (format t "no-aes -s-%" no-aes) (format t "no-aes/rule -s-%" no-aes/rule-list) (setq no-aes/rule-ordered-list (count-occurrence no-aes/rule-list (generate-n-zero (find-max no-aes/rule-list 0) ()))) (format t "no-aes/rule-ordered-list ~s~%" no-aes/rule-ordered-list) (format t "no-aes/rule-in-percent -s-%" (in-percent no-aes/rule-ordered-list no-aes ())) (setq no-nces/rule-list (find-no-nces-per-p (array-total-size pm) pm ())) (setq no-nces (sum no-nces/rule-list 0)) (format t "no-nces -s-%" no-nces) (format t "no-nces/rule -s-%" no-nces/rule-list) (setq no-nces/rule-ordered-list (count-occurrence no-nces/rule-list (generate-n-zero (find-max no-nces/rule-list 0) ()))) (format t "no-nces/rule-ordered-list -s-%" no-nces/rule-ordered-list) (format t "no-nces/rule-in-percent -s-%" (in-percent no-nces/rule-ordered-list no-nces ()))) (defun generate-n-zero (n new) (cond ((= n 0) new) (t (generate-n-zero (- n 1) (cons 0 new))))) (defun count-occurrence (1 ctr) (cond ((null 1) ctr) ( ( = (car 1) 0) (count-occurrence (cdr 1) ctr)) (t (count-occurrence (cdr 1) (update-counter (car 1) ctr ()))))) (defun update-counter (n 1 front) (cond ((= n 1) (append (reverse front) (cons (+ (car 1) 1) (cdr 1)))) (t (update-counter (- n 1) (cdr 1) (cons (car 1) front))))) (defun find-max (1 max) ;(find-max ’ (1 264384) 0) (cond ((null 1) max) ((> (car 1) max) (find-max (cdr 1) (car 1))) (t (find-max (cdr 1) max)))) (defun find-no-ces-per-p (i pm new) 147 (cond ({= i 0) new) (t (find-no-ces-per-p (- i 1) pm (cons (length (rule-cond (aref pm (- i 1)))) new))))) defun find-no-nces-per-p (i pm new) (cond ((= i 0) new) (t (find-no-nces-per-p (- i 1) pm (cons (count-noes (rule-cond (aref pm (- i 1))) 0)new))))) defun count-nces (ces n) (cond ((null ces) n) ((equal (ce-type (car ces)) '-) (count-nces (cdr ces) (+ n 1))) (t (count-nces (cdr ces) n)))) [defun find-no-aes-per-p (i pm new) (cond ((= i 0) new) (t (find-no-aes-per-p (- i 1) pm (cons (length (rule-act (aref pm (- i 1)))) new))))) (defun in-percent (1 n new) (cond ((null 1) (reverse new)) (t (in-percent (cdr 1) n (cons (round (* (/ (car 1) n) 100)) new))))) (defun sum (1 n) (cond ((null 1) n) (t (sum (cdr 1) (+ n (car 1)))))) (defun count-no-tests-per-tin (i pm 1) (cond ((= i 0) 1) (t (count-no-tests-per-tin (- i 1) pm (append (tests-per-tin-1 (rule-cond (aref pm {- i 1))) ()) 1) ) ) ) ) (defun tests-per-tin-1 (ces 1) (cond ((null ces) 1) ((> (length (ce-var-to-cmp (car ces))) 0) ;if no tin tests>0, then count. (tests-per-tin-1 (cdr ces) (cons (length (ce-var-to-cmp (car ces))) 1))) (t (tests-per-tin-1 (cdr ces) 1)))) (defun get-no-of-ces-per-group (g 1) (cond ((null g) (reverse 1)) (t (get-no-of-ces-per—group (cdr g) (cons (length (car g)) 1))))) ;;Count those not null subgroups (count-groups 0 ’ (nil nil (1) (2))) --> 2 !(defun count-groups (n g) (cond ((null g) n) ((null (car g)) (count-groups n (cdr g))) (t (count-groups (+ n 1) (cdr g))))) (defun print-groups (g n total) (cond ((null g) total) (t (format t "Group ~d (~d): ~s~%" n (length (car g)) (car g)) (print-groups (cdr g) (+ n 1) (+ (length (car g)) total))))) (defun how-many-rules-per-group (g new) (cond ((null g) (reverse new)) (t (how-many-rules-per-group (cdr g) (cons (how-many-rules (car g) ()) new))))) (defun how-many-rules (g new) (cond ((null g) (length (remove-duplicates new))) 148 (t (how-many-rules (cdr g) (cons (caar g) new))))) ( d e fu n s o r t-b y -n a m e (1 ) ;Sort groups by wme-name, ((ldia) (line) (rdia) (test) ...) (cond ((null 1) nil) ( t ( s o r t - b y - n a m e - s p l i c e - i n ( c a r 1 ) ( s o r t-b y -n a m e (c d r 1)))))) (d e fu n s o r t - b y - n a m e - s p l i c e - i n ( e 1) ;l=a list of groups, e=a group of wmes (co n d ((null 1) ( l i s t e ) ) ((string-lessp (wme-name (car e)) (wme-name (caar 1))) (cons e 1)) (t (cons (car 1) (sort-by-name-splice-in e (cdr 1)))))) (d e fu n s o r t - e a c h - g r o u p (wm new) ;Sort each group of wmes by wme-no. (cond ((null wm) (reverse new)) (t (sort-each-group (cdr wm) (cons (sort-by-wme-no (car wm) '<) new))))) (d e fu n s o r t-b y -w m e -n o (1 o p ) ;(sort-by-wme-no ’(1 3 2 5 9 4 8 6) ’<)--> (1 2 3 4 5 6 8 9) (co n d ((null 1) nil) (t (splice-in (car 1) (sort-by-wme-no (cdr 1) op) op)))) (defun splice-in (e 1 op) (cond ((null 1) (list e)) ((funcall op (wme-no e) (wme-no (car 1))) (cons e 1)) (t (cons (car 1) (splice-in e (cdr 1) op))))) ;;Separate wmes by wme-name. Return ((w w ...) (w w ...) (w w ...)...) '(defun separate-wmes-by-name (wm new) (cond ((null wm) new) (t (separate-wmes-by-name (cdr wm) (separate-wmes (car wm) new ()))))) (defun separate-wmes (wme new front) (cond ((null new) (cons (list wme) front)) ((equal (wme-name (caar new)) (wme-name wme)) (append (reverse front) (cons (cons wme (car new)) (cdr new)))) (t (separate-wmes wme (cdr new) (cons (car new) front))))) ( d e fu n upd ate-w m ( u p d a te d new) ;Add new +wmes to and remove new -wmes from wm. (co n d ((null new) u p d a te d ) ((equal (wme-type (car new)) '-) (update-wm (remove-wme (car new) updated ()) (cdr new))) (t (update-wm (cons (car new) updated) (cdr new))))) (defun remove-wme (w wmes new) (cond ((null wmes) (reverse new)) ((two-wmes-same w (car wmes)) (remove-wme w (cdr wmes) new)) (t (remove-wme w (cdr wmes) (cons (car wmes) new))))) ;;Retrieve all wmes from the network. Note that those wmes not matched to the network are NOT included, (d e fu n g e t - a ll- w m e s (pm) ( g e t - a ll - w m e s - 1 ( a r r a y - t o t a l - s i z e pm) pm ())) (d e fu n g e t - a l l- w m e s - 1 ( i pm new) (cond ((= i 0) new) (t (get-all-wmes-1 (- i 1) pm (append (get-all-wmes-2 (rule-cond (aref pm (- i 1))) ()) new))))) (defun get-all-wmes-2 (ces new) (cond ((null ces) (reverse new)) (t (get-all-wmes-2 (cdr ces) (append (ce-omem (car ces)) new))))) (defun remove-same-wmes (wm) (pick-one-wme (remove-duplicates (get-wme-nos wm ())) wm ())) 149 (defun pick-one-wme (set wm new) (cond ((null set) (reverse new)) (t (pick-one-wme (cdr set) wm (cons (pick-wme (car set) wm) new))))) (defun pick-wme (no wm) (cond ((= no (wme-no (car wm) )) (car wm)) (t (pick-wme no (cdr wm))))) {defun get-wme-nos (wm new) (cond ((null wm) (reverse new)) (t (get-wme-nos (cdr wm) (cons (wme-no (car wm)) new))))) (defun remove-no (no set front) (cond ((null set) (reverse front)) ((= no (car set)) (append (reverse front) (cdr set))) (t (remove-no no (cdr set) (cons (car set) front))))) ;;Print information on the conflict set, rule name, no of occurrences, ... '(defun print-info-on-conflict-set (cs) (cond ((null cs)) (t (print-info-on-a-rule (car cs)) (print-info-on-conflict—set (cdr cs))))) (defun print-info-on-a-rule (r) (let ((name (rule-name r)) (no (length (ce-tmem (car (last (rule-cond r))))))) (cond ((> no 1) (format t "-s (~s occurrences)-%" name no)) (t (format t "~s~%" name))))) (defun print-info-on-wm (wm) (cond ((null wm)) (t (print-info-on-a-wme (car wm)) (print-info-on-wm (cdr wm))))) (defun print-info-on-a-wme (w) (setq tmp (list (wme-no w) (wme-tag w) (wme-type w) (wme-name w) )) (setq tmp2 (append tmp (wme-avps w))) (format t "~s~%" tmp2)) (defun print-pm (pm) (print-pm-1 (array-total-size pm) pm)) (defun print-pm-1 (i pm) (cond ((= i 0)) (t (pprint (aref pm (- i 1))) (print-pm-1 (- i 1) pm)))) ;PM is a list of RULEs, not a list of CEs. FLATTEN will return a list of CEs. "(flatten ’(1 (2 (3 4) 5) ((6)))) ->(12345 6) (defun flatten (1) (flatten-1 1 ())) (defun flatten-1 (1 new) (cond ((null 1) (reverse new)) ((listp (car 1)) (flatten-1 (append (car 1) (cdr 1)) new)) (t (flatten-1 (cdr 1) (cons (car 1) new))))) ’I6***5!5****** * * * * * * * * * * * * * * * * ****** ;;;;End of file: appendix.lsp •";*********************************************************************************** 150 Bibliography [1] Acharya, A., and Tambe, M., “Production Systems on Message Passing Computers: Simulation Results and Analysis,” in Proceedings o f the International Conference on Parallel Processing, St. Charles, IL, August 1989. [2] Arvind, and Gostelow, K.P., “The U-Interpreter,” Computer, February 1982, pp.42- 49. [3] Arvind, and Iannucci, R.A., “Two Fundamental Issues in Multiprocessing: the Data flow Solution,” Technical Report TM-241, Laboratory for Computer Science, MIT, September 1983. [4] Arvind, Kathail, V., and Pingali, K., “A Data-flow Architecture with Tagged To kens,” Technical Report TM-174, Laboratory for Computer Science, MIT, Septem ber 1980. [5] Arvind, and Nikhil, R.S, “Executing a Program on the MIT Tagged-Token Data flow Architecture,” IEEE Transactions on Computers, March 1990, pp.300-318. [6] Arvind, and Thomas, R.E., “I-structures: An Efficient Data Type for Functional Lan guages,” Technical Report TM-178, Laboratory for Computer Science, MIT, 1980. [7] Backus, J., “Can programming be liberated from the von Neumann style? A func tional style and its algebra of programs” Communications o f the ACM 21, August 1978, pp.613-641. [8] Bic, L., “Processing of Semantic Nets on Data-flow Architecture” Artificial Intelli gence 27, 1985, pp.219-227. [9] Brownston, L., Farrell, R., Kant, E., and Martin, N., Programming Expert Systems in OPS5, Addison-Wesley Publishing Company, Reading, MA, 1985. 10] Buchanan, B.G. and Shortliffe, E.H., Rule-Based Expert Systems, Addison-Wesley Publishing Company, Reading, MA, 1984. 11] Dennis, J.B, First Version of a Data-Flow Procedure Language, MAC Technical Memorandum 61, MIT, 1973. 151 12] Dennis, J.B., Lim, W.Y.-P., and Ackerman, W.B, “The MIT data-flow engineering model,” in Proceedings o f the IFIP 9th World Computer Conference, Paris, Septem ber 1983, Information Processing 83, North-Holland, pp.553-560. 13] Duda, R.O., Gaschnig, J., Hart, P.E., Konolige, K., Reboh, R., Barrett, P., and Sloc um, J., Development of the PROSPECTOR Consultation System for Mineral Explora tion, Technical Report, SRI Projects 5821 and 6415, SRI International, Inc., 1978. 14] Fahlman, S., and Hinton, G., “Connectionist Architecture for Artificial Intelligence” Computer, January 1987, pp.100-109. 15] Fikes, R., and Nilsson, N., “STRIPS: a New Approach to the Application of Theorem Proving to Problem Solving,” Artificial Intelligence 2, 1971, pp.189-208. 16] Forgy, C.L., “Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match P ro b le m Artificial Intelligence 19, September 1982, pp.17-37. 17] Gallant, S., “Connectionist expert systems,” Communications o f the AC M 31, Feb ruary 1988, pp.152-169. 18] Gaudiot, J-L., and Ercegovac, M.D., “Performance Evaluation of A Simulated Data flow Computer with Low-resolution Actors,” Journal o f Parallel and Distributed Computing, Academic Press, 1985, pp.321-351. 19] Gaudiot, J-L., “Data-Driven Multicomputers in Digital Signal Processing Applica tions” in Proceedings o f the IEEE, September 1987. 20] Gaudiot, J-L. and Bic., L. (Eds.), Advanced Topics in Data-flow Computing, Pren tice Hall, Englewood Cliffs, NJ, 1990. 21] Gaudiot, J-L., and Sohn, A., “Data-Driven Multiprocessor Implementation of the Rete Match Algorithm,” in Proceedings o f the International Conference on Parallel Processing, St. Charles, IL, August 1988, pp.256-260. 22] Gaudiot, J-L., and Sohn, A., “Data-Driven Parallel Production Systems,” IEEE Transactions on Software Engineering, March 1990, pp.281-293. 23] Gupta, A., “Implementing OPS5 Production Systems on Dado” in Proceedings o f the International Conference on Parallel Processing, St. Charles, IL, August 1984, pp. 83-91. 24] Gupta, A., Parallelisms in Production Systems, Morgan Kaufmann Publishers, Inc., Los Altos, California, 1987. 25] Gupta, A., Forgy, C.L., Kalp, D., Newell, A., and Tambe, M., “Parallel OPS5 on the 152 Encore Multimax,” in Proceedings o f the International Conference on Parallel Pro cessing, St. Charles, IL, August 1988, pp.271-280. 26] Gupta, A., Forgy, C.L., Newell, A., and Wedig, R., “Parallel Algorithms and Archi tectures for Rule-Based Systems,” in Proceedings o f the International Symposium on Computer Architecture, June 1986, pp.116-120. 27] Gupta, A., and Tambe, M., “Suitability of Message Passing Computer for Imple menting Production Systems,” in Proceedings o f the National Conference on Artifi cial Intelligence, Saint Paul, MN, August 1988, pp.687-692. 28] Gurd, J.R., Kirkham, C.C., and Watson I., “The Manchester prototype data-flow computer,” Communications o f the ACM 28, January 1985, pp.34-52. 29] Hinton, G.E., McClelland, J.L., and Rumelhart, D.E., “Distributed Representations,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol 1: Foundations, J.L. McClelland and D.E. Rumelhart (Eds.), MIT Press, 1986. 30] Hinton, G.E., and Anderson, J.A. (Eds.), Parallel Models of Associative Memory (Updated Edition), Lawrence Erlbaum Associates, Hillsdale, NJ, 1989. [31] Hiraki, K., Sekiguchi, S., Shimada, T., “Status Report of SIGMA-1: a Data-flow Su percomputer,” in Advanced Topics in Data-flow Computing, J-L. Gaudiot and L. Bic (Eds.), Prentice Hall, Englewood Cliffs, NJ, 1990. 32] Hiraki, K., Shimada, T., and Nishida, K., “A Hardware Design of the SIGMA-1: A Data-flow Computer for Scientific Computations,” in Proceedings o f the Interna tional Conference on Parallel Processing, St. Charles, IL, August 1984, pp.851-855. 33] Hopfield, J.J., and Tank, D.W., “Neural Computation of Decisions in Optimizing Problems,” Biological Cybernetics 52, 1985, pp.141-151. 34] Ishida, T., “Optimizing Rules in Production System Programs,” in Proceedings o f the National Conference on Artificial Intelligence, Saint Paul, MN, August 1988, pp.699-704. [35] Ishida, T., “Methods and Effectiveness of Parallel Rule Firing,” in Proceedings o f the IEEE Conference on Artificial Intelligence Applications, Santa Barbara, CA, March 1990. [36] Ishida, T., and Stolfo, S.J., “Towards the Parallel Execution of Rules in Production System Programs,” in Proceedings o f the International Conference on Parallel Pro cessing, St. Charles, IL, August 1985, pp.568-575. [37] Kelly, M.A., and Seviora, R.E., “A Multiprocessor Architecture for Production Sys 153 tem Matching,” in Proceedings o f the National Conference on Artificial Intelligence, August 1987, pp.36-41. 38] Kelly, M .A , and Seviora, R.E., “An Evaluation of DRete on CUPID for OPS5,” in Proceedings o f the International Joint Conference on Artificial Intelligence, Detroit, MI, August 1989, pp.84-90. 39] Kuo, S., Moldovan, D., and Cha, S., “Control in Production System with Multiple Rule Firings,” in Proceedings o f the International Conference on Parallel Process ing, St. Charles, IL, August 1990, pp.243-246. 40] Kuo, S., and Moldovan, D., “Performance Comparison of Models for Multiple Rule Firing,” in Proceedings o f the International Joint Conference on Artificial Intelli gence, Sydney, Australia, August 1991. 41] Lin, C.M., “Numerical Partial Differential Equation Solvers on Variable-grain Data flow Multiprocessor Systems,” Ph.D Thesis, CENG TR:91-13, Department of Elec trical Engineering - Systems, University of Southern California, 1991. 42] McDermott, J., “R l: A Rule-based Configurer of Computer Systems,” Artificial In telligence 19, 1982, pp.39-88. 43] McDermott, J., and Forgy, C., “Production System Conflict Resolution Strategies,” in Pattern Directed Inference Systems, D. Waterman and F. Hayes-Roth (Eds.), Ac ademic Press, New York, NY, 1978. 44] McGraw, J.R., and Skedzielewski, S.K., SISAL: Streams and Iterations ina Single Assignment Language, Language Reference Manual, Version 1.2, Technical Report M-146, Lawrence Livermore National Laboratory, March 1985. 45] Miranker, D.P., Treat: A New and Efficient Match Algorithm for Al Production Sys tems, Morgan Kaufmann Publishers, Inc., San Mateo, California, 1990. 46] Miranker, D.P., Kuo, C., Browne, J.C., “Parallelizing Compilation of Rule-based Programs,” in Proceedings o f the International Conference on Parallel Processing, St. Charles, IL, August 1990, pp.II 247-251. [47] Moldovan, D.I., “Rubic: A Multiprocessor Rule-Based Production Systems,” IEEE Transactions on Systems, Man, and Cybernetics 19, July/August 1989, pp.699-706. [48] Oshisanwo, A.O., and Dasiewicz, P.P., “A Parallel Model and Architecture for Pro duction Systems,” in Proceedings o f the International Conference on Parallel Pro cessing, St. Charles, IL, August 1987, pp.166-169. [49] Sakai, S., Yamaguchi, Y., Hiraki, K., Kodama, Y., and Yuba, T., “An Architecture 154 of a Data-flow Single Chip Processor,” in Proceedings o f the International Sympo sium on Computer Architecture ” Jerusalem, Israel, May 1989, pp.46-53. 50] Schmolze, J. “A Parallel Asynchronous Distributed Production System,” in Pro ceedings o f the National Conference on Artificial Intelligence, Boston, MA, July 1990, pp.65-71. 51] Schor, M., Daly, D., Lee, H.S., and Tibbitts, R., “Advances in Rete Pattern Match ing,” in Proceedings o f the National Conference on Artificial Intelligence, Philadel phia, PA, August 1986, pp.226-232. 52] Schreiner, F., and Zimmermann, G., “PESA-1: A Parallel Architecture for Production Systems,” in Proceedings o f the International Conference on Parallel Processing, St. Charles, IL, August 1987, pp.166-169. 53] Shaw, D.E., “On the Range of Applicability of an Artificial Intelligence Machine,” Artificial Intelligence 32, 1987, pp.151-172. 54] Sohn, A., and Gaudiot, J-L., “Multilayer of Ring-Structured Feedback Network for Production System Processing,” in Proceedings o f the IEEE International Confer ence on Tools for Artificial Intelligence, Washington, DC, October 1989, pp.457- 464. 55] Sohn, A., and Gaudiot, J-L., “Connectionist Production Systems in Local Represen tation,” in Proceedings o f the IEEE International Joint Conference on Neural Net works, Washington, DC, January 1990, vol.II pp.199-202. 56] Sohn, A., and Gaudiot, J-L., “Representation and Processing Production Systems in Connectionist Architectures,” International Journal o f Pattern Recognition and A r tificial Intelligence 4, June 1990, pp. 199-214. 57] Sohn, A. “Technical Memos on the C-Pyramid Image Processing Architecture, the Viewer Graphics Package, the Face Recognition Systems, and the Transputer Mul tiprocessor,” Lab Memos, Center for Neural Engineering, University of Southern California, 1990. 58] Sohn, A., and Gaudiot, J-L., “A Connectionist Approach to Learning Legal Moves in Tower-of-Hanoi” in Proceedings o f IEEE International Conference on Tools for Artificial Intelligence, Fairfax, Virginia, November 1990. 59] Sohn, A., and Gaudiot, J-L., “A Macro Actor/Token Implementation of Production Systems on a Data-flow Multiprocessor” in Proceedings o f the 12th International Joint Conference on Artificial Intelligence, Sydney, Australia, August 1991. 60] Sohn, A., and Gaudiot, J-L., “Connectionist Production Systems in Local and Hier 155 archical Representation,” in Applications of Learning and Planning Methods, N. Bourbakis (Ed.), World Scientific Publishing Co., Teaneck, NJ, 1990, pp. 165-180. 61] Sohn, A., and Gaudiot, J-L., “A Survey on the State-of-the-Art Research in Parallel Processing of Production Systems” International Journal o f Artificial Intelligence Tools 1, January 1992. 62] Srini, V.P., “An Architectural Comparison of Data-flow Systems,” Computer, March 1986, pp.68-87. 63] Stolfo, S.J., “Initial Performance of the DAD02 Prototype,” Computer, January 1987, pp.75-83. 64] Stolfo, S.J., and Miranker, D.P., “The DADO Production System Machine,” Journal o f Parallel and Distributed Computing 3, Academic Press Inc., 1986, pp.269-296. 65] Tenorio, M.F.M., and Moldovan, D.I., “Mapping Production Systems into Multipro cessors,” in Proceedings o f the International Conference on Parallel Processing, St. Charles, IL, August 1985, pp.56-62. 66] Touretzky, D.S. and Hinton, G.E., “Symbols Among the Neurons: Details of a Con nectionist Inference Architecture,” in Proceedings o f the International Joint Confer ence on Artificial Intelligence, August 1985, pp.238-243. 67] Touretzky, D.S. and Hinton, G.E., “A Distributed Connectionist Production Sys tem,” Cognitive Science 12, 1988, pp.423-466. 68] Veen, A.H, “Data-flow Machine Architecture,” AC M Computing Surveys 18, De cember 1986, pp.365-396. 69] Wah, B.W., Lowrie, M.B., and Li, G.-J., “Computers for Symbolic Processing,” Proceedings o f the IEEE 77, April 1989, pp.509-540. 70] Yoo, N., and Gaudiot, J-L., “A Macro Data-flow Simulator,” Technical Report, CENG-89-27, Dept, of E.E.-Systems, University of Southern California, October 1989. 156
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11255778
Unique identifier
UC11255778
Legacy Identifier
DP22834