Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
QUERY BY COMMITTEE
by
Thomas Henry Hinke
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Computer Science)
August 1991
Copyright 1991 Thomas Henry Hinke
U M I Number: DP22818
All rights reserved
INFORMATION TO ALL USERS
The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
UMI
Dissertation Publishing
UMI DP22818
Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author.
Microform Edition © ProQuest LLC.
All rights reserved. This work is protected against
unauthorized copying under Title 17, United States Code
ProjQuest
ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 4 8 1 0 6 -1 3 4 6
UNIVERSITY OF SOUTHERN CALIFORNIA
THE GRADUATE SCHOOL
UNIVERSITY PARK
LOS ANGELES, CALIFORNIA 90007
This dissertation, written by
............................ Jh.ojn.as.. H e n r y ., H i n. ke,...................... .
under the direction of h.Xs D issertation
Committee, and approved by all its members,
has been presented to and accepted by The
Graduate School, in partial fulfillm ent of re
quirem ents for the degree of
D O C TO R OF PHILOSOPH Y
Dean of Graduate Studies
D a te July.. .3 ,...199.1
DISSERTATION COMMITTEE
Chairperson
Dedication
This dissertation is dedicated to my wife K athryn and my daughter Laura
Anne who have offered support and encouragement during the course of this
work and along the way have had to endure a considerable amount of sacrifice
and aggravation.
Acknowledgements
I would like to thank my advisor, Les Gasser, for introducing me to the
fascinating world of distributed AI and for his guidance and patience throughout
the years th at I have pursued this research. I would also like to acknowledge
the assistance of the other members of my dissertation committee, Drs. Dennis
McLeod and Daniel O’Leary. In addition I would like to thank Dr. Rich Hull for
his time and suggestions during the early phases of this research. I would also
like to acknowledge the assistance provided by Dr. Deborah Estrin during the
early stage of this research.
I would like to acknowledge the support provided by Clark Weissman, Harvey
Gold and Ben Walker of System Development Corporation through very flexible
work hours and significant amounts of tuition assistance as I pursued the Ph.D.
degree. In addition, I would like to acknowledge the contribution made by TRW
for tuition support and very flexible working hours throughout the latter stages
of the Ph.D process, with special recognition to Jim Edmundson, Lois Warshaw,
Cornell Smith and Cristi Garvey.
I would also like to acknowledge the support provided by Dr. Carl Davis
of the University of Alabama in Huntsville, who provided the low teaching load
necessary for me to be able to finish this dissertation.
Finally, I would like to thank my wife K athryn and daughter Laura Anne
for their constant support throughout the long trek to the Ph.D . W ithout their
support and encouragement, this would not have been possible. I would also
like to acknowledge the patience and support of my parents, Shirley and Henry
Hinke, as they endured family events with one member absent pursuing research.
In addition, I would also like to acknowledge the support of my wife’s parents,
iii
Curt and Barbara Guernsey, who provided support when needed to get us over
some of the financially tight periods of this effort.
iv
Contents
D edication ii
A cknow ledgem ents iii
List O f Tables viii
List O f F igures ix
A b stract x
1 S tatem en t o f the P rob lem 1
1.1 Informal Example of QBC Objective ................................ 3
1.2 B ackground.................................................................................... 5
1.3 Purpose of the R e se a rc h .............. 8
1.4 Questions and/or H y p o th e sis . . . . 9
1.5 Assumptions, Limitations, D elim itations.......................................... 10
2 L iterature R eview 12
2.1 D ata Access Research ........................................................................ 12
2.2 EIMG S y s te m s ................................................................................. 20
2.2.1 Discovery S y s te m s ...................................................................... 21
2.2.2 Design S y stem s............................................................................ 24
3 A pproach 30
3.1 Basic C o n c e p ts ........................................................................................ 30
3.2 Elaboration M o d e l.................................................................................. 32
3.3 Direct Query E la b o ra tio n ..................................................................... 34
3.4 Indirect Query E la b o r a tio n ................................................................. 38
3.5 P resentation................................................................ 42
3.6 Overview of D issertatio n ........................................................................ 43
4 D irect Q uery E laboration 44
4.1 Entity D isc o v e ry ..................................................................................... 45
v
4.1.1 Entity Specification ................................................................. 46
4.2 Hypothesis G en eratio n ............................................................................... 49
4.2.1 Entity M aterializatio n .............................................................. 50
4.2.2 Specification R eform ulation.................................................... 52
4.3 A ttribute Discovery .................................................................................. 53
5 In d ire c t Q u ery E la b o ra tio n 54
5.1 Type One Indirect (P -R F /D U ).......................................................... 54
5.1.1 Activity D e sc rip tio n ................................................................. 55
5.1.2 Dominance-based Knowledge F a c to r in g ............................. 58
5.2 Type Two Indirect (P -R P /D U ).......................................................... 65
5.2.1 Partially Specified Entity Centered AC ............................... 65
5.2.2 Partially Specified Activity Centered AC ............................ 66
5.2.2.1 Partially Specified Compound C l u s t e r s .............. 67
5.2.2.2 Partially Specified A P ............................................... 67
5.2.3 Problem Reformulation ........................................................ 67
5.3 Type Three Indirect (P -R U /D U ,S -R F /D U ).................................... 68
5.4 Type Four Indirect (P-RF/D U , S-RF/DU) ..................................... 71
6 P re s e n ta tio n 72
6.1 F ilte rin g ...................... 73
6.2 Lattice-based Clustering ................................ 81
6.2.1 A ttribute C lu ste rin g .................................................................. 85
6.2.2 Lattice Cluster O p tio n s ................................... 88
6.2.3 Upper Bound for L B C .............................................................. 98
6.2.4 Distributed Cluster F o r m a tio n .............................................. 107
7 Im p le m e n ta tio n a n d E x p e rim e n ta l R e su lts 111
7.1 QBC Implementation ........................................................................... I l l
7.1.1 QBC Primitives ......................................................................... I l l
7.1.2 Distributed Elaboration C o n t r o l ........................................... 114
7.1.3 Polytyped T e r m ......................................................................... 116
7.1.3.1 S y n ................................................................................ 117
7.1.3.2 Hypothesis List ......................................................... 118
7.1.3.3 Requisition L i s t ......................................................... 119
7.2 QBC Experimental R e s u l ts .................................................................. 120
7.2.1 D ata World Elaboration of Entity ........................................ 120
7.2.2 Real-World Elaboration of AP .............................................. 124
7.2.3 Specification World to D ata W o rld ........................................ 125
8 C o n clu sio n s 126
8,1 C o n trib u tio n s............................................................................................ 126
vi
8.1.1 Types of Query E lab o ratio n ...................................................... 128
8.1.2 Framework for Query E la b o ra tio n .......................................... 131
8.1.3 D ata P re s e n ta tio n ...................................................................... 132
8.1.4 Distributed A I ........................................ 133
8.2 L im ita tio n s .................................................................................................. 134
8.3 Future Research ...................... . 137
B ibliography 139
A p p en d ix A
QBC D a ta b a s e ................................................................. 147
A p p en d ix B
T e r m s .................................................................... 153
A p p en d ix C
QBC Code .......................... . 156
A p p en d ix D
Prototype Results .................................................................. 247
List O f T ables
6.1 Galaxy D a tab a se:......................... 84
6.2 Galaxy D a tab a se:...................................................................................... 87
6.3 Totally Ordered Galaxy D a ta b a s e :.................................................... 91
6.4 Galaxy D a tab a se:......................... 94
6.5 Galaxy Display C lu s te rs ........................................................................ 96
6.6 Display Upper B o u n d :............................................................................ 97
7.1 Galaxy Identification D a ta ..................................................................... 122
A .l Nebula D atabase 1 ............................ 148
A.2 Nebula Database 2 .................................................................................. 148
A.3 Galaxy Database 1 : .................................................................................. 149
A.4 Galaxy Database 2 ...................... 150
A.5 Double Star Database 1 ..................................................................... . 151
A.6 Double Star Database 2 ....................................... 152
viii
L ist O f Figures
3.1 Direct Query Elaboration P-R F /D U . . . . ..................................... 35
3.2 Indirect Query Elaboration - P - R F / D U ................. 40
3.3 Indirect Query Elaboration P-RU /D U , S-RF/DU .......................... 41
6.1 Four cluster, two attribute la ttic e ......................................................... 98
6.2 Two cluster, two attribute l a t t ic e ......................................................... 101
ix
Abstract
This dissertation presents the results of research into the problem of query
elaboration by m ultiple agents. While computers have been used extensively to
solve problems, their application to the formulation of problems has not been
recognized as an area of research. Problem formulation has historically fallen on
the human, not the computer. This research investigates problem formulation
autom ation within the context of multiple agents, each representing different
databases of domain knowledge, collaborating to formulate a query for scientific
data.
A significant problem in query elaboration is how to provide a means by
which the domain-naive user can describe his d ata needs, when it is assumed
that he is not knowledgeable about the nature of the data in the database. A key
result of this research is the development of a construct similar to the unknown
m athem atical variable “X” , but which is applicable to both the schema of the
database and the real-world context in which the data is used. This construct is
called the polytyped term , and provides a means for agents to reason about the
unknown, and through various elaboration processes, transform the unknown into
results th at are consistent with the initial description of the unknown provided
by the user and the available data.
This research has developed a taxonomy of query elaboration types, and
presents these in term s of a model th at unifies data with its context of use in
support of real-world activities. The model is expressed in terms of worlds,
where each world has a real dimension and a d ata dimension, w ith the d ata
dimension representing that data which conceptually should be requisitioned
from a database to support the activities expressed in the real dimension. In
addition, worlds can be related to other worlds through relationships. The nature
of the elaboration is dependent upon the nature of the unknowns th at exist in
each world at the onset of query elaboration.
To present the results of query elaboration to the user, this research has
developed a presentation approach called lattice-based clustering. Under lattice-
based clustering, the user is presented with a series of options and rationals,
stated in term s of data clusters th at represent the “best” data in the database
th at satisfies the elaborated query and is consistent with the constraints. This
research presents an upper bound on the cardinality of d ata clusters that could
be presented. This upper bound is stated in terms of binary coefficients and is
shown to be of small cardinality for queries involving a reasonable number of
attributes.
A system called QBC has been implemented as a proof of concept to validate
th a t the m ajor findings of the research are realizable. This system and the results
from applying this system to the elaboration of queries targeting an astronomical
database is presented.
Chapter 1
Statement of the Problem
While computers have been used extensively to solve problems, they have not
been applied, to a great extent, to the formulation of problems. This task has
historically fallen on the hum an, not the computer. While the computer has
helped m ankind solve problems through its superior com putational speed and
ability to search rapidly through large databases to find neeeded data, the for
mulation of the problem has been left to the user. Unfortunately, if the user is
unfamiliar w ith the problem domain, he may not even know the right questions
to ask. Even if he is familiar with the field, the amount of data th at could be
applied to the problem formulation process may greatly exceed the user’s ability
to process it. Thus the user may be forced either to formulate the problem from
a position of relative ignorance, or to consult one or more experts to facilitate
the initial problem formulation process. It is the intent of this research to focus
on a solution to the problem of problem formulation.
The specific domain of investigation selected for this research is th at of prob
lem formulation for the selection of data from a database. Because data comes
from many sources and is packaged within distinct and possibly disjoint domains
of expertise, this research takes a distributed perspective. The research will con
sider a system in which various knowledgeable computer-based agents collaborate
in the elaboration of a database query.
1
The research vehicle to investigate query formulation is an experimental sys
tem called “Query by Committee” (QBC), which conceptually consists of a set
of agents representing database and problem-solving knowledge gathered around
a blackboard structure (Erman 1980). The person with the problem to be for
mulated is called the problem owner or QBC user. To initiate the elaboration
process, the user places a vague representation of the desired data on the black
board.
To provide a focus for the questions and the QBC research, a single applica
tion domain was selected. The requirements for this domain were th at it should
offer a domain that could encompass a reasonable number of query types; have
a number of different databases, some of which are potentially available on-line,
and finally, provide a realistic domain in which the QBC system could satisfy a
legitimate need. The domain selected is the databases associated with astronomy,
with the am ateur astronomer as possible user, although it is representative of
any scientific domain in which scientists from other domains may need to access
data in the domain.
In this domain, the am ateur astronomer is confronted w ith an extremely
large number of astronomical objects, which are usually listed in a database th at
encompasses the various types of astronomical objects such as planets, galaxies,
nebula, stars, double stars, variable stars, quasars and others. There is also
an interrelationship among the d ata th at provides a rich test environment for
multiagent interaction. For example, the viewability of objects is dependent upon
the tim e of day and the time of year, the aperture of the telescope, the phase of
the moon (e.g., moonlight reduces the m agnitude of stars th at are viewable), the
current location of the planets, the nature of the audience and any restrictions
inherent at the viewing site such as trees or buildings. Details of the databases
and agents associated with this domain are presented in Appendix A.
Another focus of the research seeks to understand how the data stored within
various databases can be harnessed to assist in the query formulation process.
For this research, it is assumed th at the user begins with only a portion of a
2
query that points toward the data desired, but lacks the complete detail required
to gather all of the relevant information.
It has been observed th at there are two polar views about the problem of
supporting man-machine interaction: the computer as tool, and the com puter as
intelligent assistant (Robertson, McCracken and Newell 1981). This work, while
looking at previous work in the computer as tool arena, has a goal of using the
large body of existing data in databases, salted liberally with intelligent agents,
to push forward in the direction of computer as intelligent assistant.
Each agent will represent a single database. The user prepares the query
kernel and subm its it to the QBC “query committee” . Each of the database
agents th at comprises the QBC system reads the query and, if possible, elaborates
on it using knowledge th at it has stored in its respective database. The result of
this process is both the elaboration of the query and the retrieval of the desired
information from the database.
To help in understanding the objectives of this research, an informal example
will be presented.
1.1 Informal Example of QBC Objective
To illustrate the goals of this research and contrast it with current practice in
the database domain, we look to hum an society for a relevant example. The
retail store provides a set of these that can be related to the level of assistance
that database management systems now provide, or could provide in the future.
More specifically, we consider the range of stores th at handle audio equipment.
At the most primitive end of the spectrum (with respect to user support) are
the mail order stores, which have advertisements in the popular audio magazines.
These provide little product information, limited in many cases to only a product
listing in a magazine. These cater to the more sophisticated consumer who
knows exactly what products he needs. This store is analogous to the current
3
commercial database management system. In such systems, the user can get the
information he wants if he can specify the correct query.
The next most sophisticated level of user support is the more fam iliar self-
service store th at provides goods displayed on shelves. In contrast to the mail
order store, this store perm its the consumer to browse through the products
and assess the different attributes provided by each. A customer may enter
this store with no idea about w hat he wants to purchase, and may browse until
something “catches his eye.” At the low end, these stores will be staffed with
sales personnel who have little in-depth knowledge of the products sold in the
store; hence, the task of formulating w hat to buy from that store is left mostly
to the customer. This store is analogous to those database management systems
th at perm it various types of user browsing, which will be discussed later. One
problem with this type of store is th a t a user unfamiliar with the field may not
understand the difference between a stereo receiver and a stereo tuner. He may
not realize th at if he purchases a tuner, he will also have to acquire an amplifier,
while such is not the case with a receiver.
The most sophisticated store is the audio specialty store, staffed by sales
consultants who are experts on audio products. A customer can enter such a
store with only the vaguest of ideas as to his needs, and the sales staff can
provide the necessary expertise to formulate the list of products that will satisfy
the custom er’s needs. This store is analogous to a database th at will support the
query formulation, as embodied in the QBC system investigated in this research.
In such a store, if the customer selects a tuner (possibly because it is being sold
at a steep discount), the audio store personnel will recognize th at he also needs
an amplifier and will so inform him. The store personnel recognizes this fact
based on his knowledge th at amplifiers are used with tuners.
To carry this example one step further, some specialty stores in the clothing
industry will tailor the product to the specific needs of the customer. This type
of response will be seen in the proposed QBC approach, in which the result can
be viewed as a report th at has been tailored to the specific needs of the user.
4
1.2 Background
While this research has its roots in the artificial intelligence area of problem
solving, it looks at an aspect of problem solving th at has not been addressed
to the extent proposed here. To understand where this proposed research fits
within the problem-solving continuum, I have developed a classification scheme
th at encompasses much of the artificial intelligence problem-solving research.
This classification scheme, and the point within this scheme where the proposed
research falls will now be presented.
From a state space perspective, problem solving can be viewed in term s of
states connected by links. The problem solving process begins at some initial
state and then transitions from state to state until a goal state is reached. A
transition from state “A” to state “B” can occur if there exists an operator th at
can cause a state “A” to state “B” transition. The problem-solving process is
considered successful if the initial state can be connected to the goal state. This
can be accomplished through either forward reasoning (going from initial state to
goal state), or backward reasoning (going from goal state to initial state) (Barr
and Feigenbaum 1981a).
While the initial state can usually, if not always, be explicitly stated, such
is not always the case with the goal state. This observation forms the basis of
the classification scheme proposed by this research. For some systems, the goals
cannot be stated as explicit states, but instead are “metagoals.”
The concept of metagoal, as developed by this research, is a goal th at is part
of the essence of something, but is not stated as an explicit goal state. For QBC,
the metagoal represents the ill-defined intent of the user. Cohen and Levesque,
discussing the work of Bratm an, observe th at rational behavior includes beliefs,
desires and intent. Intent captures the idea of a user settling “...on one state of
affairs for which to aim” and then establishing “...a limited form of comm itm ent”
to achieve the target (Cohen and Levesque 1987:4). The metagoal represents
QBC’s attem pt to infer the intent of the user from the potentially ill-defined
5
kernel of a query th at the user initially presents to the QBC system, the advice
of the QBC committee of experts and feedback provided by the user in the course
of the query elaboration process. In many ways the QBC system is analogous to
a lawyer. The client states some vague and possibly ill-defined intent, and the
lawyer, using his knowledge of the law, elaborates this ill-defined intent into a
well defined statem ent of intent th at satisfies the client. To do this, the lawyer
draws on an extensive toolkit of methods (e.g., questions and forms) th at help
him transform the client’s metagoal into an explicit goal. In a similar way, QBC
will apply various methods to assist in the transform ation of the user’s metagoal
into an explicit goal.
This raises another aspect of the QBC metagoal. It changes in the course of
the query elaboration process. As the elaboration process continues, the QBC
agent, based on information provided by other QBC agents as well as the user
himself, gains a more concrete view of the user’s intent. If the metagoal represents
a snapshot of the user’s perceived intent at any point in tim e, then the metagoal
changes through the course of the problem solving as the intent is clarified to
both the user and the QBC agents.
As will be shown in the literature survey, while some problem-solving pro
grams have explicit goal states, others do not. The program s th at lack explicit
goal states do, however, have some similarity with those programs th at have
them. For example, they normally begin execution at some explicitly stated
initial state, undergo a series of state transitions and reach completion in some
final explicit state. This final explicit state was not stated as an initial goal state.
However, if the program is to be considered successful, these final states are con
sistent with the overall purpose of the program as it was intended by the human
who created it. In recognition th at these programs are not goal-less, th at they
lack only explicit goals, this research has used the term “m etagoal” to describe
these non-explicit goals. Here, “m eta” connotes a “description of” the intent of
the user rather than an explicitly stated goal in the program .
6
W ith the concepts of explicit initial state and either explicit or metagoal
state, a classification scheme can be presented. This will then be ttsed to classify
both the proposed and previous research.
Systems can be grossly classified as EIEG systems or EIMG systems. An
EIEG system is an Explicit Initial, Explicit Goal state system. An EIMG system
refers to an Explicit Initial, M eta Goal state system.
W ithin the artificial intelligence literature, systems such as general problem
solvers, planners, natural language understanding systems, speech understanding
systems, theorem provers and language parsers can all be viewed as starting with
some explicit initial state and attem pting to reach some explicit goal state. These
systems are all examples of EIEG systems. These systems and justification for
the EIEG classification will be provided as p art of the literature survey.
EIMG systems also exist within the artificial intelligence literature. Com
puter programs th at compose music or generate Haiku poetry are examples of
EIMG programs. They all have explicit initial states consisting of musical notes
or English words. Encoded in the programs are heuristics to transform this
initial state into a final state satisfying the metagoal of music or Haiku poetry
(Hofstadter 1979). Other examples of an EIMG program are AM, Eurisko and
Dendral. AM has the metagoal to discover new m athem atical concepts (Barr &
Feigenbaum 1981). Eurisko has the metagoal of discovering new heuristics th at
could be used by a program such as AM in the discovery of new m athem ati
cal concepts (Lenat 1983). Heuristic Dendral has the goal of finding molecular
structures th a t account for a particular spectroscopic analysis, while M eta Den
dral has the goal of finding heuristics th at could be used by Heuristic Dendral in
its search. Other examples of EIMG programs found in the literature are story
generation, engineering design, autom atic programming and network resource
discovery systems.
In all of these EIMG systems, the initial state is given, and the program
incorporates a metagoal of the author. The program then generates output
whose intent is to satisfy the author’s metagoal.
7
The QBC system proposed for this research is a type of EIMG system applied
to the retrieval of data from databases. It differs from other EIMG systems
by virtue of its application to a database; its attem pt to use the d ata in the
database as the source of both metagoal and elaboration control; its incorporation
of multiple, collaborating and possibly conflicting metagoals; its open system
architecture, and its attem pt to guide dynamically the elaboration process via
multiple, collaborating knowledge sources.
Because of its incorporation of multiple, metagoal sources, the QBC system
would be classified as an EIMG* system, where the “*” indicates th a t zero or
more metagoals could be applicable. The zero stems from the open system nature
of QBC; hence, it is possible th at no metagoals exist for some execution of QBC,
although normally this is not the case.
1.3 Purpose of the Research
This research has three purposes: to show how EIMG systems can be effectively
applied to databases, to demonstrate th at data can be harnessed to assist users
in the formulation of queries, and to show th at a m ultiagent, open system, dis
tributed architecture provides a viable architecture for a DBMS-based EIMG
system.
The first purpose seeks to extend EIMG systems into database applications.
The exploding volume of knowledge means th at intelligent queries m ust rely on
legions of experts rather than the single research librarian of the past. Systems
such as NASA’s proposed E arth Observing System, D ata Information System
(EOSDIS) will be amassing large amounts of remotely sensed E arth science data
(Dozier 1990). This data will be made available to scientists around the world.
Since this data will cut across numerous E arth science disciplines, a system such
as QBC could be useful in assisting scientists involved in multidisciplinary re
search in formulating queries for data from unfamiliar disciplines.
By extending EIMG systems into database retrieval, machine-based expertise
can be applied to the formulation of the user query. While the focus of this
research could have been on information retrieval (IR) systems (those systems
whose purpose is to retrieve documents or document abstracts), the focus was
limited to database systems. The reason for this is th at the database system
contains data th at has been categorized and stored based on the semantics of
the data. Such is not the case w ith IR systems, which may deal with natural
language documents. The structured nature of the data is supportive of the
second purpose of the research.
The second purpose of the research is to investigate how the data itself can
be harnessed to provide some of this expert assistance. W ith the proliferation
of networks and CD-ROM-based technology, the d ata can be made available to
the user to assist him in his query formulation. The key is to understand how
the data can be harnessed in this way, going beyond its support in answering the
user’s query.
The third purpose, using a distributed, open-system architecture, is impor
tant. It will address not only the problem of how to extend systems to provide
greater knowledge, but also foreshadows an ability for systems with their own
unique expertise (either d ata expertise or the rule-based expertise of expert sys
tems) to collaborate on the solving of problems. Today and in the future, various
companies may market expert systems with proprietary software and data. By
providing an open-system means for these systems to collaborate, committees of
expert systems (much as committees of hum an experts today), will be able to
pool their expertise for some common objective. The collaboration of the experts
associated with databases in QBC will provide insight into the solution of this
problem.
1.4 Questions and/or Hypothesis
This research seeks to answer the following three questions:
9
1. How can a user, who is expert in the use of a database retrieval system
but unfamiliar with the domain of inquiry, communicate his d ata needs
such th at a distributed, intelligent community of computer-based agents
can formulate a query th at is useful to the user?
2. How can the data stored within the database be harnessed to assist the
user in the formulation of his query?
3. How can the elaboration process be controlled, such th a t it provides useful
data but does not overwhelm the user.
The hypothesis to be investigated by the research is th a t the proposed QBC
system and its associated elaboration control policies provide a viable approach
for answering these questions.
1.5 Assumptions, Limitations, Delimitations
This research has m ade some decisions to ensure th at the focus of the research
remains on the problem of query elaboration, and not on other interesting prob
lems th at would dilute the thrust of this work. These include the following:
1. Universal Type Hierarchy - This work assumes th a t all d ata accessible
by the system can be classified in some universal semantic model. When
an agent joins the query committee, it is the responsibility of th at agent
to register the semantic description of its data objects, and, if necessary,
conform to the names of previously registered semantic entities.
2. Machine Understandable Input - To avoid the issue of natural language
understanding, it is assumed th at all input to the system is provided by
a means th at is understandable to the QBC system. The form at of this
input will be described as part of the research.
10
3. Retrieval Only - To limit the scope of this work, information retrieval only
is considered in this research. The creation and update of systems is not
considered.
4. Performance - This work is seeking to understand how a new capability
can be added to a system; hence, high performance (i.e. speed of retrieval)
will not be a design objective.
5. Architecture - While the investigation assumes a distributed set of agents,
the focus of the research is not on multiagent architectures, but rath er on
the w hat could be term ed the protocol of multiagent query elaboration.
While the research assumes a blackboard-type architecture, agents are as
sumed to be identical in processing capability, differing only in the data
th at they possess. Hence, a single, round-robin invocation scheme is used
in the implementation, which sequentially applies the agent-processing al
gorithms to the agent’s database.
11
Chapter 2
Literature Review
This research can be viewed from a num ber of perspectives and the literature
survey has been organized in term s of these various perspectives. From the
database perspective, this research is extending database technology on two re
lated fronts. It is providing assistance to the user in finding the desired data,
and it is activating the database system to become an active participant in the
user’s quest for desired data. Therefore, research on database browsers and even
those aspects of information retrieval systems th a t facilitate the user finding the
desired data are potentially relevant to the QBC research, in th at they provide
either a foundation on which to build a n d /o r a contrast of current capabilities
with those of QBC. Relevant research in this area will be considered in the “D ata
Access Research” section of this chapter.
From the artificial intelligence perspective, this research is extending the
EIMG systems into the new area of problem formulation as applied to query
elaboration. From this perspective, previous work in EIMG systems is relevant
and will be considered in the “EIMG Research” section of this chapter.
2.1 Data Access Research
D ata access research is concerned w ith facilitating access by a user to some set
of desired data. D atabase management systems support query languages such
12
as the relational query language SQL, by which the user can formulate a query
and the system will attem pt to satisfy it (Ullmann 1988, Date 1984).
It has been observed elsewhere th at there are two polar views to the problem
of supporting man-machine interaction; the computer as tool and the computer
as intelligent assistant (Robertson, McCracken and Newell 1981). One of the
data access research thrusts was research into tools, called browsers, for users
to scan databases for relevant information. In the early work, the tool provided
the means of scanning, and the user provided the intelligence to identify relevant
information. These systems have various levels of intelligence to facilitate a user’s
ability to sift through the database in search of desired data. In general, these
are not EIMG systems, since a hum an user provides the metagoal in real-time;
also the systems are controlled manually by people. However, in some cases,
the systems provide some EIMG-like capability, which will be pointed out in the
course of the review.
A number of browsers have been described in the literature. While m ost of
these systems are database browsers, one information retrieval browser has also
been included because of the relevance of its architecture. The systems include
the following:
1. PIE, used to explore networks used for the description of software (Gold
stein and Bobrow 1980);
2. TIMBER, a graphically oriented browser that displays various parts of
relations, but relies on the intelligence of the user for all the search strategy
(Stonebraker and Kalash 1982);
3. RABBIT, which supports the browsing of both the d ata and a semantic
model th at describes the data through the successive reformulation of the
query, based on user critique of the previous query in light of the results
(Tou, et. al. 1982);
13
4. The Active Database, which provides both a level of semantic browsing and
autom atic triggering of database updates based on semantic relationships
to user-initiated changes (Morganstern 1984);
5. BAROQUE, which provides for both schema and value browsing, using an
autom atically constructed schema and value network (M otro 1986);
6. FLEX, which provides a number of levels of query repair for poorly for
m ulated queries, or query synthesis based on database schema identifiers
or instance values contained in a hopelessly non-repairable query (Motro
1988).
7. IsR (Intelligent Interface for Information Retrieval), which provides a black
board centered multiagent retrieval system that can support user browsing
at the concept, document or journal level (which incorporates topic specific
survey articles) (Thompson and Croft 1989).
Browsers, as exemplified by the RABBIT system, were designed to satisfy
the following four objectives (Tou, et. al. 1982):
1. Support the user who has incomplete knowledge about the term s required
to create a query.
2. Support the user whose intention is only partially articulated.
3. Do not overwhelm the user with data when presenting him with information
about a particular item.
4. Recognize that, “. . .the structure of the database(s) is heterogeneous with
the result th at the ‘shape’ of the database changes depending upon where
one is within the database;” hence, the browser m ust be able to adapt to
the changing shape of th at portion of the database with which the user is
currently concerned.
14
These objectives sound very similar to those of QBC. The major additions
th at distinguish the QBC objectives from those of the browsers are th a t it is to
1. Provide a significant movement from the tool type of browser system, which
assists the user who provides the prim ary information-exploration direc
tion, to an intelligent-assistant-type information extraction, which assists
the user in clarifying his partially articulated intentions.
2. Harness the considerable amount of information available in the database
about any given item to assist in both the information exploration and
completing the articulation of the user’s intentions.
3. Perform this assistance by harnessing multiple agents to collaboratively
provide the assistance in information exploration through query formula
tion.
Some of these systems go beyond support for d ata and schema browsing to
provide more active support in the user’s data search. One of the first problems
th at m ust be confronted in supporting the search for unknown data is how to
describe it. If the user cannot formulate a query, then how is he to describe
what he needs? This represents a m ajor problem th at had to be addressed in
the QBC research. BAROQUE and the active database address this problem by
allowing the user to describe an example of the d ata th a t he wants, and then
requesting th a t the system find any more data th at is like the example. If there
is no d ata th at totally matches the example, BAROQUE returns the number of
matches th at exist for each of the characteristics of “Q” taken separately. The
user can then select the subset of the characteristics th at he wants to pursue,
w ith BAROQUE taking the conjunction of the characteristics th at comprise the
subset as its new target m atch conditions. The active database perm its the user
to describe the desired data in term s of the differences between the known object
and the desired object.
M otro, with his goal queries (Motro 1986b) and FLEX system, has provided
a means by which the system attem pts to provide a result th at is close to w hat is
15
specified in the user query. A goal query establishes a target retrieval request, but
data th at is close to the target will be considered to satisfy the query. A distance
function is defined to measure the distance between values in the database. A
neighborhood radius indicates a standard size radius around each attribute value.
Since a measure of the distance between two entities may involve a distance
comparison between entities th at contain multiple attributes, relative weights can
be used to order the im portance of the attributes in determ ining the distance
between their associated entities. Scaling factors can be used to reduce the
distances obtained for different to values of comparable m agnitude.
Three different types of goals are specified by M otro. Neighborhood goals
are satisfied by using a sim ilar-to comparison in place of the equal-to comparison
normally found in queries. The similar-to comparison is satisfied if a value is
within some specified delta distance from the original target. Optimum goals
are used to prune a selection of neighborhood goals into one th a t is optimum
according to some criteria. The priority goal establishes a priority order, in
which the system is to consider the various attributes of a query. In this way the
user can specify how the database is to be culled for entities th a t satisfy the goal
query.
FLEX provides an ability to generalize those queries th a t return null values.
Numeric values are generalized by adding or subtracting (as appropriate) a delta,
while non-numeric values are generalized by setting them to true. Special care
is taken in generalizing queries th at involve joins, since removal of the common
attributes may make the resulting generalization meaningless. This query gener
alization capability begins to have some of the flavor of an EIMG system, since
there is no explicit final goal state, only a hill climbing heuristic to generalize the
query until it it can be satisfied. QBC generalizes this approach by perm itting
the user to provide abstract patterns th a t provide some of the characteristics of
the data and relying on the query committee to “flesh out” the abstract patterns.
Conceptually, these approaches can be viewed as performing a reformula
tion of the query to broaden its scope, with the objective of finding d ata th at
16
m atches this broader scope. Query modification has also been performed to re
pair queries th a t are not quite correct, as in Codd’s 1978 RENDEZVOUS (Barr
and Feigenbaum 1982). When the system discovered ambiguities in the query, it
then communicated problems in the formulation of the query to the user, who
interactively corrected them based on questions from the system. FLEX actually
attem pts to repair the query.
For the worst non-repairable queries, FLEX discards the query, but saves
recognized words in the query th a t relate to the data stored in the database.
FLEX takes words and attem pts to synthesize a new query. Only words th at
can be translated, using the thesaurus, into tokens known to represent schema
or instance d ata are considered. Tokens th a t represent schema words become
requests and those th at represent d ata instances are used as qualifiers. To ac
complish this, FLEX attem pts to find paths th at join the tokens. Again, this
has some of the attributes of an EIMG system; there is no explicit goal state or
even criteria for a final state, except th at the query be syntactically correct.
Query reformulation has also been used to optimize system performance. Un
der this approach, a user’s query is transform ed into a semantically equivalent
query th at will result in the same answer, but involves less processing on the part
of the database management system. The basis for transform ation is integrity
constraints th a t hold over all possible states of the database. A description of
a system th a t supports semantic query optim ization is provided in King (1980).
However, in contrast to QBC and even the query repair work previously de
scribed, this query modification is based on maintaining the invariant of the
d ata retrieved. W ith query repair and QBC’s query formulation no such test
exists, since the initial query is not completely specified.
QBC also contrasts with the previously described work in query formula
tion and reformulation, in th a t QBC does not begin with a partially form ulated
database query or fully formulated goal query, but allows the user to specify his
QBC query kernel in term s of the entities desired, the real-world activity to be
supported, or d ata th at is specified based on relationships between real world
17
activities. Thus the semantic level of the query kernel is higher in QBC than it
is in the work described.
To address the problem th at the same data may be known by different names,
the FLEX system , among others, includes a lexicon, which translates “real world”
words to internal tokens th at are known to the system (Motro 1988). QBC
extends the concept of the synonym through the use of the polytyped variable,
which consists of a variable th a t can simultaneously have multiple types.
M otro and others have pointed out one pitfall to which systems such as BA
ROQUE as well as QBC may be vulnerable (Motro 1986, Hinke 1988a). This
is w hat M otro calls the connection problem. The creation of paths based on
value alone could result in a m atch of value, but a mismatch of semantics (Motro
1986, Hinke 1988a). In BAROQUE, M otro suggests th at referential integrity
constraints could be tapped to provide evidence of a strong connection between
the attribute of one relation and the key attribute of another relation. For two
such linked attributes, identical values in each can be assumed to represent the
same value-item. M otro also suggests th at McLeod’s abstract domains could
provide a solution for this “connection trap ” problem (McLeod 1977). I have
previously suggested th at strongly typing the d ata according to a semantic model
will eliminate some of the problems of identical value matches, in which the
semantic types of the data are different (Hinke 1988b). QBC will assume th at
all of the d ata is strongly typed according to some semantic-model-based type
hierarchy.
One of the thrusts of the QBC research is to activate the data with an agent
th a t represents the database and participates in the query elaboration process.
A system th a t took an initial step in this direction was the active database work
of M organstern (1984). The active database has the capability to respond intel
ligently to changes in the database (M organstern 84). These changes relate to
maintenance of the integrity and consistency of the database (e.g., if a person’s
phone num ber for messages m ust be the same as the phone number of his sec
retary, then this num ber changes autom atically if his secretary or his secretary’s
18
number changes). These are described using condition-action rules, where the
condition represents the state of the database th a t m ust exist for the rule to fire,
and the action is the effect of the rule. In contrast to QBC, the active database
accomplishes these changes with a distinct rule base, while much of the QBC
“rule base” is the data itself. Also, while the active database m anipulated the
database, QBC’s agents m anipulate the query.
A example of a multiagent approach to d ata access is provided by the I3R
system. The I3R system represents a multiagent, blackboard-based system that
supports user browsing. The information in the system is organized at three
levels. The concept level is the lowest and constitutes the thesaurus level, which
is organized as a semantic network w ith a number of different types of links.
These include synonym, related, broader, narrower, instance, phrase and nearest
neighbor. The nearest neighbor is determined by statistical similarity, in th a t the
terms are used together with a high frequency. Like the concept level, the docu
ment level is also interconnected by links, including links based on the citations.
The journal level stems from the fact th at publications will sometimes have issues
th at sum m arize a field, and thus provide a useful place to begin browsing. This
idea of levels th at are both internally linked as well as linked to other levels has
been adopted in QBC.
The power of a DBMS can also be enhanced through the addition of a de
duction engine, which perm its the database to derive new facts from the data
existing in the database through various deductions (Gallaire, Minker, Nicolas
1984, Ullman 1988).
A deductive database takes the explicitly stored extensional data (ED) and
applies some rules (R), which perm it the retrieval of intensional data ID. The
total data potentially retrievable by a user consists of TD = ED + ID. However,
if TD were stored explicitly (assuming th at it is not infinite) in a database and
the rules eliminated, then one would have a non-deductive database w ith the
same retrieval capability as the original database. A deductive database thus
serves as an amplifier for the extensional data.
19
In contrast, QBC serves as a filter to determine w hat is germane to the user’s
query needs, from among the ED data if only conventional databases are used
or among the ED + ID if a deductive database is used. Also, QBC actually uses
a form of deduction in the elaboration of some of its states into new states. Just
as a deductive database uses inference rules to extend the p ath from extensional
d ata to new facts, so m ust QBC extend the initial query kernel. This will add
additional relevant rules or d ata necessary to discover additional germane data,
or restrict the scope of the current d ata th at has been discovered.
While a deductive database can be mined more deeply for information, it
does not relieve the user of the need to formulate a query based on specific
knowledge of the dom ain encompassed by the database. This differentiates the
goal of deductive databases from th at of QBC.
A closely related area is the combination of expert systems with database
technology, resulting in the expert database system (EDS). “. . . an EDS
is defined here as ‘a system for developing applications requiring knowledge-
directed processing of shared information.’” (Smith 86:K-6). In sense, the QBC
query comm ittee fuses domain experts with data experts to facilitate user access
to data. However, the focus of this research is on how these various experts
can be fused to accomplish the query formulation, not on pushing forward the
state-of-the-art in EDS technology.
2.2 EIMG Systems
There are a num ber of classes of EIMG systems th at will be considered in this
section. The EIMG systems will be organized as discovery systems and design
systems. Although design is really a type of discovery, these two divisions repre
sent a useful means of partitioning the following discussion.
20
2.2.1 Discovery Systems
As the name implies, discovery systems seek to discover th a t which is new. A
prime example of a discovery system is Lenat’s AM system which was designed
to synthesize new m athem atical concepts from a set of primitives (Davis and
Lenat 1982; Lenat 1979).
AM begins with a set of primitive concepts and a set of discovery heuristics
th at inductively attem pt to discover new concepts. Since this forward direction
from primitives to concept discovery has a larger fan-out of plausible paths than
the backward direction from discovery concept to primitives, AM includes heuris
tics to assess which paths are the most promising for investigation. In addition,
newly discovered concepts can be used to find additional concepts.
AM organizes its database of concepts in a hierarchy of concepts connected
with links indicating set specialization and generalization or elements and exam
ples of relationships. The heuristics are placed at the m ost general level appro
priate and associated with a particular slot of the concept frame. In the course of
processing, the search for an appropriate heuristic moves from the specific to the
general by rippling up the hierarchy. As will be shown, QBC also relies heavily
on specialization and generalization hierarchies for organizing knowledge about
data types and their relationship to other data types.
For control, AM uses a list of tasks, ordered by an im portance measure. The
importance measure is used not only for selection, but also to determ ine how
much tim e the task is given to process. At the end of the tim e, the task is placed
back on the agenda with a new importance m easure (based on the results of the
processing), and a new task is selected. New tasks are added to the agenda in a
place appropriate to the importance value assigned to them by the AM heuristics,
based on their perceived ability to lead to interesting discoveries. The heuristics
can add new tasks to the agenda, create new concepts or fill in an entry in an
existing concept.
AM is limited by its inability to formulate new heuristics for the concepts
that it discovers. The finding of new heuristics was addressed by the subsequent
21
Eurisko project. Also, AM’s judgments about paths to follow are based on an a
priori assignment of an im portance value of each newly discovered concept and
how fruitful an initial path exploration is. If the path does not appear to be going
anywhere, AM loses interest in it. Unfortunately, these decisions may eliminate
interesting avenues of investigation th at could be assessed only after much more
work or insight th an AM is capable of providing.
Lenat observes th a t, from the proper perspective, AM can be viewed as a hill-
climbing program , where the most interesting new concepts are uphill from the
primitives. I t’s goal is “maximize the interestingness levels of the activities th at
it performs.” (Davis and Lenat 1982:pg 10). In AM, “interestingness” is defined
based on knowledge of the conditions under which new m athem atical concepts
might appear. In QBC, “interestingness” is in term s of an agent’s recognition
th at it has something relevant to contribute to the evolving query kernel. For
participation to occur, there m ust be an affinity between the agent’s d ata and
the currently posted state of the query kernel.
QBC and AM differ in a number of areas, including domain, architecture
(QBC’s is m ultiagent and AM’s is not), and the fact th at while the QBC query
kernel is a function of w hat the user posts, AM always begins with the same set
of primitive concepts.
A nother of the seminal discovery programs is Eurisko, which builds on the
AM base, attem pting to counter one of the limitations of AM, which was its
incapability of learning new heuristics as the m athem atical discovery process
continued. All discovery was based on the heuristics initially stored in the system.
The Eurisko research is concerned w ith how new heuristics can be discovered
(Lenat 1983).
The key element in AM was the concept, to which heuristics were applied to
discover new concepts. In Eurisko, the heuristic is treated as a concept. Each
heuristic is represented as a frame, with a set of condition slots and a set of
action slots.
22
AM discovered new concepts through specializing or generalizing of existing
concepts or finding new concepts through analogy. In a similar way, these are
the three approaches used by Eurisko to find new heuristics. The generalization
of past successes leads to heuristics th a t indicate direction for future movement,
while the generalization from past failures leads to heuristics th a t prune unde
sirable directions from the problem solving search path. Since failure is more
common than success, most heuristics deal with pruning.
In one sense, QBC is closer in type to AM th an Eurisko, in th at the basic
heuristics will be fixed. While the heuristics will be d ata independent, they will
be the same from one session to the next. Heuristic discovery as addressed in
Eurisko will not be addressed in QBC.
Another type of discovery system found in the literature is game-playing pro
grams. Game-playing programs represent a transition from explicit-goal systems
to metagoal systems. Information about game-playing programs was obtained
from (Barr and Feigenbaum 1981), although the analysis with respect to explicit-
goal and metagoal systems is mine.
While game-playing programs can explicitly describe the goal in term s of the
set of states th at constitute a “win” , these goal states may not be of much use
in selecting the next state. This is due to the extreme fan-out of states th at are
possible from any particular state. Since the goal cannot be used in the selection
of the next state-transition operator, these systems are similar to EIMG systems.
R ather than using a means-end analysis (to compare the current state with the
goal state and selecting an appropriate operator to reduce this difference), the
game-playing system uses a utility function to pick the best move based on the
current state of the game and some heuristics (Rich 1983).
This necessity to navigate toward a goal without actually using the goal in the
next-state determ ination has elements of the metagoal systems, of which QBC
is a type. However, it differs from the metagoal systems in that the explicit goal
state or goal-state predicate can be used to determine when the game is over.
23
This is not possible with QBC, since not even the user knows precisely for what
he is searching.
2.2.2 Design Systems
The proposed QBC research goes beyond just the retrieval of data, in th at the
system actually participates in the formulation of the query. In a sense, the
system has become co-equal with the user, as it provides independent guidance
to the d ata selection process. This can be viewed as a problem of design, since
both the d ata customer and the system are collaborating on the design of a query
for information. For this reason, it is useful to look at design systems.
ALADIN (Aluminium Alloy Design INventor) is a system th at uses multiple
sources or knowledge to design an alloy th at will have specified target properties
(Farinacci, et. al. 1985). The problem solving begins with a set of constraints
th a t may over-constrain the solution to the point where there is no solution.
ALADIN uses a generate and test approach to generate candidate alloys and
test to see whether they satisfy the constraints.
CADRE is a system for the autom atic synthesis of VLSI circuits. It was
designed as a system of cooperative agents interacting via a shared description of
the chip under design. The CADRE design is complete when sufficient constraints
have been presented to “completely and unambiguously define the physical layout
of the chip” (Ackland, et. al. 1985).
W ith design systems, as w ith QBC, the design is not known at the outset,
and in fact, there may be multiple designs th a t would satisfy a set of constraints.
However, in contrast w ith QBC, there exists some test th at an adequate design
has been achieved. Hence, these systems are not pure EIMG systems.
Another area of design th a t has received much attention is th at of autom atic
programming. The objective of autom atic programming is to ease the lot of
the program m er by assisting him in going from a high-level (and possibly par
tial) specification for a program into target code. The methods of autom atic
24
programming can be characterized as the following: theorem proving, program
transform ations, knowledge engineering, traditional problem solving and pro
gram m er apprentices (Avron and Feigenbaum 1982). In contrast to QBC, these
systems are characterized by a priori constraints th at can be used to test the
final program.
From QBC’s perspective, the interesting aspect of these program s is how to
transform the initial program specification into a final program . The NLPQ
(Natural Language Program for Queuing Simulation) was developed by Heidorn
in the late 1960’s (Barr and Feigenbaum 1982). It is designed to take natural
language input regarding queuing problems and output GPSS programs. Its
knowledge base consists of information on queuing problems and production rules
to transform the natural language input into an internal network representation
of the problem. The rules are then used to transform this internal representation
into a GPSS program . It can identify missing information and tell the user what
additional information is needed to complete the problem specification. This
ability to identify missing information is based on NLPQ’s understanding of the
actions of a queuing system (e.g., arrival, service, departure) and the nature
of information associated with an action (subject, object, location, duration,
frequency and following action). Based on this knowledge of normal actions
and associations, NLPQ is able to identify missing information (since slots are
unfilled) and request it.
The PSI system was developed by Cordell Green and associated students at
Stanford in the late 1970’s and early 1980’s (Barr and Feigenbaum 1982). Work
was continued on a follow-on to PSI, called CHI, at Kestrel Institute in Palo Alto.
It is a knowledge-based system th at uses a domain expert with knowledge of the
application area to help other experts, such as the PA R SE R /IN T ER PR ET E R
and EX A M PLE/TRA CE experts. These fill in missing information in the evolv
ing specification.
The SAFE system takes a program specification th at may initially be am
biguous or incomplete and transform s it into a well-formed program th at satisfies
25
various constraints (e.g., information m ust be processed before it is output). In
the process, it attem pts to remove the ambiguity and resolve the incompleteness
(Barr and Feigenbaum 1982). In term s of problem solvers, the constraints pro
vide a goal, and the initial program specification provides an initial state that
must be transform ed into the goal by using the knowledge base of a well-formed
program provided by the constraints. The completed program is then symbol
ically executed to provide a database of relationships between input and the
variables of the program . This program is analogous to the query repair systems
previously described.
The concept of a program m er’s apprentice is being investigated at both MIT
and USC/Inform ation Sciences Institute. Research was performed at M IT by
Charles Rich, Howard Strobe and Richard W aters. Inform ation on the Pro
gram m er’s Apprentice was gathered from (Rich and W aters 1988b; B arr and
Feigenbaum 1982). Related work is also being conducted under the Knowledge-
Based Specification Assistant project at USC/Inform ation Sciences Institute by
W. Lewis Johnson, Don Cohen, M artin Feather, Dan Kogan, Jay Myers, Kai
Yue and Robert Balzer (Johnson, et. al 1988). Aspects of each of these projects
th at are applicable to the QBC work will be discussed in following paragraphs.
The Program m er’s Apprentice work will first be described. As stated by Rich
and W aters (1988b):
“The long-term goal of the Program m er’s Apprentice project is to develop a
theory of how expert program m ers analyze, synthesize, modify, explain, specify,
verify and document programs. . . . Because it will be a long tim e before we can
fully duplicate hum an program m ing abilities, the near-term goal of the project is
to develop a system - the Program m er’s Apprentice - th at will provide intelligent
assistance in all phases of the program m ing task. . . . The Program m er’s
Apprentice focuses on the use of inspection methods to autom ate programming,
as opposed to more general but harder-to-control m ethods, such as deductive
synthesis or program transform ations.” (1988b)
26
Under the inspection approach to synthesis, cliches are recognized in specifi
cations, and a library of cliches is used to satisfy these specifications. The concept
of a cliche rests on the observation th a t hum ans think in term s of combinations
of elements. This is related to the theory of “chunking, whereby a hum an builds
up the organization of memory by assembling collections of existing chunks into
larger chunks, each chunk being a description of a p art of a situation.” (Laird,
Rosenbloom and Newell 1986:xiii). By predefining these chunks of program m ing
expertise, called cliches, the Program m er’s Apprentice can then apply them to
the program development process. The hum an program m er can specify the pro
gram specification in term s of cliches, or use the cliche as a base and then specify
how the desired program should differ in specific details. As will be shown in sub
sequent chapters, QBC adopts a num ber of cliche-like constructs (e.g., activity
primitives) for its knowledge base.
A graphical notation, called the plan calculus, is used to represent both cliches
and the evolving program . The plan calculus represents programs in term s of
data and control flows, thus abstracting away the details of how this flow is
to be accomplished. The plan can represent complete, incomplete and abstract
cliches or program plans. The cliche library supports frame inheritance, so it can
represent relationships between cliches, such as specialization. This concept is
used by QBC to elaborate various QBC data structures in terms of specialization.
Constraints can be expressed between frame slots and the slot values. If
constraints are established equating multiple slots, then values stored in one slot
are propagated to others th at have been equated w ith the filled slot.
Rich and W aters are also developing a Requirements’ Apprentice (1988b),
which seeks to bridge the gap from the informal (and possibly ambiguous, in
complete and contradictory) requirements generated by hum ans and the more
formal requirements th a t are the focus of much current work in software require
ments tools. A key aspect of the Requirem ents’ Apprentice is the codifying of
cliches relevant to requirements. They recognize th at the potential range of such
27
cliches is very large, encompassing any p art of the real world. This need to cap
ture such real-world cliches is closely related to the attem pt to capture common
knowledge in CYC (Lenat 1986). It is also related to the need for QBC to have
a sufficient body of expertise to relate the needs of the d ata consumer th at come
out of the real world to the QBC available data.
In essence, the Program m er’s Apprentice is a transitional system between
EIEG systems and EIMG systems. The hum an provides the basic outline of the
goal in term s of cliches. The cliches, being frame-like, can then identify missing
data and indicate w hat additional d ata needs to be applied. Hence, the set of
cliches together forms a type of metagoal th at describes w hat needs to be further
elaborated. Such elaboration is guided by the program m er in the current system.
The Knowledge-Based Specification Assistant (KBSA) work of Johnson ad
dresses the problem of specification development. Once a specification has been
elaborated, then the code can be generated from it. In a sense, QBC is evolving
a specification for a query. However, in contrast to KBSA, it will also present
the user with a set of result options.
One objective of the Specification Assistant is to perm it parallel elaboration of
the specification. The approach taken is to perform each elaboration in isolation,
then attem pt to combine them. Such a combination of elaboration steps can fail,
due to interference or ambiguity. An interference can occur if one elaboration
violates a precondition of the other, violates the goal of the other or changes
a p art of the specification th at is referenced by the other. Ambiguity exists if
there is no clear m ethod of combining the separate elaborations due to changes
in specifications, leading to different interpretations of portions of it by each of
the different elaborations.
While the designers would ultim ately like to see KBSA play an active role
in the specification-development process, the current work has progressed only
a limited way in th at direction. Their proposed direction is to view the spec
ification-development process in term s of state spaces, progressing from some
initial state to a goal state. Under this concept, it is the user who m ust decide
28
whether the goal has been reached. Progression to the goal is achieved by iden
tifying issues th a t separate the current specification state from th e goal state.
These issues- are resolved by establishing a development goal (or a subgoal).
The development goal is satisfied by selecting the proper command to transform
the specification and elim inate the issue. The commands are in term s of editor
commands th at are to be applied to the specification. The kinds of commands
th at lead to changes in the specification are as follows: add terminology, remove
terminology, revise signatures of existing terminology, add invariants, remove in
variants, revise invariants, extend behavior, restrict behavior, revise behavior,
rephrase specification (leaving behavior and invariants fixed) and move toward
implementation.
29
Chapter 3
Approach
This chapter presents a model of query elaboration and then introduces the
various levels of query elaboration processing th at will be described in detail in
subsequent chapters.
3.1 Basic Concepts
This section develops a model of query elaboration. It begins w ith the devel
opment of terminology w ith which to characterize query elaboration, and then
extends the model to incorporate the various stages of the elaboration. However,
prior to describing the model, some basic concepts will first be presented.
The stimulus for a query is a recognized need for d ata by a person w ith a
problem, who will be called the problem owner. The solution to a problem may
involve physical actions, the use of various physical objects and various pieces
of data, some of which may be stored in a com puter database. This research is
concerned only w ith the database data th a t is required to solve a problem.
The nature of the data required to solve a problem can be identified by
domain experts. If the problem owner consults an expert in the dom ain of the
problem, then th at expert can provide the problem owner with a description of
d ata required to solve the problem. (This data may or may not be contained in
a com puter database.) For some problem p, the set of d ata needed to solve the
problem will be called total problem d ata T PD p. T hat subset of T P D p, which is
30
stored in database d, is the stored problem d ata SPDPd- This characterizes the
maximal set of data th a t is relevant to problem p in database d.
A problem owner will formulate query Qp c j to acquire the requested problem
data R PD pd from database d for problem p. If the problem owner were also a
domain expert, then it can be assumed th at barring errors - R PD pd = SPDpd.
However, since a problem owner is not necessarily a domain expert, it is likely
th at
(RPD pd fl SPDpd) C SPDpd.
As a minimum to be useful at all, Qpd must satisfy the following two princi
ples:
• P i: RPDpdf!TPDp /- Null, which states th at the d ata requested by the
query m ust intersect the set of d ata required to satisfy the problem.
• P2: R PD pd ^ Null, which states th at the query m ust actually target d ata
in the database.
The utility of Qpd to the user increases as the query maximizes both recall
and precision, whose definitions taken from the field of information retrieval are
as follows:
Recall measures the proportion of relevant information actually re
trieved in response to a search (that is, the num ber of relevant items
actually obtained divided by the total number of relevant items con
tained in the collection), whereas precision measures the proportion of
retrieved items actually relevant (that is, the num ber of relevant items
actually obtained divided by the total num ber of retrieved items.
(Salton and McGill 1983:55)
This leads to the following two additional principles th a t should be satisfied
by the selected query:
• P3: Maximize (RPD Pd fl SPDpd)/SD Ppa, which maximizes recall.
31
• P4: Maximize (RPD pa fl SPDpd)/R P D pa, which maximizes precision.
To provide structure to the data, this research has adopted a variant of the
ER-model (Chen 1976). While Chen’s original model viewed data in term s of
entities, attributes and relationships between entities, this research, has dropped
the distinction between the entity-attribute relationship and the entity-entity
relationship. An entity is viewed as having some context, and this context con
sists of zero or more entities. This loses some of the semantics provided by the
ER-model. It is equivalent to the ER-model with respect to attribute-entity re
lationships, since the ER-model does not type these links other than to note the
attribute relationship, with all attributes having the same relationship. How
ever, it does lose the knowledge inherent in the entity-entity relationship of the
ER-model.
In defense of this simplification, it is argued th at the result is no worse than
th a t provided by the relational model; at worst it will lead to a lower precision,
but no loss of recall. The advantage of this approach is th at it perm its a uniform
means of handling an entity space, and as will be shown, will allow the same
techniques to be used for higher order semantic spaces as well. Also, it frees
the problem owner from having to be aware of anything other than entities and
attributes.
3.2 Elaboration Model
To provide a means of characterizing the various types of query elaboration, this
research has developed the multi-world, two-dimensional (M-World-2D) model.
In this model, a universe is defined as th at which incorporates all data th at is
associated w ith a particular QBC query elaboration session.
A universe contains one or more worlds th at encapsulate a particular state
of real-world activity. For example, worlds could be partitioned spatially or
temporally. Thus, one world could indicate the state of real-world activity at
tim e T 1 ? while another world could indicate the state at some later tim e T 2. Or,
32
one world could indicate the state of activity at some location, such as Redondo
Reach, while another indicates the state of activity in Huntsville. L ehnert’s
characterization of question types identifies a set of questions in which causality
or effect is being requested (Lehnert 1978). This suggests th at one world could
have various causal relationships to another world. For example, one world could
indicate some activity, while another world indicates the activity th at caused the
activity in the first world. In general, worlds provide a means of encapsulating a
description of some activity.
Since queries are developed in the context of real-world activity (e.g. d ata
required to view some astronomical object), the model addresses the relationship
of d ata and real-world activity. Worlds are divided into two related dimensions.
The data dimension contains data; the real dimension contains descriptions about
activities in the real world, such as the act of viewing a galaxy or photographing
a deep space object. The d ata dimension is related to the real dimension, in th at
the d ata dimension contains the d ata th at is required to support the activity
specified in the real dimension.
In general, a query formulation can be characterized in term s of two worlds.
The prim ary world, consisting of its real dimension and its d ata dimension and
a secondary or specification word which consists of a real dimension. Using the
model th a t has been developed, a taxonomy of query elaboration was created.
The taxonomy is based on the nature of the initial state for each of the dimensions
of the prim ary and specification worlds. Each of the dimensions of a world can
be fully specified, partially specified or unspecified.
The description of this initial state takes the following form: P-R X /D Y , S-
RW /DZ. “P-” designates the prim ary world and “S-” designates the secondary
world. RX designates the real part of the prim ary world, RW the real p art of
the secondary world, DY the d ata p art of the prim ary world and DZ the data
part of the secondary world. The argum ents X, Y, W and Z have the value F
(fully specified), P (partially specified) or U (unspecified).
33
If the d ata requirements were fully specified, this would be represented as
P-R U /D F. This represents the situation required to use today’s commercial
database management systems.
The first type of query elaboration to be presented is represented by the no
tation P-R U /D P, which states th at the data dimension is only partially specified,
while the real dimension remains unspecified and unused in this first type of query
elaboration. This type of query elaboration is called direct query elaboration.
Various types of indirect query elaboration will be described based on un
specified data dimensions, b u t a fully or partially specified real dimension of the
prim ary world or the real dimension of the specification world. In the next two
sections, direct and indirect query elaboration will be described.
3.3 Direct Query Elaboration
In support of the need, at least initially, to formulate a query in a problem
domain th at may not be realizable in a database, this research has developed
the concept of a solicitation. A solicitation represents a request for d ata th a t is
in T PD p; however, the request may not be satisfiable by the database. There
are two prim ary reasons for this: either the solicited data is not present in the
database, or the solicitation may be stated using term s th a t are not recognized in
the database. The model supports two types of solicitations: entity solicitations
(E-sols) and attribute solicitations (A-sols).
An E-sol represents a request for a particular entity. In essence, it is the
name for a desired entity. An E-sol represents the minimum inform ation th at a
problem owner could have about an entity - a name - w ith no indication at this
point whether the solicited entity actually exists, or the nature of its attributes.
An A-sol has the form ({E-sol}, attribute-nam e, {value range}), where the {
} notation indicates th at both the E-sol and value range are optional. If the E-sol
is absent, then the A-sol is said to be unconstrained. Value range can represent
34
P -R U /D P Query Elaboration
D ata
Capabilities
Solicitations
D ata
Dimension
Figure 3.1: Direct Query Elaboration P-R F/D U
either a single value or a range of values. An A-sol represents a desired entity-
attribute pair or entity-attribute-value triple. An A-sol represents knowledge
about a single attribute of an entity.
Having defined the concept of the solicitation, a user query can now be de
fined, as illustrated in figure 3.1. The initial formulation of a query is a set of
solicitations. For maximum utility, the solicitations in the query should satisfy
principle P i, targeting d ata th a t is required to satisfy the problem. However,
the solicitations need not satisfy principle P2. If the solicitations in the query do
not satisfy P2, one of the tasks of the query committee will be to transform the
set of solicitations into a form th at does satisfy principle P2.
A solicitation th a t can be realized in the target set of databases is called a
capability. A capability represents an ability to retrieve some subset of the data
in a target database. These capabilities are analogous to the capabilities found
in operating systems, in th a t they provide a token, possession of which provides
an ability to acquire the d ata referenced by the capability. Analogous to the
operating system capability, the same set of d ata may have different and distinct
capabilities th a t all result in retrieval of the same data. The prim ary difference
35
between a solicitation and a capability is th at the capability represents d ata that
is known to exist in the database. The solicitation represents data th a t is desired,
but it is not known whether it exists in the available database.
Analogous to solicitations, there are two types of capabilities. An entity ca
pability (E-cap) represents knowledge of the existence of a particular entity. An
attribute capability (A-cap) has the form (E-cap attribute-nam e {value range}),
where value range is optional. Value range can represent either a single value or a
range of values. If no value is specified, then the capability has the form (E-cap
attribute-nam e *), which, analogous to the SQL query language, implies th at
all of the values associated with the attribute are designated. If the underlying
databases support the relational model, then an A-cap represents a binary rela
tion. This consists of the column designated by the attribute-nam e, w ith each
row of the column associated with the key column(s) of the relation representing
an entity.
In essence, possession of the E-cap indicates th a t one knows about the exis
tence of an entity and possession of one or more A-caps indicates th a t one has
an ability to build a relation, using either a natural or outer join, by subm itting
the capabilities to the appropriate database.
The application of E-caps and A-caps can result in the retrieval of d ata th at
can be used to build new E-caps and A-caps. Hence, there is not a clean separa
tion between the d ata and the query.
There exists one additional primitive of the model th at plays a role in the
query elaboration process in the d ata dimension. In addition to solicitations and
capabilities, the model contains the concept of a filter, A filter is analogous to a
capability and a solicitation, except th a t it restricts (-filter) or enhances (+filter)
the d ata th a t can ultim ately be retrieved w ith E-caps and A-caps. As will be
shown, filters (like solicitations and capabilities) can be elaborated. Also, filters
can propagate from more general solicitations and capabilities to their associated
specializations.
36
Having defined solicitations, capabilities and filters, we can now define direct
query elaboration in term s of figure 3.1. W ith direct query elaboration, the user
begins w ith some initial statem ent of w hat is required. This initial statem ent
is called a query kernel. The query kernel is expressed in term s of solicitations.
The objective of the QBC query comm ittee is to transform the solicitations of
the query kernel into capabilities th a t satisfy principles P3 and P4 to the greatest
extent possible. The use of solicitation in the query kernel decouples the desire
for d ata from the availability of data. It will also perm it the query comm ittee to
modify the solicitation, as necessary, in order to bring it into line w ith w hat is
actually available in the databases (e.g. SPDp< j).
W hat characterizes direct query elaboration is th a t the problem owner is
partially > u> describe the desired data. In the entity discovery type of direct
query elaboration, the problem owner is able to characterize the desired d ata
in term s of unconstrained A-sols. The query com m ittee then has the task of
determining the nature of the A-cap th at is consistent w ith the set of attributes
proposed in the query kernel. A problem owner might describe an unspecified
entity as having a size (diam eter) and a m agnitude. The result of the elaboration
with a set of astronomical d ata would include galaxy or a planetary nebula.
In the attribute discovery type of direct query elaboration, the user proposes
an E-sol or E-cap, and the query com m ittee attem pts to find A-caps th a t are
associated w ith the entity. Thus, a user m ight wish to discover the attributes
associated w ith a galaxy. These would include NGC designation, right ascension,
declination, m agnitude, size and type.
In direct query elaboration, filters are used to lim it the d ata th a t is presented
to the user, based on interactions among A-caps in the resulting elaborated query.
More will be said about this in the presentation chapter.
While direct query elaboration is useful, it requires th a t the user have some
idea of the nature of the SPDp (j data, so th a t he can at least compose a partial
description of the d ata in term s of solicitations or capabilities. However, the user
may not be able to do this. To understand how query elaboration can occur
37
when the user cannot begin to describe directly the desired data, the research
has developed the concept of indirect query elaboration.
3.4 Indirect Query Elaboration
Indirect query elaboration is a common experience in life. The need to find tools
to support some job provides a good example of this type of elaboration. A novice
astronomer may want to photograph deep space objects, but does not know w hat
is required in the form of cold cameras or optical filters. Also, the nature of the
tools changes, depending upon whether the novice wants to photograph the moon
or some deep space object such as a galaxy. To discover w hat is needed, the novice
will not attem pt to vaguely describe the tools, but will describe the activity in
which he wants to engage, relying upon a domain expert to suggest the tools
required.
Real-world activities are described in term s of activity primitives (AP). Each
AP consists of one activity associated w ith one entity. An example AP is (view
deep-space-object), in which the activity view is associated w ith the entity deep-
space-object. Activities are a concept th a t captures the English language verb.
This includes not only action, but also being, as in (exists observatory), which
states th at an observatory exists in the specific real world in which this is asserted.
Activity primitives can be clustered into a set of related activities under an
Activity Cluster (AC).
The d ata dimension of a world represents a data requisition for the activity
primitives specified in the real dimension of the world. A d ata requisition is d ata
th at a hum an would find necessary to participate in the activities specified in the
real dimension. As an example, to support the AP (view deep-space-object), one
needs to know the coordinates of the object (e.g., right ascension and declination)
in order to find it w ith a telescope. These coordinates are specified in term s of
RA and dec. Thus the RA and dec would be contained in the requisition th at
38
exists in the d ata dimension of the world th at contained (view deep-space-object)
in its real dimension.
Figure 3.2 illustrates the first type of indirect query elaboration. This in
volves the specification of the real dimension of the world through one or more
A P’s or ACs. This is represented as P-R F /D U , with no secondary world being
specified. In this case, the real dimension is fully specified, but the associated
data dimension is unspecified. It is the task of the query com m ittee to identify
the components of the data dimension of the prim ary world. The activity primi
tives in the real dimension will result in a d ata requisition in the d ata dimension.
The d ata requisition th a t results from this indirect elaboration could be in the
form of solicitations which m ust undergo further, direct query elaboration.
The second type of indirect entity elaboration involves the partial specification
of the real dimension of the prim ary world. This is indicated by P -R P /D U . In
this type of elaboration, it is the task of the query comm ittee to complete the
specification of the real dimension by identifying related state predicates th at
should be included, or completing the specification of the state predicates th at
are included.
The third type of indirect entity elaboration involves the indirect specification
of an unknown world. This is accomplished by specifying a known world, the
specification world, and the relationship th at this known world has with the
unknown or prim ary world. This is indicated by P-RU /D U , S-R F/D U and is
illustrated in figure 3.3.
A world can be related to another world by world- relationships (w-relation-
ships). A w-relationship relates the activity primitives of one world to the ac
tivity primitives of another. One world can serve as a specification world for
another one, w ith the w-relationship being a realization policy for transform ing
the specification of one world into the realizations of th a t specification under
the m apping of a particular realization policy. For example, one could describe
a specification world consisting of the AP (view deep-space-object), and then
specifying th at this world should have a tools-for relationship to the unknown
39
P -R P /D U Query Elaboration
D ata
Requisition
D ata
Capabilities
Solicitations
Activity
Primitives
Activity
Clusters
Real
Dimension
D ata
Dimension
Prim ary World
Figure 3.2: Indirect Query Elaboration - P-R F /D U
P-R U /D U , S-RF/D U Query Elaboration
W-Relationship
Real
Dimension
Activity
Clusters
Activity
Prim itives
Secondary World
Real
Dimension
Activity
Clusters
Activity
Primitives
D ata
Requisition
Solicitations
T
Capabilities
-
Da
•
ta
D ata
Dimension
Prim ary World
Figure 3.3: Indirect Query Elaboration P-R U /D U , S-RF/D U
world. The prim ary world would contain those tools necessary to support the
activities in world A, w ith the d ata dimension containing information necessary
to bring into being the existence of a telescope in a world - - namely, telescope
selection information. Or one wold describe a tool world, and the same tools-for
relationship and materialize a world th a t contained the activity primitives that
the tool would support and the particular d ata in the database relevant to this
activity.
In the fourth type of indirect entity elaboration, both a secondary and a
primary world are specified. It is the relationship th a t is of interest.
Having characterized the overall elaboration process, the final step in pre
senting the approach is to look more closely at the characteristics of the query
th at results from this elaboration process. This is addressed in the next section.
3.5 Presentation
The following presents the approaches th at could be taken in term s of the nature
of the resultant query:
1. The results of running the form ulated query. This would invoke the capa
bilities of the queries in Q to retrieve the actual data.
2. The form ulated query. This is simply presenting the user w ith the queries
in Q.
3. A set of query options. As will be shown, this is providing some organiza
tion and rationale to the presentation of the query options.
Option 1 lacks credibility. At the outset, the user lacked sufficient expertise
in the domain of inquiry to formulate a desired query, and thus began w ith an
abstract description of the information desired form ulated in term s of a solicita
tions th at had to be transform ed into capabilities. T h at the QBC system could
transform this abstract description into a specific d ata result th a t would be of
42
any possible use to the user stretches ones credulity. For a similar argum ent,
having QBC provide a simple formulated query in some DBMS query language
would also tend to be of little practical value to a user.
Option 3 however, suggests the type of output th at a corporate executive or
m ilitary leader might expect from his or her staff: a staff report in which the
decision options are presented along with the advantages and disadvantages of
each option. These options could even be prioritized according to some specified
criteria, but the actual selection decisions would be left to the executive. W ith
this form of output, the QBC query committee would be serving a role similar
to th a t of a staff. It would distill the data under its jurisdiction into a set of
query options intended to cover the specified desires of the problem owner under
the constraints of both the available data and the inherent knowledge of the
combined query committee. The user would then be free to select the option of
interest.
Having presented a model of query elaboration and described the various
types of elaboration, the next section will outline how the rem ainder of the
dissertation will elaborate on w hat has been presented here.
3.6 Overview of Dissertation
C hapter 4 will present the details of direct query elaboration. This chapter
presents a key development of this research, the polytype variable. C hapter 5
presents details of indirect query elaboration. Then, C hapter 6 presents details
of the approach taken for presenting d ata to the QBC user. This includes how
the cardinality of the presented data is reduced through both filters and lattice
based techniques developed as part of this research.
In C hapter 7, the QBC implementation is described along with the results
of experiments run using the QBC system. Finally, C hapter 8 presents the
conclusion of the research and identifies possible areas for future research.
43
Chapter 4
Direct Query Elaboration
This chapter describes the QBC approach to direct query elaboration. It will
describe the process by which unknown entities are elaborated, based on some
initial guidance provided by the problem owner, who is called the QBC user
(or simply user). The user begins this process by presenting QBC w ith a query
kernel. The query kernel will be elaborated by the query committee to m aterialize
a query th a t is satisfiable by the database. Two forms of direct entity elaboration
exist; entity discovery and attrib u te discovery.
W ith entity discovery, the user - although ignorant of the entity desired -
knows at least some of the attributes of the entity (A-sols or A-caps). He
utilizes the query com m ittee to try to discover an entity th at possesses the desired
attributes. W ith attribute discovery, the user knows the entity of interest and
solicits from the query com m ittee the identity of any additional attributes th at
are associated w ith th a t entity.
It should be noted th a t while this chapter describes processes th a t are used
in direct query elaboration of the d ata dimension of the prim ary world, the al
gorithm s described will also be used in other dimensions to elaborate partially
specified A P ’s and w-relationships. However, to keep the discussion uncluttered,
it will be presented in term s of entities and attributes. The next chapter and
C hapter 7 will show how these same algorithm s are used for indirect query elab
oration.
44
The presentation will begin with a discussion of the entity discovery type of
direct query elaboration. It will conclude w ith the attribute discovery type.
4.1 Entity Discovery
The entity discovery type of direct query elaboration (abbreviated as entity elab
oration) , begins w ith a user’s presenting the query com m ittee with a query kernel
consisting of a description of the desired entity in term s of the attributes it should
possess. From this initial entity specification, the query com m ittee “brainstorm s”
to generate hypotheses th at satisfy at least one of the attrib u te contexts. Fol
lowing the brainstorm ing session, those hypotheses th a t satisfy their context are
selected as consistent entity candidates (CEO’s). Each of these CEO’s represents
a m aterialization of an entity th a t satisfies the original attrib u te constraints. If
no CEO can be found, the problem m ust be reform ulated to one th a t is either
semantically consistent or semantically near the original form ulation. It must
also be satisfiable by the database.
The following are the stages of entity elaboration in QBC:
• Entity specification
• Hypothesis generation
• Entity m aterialization
• Specification reformulation
Since entity elaboration is an iterative process, these stages may each be
visited a number of tim es in the course of an elaboration. The rem ainder of this
section will describe each of these four stages of entity elaboration in detail.
45
4.1.1 Entity Specification
One of the challenges of this research has been the investigation of how a user who
is a domain novice could describe the unknown, since by definition he had insuf
ficient domain knowledge to formulate his own query. But the query com m ittee
m ust begin with some idea of what the user wants.
One key to being able to describe the unknown is being able to nam e it. Thus
in m athem atics, a variable serves this purpose (e.g., X). W ith a nam ed variable,
one has the ability to describe various characteristics th a t this variable should
satisfy (e.g., X > 5).
An analogous approach has been used in QBC. QBC has the need for a
variable-like construct, but one th at operates - at least initially - on types, rather
than data. As p art of the query elaboration process, the query com m ittee m ust
arrive at an understanding of the type of d ata th a t will help the user solve his
problem. Since d ata in QBC is described in term s of entities and its associated
attributes, the type vaxiable will represent an entity and its associated attributes.
To represent the fact th at the type of the entity may be unknown, the type
of the variable can be unspecified. The unknown d ata can then be specified in
term s of an unspecified entity. However, while the type of the entity may be
unknown, the context of attributes in which this entity may be discovered may
be known. Hence, this unspecified entity may be defined in term s of a context
of specified attributes. For example, a user may be looking for som ething th a t
has both m agnitude and size (i.e., diam eter). Thus, he can define an unspecified
entity variable th a t has a specified context of m agnitude and size. Based on query
committee elaboration, such an unspecified entity variable could be instantiated
w ith entities such as planets, moons, galaxies and nebulae.
While this concept of an unspecified type variable provides the basic capabil
ity required for direct query elaboration, there are two additional requirements
th at are of use in the distributed QBC environment. The first of these is th a t
there may be a desire to place some limits on the types th a t can be ultim ately
considered to satisfy an unspecified type. However, since by definition the type
46
of the unspecified type is unknown, one cannot enumerate the set of types th at
can potentially satisfy the unspecified type. However, one can constrain the
types th a t can potentially satisfy the unspecified type to those th a t have some
semantic relationship, such as those th at are specializations of some initial type,
generalization of the initial type or in some sense semantically near the initial
type. Thus, an unspecified entity variable th at is constrained to be related to
deep space would be satisfied w ith entities such as galaxies and planetary nebu
las, but not planets and moons since these are not considered specializations of
deep space entities.
The second additional requirem ent stems from the fact th a t the user may
initially define the constrained unspecified entity or the specified attributes of
this entity in terms th at are not generally known to those agents having the
desired data. In a sense, the user has specified the query kernel in term s of
E-sols and A-sols and there is a need to transform these initial solicitations into
capabilities th at are known to the agents holding the data. This requires a
generation of new E-sols and A-sols th at may be known to a wider group of
agents until ultim ately the E-sols and A-sols can be generated into E-caps and
A-caps.
To accomplish this, the type variable should be able to be referred to by a
num ber of different names. Each of these names is the name of a type or the
name of a specific entity and its associated attributes. In term s of the model,
these entity names are E-sols until they refer to an entity th at is actually stored
by a database. The attributes of an entity are also considered to be entities
in their own right, and thus are identified with E-sols or E-caps. However, the
combination of an attribute w ith a particular entity is term ed an A-sol or A-cap,
depending upon whether it is retrieved from a database.
To support the ability to name the unknown and then to describe desired
characteristics of this unknown, this research has developed the polytyped term .
The poly typed term can be used to represent type variables and type constants.
The polytyped variable provides a means not only to name the unknown, but
47
also to accumulate m ultiple related names (e.g., E-sols and E-caps) for it in the
course of the elaboration process. The polytyped term provides a means by
which one can associate a context (e.g., attributes) w ith the poly typed term .
These attributes are also represented with polytyped term s, and hence can also
be known by different names as the elaboration process proceeds. A polytyped
term can initially be unspecified, or it can be unspecified but constrained, in
which case its initial E-sol can be specialized, generalized, or subjected to a
transform ation to generate semantically near E-sols.
The multiple, related names of a polytyped term are called syns (for syn
onym). Thus a polytyped term is known by its synlist. For example, given some
polytype term X, w ith initial type deep-sky object, a synonym transform ation
m ight convert this to deep-space object. A specialization transform ation might
transform this to galaxy. The result is th a t the synlist of this polytyped term
would consist of deep-sky, deep-space and galaxy. All of these are, in a sense,
synonyms (using a very broad definition of synonyms) of the initial type deep-sky
object. Each of these synonyms represents E-sols th a t are related, in this case,
through the specialization transform ation.
In summary, one im portant contribution of the polytyped variable is th a t it
provides a means of talking about an unknown type. The other im portant contri
bution is th at it provides a means to constrain the types th a t can be assumed by
the polytyped variable. This has the effect of constraining the unknown, w ithout
the user having to know w hat is present in this unknown search space.
Having provided a means of nam ing the unknown entity, and even placing
some constraints on the types th a t are acceptable for satisfying it, the next
step in specifying an unknown entity is to describe whatever characteristics are
known about it. This is the context in which the unknown type will be found.
Since entities are characterized by their attributes, the user can characterize
the nam ed polytyped variable representing an unknown entity w ith the set of
attributes th at the desired entity should have. These attributes are called the
48
context of the entity, and during the next stage of processing will be used by the
agents to identify the unknown entity.
This concept of using the context to describe an entity was also used by
M organstern in the active database (M organstern 1984). He defined a descriptor
th at contained a partial list of the attributes and the relationships th at the
desired entity should have w ith other entities. However, in contrast to the QBC
approach, the entity type was specified in the active database; QBC allows the
entity to be unspecified as to type.
Similar to the entity itself, the attributes of the entity are also represented
by polytyped variables. This will facilitate query reform ulation to be described
shortly. Each of the type values for the attributes is represented by an E-sol,
which in combination with the synlist of its associated entity is called an A-sol.
It should be noted th a t w hat we have described Is an attribute, which rep
resents some known type; hence, the set of attributes provides a known context
for the unknown entity. However, the attribute has been represented w ith a
polytyped variable, whose initial value is this known attrib u te type. W hat this
means is th a t this fixed attribute type can itself assume the various values of a
constrained polytyped variable, subject to the initial type value and the trans
formations th at can be applied. Thus, the attribute provides a type of variable
constant th a t can be reform ulated as needed - in controlled ways - to adapt itself
to the available d ata in the database.
4.2 Hypothesis Generation
The polytyped entity variable provides a focus point for the next stage of query
elaboration. This consists of agents “brainstorm ing” to suggest entity values
th at are consistent w ith both polytyped entity variable and at least some of the
attribute context of the entity variable. These suggested entities are term ed hy
potheses, since they m ust undergo further processing before they can be adopted
as suitable type instantiations for the entity variable.
49
Each of the query comm ittee members th a t represents a database m aintains a
m etadatabase (or intensional database) th a t describes the extensional database
in term s of the entities and attributes represented in the database in an E-R
m odel-type approach (Chen 76). To determ ine whether it can participate in
the hypothesis generation process by posting a hypothesis, an agent searches its
intensional database for an attribute th at is polytype-equal to a characteristic
attribute in the query kernel. Polytype-equal means th a t there exists a non-null
intersection between the synlist of the attributes of the unspecified or constrained
entity and the synlist of an attribute of the agent’s database. If there is an
attrib u te th a t is polytype-equal, then the agent posts the entity name (E-sol)
w ith which th e attribute is associated as a hypothesis for the unspecified entity.
It posts the hypothesis irrespective of whether the query kernel’s unspecified
entity is constrained or not. Any validation of hypothesis type against the type
of the constrained, unspecified entity will wait until all of the knowledge is posted,
since later knowledge may be able to relate a posting to the original type of the
constrained entity.
Another im portant aspect of hypothesis generation is extending the synlist
of the query kernel entity. This is called solicitation elaboration, and has the
effect of widening the names under which a particular polyvalued term is known.
This will facilitate the hypothesis m atch process. This process of solicitation
elaboration and attrib u te hypothesis generation progresses until no agent has
any additional hypotheses to post or solicitations to elaborate.
4.2.1 Entity Materialization
After a brain-storm ing session, in hum an problem solving the suggested ideas are
sifted to find those th at both satisfy the original constraints and are realizable.
In this case, satisfying the initial constraints means th at there exists an entity
hypothesis th a t is consistent with each of the characteristic context attributes.
In this case, the hypothesis is said to be a cover for its context. Also, if the
50
unspecified target entity is constrained, then an acceptable hypothesis m ust be
related to the initial type of the constrained polytyped variable representing the
entity.
A consistent cover exists if an identical hypothesis has been posted with
respect to each of the attributes associated w ith the unspecified E-sol. This
means th a t the hypothesis is consistent w ith the conjunction of its context. This
is the strongest cover th a t can be formed, since there is positive affirmation th at
the hypothesis has actual database support for each of its context attributes.
This is term ed a strong cover.
It is interesting to note th a t one is not concerned w hether each agent or
th a t even a particular set of agents proposed the hypothesis. T he context of
the hypothesis is all th at m atters. W hat this says is th a t if there exists local
unanim ity with respect to a hypothesis, then the hypothesis is taken as a valid
hypothesis. It is said to be consistent w ith its context.
Note th at a strong cover can only occur if all of the attributes associated with
the unspecified E-sol have been validated w ith A-caps. If such a hypothesis can
be found, the hypothesis constitutes a consistent entity candidate (CEC).
If the query kernel contains m ultiple unspecified entities, then it is desirable
to build a consistent cover consisting of one CEC (which is an E-sol) from each
entity, such th a t together they are self-consistent. In this case, two E-sols are
self-consistent if they would norm ally be used together. This can be determ ined
through consultation of higher-level constructs in QBC (such as activity prim i
tives) to see if the term s are used together. Or, if a semantic model exists for the
general application area, then two term s are self-consistent if they both exist in
the semantic model and are joined w ith a relation. The other approach, currently
taken in QBC, is to let the user decide w hether the term s are self-consistent. This
has the advantage of not discarding potentially useful results simply because the
knowledge base of the query comm ittee is not yet sufficiently rich.
51
4.2.2 Specification Reformulation
The mechanism for specification reformulation is included in the concept of the
polytyped variable and polytyped constant. As noted, the polytyped term con
sists of synlists of multiple types th a t are related to each other through special
ization, generalization and near relationships. The most desirable relationship
between one E-sol and another is th at of specialization. If a user is looking for
an unspecified entity with attribute A and finds an entity w ith an attribute B
th at is a specialization of A, th at should be totally acceptable, since B is an A
(i.e., satisfies the isa relationship). For example, a user might have form ulated a
query kernel describing an unspecified entity w ith attributes of m agnitude and
distance. One of the databases available to the query committee is th a t for stars,
which includes among other attributes, absolute-magnitude and distance. While
this will not initially lead to a CEC (since there is no m atch between m agnitude
and absolute-m agnitude), an agent w ith the knowledge th at absolute-m agnitude
is a specialization of m agnitude adds absolute-magnitude to the synlist of the
unspecified type. Now star will result in a CEC, since the initial query kernel at
tributes of m agnitude and distance will be polytype-equal to absolute-m agnitude
and distance.
However, while this is technically a reformulation of the problem, it does
represent a m inor one. Since one of the objectives of QBC is to bring the query
kernel into conformance w ith the database, such th at a query can successfully
be form ulated, a more drastic reformulation may be required. In this case, the
attribute m ay have to be generalized to attain a match. In the m ost drastic
situation, synonyms th at semantically approach the original would be generated
to facilitate the making of a m atch.
If a strong cover cannot be formed, there are then two strategies to form one.
First of all this means th a t the posted hypotheses do not m atch all of the required
attribute contexts. In this case, the strategy is to relax the requirement th a t the
hypothesis be consistent w ith all original attributes in the query kernel, and
select the hypotheses th at are consistent w ith the greatest num ber of attributes.
52
This is posted as the CEC. A variation of this would be for the user to assign
weights of importance to each of the context attributes for the unspecified entity,
and allow the query committee to select those hypotheses with the maximum
weight for the CEC. Of course, if this strategy is pursued, then the attribute not
matched would be dropped from the polyvalued term representing the entity.
4.3 Attribute Discovery
Once the desired entity is identified by the query com m ittee, the next task is to
find all of the attributes of th a t entity th a t might be relevant in m aking an access
decision. For this stage of query com m ittee processing, the agents attem pt to
answer the question, “Given the proposed entity, w hat additional information in
the form of attributes is available for this entity?”
For example, if a QBC user wanted an an entity th a t had both m agnitude
and size, the current database would propose the galaxy and the nebula as CEC.
These entities would then be presented to the query com m ittee to suggest addi
tional attributes th a t are associated w ith these entities. Having defined a galaxy
as one of the CECs, this current stage of processing would seek to discover ad
ditional attributes known to the query com m ittee for the galaxy. To accomplish
this, each member of the query com m ittee looks a t the E-cap, which is galaxy,
and checks to see if they know of any attributes (beyond those of m agnitude and
size) associated w ith galaxy. These additional attributes are posted as additional
context for the polytyped term representing galaxy. In this case, the additional
attributes th a t would be posted include NGC num ber, right ascension, declina
tion, surface brightness, distance and shape.
This represents another brainstorm ing stage and is analogous to attribute-
based entity elaboration. The difference is th a t during this stage, the agents key
off of the entities and supply attributes, while in the previous entity elaboration,
the agents keyed off of the attributes and supplied entity hypotheses.
53
Chapter 5
Indirect Query Elaboration
The previous chapter considered the case in which the entity was unknown, but
could be described in term s of its characteristic attributes. However, it may
weii be the case th at the user cannot begin to describe directly, even abstractly,
the d ata th at he desires. In this case, the user indirectly describes the d ata in
term s of real world activity th a t the d ata is intended to support, or relationships
between real-world activity, where activity connotes both action and being.
The rem ainder of this chapter will describe the four types of indirect query
elaboration th a t have been identified by this research.
5.1 Type One Indirect (P-RF/DU)
The goal of the P -R F /D U type of indirect entity elaboration is for the user to
be able to make a declarative description of some desired state. Based on the
nature of this desired state, the query com m ittee will propose the d ata th at is
required to support the state. There are two problems th a t have to be resolved
to accomplish this. The first is to provide a means by which the user can describe
the activity for which the d ata is needed. The second is to provide a means to
process this activity description in order to derive the d ata requirements of the
activity. Each of these problems will be considered in turn.
54
5.1.1 Activity Description
In hum an interaction, activities are described in natural language. However, since
natural language understanding is not a focus of this research and not particularly
germane to the QBC research, this was not considered a viable option.
Numerous machine processable means exist for describing real-world activi
ties. These include various languages, including conceptual dependency graphs
(Shank 77), predicate calculus, which has been used widely in the formal speci
fication of software (Scheid 1989), and scripts (Rich 1983) to name some of the
m ain representations.
While any of these approaches could have been adopted for describing activi
ties to the query committee, they were not particularly supportive of the need to
derive required d ata from the description of an activity. Since this d ata deriva
tion represents a new focus, the approach taken was to begin w ith the simplest
activity description possible in order to understand the issues involved and then,
if necessary enhance it to provide the needed capability.
If one looks at the English sentence structure for guidance, one of the sim plest
sentences is the declarative one consisting of subject, verb and direct object.
The following are examples of activities th a t can be expressed w ith a simple
declarative sentence:
1. I view Mars.
2. He purchases telescope.
3. He visits observatory.
4. I attend meeting.
These simple sentence examples lack adjectives, adverbs and indirect objects,
but they are capable of expressing a great num ber of activities. However, these
sentences can be simplified even more if the verb is associated w ith ju st a single
noun. This results in the following statem ents, which (while not commonly
55
spoken English are on a par with w hat a baby might u tter and babies are very
accomplished at getting their needs satisfied:
1. I view.
2. View Mars.
3. He purchases.
4. Purchase telescope.
5. He visits.
6. Visit observatory.
7. I attend.
8. A ttend meeting.
Each of these binary relationships between a verb and a noun is called an
activity primitives (AP). The noun p art of the AP is represented by an entity.
The attributes of this entity are analogous to adjectives th at modify the noun.
The verb is represented by an activity.
The following represents examples of APs:
1. (I view)
2. (View Mars)
3. (He purchases)
4. (Purchase telescope)
5. (He visits)
6. (Visit observatory)
7. (I attend)
56
8. (Attend meeting)
As can be noted, there are two forms of the activity primitive: (verb entity)
and (entity verb). In the (verb entity) activity primitive, the action of the verb is
applied to the entity, thus the entity has the role of direct object. In the (entity
verb) activity predicate, the entity is applying the verb. Thus the entity has
the role of the subject in an English sentence. W hat appears to be missing is
the indirect object - represented by K ate, in the sentence “Laura gives K ate a
telescope.” The activity primitives can represent (Laura gives) and (gives tele
scope). The fact th at K ate receives the telescope can be represented by (receives
K ate). Since a particular real world is assumed to be specified only by the state
predicate existing in th a t world, there is no possibility th a t the telescope is given
to anyone other th an K ate or th a t K ate receives som ething other than the tele
scope, since no other receiver or object is specified. If I want to say th a t Chuck
gives M arcia a telescope and Chuck gives Tom binoculars, then this would have
to be expressed in separate worlds.
Although this notation does not have an indirect object, an indirect object
can be represented by the second statem ent “Receives K ate.” All of these A P’s
can be bound into a single state predicate w ithin a world. The complete state
predicate would be represented by [(Laura gives), (gives telescope), (receives
Kate)]. This state predicate would be sim ilar in meaning to [(Laura gives),
(receives telescope), (receives Kate)]. This could be represented by showing the
A P’s (receives telescope) and (gives telescope) as specializations of a generalized
AP called (transfer telescope) or even a more general AP (transfer thing).
This state predicate could also be represented by the state predicate [(Laura
gives), (gives ball), (Inv(gives) Kate)], where Inv represents the inverse of gives.
This is useful if a particular verb does not have an inverse in English.
If there is no specified subject, the query com m ittee is assumed to be the
default subject. Thus, one could have a predicate [(View Mars)], where the
implicit viewer is the query committee. This (verb entity) activity prim itive has
57
been found to have sufficient expressive power for the astronomical database;
hence it alone is used.
Since the approach being described straddles both d ata and real-world ac
tivities, a notation is provided to make explicit what is being talked about. If
there is any doubt, real-world objects are indicated by (rw object). Thus, (rw
star) indicates an actual star. The notation (dw star) represents d ata about a
star. While (visit (dw star)) is possible, (visit (rw star)) is not. W hen an en
tity is referenced, the default meaning is assumed to be to the real-world object
represented by the entity.
As an example, assume th at a user describes a desired state as (view (rw
deep-space-object)). This represents a predicate th at the user would like to be
tru e in the user’s desired state. Members of the query committee would post
solicitations or capabilities required in a state in which the (view (rw deep-space-
object)) was true. This posting of d ata occurs until each agent has contributed
all the solicitations or capabilities of which it was aware to support the real-world
activity of viewing a deep space object. While this does not guarantee th a t all
necessary d ata has been contributed, it does guarantee th a t all solicitations or
capabilities of which the query comm ittee is aware has been contributed.
Having provided a notation w ith which the user can specify the desired ac
tivity for which d ata is required, the next section will look at how this state
predicate is transform ed into a set of solicitations, using an approach called
dominance-based knowledge factoring.
5.1.2 Dominance-based Knowledge Factoring
The first issue th at needs to be addressed is the connection between an activity
and data. If an activity is described as indicated above in some state predicate,
this describes action, and not data. To understand the connection between an
activity and data, we will use a function called Data-needed-for. This function
58
perform s a m apping from the set of activity primitives to the set of data entities
needed to support the activity.
D = Data-needed-for (AP)
The data-needed-for function returns solicitations or capabilities for all of the
data needed to support the activity primitive. While the agents are incapable of
determining w hether all of the required d ata is present, a particular agent can
assess whether - from its perspective - it possesses any data in support of the
AP. If so, it will then provide the missing data.
To support the transform ation of a state predicate into a set of solicitations
1 requires knowledge of th e d ata th a t should be associated with a particular
activity and entity. T he d ata structure th a t holds this knowledge is called a d ata
requisition, since it represents the d ata th a t m ust be requisitioned from some
d ata repository in order to support the specified activity and object.
The simplest type of requisition consists of a fixed set of solicitations. This
set-structured requisition can be identified w ith an absolute naming convention
of the form activity-object. This simple requisition is called a fixed activity/fixed
object/fixed solicitation set (FA /FO /FS) requisition. W hen the user specifies
the activity and the entity, the activity name, and entity name are formed into a
requisition name and the requisition set is searched for one th at has the desired
name. The fixed set of solicitations in the requisition is then placed on the
blackboard for the query comm ittee to transform into capabilities. If successful,
the set of capabilities will represent the d ata th a t is required to support the
specified activity.
While the F A /F O /F S requisition is effective, it lacks generality, requiring th a t
a new requisition be defined for each activity/object pair. To provide a more
general requisition structure means th a t the requisition should be applicable to
1 State predicates can also be transformed into capabilities, but the more general case is
to transform them into solicitations. These are then transformed into capabilities, using the
approaches described for entity-based query elaboration discussed in the previous chapter.
59
more cases than just the single activity/object pair. This can be handled in a
num ber of ways.
The first level of generalization comes w ith the realization th a t if a requisition
is representative of the d ata required for an activity applied to some entity E,
then it is also applicable to the activity applied to the specialization of E, S(E),
since S(E) is still an E. This can be addressed by representing the object portion
of the requisition with a polytyped name, with the synonyms restricted to types
th a t are specializations of the initial object. As appropriate, this specialization
constraint can be relaxed to include synonyms th a t are semantically near the ini
tial object. This first-level requisition generalization using the specialization and
possibly the near-based synonyms is called a fixed activity/variable object/fixed
set (FA /V O /FS) requisition.
In a similar way, the activity nam e can also be represented by a polytyped
activity name. The various synonyms for the activity nam e are generated using
transform ations th a t operate on activities to generate synonym activities th at
are specializations of the argum ent or near the activity argum ent to which the
transform ation is applied. A requisition th a t has a polytyped activity nam e as
well as polytyped object names is called a variable activity/variable object/fixed
set (VA /V O/FS) requisition.
Up to this point, the requisition naming techniques have provided a m ethod
of broadening the name space to which the requisition responds by generating
synonyms th a t are either specializations of, or in some sense, sem antically near
the nam e under which the requisition was initially specified. However, in all of
this, the definition of the requisition itself has not changed. It is the same fixed
set of solicitations, whose applicability has ju st been broadened to a larger set of
activities and objects.
However, while the requisition will be applicable to A P ’s whose activities
or entities are specializations of the AP for which it was defined, the static
requisition does not take into account additional d ata th a t may be required
for the more specific activity and object. This can be accomplished through
60
an inheritance technique which has been nam ed Dominance-based Knowledge
Factoring (DKF).
A classification hierarchy can be established for entities. This indicates
w hether a particular entity is a specialization of another entity according to
an isa relationship, or whether one entity is a p art of another, according to an
aggregation relationship (Hull 1987). In a sim ilar way, one can establish a clas
sification hierarchy for activities. These classification hierarchies can be viewed
as existing on two planes: an activity plane and an entity plane. A directed
line connecting an activity on the activity plane to an entity on the entity plane
represents a (verb/ entity) AP. Likewise, a directed line from the entity plane to
the activity plane represents an (entity verb) AP. As noted, since a requisition
can be associated with an AP, the requisition can be associated with the directed
line connecting the activity on the activity plane w ith the entity on the entity
plane.
The goal of DKF is to provide a structure to accomm odate distributed knowl
edge about the nature of the d ata required for a particular requisition. DKF
provides a means by which agents can know if they have some fragm entary part
to contribute to a requisition. In line with this, we will define the concept of a
req-fragment (r-frag), which represents a portion of a requisition th a t together
with other r-frags will comprise as complete a requisition as the query committee
is capable of producing.
In line with this, the following discussion will first present a structure by
which requisitions can be stored in term s of r-frags. It will then explain how
this structure will facilitate the collaboration of m ultiple agents in the pooling
of their r-frags to produce a complete requisition.
As noted, a requisition can be associated w ith each AP line. However, if there
are n activities and m entities, then this will constitute m x n A P’s. While m x
n complete requisitions could be stored for each AP, m any of these requisitions
would be identical since they would cover various specializations of activities or
entities for which there were identical requisitions.
61
To avoid this duplication, the knowledge th at comprises a requisition will be
factored, w ith each factor th at constitutes an r-frag stored only once. Then,
when a complete requisition is required for a given AP, the requisition will be
built from the appropriate r-frags.
The key to this factorization and re-assembly is recognition th a t the A P ’s form
a lattice, and th a t the A P ’s can be compared using a dominance relationship.
W hen there is no need to distinguish between an activity or entity in an AP, these
will be referred to as AP term s. We can then define the dominance relationship
on AP term s. An AP term Tj dominates another term Tj, if Tj is either equal to
or more general than term Tj.
We can also talk about whether one AP dominates another. For A ,- and A j
which are APs, Aj totally dominates A j if th e entity in Aj dom inates the entity
in Aj, and the activity in Aj dominates the entity in Aj. Aj activity dominates
Aj if the activity in Aj dom inates the activity in Aj, but the entity in Aj does not
dom inate the entity in Aj. Aj entity dominates Aj if the entity in Aj dominates
the entity in Aj, but the activity in Aj does not dom inate the activity in Aj.
If Aj both activity dom inates and entity dom inates Aj then Aj totally dom
inates Aj. The case could also exist in which A, activity dom inates Aj, and Aj
entity dom inates Aj. In this case Aj is said to cross dominate Aj, and Aj is said
to cross dom inate Aj.
Propagation of an r-frag from Aj to Aj can occur if Aj totally dom inates Aj.
However, if Aj activity dom inates, entity dominates or cross dom inates Aj, then
one could also propagate r-frags w ith the recognition th a t while some additional
useful d ata may be requisitioned, some irrelevant d ata will also be requested.
Thus, additional recall is achieved at the cost of some lost precision.
An activity cluster (AC) is defined as a d ata structure th a t contains a num ber
of A P’s. While an AP represents primitive activities, the AC represents a set of
A P ’s th a t have become related to form a non-primitive activity.
62
The A P’s in an AC can be either joinable or disjoint. If they are joinabie, this
means they have term s such th a t they can be connected, based on the common
term . If they are disjoint this means they have no common terms.
An AC formed from joinable A P ’s can be formed w ith a join based on the
entity or the activity. M ultiple-entity-connected A P ’s, where each AP has the
same entity are called an entity centered AC. This represents different activities
associated with the same entity.
Similarly, m ultiple-activity-connected A P ’s, where each AP has the same ac
tivity are called an activity centered AC. This represents m ultiple objects that
are all subjected to the same activity.
Finally, an AC can contain other AC’s. This is called a compound AC. An
AC th a t contains only A P’s is called a simple AC.
An activity cluster Cj one-to-one dominates another cluster Cj if there is a
one-to-one, onto m apping from the A P ’s in cluster Cj to the A P ’s in cluster Cj
and each AP in cluster Cj dom inates the A P ’s in cluster Cj to which it is m apped.
A second, more general definition is not used due to some undesirable side
effects. This second definition states th a t a cluster Cj superset dom inates another
cluster Cj, if for each AP in cluster Cj there exists an AP in cluster Cj, and the AP
in cluster Cj dom inates the AP in cluster Cj. Under this definition, th e dom inant
cluster may have A P ’s th a t have no counterpart in the superset dom inated AP.
The problem w ith using this definition will be explained shortly.
The objective of this development is to be able to factor the knowledge re
garding the composition of a requisition. To this end, r-frags can be associated
w ith both A P ’s and AC’s.
To provide the greatest coverage and thus reduce the num ber of explicitly
stored r-frags, the r-frag should be associated w ith the m ost general AP possible.
Then, all of the A P ’s which are dom inated by this m ost general AP will inherit
this r-frag, as well as all other r-frags th a t dom inate it. It will be noted th at the
propagation of r-frags differs considerably, depending on w hether one-to-one or
superset dominance is used. The problem w ith superset dominance is th a t if an
63
AC has even one term in common w ith another AC, then it will inherit all of its
r-frags, even though the AC’s may have totally different objectives.
The r-frags associated w ith an AC are themselves factored. Since an AC
consists of A P’s, each AP either has r-frags directly associated w ith it or r-frags
inherited from more general A P ’s. In addition, the AC may have r-frags th a t
cannot be associated w ith a particular AP (or component AC), but m ust be
associated w ith the AC itself or w ith more general versions of the AC. These are
term ed cluster specific r-frags. Unless an r-frag applies to all components of the
AC, it should be associated w ith the appropriate AP, rather than the AC. Those
r-frags th a t are associated w ith an AP - either directly or by inheritance - are
called cluster independent r-frags, since they exist based on the nature of the AP,
rather than on the nature of the cluster in which they find themselves.
By keeping the r-frag associated with the AP rather than w ith the AC, the
r-frag will autom atically be included in all AC’s in which the AP finds itself. If
the r-frag had been associated w ith the AC, then each new AC th at required the
same r-frag (by virtue of the fact th at it includes a particular AP) would have
to duplicate this r-frag. While the discussion to this point has described the
approach in term s of r- frags associated with A P ’s and AC’s, in reality the A P’s
and AC’s provide a space in which the various agents can reason about their r-
frag contribution. As a structure of A P ’s and AC’s relevant to the current query
elaboration is constructed on the blackboard, each agent will use this AP and AC
structure to determ ine when it can contribute an appropriate r-frag. Thus, for
example, if the agent recognizes th a t it holds information about a generalization
of some AP th a t has been posted on the blackboard, then it knows th a t it can
contribute the relevant r-frag.
However, an agent may not initially recognize th a t it can make a contribu
tion, since it may not know th a t the AP about which it possesses knowledge
is a generalization of the current AP. This fact can be provided as agents post
knowledge to generalize the current AP or AC’s since the generalization poten
tially broadens the audience of agents th at can make an r-frag contribution. This
64
ability to broaden the name space is supported by representing A P’s and AC’s
as poly typed variables w ith the additional names, as they are contributed by
agents, being posted to the synlist of the appropriate AP or AC.
5.2 Type Two Indirect (P-RP/DU)
This section will look at the case in which the real dimension of the prim ary
world is only partially specified initially. In this situation, the query comm ittee
has the task of completing the specification of the real dimension. Once this is
complete, then Type One indirect processing (i.e. P-R F/D U ) can be performed.
A useful means of approaching the Type Two indirect elaboration is to rec
ognise th at the real dimension is comprised of four different types of constructs,
including; simple entity centered AC, simple activity centered AC, compound AC
and finally, just the AP. Each of these constructions can be partially specified
initially and thus form the target for Type Two indirect elaboration.
The next four sections will discuss the different strategies used for the elab
oration of the partial specification of each of these constructs. Following this
will be a discussion of the problem reformulation th at occurs if one of the four
elaboration strategies cannot be applied initially.
5.2.1 Partially Specified Entity Centered AC
Ju st as entities could be recognized from the attributes in entity-based elabora
tion, AC’s can also potentially be recognized from their AP patterns. The A P ’s
associated with an AC form a p attern th a t identifies the AC.
Recall th at an entity-centered AC consists of two or more A P ’s joined at a
common entity. Thus, the form of this cluster is a set of activities surrounding
a common entity. This represents a number of activities in which a particular
entity may participate. An incomplete entity-centered AC could then involve
one of two cases: Either one or more of the activities is missing or the entity is
missing.
65
If the entity is present and one or more of the activities is missing from the
partial pattern, such th a t there is not a complete m atch w ith any AC held by
an agent, then the entity becomes the key for the query committee to search
for an entity-centered AC th a t is based on the same entity. It also includes the
activities of the partial p attern as a subset. If a m atch is found, then the AC on
the blackboard can be enhanced w ith the activities present on the agents’ AC.
Then the r-frags associated w ith the newly m atched AC can be posted to the
blackboard AC.
If the entity is missing, then in effect there exists no AP. There exists only a
set of activities th a t forms an initial activity set. This activity set provides the
basis for the query com m ittee to search for AC’s for which the initial activity set
is a subset of She activities associated w ith the AC. These AC’s are then posted
as hypotheses th a t satisfy the constraints.
5.2.2 Partially Specified Activity Centered AC
An activity-centered AC consists of two or more A P’s joined w ith a common
activity. The form of this cluster is a set of entities surrounding a common
activity. This represents a num ber of entitles th a t are involved in a particular
activity. As w ith the entity-centered AC, there are also two cases w ith the
activity-centered one: E ither one or more of the entities is missing, or the activity
is missing.
If the activity is present, but the entities form a subset of the activities as
sociated w ith known AC’s, then the missing entities can be added by the query
comm ittee in the form of hypotheses. If the activity is missing, then there exists
no A P ’s, but only a set of entities. In this case, the objective of the query com
m ittee is to search for one or more AC’s th a t contain the initial set of entities as
a subset of the entities associated with the AC. If such an AC exists, then it is
posted as a hypothesis.
66
5 .2.2.1 P a rtia lly Specified C om p ou n d C lusters
The compound AC is an AC th at is comprised of other AC’s. As such, a partially
specified AC is an AC whose components do not completely m atch any compound
AC known to the query committee. The goal of the query com m ittee is to discover
if any compound AC exists for which the components of the partially specified
AC represent a subset. If such an AC is found, this is posted as a hypothesis,
and the partial specification can be filled out w ith the missing components from
the hypothesis.
5.2.2.2 P a rtia lly Specified A P
Since an A P consists of an entity and an activity, a partially specified AP consists
of either an entity alone, or an activity alone. In this case, the query comm ittee
attem pts to find A P ’s known to the query com m ittee th a t m atch on either the
entity or activity th a t is present. For example, the initial query kernel could
include an activity primitive th at included only view. The query committee
will suggest view-deep space and photography-deep-space. The latter shows up
because view dom inates photograph.
If desired, since the AP can form the basis of an activity cluster, the query
committee can attem pt to build an AC structure around the AP. In this way, a
more complete real-world context will be m aterialized for the AP, providing the
basis for the generation of more complete requisitions.
5.2.3 Problem Reformulation
If no m atch can be found, then a strategy is needed to decide w hat type of
processing to pursue next. As was the case w ith entity-based elaboration, if no
m atch is found, then the query comm ittee attem pts to modify the initial partial
pattern to one for which a m atch can be made. To this end, the polytyped
variables will be used for the A P ’s of the partial pattern. We will now consider
the ways in which a partial pattern might not m atch.
67
If the partial pattern is in the form of an AC, it is either entity centered or
activity centered. In either case, to look at this in the m ost general way possible,
one has a central term and one or m ore context term s. If no m atch can be found,
this could be due to two reasons. The first is th at the query com m ittee knows
of no AC th at the central term matches. In this case, the strategy is to attem pt
to modify the central term so th a t a m atch can be made. By representing the
central term as a polytyped variable, it can then serve to collect synonyms,
including term s th a t are specializations, semantically near, or related through
some generalization.
The second reason for not m aking a m atch is due to the fact th a t the query
comm ittee knows of no AC, such th a t the user-supplied context term s form a
subset of the context terms of the known AC. By representing the context term s
as polytyped variables, this provides a basic mechanism to see if there is any
possibility of a m atch based on specializations, nearness or generalizations. If
this does not produce a m atch, then attem pts will be m ade to see if a m atch
can be m ade by dropping some of the context term s. The query committee
will identify the AC th at m atches the central term and best covers the context.
T h at AC will then be proposed as a hypothesis. The objective is for the query
com m ittee to find the best result based on the d ata available.
5.3 Type Three Indirect (P-RU/DU,S-RF/DU)
In Type Three elaboration, the user specifies the real dimension of the prim ary
world in term s of a secondary world and a relationship - w-relationship - th at the
secondary world should have w ith the prim ary world. The objective is for the
secondary world to serve as a specification for the prim ary world. Thus, the S in
the P-RU /D U,S- R F /D U notation could be read as secondary, or specification
world. The w-relationship represents a realization policy for translating the
specifications of the specification world into the state predicates of the realization
world.
68
The world th a t realizes this specification world is called the prim ary world.
This world is empty initially, and it is the task of the query com m ittee to populate
this world based on the contents of the specification world and the nature of the
realization policy th at is in effect.
The realization policy is relationship th a t the specification world should have
w ith respect to the prim ary realization world. The realization policy itself can
be elaborated through the identification of specializations of th a t policy or other
policies th a t are near the initial realization policy. The realization policy is
represented by a polytyped variable th a t can have numerous synonyms related
to the initial policy through specialization or near relationships.
The function of a w-relationship is to m ap a set of activity primitives or clus
ters from the specification world into a set of activity primitives and clusters in
the prim ary world, where the prim ary world A P ’s and AC’s have a relationship
specified by the w-relationship. In effect, the w-relationship is indexed by a spec
ification world AC, which contains the AC’s and A P ’s of the specification world,
and a realization policy th at should govern the transform ation of specification
world to prim ary world.
Polytyped term s are used to represent both the AC and the realization policy
and thus, analogous to the AP, provide a binary context to the poly typed term
representing the w-relationship. Thus the w-relationship can be dealt w ith in
ways analogous to those used for A P ’s in the real dimension of the prim ary
world, or entities in the d ata dimension of the prim ary world. As w ith the other
polytyped term constructs (e.g. A P ’s, AC’s and Entities), the w -relationship also
contains a requisition. However, in this case the requisition is for transform ations
required to change the specification world into the prim ary world.
In the simplest case, which is used in QBC, the transform is explicitly stored
by storing in the w-relationship the requisition for the prim ary world th a t is
associated w ith the specification world and the realization policy. However, the
transform ation could also be stored in term s of a program , such as an expert
system whose purpose would be to diagnose the specification world in term s of a
69
particular type of relationship to the prim ary world. It would then configure the
prim ary world w ith A P’s and AC’s. This is analogous to the type of activity th at
other configurators use such as R1 (Hayes-Roth, W aterm an and Lenat 1983). An
example of a w-relationship is “use-to-instrum ent” , which m aps a particular use,
such as view-deep-space object, to a set of astronom ical instrum ents.
Analogous to the A P ’s, the w-relationships form a lattice based on their spec
ification world AC and the realization policy. We can then define the dominance
relationship on w-relationships. A w-relationship Wi dom inates w-relationship
Wj if the AC and realization policy of W{ both dom inate the AC and relationship
of Wj. In the case, of the AC, the dominance should be one-to-one. The issues
with respect to one realization policy is som ewhat counter-intuitive.
In tins research, the basis for dominance is the complete inheritance of the
properties of the dom inating term by the dom inated term . Thus, if term A domi
nates term B, then B inherits all of the attributes and requisitions associated with
term A. In the case of a realization policy, one realization policy R, dom inates
another realization policy R j if the set of prim ary world realizations generated
by R j contains the set generated by JR ,-. Under this definition, the dominant
case is one producing d ata th a t is a subset of the dom inated case. While this
might seem backwards, it is consistent w ith the fact th a t, under this research,
specializations (which are dom inated by generalizations), inherit all the proper
ties of the generalization, and plus those properties th a t m ay be unique to the
specialization alone. Thus, the set of inform ation associated w ith a specialization
is greater than th a t from which it inherits. The same principle is used here.
As w ith A P ’s and AC’s the knowledge is stored as requisition fragments which
are then assembled into complete requisitions base on the inheritance of r-frags
from dom inant w-relationships. A w-relationship inherits r-frags from all of these
w-relationships th a t dom inate it.
70
5.4 Type Four Indirect (P-RF/DU, S-RF/DU)
T he Type Four indirect elaboration involves the specification of both the sec
ondary and the prim ary world, w ith the task of the query comm ittee being to
discover the nature of the relationship th at joins the two worlds. W ith the QBC
database, this involves a m atch of the prim ary world w ith the requisition of a
w-relationship poly typed term and the secondary world context. If the m atch
succeeds, then the relationship associated w ith the polytyped term represent
ing the w-relationship is the result. If the requisition of the w-relationship is a
program , such as an expert system, then the analysis m ust analyze whether the
secondary world is derivable from the prim ary world.
71
Chapter 6
Presentation
The previous chapters have dealt w ith determ ining the entities and associated
attributes th a t will form the subject of the query. At this point in the query elab
oration, the query state consists of a set of A-caps th a t designate the entity and
associated attributes th at, based on the d ata available to the query committee,
come as close as possible to satisfying the query kernel initially provided by the
user. As necessary, this initial query kernel has been elaborated - and in some
cases changed - so th at it specifies d ata th at is available to the query committee.
The current state of the query at this point is called the final entity state.
Now the focus of the query elaboration turns to the data itself. While the
goal of QBC is not to provide the user with data, but w ith a query (or as will
be seem shortly, a set of query options), these options must be based on the
available data. Some of this data th a t might satisfy the query elaboration may
be elim inated, based on dom ain knowledge such as the tim e of the year. In QBC,
this is accomplished through the imposition of filters.
Once the cardinality of the data has been reduced through filtering, there
may still exist a large body of applicable data. As noted, it is not the objective
of QBC to present this d ata to the user. In the first place, there may be so much
data th a t the user would be overwhelmed; in the second place, the goal of QBC
is to allow the user to scope out the forest, and not lose him among the trees.
72
To accomplish this objective, this research has developed a new m ethod of
presenting data to the user called lattice-based clustering (LBC). LBC represents
a clustering m ethod th a t presents the user w ith a sum m ary of the m ost attractive
query options available to him, along w ith the rationale for those options. In the
rem ainder of this chapter, the details of filtering and LBC will be presented.
6.1 Filtering
A filter encapsulates knowledge possessed by a m ember of the query comm ittee
regarding the desirability or undesirability of certain data. For example, the
light gathering capability of the astronom ical instrum ent used for viewing will
determine the limits, under the best viewing conditions, of w hat th e astronom er
can see. Karkoschka suggests limits th a t are dependent upon the type of instru
ment and the nature of the object to be viewed. For stellar objects (e.g., stars),
the unaided eye can see objects up to m agnitude 6 1. Binoculars will raise the
viewable m agnitude to 10 to 11 depending upon the size of the binoculars, while
a telescope will increase this to 12 to 13, again depending on the size of the tele
scope. These m agnitude limits are for stellar objects. For nebulas, these limits
need to be reduced by one. This inform ation can be included in an instrum ent-
magnitude filter th at, based on knowledge of the instrum ent used, will establish
limits on the m agnitude of astronom ical objects presented to a user.
As another example, the viewability of nebulas is a function of the darkness
of the sky. Karkoschka observes th a t nebulas2 w ith a surface brightness 3 of
m agnitude 5 to 7 can be observed in moonlight, while nebulas w ith a m agnitude
of 8 to 10 require a dark sky. Those w ith m agnitude 11 to 12 are a challenge
to even an experienced observer (Karkoschka 1990:pg. 11). Thus, if a user is
1The higher the magnitude number, the dimmer the object, with a difference of 5 magnitude
representing a factor of 100 times the brightness.
2Under nebulas he includes planetary nebulas, diffuse nebulas, open and globular star clusters
and galaxies
3The brightness of an area of a 5’ circle of the nebula.
73
considering viewing astronom ical objects on a particular date, a moon filter can
be imposed th a t indicates the acceptable m agnitude of objects to be presented to
the user if the date is one in which the moon is, for example, between a first and
last quarter. We expand this to stellar objects by increasing the m agnitude by
one, which is consistent w ith the nebula-to-stellar-object m agnitude relationship
in the previous paragraph. Thus the moon will limit the m agnitude of stellar
objects to 6 to 8, while a dark sky will perm it easy viewing of objects in the
range of 9 to 11, w ith more difficult viewing for objects in the range 12 to 13.
This moon filter will use knowledge of the date of observation to establish limits
on the m agnitude of astronom ical objects presented to the user.
While the m oon filter eliminates one contributor to less-than-ideal viewing
conditions, there are other factors as well such as light pollution in a city or
poor atm ospheric conditions. These conditions will differ from city to city, and
even locations w ithin a city. This problem could be represented by a site-light
filter th a t will relate a particular viewing site to the range of astronomical object
m agnitude th a t can be viewed from th a t site. To counter the problem of light
pollution, astronom ers can use optical filters th a t block undesirable light. One
such filter is an optical nebula filter, which passes light at the frequency of light
em itted by a nebula (Eicher 1989). In QBC, the effects of this filter will be
represented by an agent-based filter called a nebula filter.
As another example, one of the goals of viewing double stars is to be able
to resolve the two components into their distinct points of light. Resolution is
a function of the aperture of the instrum ent. The eye, for example, can resolve
5’ of arc, or slightly better if the eyes are especially good (Karkoschka 1990:12).
The theoretical limits of telescope resolution are given by the Dawes lim it. This
expresses the detectable separation of two objects as 4.56/A arc seconds, where
A is the diam eter of the aperture of the telescope lens or m irror expressed in
inches; both objects are assumed to be of equal m agnitude; both objects have a
m agnitude of 6, and the viewing conditions are perfect, w ith very still air. Eicher
references a 1914 study by Lewis th at represents the arc second separation as
74
16.5/A for pairs with m agnitudes 6.2 and 9.5, with A in inches. Very unequal
pairs of m agnitude 4.7 and 10.4 show an arc second resolution of 3 6 /A. This data
could be included in an im trum ent-resolution filter.
Finally, users are normally interested in a particular p a rt of the sky, based
on their location and the tim e or year and day at which viewing is to occur.
The Northern Hemisphere viewer will not be able to see objects th a t have too
low a declination. Similar restrictions affect those in the Southern Hemisphere.
Hence, a very im portant filter is one th a t is based on the tim e and location of
viewing, and results in filters being applied to the right ascension and declination
of objects of interest. This will be called the time-location filter.
The concept of a filter is somewhat analogous to th at of a view in traditional
database m anagem ent systems (Ullman 88), in th a t it m ediates w hat the user
can see of the actual data. In QBC, all the filters are driven from d ata th at
is contained on the database and, as will be shown, the view itself is a con
tributor to the query elaboration process. Also, because QBC is a distributed,
open-architecture system, an appropriate open-system filter protocol had to be
developed to ensure th a t the intent of the filter is satisfied. In w hat follows, the
filter will be described from a num ber of perspectives, including the model, data
structure and operational semantics.
At the model level, a filter specifies a relationship th a t connects two term s
in the evolving query elaboration space. The domain of the filter is a term T,
which represents the triggering context of the filter and may be a w-relationship,
AP, solicitation or capability - or a component of one of these structures. The
range of the filter is a term S, th a t represents an A-sol or A-cap and some set of
values to which these are restricted. An agent holds a filter in the form of the
filter d ata structure.
The d ata structure of the filter consists of three components: filter trigger
(f-trigger), filter knowledge (f-knowledge) and filter restricted solicitation (f-so-
licitation). The f-trigger describes the argum ents to the filter. In the case of
the m oon’s effects on astronomical object viewability, the argum ent is the date
75
(year, m onth, day, tim e of day) from whieh the phase of the moon can be derived,
using formulas or tables. The context of the date is th a t it represents the date
at which viewing or photographing of astronom ical objects is to occur. Thus, to
be m ost accurate, the f-trigger should be specified in the context of view in the
AP, view*deep-space object, where the indicates the connection between the
activity, view and the entity, deep-space. In this case, the filter would react only
to dates th a t were specified in the context of such an AP. A less constraining
specification of the f-trigger could be to define the f-trigger as an independent
term th a t specifies the date. W ith this less restrictive specification, it is possible
th at the filter might react to some date other than th a t associated w ith the
viewing of astronom ical objects.
The f-knowledge, the second com ponent of the filter’s d ata structure, repre
sents the knowledge th a t the filter needs to perform its task. In the case of the
moon filter, the f-knowledge will include d ata on when the moon is in different
phases and the im pact on viewing m agnitude th a t these phases will cause. Such
d ata can be represented in three ways: It can be represented as formulas like the
various resolution formulas, explicitly stored w ith the filter, such as m oon phase
data; or represented as a capability th a t will be used to acquire this d ata from
other agents when needed. While the required d ata could also be specified in
term s of a solicitation (meaning th a t th e potential source and form of the data
is unknown), the d ata resulting from the solicitation might not be in a suitable
form at. The data would have to be described in some d ata description protocol
th at was known to both the agent responsible for the filter and the agent who
held the required f-knowledge. As a simplification, QBC assumes th at the filter
agent knows of the specific d ata th a t it needs and its form at. Thus exists the
requirement th at the f-knowledge be specified in term s of a capability.
The f-solicitation, the final com ponent of the filter’s d ata structure, specifies
where the effects of the filter are to be applied. These will differ based on the
nature of the filter. Both the instrum ent-m agnitude filters and the moon-filters
76
apply their effects to the m agnitude of astronomical objects. The instrument-
resolution filter applies its effects to the separation field of double-star data.
From the perspective of operational semantics, the filter process is a two-
phase operation involving filter posting and query com m ittee response. The first
question to be addressed is w hat stim ulates the posting of a filter. Since the filter
will be posted in the context of the current state of the elaboration, this question
can be analyzed in term s of the current elaboration state as it relates to the
filter. Under ideal conditions, prior to filter activation, the blackboard contains
d ata structures th a t both f-trigger and the f-solicitation m atch. In these cases,
the effects of the filter can be registered on the affected solicitation as indicated
by the f-solicitation. This suggests th a t the application of the filter can be
analyzed in term s of w hether the elaboration state (i.e., the current blackboard
state) contains both of these f-trigger and f-solicitation d ata structures, or only
a possibly null subset. This leads to the following cases:
• Case 1: Elaboration state contains both f-trigger and f-solicitation relevant
d ata structures.
• Case 2: Elaboration state contains only f-trigger relevant d ata structures.
• Case 3: Elaboration state contains only f-solicitation relevant d ata struc
ture.
• Case 4: Elaboration state contains neither f-trigger nor f-solicitation rele
vant d ata structures.
The operational semantics of the filter will be analyzed for each of these cases.
In Case 1, the agent holding the filter recognizes th at the elaboration state con
tains d ata structures th a t m atch its f-trigger and f-solicitation d ata structure.
This means th at, based on the contents of the blackboard d ata structure th at
m atches the f-trigger, the agent can consult its f-knowledge and impose the re
quired limits on the f-solicitation. For example, suppose th a t the elaboration
state contained the AP, view*deep-space object, w ith a date of May 30, 1991.
77
This is within two days of the full m oon on May 28, and thus satisfies the moon
agent’s f-knowledge requirement for invocation of this filter. Recognizing this,
the agent holding this filter would impose limits on the m agnitude of astronom i
cal objects of interest by annotating th e lim it field of the appropriate solicitations
and capabilities. Assuming the use of a small telescope as the default, for stellar
objects, the perm itted value range would be 04 to 6. For non-stellar, nebula-like
objects, the surface brightness m agnitude would be lim ited to the 0 to 5 range.
In addition to posting value range lim its, an agent posting a filter also an
notates the filter list of the affected term to indicate th a t this particular filter
has been activated. As will be shown shortly, this is required for activation of
context-dependent filters.
A second possibility under Case 1 is th a t while both the f-trigger and f-
solicitation relevant data structures are present in the elaboration state, the
f-trigger relevant d ata structure in this state has not been instantiated w ith a
particular value. This could be handled in three ways. One would be to assume
a default, such as a moonless night. A second possibility would be to ask the user
for a value for those fields th at serve as triggers of applicable filters. The final
option would be to use the possible values of the various trigger fields as p art of
the output clustering to be described in the second section of this chapter. For
QBC, the option selected is a com bination of options one and two. If the value is
not present in an f-trigger relevant d a ta structure in the elaboration state, then
the filter agent will post a list of options (e.g., ranges of acceptable values) along
with a request for a default value. If one of the agents has a set of defaults,
then an applicable default will be supplied. If not, then the default agent will
make a request of the agent th a t such a default be given. Using information
supplied by the filter agent, a set of options will be provided to the user, since
the domain-naive user may not know w hat is expected.
4 Actually there are astronomical objects that have negative magnitude since they are so bright,
but 0 will be taken to mean the brightest object of any m agnitude less than zero value.
78
Even if defaults are supplied by a default agent, this agent will still report
to the user th a t a particular default value was supplied. Hence, the user will be
aware of any default assum ptions th at are m ade in the course of the elaboration
process.
Case 2 involves the situation in which only the f-trigger relevant d ata structure
exists in the elaboration state. This state does not contain the d ata structure to
which the effects of the filter are to be applied. In essence, this means th a t the
query com m ittee is not really working on a problem to which the filter is relevant,
although they have included in the elaboration state a d ata structure th a t could
trigger a filter for some d ata in which they have yet to show an interest. For
QBC, existence of f-trigger relevant data alone will not be sufficient to invoke
the filter.
Case 3 involves the situation in which the elaboration state contains only an
f-solicitation relevant d ata structure, but not an f-trigger relevant data structure.
In this case, the query comm ittee is working in an area for which the effects of a
filter are relevant, b u t has not yet included the triggering relevant d ata structure.
From the perspective of query elaboration, the fact th a t triggering d ata exists
and is potentially relevant is im portant information, and thus should be added
to the elaboration state information. In effect, this is a form of indirect query
elaboration, w ith the relationship being needed-to-trigger. An agent w ith this
knowledge will append the relevant triggering data structures to the current state
of the query elaboration, along with the value options expected, as in Case 1.
At this point, agents w ith default knowledge can either annotate this f-trigger
relevant d ata w ith the appropriate default values or request th at the user supply
them .
In Case 4, neither the f-trigger nor the f-solicitation relevant data structures
are in the current elaboration state. Using the same reasoning as in Case 2, the
agent holding the filter will not participate in the current elaboration state.
Having explored the cases in which filter agents will participate in the query
elaboration and the nature of th a t participation, the next issue is how the effects
79
of the filter are to be interpreted by the query comm ittee. If a filter has been
posted prior to the posting of any values by the query com m ittee, the filter
restricts value posting to only those values th at are consistent w ith the filter. As
noted earlier, the filter is really a restricted solicitation or capability. For this
to work, the agent, prior to posting any data values, m ust check to see if a filter
exists th a t covers the d ata and if so, to ensure th at the posted d ata is consistent
with the filter.
However, given the distributed nature of QBC, agents m ay post d ata prior to
the activation of a filter in the query elaboration state. Of course, once the filter
is posted, then all subsequent postings will be consistent w ith the filter; however
the system m ust be able to handle the case in which non-conforming d ata exists
in the elaboration state. There are at least two ways th a t this could be handled.
The first is to “garbage collect” the posted values when a new filter is posted.
However, it may be the case th a t the filter agent, although it knows about filters,
cannot recognize non-conforming values in the elaboration state. In fact, it may
not be able to recognize values at all.
The second option is to defer this issue to th a t portion of the presentation
section in which the results of the query elaboration are presented to the user.
This is the option th a t will be taken, since this places the problem in an area
th at is designed to deal w ith values.
A final issue concerns the interaction of filters. To this point, filters have had
to interact only with agents th a t had data, and the interaction placed the filter
in the dom inant position. The agent was perm itted to post only conforming
data. However, it may be the case th a t two filters interact. For exam ple, both
the moon and instrum ent-m agnitude filters affect the m agnitude of astronomical
objects to be viewed. In this case, the result is a filter range th a t represents the
intersection of the ranges of the two filters.
However, the case may exist in which one filter modifies the results of an
other. Thus, the nebula filter will increase the viewable m agnitude of nebulas
80
whose range has been attenuated w ith light pollution, as indicated by the site-
filter. However, the nebula filter can be activated only in the presence of light
pollution. Thus, for example, one cannot apply a nebula filter under dark sky
conditions and expect an amplification in the m agnitude of astronom ical objects
th a t can be viewed. This is called a context-dependent filter, since it may be
applied only in the context of certain other filters. But this is consistent w ith the
general definition of the QBC filter. In this case, the f-trigger for the nebula filter
includes the fact th a t a site-filter has been applied to the m agnitude. Hence, the
nebula filter can be applied only following the application of a site-filter. For the
purposes of the QBC system , a nebula filter is assumed to provide viewability of
m agnitude 8, which represents the bottom of w hat Karkoschka views as relatively
easy dark-sky viewing m agnitude for nebulas, as previously described.
Filters reduce the d ata th a t will ultim ately be applicable to an A-cap. But
in this domain, th a t could still leave a large number of astronom ical objects th a t
may satisfy even the filtered attributes. The next section will look at the means
developed by this research to further reduce the cardinality of the results th at
axe provided to the user by a m ethod called lattice-based clustering.
6.2 Lattice-based Clustering
To cluster d ata is to sum m arize it, and this sum m ary can be designed to provide
the user w ith a description of the set of d ata options th a t are available in the
database. The user can then select the option in which he is interested. A
challenge is to ensure th a t the num ber of clusters does not grow so large as to
overwhelm the user. However, the clusters m ust have sufficient specificity th at
the user is given an adequate feel for the nature of the data th a t is available in
the database.
While a num ber of clustering approaches have been presented in the liter
ature, they did not fit the QBC requirem ent for the clustering to be driven
by a combination of final entity state and available data. Some of the clustering
81
m ethods considered and rejected for QBC were conceptual clustering and various
forms of m ultivariate analysis. Conceptual clustering utilizes object attributes
and different groups of objects, including negative examples, to build rules th a t
would allow new objects to be placed in the proper group, based on the rules
(Michalski and Stepp 1983; Strepp and Michalski 1986). Variants of these tech
niques are used to develop a description of a group of objects. Since each QBC
query form ulation was viewed as potentially one-of-a-kind, there were no a pri
ori examples, nor were the attributes known at the outset. Likewise, the set of
d a ta which could satisfy the final entity state was potentially vast, and was not
partitioned into groups th a t could then be analyzed for rules to describe it.
O ther approaches (such as principal component analysis, a type of m ulti
variate analysis) have also been used for clustering (Johannes 1984). Techniques
such as principal component analysis are used to cluster attributes relevant to an
observation into a smaller number of orthogonal attributes (Dillon and Goldstein
1984). If the objects are viewed as tuples in a relational DBMS, this is attrib u te
(or column) clustering.
In contrast, QBC does not start w ith a fixed group of selectors, nor w ith a
fixed group of objects, either a single group or different groups for which cluster
ing or selection rules are to be developed. W hat is desired for QBC is a means
of characterizing an ad hoc group of objects th a t happens to be consistent w ith
the final entity state. The intent is to provide the user with the flavor of the
variety of objects th a t are available within the various groups delineated by the
completion state.
A nother new constraint th at had to be addressed in this research was th a t
the holders of the d ata consist of multiple agents. Hence the approach m ust be
am enable to a m ultiagent environment.
The rem ainder of this section will present the details of LBC in four subsec
tions. The first of these sections will show the approach used to cluster the d ata
in a single attribute. Following this, the lattice of options will be presented, w ith
the approach used to select the set of m ost attractive options. The third section
82
will give the proof of an im portant result of this research, the determ ination of
an upper bound on the num ber of clusters th a t will be presented to the user
using LBC. The final section will look at how option validation and selection are
accomplished in a distributed environment.
To provide an example for this discussion, we assum e th a t the user has initi
ated his query elaboration session with an unknown entity of type astronomical
object, w ith attributes diam eter and m agnitude. The result of QBC processing
is the identification of galaxy and planetary nebula as entities th a t fit the con
straints. These entities and their associated attributes constitute the final entity
state. For the purpose of illustration, we will focus on th e galaxy entity.
The attributes associated w ith a galaxy, based on the d ata available to the
query committee, consist of the following: Messier identifier, NGC identifier,
right ascension, declination, m agnitude, surface brightness, size, distance and
shape.
The NGC and Messier identifiers are keys th a t uniquely identify the galaxy.
Right ascension and declination identify the location of the object in the sky.
M agnitude is the observed m agnitude of the object, while surface brightness is the
brightness of a 5’ circle imposed over the object. For nebulas, some authorities
view surface brightness as a more useful m easure of nebula viewability than
m agnitude (Karkoschka 1990). Size refers to the observed size in m inutes of arc.
Distance is m easured in light years.
The galaxy database is shown in Table 6.1. It includes all galaxies th a t have
Messier numbers and are contained in Sky charts NP, NO, N l, N2, N4, N6, N8,
N10, N12, N14, N16, N18, N20, N22, N24 and E0 of Karkoschka (1990). Except
for E0, the charts cover the N orthern Sky. It should also be noted th a t all of the
Northern Sky charts are evenly num bered, so this is a complete set of Northern
Sky galaxies contained in Karkoschka.
For the purposes of clustering, only the non-key attributes will be used. In
addition, it will be assumed th a t filters are used to elim inate those objects whose
right ascension and declination are outside the current viewing area. Even if this
83
Table 6.1: Galaxy Database:
G a la x y D a ta b a s e |
Messier NGC M agnitude Surface Brightness Size Distance Shape |
M31 ! 224 3,7 10 10’ 2.5 E 5 1
M32 221 8.5 8 4’ 2.5 E 2
M33 598 6 11 50’ 3 Sc 4
M51 5194 8.5 10 10’ 20 Sb 2
M63 5055 9 10 10’ 20 Sb 5
M77 1068 9 8 4’ 50 Sb 2
M81 3031 7 9 20’ 10 Sb 5
M82 3034 8.5 9 10’ 10 Ir 6
M94 4736 8.5 9 6 ’ 20 Sb 3
M101 5457 8 11 20’ 40 Sc 1
M102 5866 10.5 9 3’ 30 L 6
M106 4258 8.5 10 12’ 20 Sb 6
M108 3556 10 9 8’ 30 SC 8
M109 3992 10 10 6 ’ 40 Sb 4
MHO 205 9 10 10’ 2.5 E 5 1
84
is not the case, the right ascension and declination are, in essence, astronomical
object identifiers, and thus are not viewed as appropriate for LBC. This leaves
m agnitude, surface brightness, size, distance and shape as five attributes th at
could be used to cluster galaxies. These will form the basis for our example.
6.2.1 Attribute Clustering
To understand how d ata can be clustered, it is useful to distinguish between the
types of measurements scales on which d ata can be defined. Stevens, as described
by Dillon and Goldstein, defines the following m easurem ent scales th a t can be
used for data:
. . . interval-scaled d ata allows us to say how much more one object
has of an attribute than another, whereas w ith ratio-scaled d ata we
can define an origin, which means a zero am ount of the attribute in
question, and ratios of scale values are meaningful. In addition to
interval- and ratio-scaled data, there are nominal and ordinal types
of scales. Nominal-sealed d ata are described in term s of classes; th at
is, the numbers assigned simply allow us to place an object in one and
only one of a set [of] m utually exclusive and collectively exhaustive
classes w ith no implied ordering. Ordinal-scaled data are ranked data,
which means th a t all we can say is th a t one object has m ore or less
or the same am ount of an attribute as some other object. (Dillon and
Golstein 1984:p 2, Stevens 1946)
If attribute d ata can be m apped into ordinal scale clusters, then these clus
ters can be ordered. Since by definition ordinal scale data is concerned only with
whether some object is better than some other object, but not w ith the magni
tude of “goodness” , we do not have to be concerned about the different units of
measurement used for different attributes. Thus, ordinal scale clusters provide a
universal way of comm unicating th a t is independent of any particular m easure
m ent system, and hence is com parable across the different m easurem ent systems
85
th a t each agent may have for each attribute. This is the first step required for
agents to comm unicate about the relative importance of a particular attrib u te
value.
For the galaxy database, the attributes of m agnitude, surface brightness, size
and distance can be m apped into ordinal scale data. For QBC, each of these
attributes will be considered separately. To form a cluster, the approach is to
sort the values for a single attribute th at satisfies the final entity state and any
applicable filters.
The next step is to designate which end of the scale is preferred: th e largest
or sm allest num eric values. The preference depends upon the purpose to which
the d ata is to be put. This is accomplished by specifying the attribute cluster
preference (ACP) as HIGH or LOW. For example, if the user is a novice, intent
upon viewing deep space objects through a small telescope, then the ACP for
m agnitude, surface brightness and size would be high, to make the objects easier
to find. A more experienced viewer m ight specify the ACP for m agnitude, sur
face brightness and size as Low, to provide greater challenge. The ACP can be
provided either by the user or by the agents. In QBC, the agent responsible for
the attribute is assum ed to have a default ACP, but this can be changed by the
user after he views the lattice-clusters presented to him.
These sorted lists are then partitioned into two clusters, representing the high
and low values respectively. If a higher level of granularity is required, then each
of these clusters can be partitioned onto two more clusters, representing both the
highest and lowest values in the cluster. This quad-partition approach was used
for the QBC database. The result is the set of attribute values th at represent the
first, second, third and forth quartiles of data, w ith the first quartile representing
the ACP. The clusters can then be labeled 1, 2, 3 or 4 depending on the quartile
represented. Since the first quartile represents the ACP, and the clusters have
been formed from sorted lists of attribute values, this says th at those values in
the first quartile are preferred over those in the second, which are preferred over
the third. The fourth quartile values are the least preferred. If finer granularity
86
Table 6.2: Galaxy Database:
Q u a rtile s
A ttribute
Quartile Clusters
1 Q 2 Q 3 Q
4 Q
M agnitude 3.5 - 8.5 8.5 - 8.5 8 .0 - 9 10.5 - 10
Surface Brightness 8 - 9 9 - 10 10 - 10 10 - 11
Size 50 - 12 10 - 10 1 0 - 6 4 - 3
Distance 50 - 30 30 - 20 20 - 3 2.5 - 2.5
Shape S E L I
were required, each of the quartiles could be partitioned further. As will be noted
however, the finer the granularity of partitioning, the larger the potential am ount
of lattice clusters (to be described shortly) th at would have to be presented to
the user. Thus, for the QBC data, quartile partitioning was considered adequate
to present the approach.
If the attribute is represented w ith nominal-scale values, then it should be
m apped onto the ordinal scale if possible. For the galaxy data, the shape can
be spiral (S), elliptical (E), lenticular (L) or irregular (I). A dditional levels of
classification could also be used, such as the type of spiral (e.g., a, b, c, d or m)
and the oblateness of the galaxy (Karkoschka 1990). For the QBC experiments,
these finer levels were not considered, since it was believed th a t the first-level
classification of shape (e.g., S, E, L and I) was sufficient to illustrate the approach.
One approach to m apping shape values onto the ordinal scale is to define
some custom ordering, based on interest or research focus. Thus, one could
hierarchically order the galaxy shapes as S > E > L > I, where S is the ACP. In
this case there are also four, values so it meshes well w ith the quartiles. However,
there is no reason th a t each of the attributes has to have the sam e num ber of
clusters. Thus, if there were more nominal scale values, then these would all
become ordered clusters according to the custom ordering provided. As a final
note, the custom ordering can come from either the user or one of the agents.
87
If no order can be provided, the approach is to treat each of the nominal
values as attributes in their own right, w ith a cluster value of 0 or 1. Cluster
value 1 indicates th a t the attribute is to be considered, and 0 indicates th a t the
attribute is to be ignored.
The galaxy clusters, based on the d ata in Table 6.1, are shown in Table 6.2.
These clusters assume th a t shape values have been custom ordered into a hier
archy. The next section will describe how the individual attribute clusters are
formed into lattice-based clusters.
6.2.2 Lattice Cluster Options
There are a num ber of presentation approaches to provide results to the user as
a result of the attrib u te clustering ju st described. One approach is to specify
a set of preferences. In essence, this has been done on an attribute basis, and
with the specification of an ACP, the composite preference could be stated as
first quartile m agnitude, surface brightness, size, distance and shape. This has a
number of problem s, however.
The first one is th a t while galaxies exist th a t individually satisfy the first
quartile of each attrib u te (since these quartiles were generated from the actual
data), it is not necessarily the case th a t there exists a galaxy th a t satisfies all
of the first quartile clustering conditions for each attribute. In this case, such
a galaxy would have to have a m agnitude from 3.5 - 8.5, a surface brightness
from 8 - 9, a size from 50 - 12, a distance from 50 - 30 and a shape of S. In this
example, no galaxy in the QBC database satisfies these conditions. While QBC
could incorporate some other set of cluster preferences, this requires a rather
specific set of cluster designations, and has the risk th a t galaxy instances of such
clusters may not exist in the database. Building these specific cluster preferences
into QBC would leave it very inflexible.
Another approach is for the user to specify specific clusters or interest at the
beginning of the session, or as p art of a use profile filter. This is also not very
88
realistic since at the outset, the user was not even aware of precisely w hat he was
looking for in an astronomical object; hence he could not be expected to provide
such a precise preference at th a t time.
At this point the reader might be wondering how this differs from the fact
th a t an ACP is specified for each attribute. As will be shown, while default
A C P’s exist, the user will be able to change the ACP in the context of the d ata
displayed for him. Thus, he does not have to specify anything in advance.
A third presentation approach is to provide the user w ith a listing of all
possible clusters and an indication of the population of d ata in the database of the
query com m ittee for each cluster. If, for example, there were no database entries
th a t satisfied the cluster which is first quartile m agnitude, surface brightness,
size, distance and shape, this would show up as a population of zero. The user
would then be guided away from making such a query. Unfortunately, the num ber
of potential clusters grows exponentially in the num ber of attributes.
The upper bound on the number of clusters is the set of clusters, L, form ed by
taking the cartesian product of the attribute clusters. If C, indicates the clusters
for attribute i, and assuming th a t there are n cluster attributes, then the set L
is given by the following:
L = { C , x C 1 x C 2 x . . . C n_1 x C a}
For the galaxy example, this results in
Lgaiaxy = {m agnitude x surfacebrightness x size x distance x shape}
W ith A cluster attributes and a cardinality of N(CA) clusters for attrib u te i,
the cardinality of L - represented by N(L) - is given by the following equation:
N(L) = E&I N(C,)
For the example, w ith four clusters per attrib u te and five attributes, this
results in
N(L) = 45 = 1024.
As noted, this represents an upper bound, since many of the clusters will not
have instances in the database. However, even if only a small portion of these
clusters is populated w ith galaxy instances, this will tend to overwhelm the user.
89
The problem lies in the fact th a t d ata is being presented to the user w ithout
any attem pt to prioritize it. While, a preferred ordering of attribute values exists
at the attribute level, based on the AGP for each attribute, this has not been
used in the set of elements displayed to the user.
A fourth approach is to build some general preference into QBC, such th at
the options could be ordered. Then, for example, only the top M options, where
M is reasonably small, could be presented to the user, out of the total of N(L)
options potentially available. One approach requires th at attributes be ordered
w ith respect to other attributes, in addition to the cluster ordering th a t has
already been described w ithin an attribute. For those nominal-scale clusters
th a t cannot be m apped onto the ordinal scale, each of the clusters will be viewed
as a distinct attribute w ith value 0 or 1.
To provide an ordering, the clusters will be sorted from m ost significant at
tribute to least, and w ithin attributes by cluster, in a type of radix exchange sort
(K nuth 1973). In this way, a total cluster order can be achieved.
In the galaxy database, one might provide the following attribute ordering,
using " >" to indicate more importance:
surface brightness > m agnitude > size > distance > shape
In this case, it will be assum ed th a t the nominal-scale shape d ata has been
ordered a s S > E > L > I , as indicated previously. W ithin each attrib u te the
lower value will be the ACP for surface brightness and m agnitude, while the
higher value will be the ACP for size and distance. The results of this sort for
the galaxy database are shown in Table 6.3, w ith the lowest-ranked options at
the top of the table and the highest at the bottom .
Having provided a total ordering for the data, the five highest-ranked options
could be presented to the user. W hile this approach reduces the num ber of
options th a t m ust be presented, it requires th at a rank ordering be provided for
all of the attributes, in addition to the rank ordering required for the attrib u te
clusters. Such an ordering may not be obvious, and thus the results presented to
90
Table 6.3: Totally Ordered Galaxy Database:
G a la x y D a ta b a s e
Messier NGC M agnitude Surface Brightness Size Distance Shape
M l 01 5457 8 11 20’ 40 Sc 1
M33 598 6 11 50’ 3 Sc 4
M109 3992 10 10 6’ 40 Sb 4
MHO 205 9 10 10’ 2,5 E 5
M63 5055 9 10 10’ 20 Sb 5
M106 4258 8.5 10 12’ 20 Sb 6
MSI 5194 8.5 10 10’ 20 Sb 2
M31 224 3.7 10 10’ 2.5 E 5
M102 5866 10.5 9 3’ 30 L 6
M108 3556 10 9 8’ 30 SC 8
M94 4736 8.5 9 6’ 20 Sb 3
M82 3034 8.5 9 10’ 10 Ir 6
M81 3031 7 9 20’ 10 Sb 5
M77 1068 9 8 4’ 50 Sb 2
M32 221 8.5 8 4’ 2.5 E 2
91
the user are somewhat arbitrary, depending on the nature of the rank ordering
provided.
A better approach, and the basis for the LBC approach developed by QBC,
is to recognize th a t the set L forms a lattice, w ith the set of attrib u te clusters
forming a lattice cluster label th a t can be com pared w ith other lattice cluster
labels. Let A ; be an element of L. The index, i, of Ai can be designated by either
the quartile indicator of each attrib u te cluster represented by Aj, or by the value
range th at this quartile represents. Thus for the galaxy example, assum ing th at
each attribute is assigned to a particular position in the label, one of the lattice
clusters in L is A(iiiii) 3 ii. This indicates th a t the lattice cluster consists of first
quartile attribute clusters from m agnitude, surface brightness, size and shape,
and a third quartile attrib u te cluster from distance. In term s of cluster value
ranges, this cluster could also be represented as A(3 .5- 8 .6 1 8 - 9 > 50- i 2 1 20- 3o,s)-
W hen there is no confusion, the A notation will be dropped and the cluster will
be referred to as (1,1,1,3,1), or (3.5-8.5, 8-9, 50-12, 20-3, S).
The reason th a t a lattice is useful for presenting d ata to the user is th a t it
forms a prioritization of the clusters, based on the dom inance relation. Element
Ax of the lattice dom inates element element Ay, if cluster index X dom inates
cluster index Y. The cluster index is comprised of an ordered set of attribute
cluster indicators. Thus, cluster index X is represented as
X = {q;,qS,...,qJ},
where qf represents the quartile rating for lattice cluster X for attribute i,
and there are n cluster attributes per index.
In a sim ilar way, the cluster index Y is represented as
Y = { q j,q a ,.-.,<£}•
A cluster attribute qf dom inates cluster attrib u te qf if the quartile repre
sented by qf is less th an or equal to the quartile represented by qf. A cluster
attribute qf strictly dom inates cluster attrib u te qf if qf dom inates q f, but is not
equal to it.
92
Having defined cluster attribute dominance, we can now define lattice-cluster
index dominance. Cluster index X dominates cluster index Y if for i = 1 to n,
qf dom inates q f. Cluster index X strictly dom inates cluster index Y if for i = 1
to n, qf dom inates q f, and there exists some i, 1 < i < n such th a t qf ^ q f.
If two clusters are not related by dominance or strict dominance, they are said
to be incomparable. For example, we see th at lattice cluster (1,1,1,3,1) strictly
dom inates lattice cluster (1,1,1,4,1). However, lattice cluster (1,1,1,3,1) is not
com parable to lattice cluster (3,1,1,1,1).
Since L is formed by taking the cartesian product of the attribute quartiles,
different clusters will either be related through a strict dominance relationship, or
will be incomparable. As previously noted, these clusters are theoretical clusters
only, and all of the clusters in L do not necessarily have instances in the database.
For the QBC database, cluster (1,1,1,3,1) is represented by M81, and cluster
(3.1.4.1.1) is represented by M77. However, there is no representative for cluster
(1.1.1.1.1).
As noted previously, nominal-scale data, such as galaxy shape, can be m apped
to the ordinal scale clustering. If this is not acceptable, then each of the values
is treated as an attrib u te, with a value of one, indicating presence, and zero
indicating absence. If this were done in the galaxy example, we would have
the following attributes: m agnitude, surface brightness, size, distance, elliptical
shape, spiral shape, lenticular shape and irregular shape. An example cluster
would be (1,1,1,3,1,0,0,0), which would include all of the instances of the five
attrib u te cluster (1,1,1,3,1), th a t has M81 as one of its instances. Note th a t
under this approach, one would not have any instances of (1,1,1,3,1,1,0,0), since
a galaxy cannot have two types of shapes. Assuming th a t the nominal-scale d ata
is m apped to the ordinal scale such th at S > E > L > I, the galaxy database
results in the clusters shown in Table 6.4.
Having provided a lattice space of clusters and defined a dominance relation
ship, we are now ready to define how the clusters to be displayed to the user are
selected. If a lattice cluster Aj is dom inated by A;, then lattice cluster Aj can be
93
Table 6.4: Galaxy Database:
G a la x y D a ta b a s e C lu ste rs
Messier NGC Mag Surf Br Size Dist Shape Cluster
M31 224 3.7 10 10’ 2.5 E 5 (1,2,2,4,2)
M32 221 8.5 8 4’ 2.5 E 2 (2,1,4,4,2)
M33 598 6 11 50’ 3 Sc 4 (1,4,1,3,1)
M51 5194 8.5 10 10’ 20 Sb 2 (2,2,2,2,1)
M63 5055 9 10 10’ 20 Sb 5 (3,2,2,2,1)
M77 1068 9 8 4’ 50 Sb 2 (3,1,4,1,1)
M81 3031 7 9 20’ 10 Sb 5
(1,1,1,3,1)
M82 3034 8.5
9
10’ 10 It 6 (2,1,2,3,4)
M94 4736 8.5 9 6’ 20 Sb 3 (2,1,3,2,1)
M1G1 5457 8 11 20’ 40 Sc 1
(1,4,1,1,1)
M102 5866 10.5 9 3’ 30 L 6 (4,1,4,1,3)
M106 4258 8.5 10 12’ 20 Sb 6 (2,2,1,2,1)
M108 3556 10 9 8’ 30 SC 8 (4,1,3,1,1)
M109 3992 10 10 6 ’ 40 Sb 4 (4,2,3,1,1)
MHO 205 9 10 10’ 2.5 E 5 (3,2,2,4,2)
94
elim inated from consideration, since the dom inant cluster is better in each of the
clustering attributes than is the dom inated cluster.
Those clusters from Table 6.4 th at are not dom inated by any other cluster
form the display set. The display set for the galaxy database is as follows:
1. (1,1,1,3,1)
2. (2,1,3,2,1)
3. (2,2,1,2,1)
4. (1,4,1,1,1)
5. (3,1,44,1)
6. (4,1,3,14 )
In essence, this approach selects a set of incomparable clusters th a t represent
the best of the lattice. If the ACP has been correctly established for each at
tribute, there should be no reason for a user to be interested in any lattice cluster
th a t is not p art of the display cluster set. Since the clusters are displayed w ith
an indication of the ACP, the user can change the defaults, which will result in
a different display cluster set.
The galaxy display cluster set for the five attrib u te clusters is displayed in
Table 6.5. The clusters represent the selection options open to the user; the
rationale for the option is the range of attribute values associated w ith the clus
ter. In reviewing this display set, it is clear th a t many of the options are there
because they are first quartile in three to four attributes. The other clusters are
present since, while they may be only first quartile in two attributes, in the other
attributes they are better than any other option.
Thus for our small galaxy database, we have reduced the display from 15
clusters to 6. Also, as will be noted shortly, the m axim um num ber of clusters
th a t could have been displayed for five attributes and four partitions per attribute
95
Table 6.5: Galaxy Display Clusters
Cluster
C luster Inform ation
Mag. Surf. Br. Size Dist. Shape Instance
ACP LOW LOW HIGH HIGH S
(1,1,1,3,1)
3.5 - 8.0 8 - 9 6 - 10 12 - 20 S M81
(2,1,3,2,1) 8.5 - 8.5 8 - 9 6 - 10 20 - 30 s M94
(2,2,1,2,1) 8.5 - 8.5 9 - 10 12 - 50 20 - 30 s M106
(1,4,1,1,1) 3.5 - 8.0 10 - 11 12 - 50 30 - 50 s M101
(3,1,4,1,1) 8.5 - 9 8 - 9 3 - 4 30 - 50 s M77
(4,1,3,1,1) 10 - 10.5 8 - 9 6 - 10 30 - 50 s M108
is 35 clusters. (Of course, with only 15 galaxies in the database, at m ost only 15
of these 35 clusters would have been instantiated. B ut in this case, significantly
less than 15 were included in the display set.)
As will be shown in the next section, this research has developed the following
equation for the upper bound on the num ber of clusters th a t could be displayed
to the user. Assuming A attributes to be displayed and N attrib u te clusters per
attribute, this equation, stated in term s of a binomial coefficient, is as follows:
(
P a ,n =
This equation represents a very significant result, since it limits the upper
bound of the num ber of clusters th a t could be displayed for the user to a relatively
small number. Table 6.6 presents this upper bound for a selection of attributes
and clusters per attribute. The first figure in each cell represents the m axim um
size of the set of clusters th a t could be presented to the user using LBC. The figure
following the “/ ” is the maxim um number of clusters if all possible clusters were
populated. We note th a t as the num ber of attributes and clusters per attrib u te
increase, the percentage of the total clusters contained in the cut set decreases.
Thus, even in the worst case in which a display set the size of the cut set results,
this still represents an extremely efficient way to sum m arize a large am ount of
data.
96
Table 6.6: Display Upper Bound:
C lus./
A ttr.
A ttrib u te s
2 3 4 5 6 7
1 1/1
1/1 1/1 1/1 1/1 1/1
2 2/4 3/8 4/16 5/32 6/64 7/128
3 3/9 6/27 10/81 15/243 21/729 28/2187
4 4/16 10/64 20/256 35/1024 56/4096 84/16384
5 5/25 15/125 35/625 70/3125 126/15625 210/78125
6 6/36 21/216 56/1296 126/7776 256/46656 466/279936
7 7/49 28/343 84/2401 140/16807 396/117649 862/823543
8 8/64 36/512 120/4096 260/32768 656/262144 1518/2097152
This table suggests an interesting strategy th at could be pursued by the user.
Initially he could look across a large num ber of attributes, b u t call for a binary
partition of each attribute. As will be noted, this results in a very low num ber
of clusters. Then, as he finds attributes of special interest, he could reduce the
num ber of attributes th a t participate in the display, but increase the num ber of
partitions for each attribute. In this way, he m aintains a relatively small num ber
of clusters w ith which he m ust contend.
It should also be emphasized th a t this upper bound on the num ber of clusters
to be displayed for the user represents just th at, an upper bound on the display
set. The actual set displayed will not necessarily be this large, since the actual
data will determ ine which clusters form the incom parable, dom inant display set.
If, for example, the lattice cluster (1,1,1,1,1) actually existed in the database,
then this would be the only cluster displayed. However, the im portant point
th a t this finding makes is th a t even in the worst case, the upper bound is not
too high.
It should also be noted th at the galaxy database represents a small sample
of the visible galaxies. Eicher, in his guide to deep space objects viewable by the
small telescope, lists 758 galaxies th a t are visible (1989). Even after filtering, the
actual num ber of galaxies and associated clusters th a t could be displayed would
97
(4,4)
(3,4)
(2,3)
(1,1)
(2,2)
(3,2)
(2,4)
(2 ,1)
(4,3)
(3,3)
Figure 6.1: Four cluster, two attrib u te lattice
be considerable. Thus the LBC provides a significant means of picking the best
from this large set of possible objects th a t could be displayed. The next section
will present the proof of this upper-bound result for LBC.
6.2.3 Upper Bound for LBC
This section presents the proof of the equation for the upper bound of the QBC
display set. It will be noted th a t a lattice has a least upper bound. For five
attributes the least upper bound of the lattice is the cluster (1,1,1,1,1). A lattice
also has a greatest lower bound, which, for a complete lattice w ith five partitions
per attrib u te is the lattice cluster (4,4,4,4,4).
We are interested in the largest set of clusters th a t can be selected from the
lattice, such th a t none of the clusters in the set dom inates any of the others, and
any other cluster th at can be generated either dom inates or is dom inated by a
member of this set. We have called this set the cut set.
98
For two attributes and four clusters per attribute, we have the lattice shown
in Figure 6.1. An arrow from one box to another indicates th at the box from
which the arrow em anates dom inates the box to which the arrow points. As can
be seen, the lattices comes to a point at the top, bottom and sides. We will call
the points of the lattice on the side the extreme points. The largest set of clusters
m ust come from the set th a t includes the extrem e points, since this represents the
place at which the largest num ber of clusters can be drawn, such th a t no cluster
in the drawn set dom inates any other cluster is th a t set. By observation, from
figure 6.1, this largest set consists of the set {(1,4), (2,3), (3,2), (4,1)}. This set
resides on the shortest p ath from cluster (1,4) to cluster (4,1). This set cuts the
lattice into two pieces, w ith the members of this cut set dom inating all clusters
th at lie between the cut set and the greatest lower bound of the lattice. For the
clusters between the cut set and the least upper bound, each cluster dom inates
a member of the cut set.
The cut set can also be viewed as an n-dimensional plane of incom parable
clusters. In this case, the plane is one dimensional and represents a line of clusters
th at cuts the lattice, but for higher dimensional lattices the cut set will be of a
dimension th at is one less th an the lattice. In fact, as will be noted from figure 6.1,
the lattice has a num ber of planes containing incom parable clusters. The least
upper bound represents the first plane, which consists of the single cluster th at
represents the least upper bound of the lattice. The second plane consists of the
clusters (1,2) and (2,1), and has cardinality 2. The th ird plane of incomparable
clusters consists of the clusters (1,3), (2,2), (3,1), and has cardinality 3. The
fourth plane is the cut set, which was previously listed, and has cardinality 4.
The planes and their associated cardinality can be labeled as P a.l, where A
indicates the num ber of attributes in the lattice and L indicates the level; also
the level m ust not be greater th an the num ber of clusters per attribute, assuming
th at all of the attributes have the same num ber of clusters. Thus, for the lattice
in figure 6.1, the first plane th a t contains only the least upper bound cluster is
labeled P 2ji- The symbol P 2 > i designates the plane and represents its cardinality,
99
which in this case is one. The second plane is labeled Pg.2 and has the value 2,
representing the plane’s cardinality of 2 for this second plane. The cardinality
of the third plane is represented by P 2)3 and has the value 3. The cardinality of
the fourth plane is Pa,4 , and has cardinality 4 for the case of two attributes. This
plane represents the cut set, and in general, the cut set is the plane th a t is at a
level equal to the num ber of clusters in each attribute. For the example, there
are four clusters per attribute; thus, the cut set is the level 4 plane. If there had
been five clusters per attribute, the cut set would have been the level 5 plane.
This research has derived an equation for the m axim um num ber of clusters
in a cut set. The results of this derivation are presented as Theorem 1.
T h e o re m 1: For a lattice comprised of entity clusters consisting of A at
tributes, w ith each attrib u te partitioned into N clusters, the m axim um cardinal
ity of the cut set is represented by P a ,n ? which is given by the following binomial
coefficient;
P A,' ,N
( \
1 A + N - 2
A - 1 j
(6 .1)
To prove this theorem , we will first prove theorem 2.
T h e o re m 2: For a lattice comprised of entity clusters consisting of A at
tributes, w ith each attrib u te partitioned into N clusters, the maxim um cardinal
ity of a plane at level L , represented by P a ,l > is given by the following binomial
coefficient:
P a,l
A + L - 2 ^
A - l
(6 .2)
We will give a tw o-part inductive proof of Theorem 2. The first p art will
perform an induction on the num ber of attributes in the cluster, and the second
p art will perform the induction based on the num ber of levels. We will begin
the induction by showing th a t it holds true for the case of two attributes and
two clusters, the m inim um case of interest. The lattice for this case is shown in
100
(1,1)
r r
(1,2) (2,1) 1
'r 1
(2,2)
Figure 6.2: Two cluster, two attribute lattice
Figure 6.2. Applying the equation of Theorem 2, we have for the Level 1 plane
th a t
2 , 1'
( 2 + 1 - 2 ^
2 - 1
(6.3)
For the Level 2 plane we have
/
2,2
V
(6.4)
2 + 2 - 2
2 - 1
From observation, this agrees w ith the lattice in Figure 6.2, thus the first
p art of the induction is proven for two attributes and Levels 1 and 2. Also,
Equation 6.4 proves Theorem 1 for this initial case since from observation the
m axim um cut set is two which is predicted by Theorem 1.
Now we will assume th a t Equation 6.2 holds true for the case of “a” attributes,
and prove th a t Equation 6.2 holds for a + 1 attributes. Substituting a + 1 into
equation 6.2 yields
/
a+ 1,1
/
V
a + 1
a
(6.5)
C L + 1 + I — 2
a + 1 — 1
We will now show th a t Equation 6.5 holds true. First we extend the concept of
dominance to these layers by defining layer dominants. Layer Lx of incom parable
clusters dom inates layer Ly of incom parable clusters if there exists one member
of layer Lx th a t dom inates one member of layer Ly. Using this definition, layer
101
Pa,i dom inates layer P a ,2 and in general, layer P a j dom inates layer P a j for i <
j-
Each of the layers P a,i... P a > N can be viewed as a partition of a single at
tribute, with P a > i associated with cluster (1), P A ,2 associated with cluster (2)
and in general, Pa,» associated with cluster (i). Consistent with our previous
cluster numbering, cluster (l) dominates cluster (2), and cluster (i) dom inates
cluster (i+1).
In general, there will be N such clusters (1) through (N), where N represents
the num ber of clusters into which a single attribute has been partitioned. These N
hierarchically ordered clusters can be used to form N additional non-com parable
clusters, where these new clusters have one additional attribute. They thus form
the set of non-com parable planes for th at case in which there are a + 1 attributes.
This approach was suggested by an inductive m ethod of constructing gray codes
presented in (Wakerly 1990).
The most dom inant cluster can be formed from a pair, consisting of one in
the first position, and the m ost dominant plane of the “a” attribute case as the
second member of the pair. This results in the cluster (1,1), which forms Plane 1
for the a + 1 attribute case and has the cardinality indicated by P 3|1, since this
was the plane used in the construction. This has cardinality one.
The next m ost dom inant cluster can be formed by using a “1” or a “2”
in the first position, along w ith a cluster drawn from the two m ost dom inant
clusters in the “a” attribute case. Recall th a t the resulting cluster m ust consist
of non-comparable clusters. This is accomplished by taking the m ost dom inant
first element and associating it in the cluster pair w ith the least dom inant second
element. Thus, if the first pair consists of (1,2), the next element would consist of
(2, 1), which has the least dom inant first element paired w ith the m ost dom inant
second element. Recall th a t the first element is drawn, w ithout replacement,
from the numbers 1 to 1, where 1 corresponds to the layer of the case a + 1 th a t
we are populating. The second element is drawn, w ithout replacement, from the
102
1 m ost dom inate clusters (which were the layers) of the “a” attrib u te case. The
cardinality of this case is ju st
P.-H,2 = E?=,P.,i-
We can build the next set of incom parable clusters by taking the first three
layers of the “a” attrib u te case, and building attrib u te clusters for the “a + 1”
case by constructing pairs of clusters. The first element is drawn in order of
decreasing dominance from 1 to 3, and the second element is drawn from the
first three layers of the “a” attrib u te case, w ith the layers drawn in increasing
dominance (beginning w ith layer 3). This results in th e clusters (1,3), (2,2) and
(3,1). The cardinality of this layer is given by
P a + 1 ,3 “ £ f = l P a ,i -
In general, we have th a t
P.+1.1 = E p .,i- (6.6)
i = l
From the definition of P aj and the binary coefficient, this is equivalent to
P, a+l,l
Since
V
o — l
o — l
( \
f , \ (
a ft -f 1 o 2
+
o — l ,
+
a “ 1
+
o —l ,
-j- . . . -\~
o -f-1 — 3
o —l
\
\ /
+
V
o I — 2
o —l
\ /
= 1 =
J V
a — 1
o — l
, we can make this substitution into Equation 6.7, w ith the result th a t
f \ ( \ ( , \ / _ \
P i+l,2 = + • . . +
o -j-1 — 3 o + l — 2
+
o — l o —l
(6.7)
(6.8)
(6.9)
103
Using a relationship from combinatorics taken from (Bogart 1983), we know
th at
/ • i \
% — 1
k — 1
+
1 i - 1
V
k
(6.10)
Using Equation 6.10 in Equation 6.9, we can combine the first and second
term s to yield:
' . + U
a+l,J
+
(
+
< x 4- 2
A
a ~ 1 )
+
a , 1 — 2
V a “ 1 J
(6.11)
Again applying Equation 6.10 to the first and second term s of Equation 6.11
we have
a+ 1,1
(
V
a + 2
a
+
a
/
& -j~ I — 2
a — 1
\
(6.12)
We can continue to combine the first and second term s until there is ju st one
term . Since we started w ith “a” as the upper term of the binomial coefficient
and 1 term s can be combined 1 - 1 tim es, the result is (a + 1 - l) as the upper
term of the binom ial coefficient after all combinations are completed. T he lower
term is always (a). The result is as follows:
a+ 1,1
/
I
(6.13)
o, -f- Z — ]
a
This is equivalent to Equation 6.5 which we were trying to prove, so this
completes the attribute induction proof of Equation 6.2. The next part of the in
duction proof will perform the induction on the num ber of clusters per attribute.
The F a,i term s are actually term s in Pascal’s triangle. Using Equation 6.10,
we can construct new term s in the triangle using the two term s th a t are adjacent,
in th a t they are either lower in attribute num ber or cluster number.
104
To get this induction started, we note th a t all binomial coefficients w ith a
zero for the lower term are equal to one. This is shown in the following:
( • A
%
(6.14)
Applying Equation 6.2 to Equation 6.14, since the lower p art of the binomial
coefficient is zero, then A m ust be 1. Substituting A = 1 into the upper expression
in the Equation 6.2 binomial coefficient yields
1 + L -2 = i,
Solving for L we have L = i + 1. These term s are ju st the term s Pi.i+i-
Knowing this for all i, we can then inductively build the desired term s. For
the induction proof, we assume th a t E quation 6.2 holds for some level 1 , and
want to prove th a t for 1 + 1 we have the following after substituting 1 + 1 into
Equation 6.2 and simplifying:
/
A ,H U
\
V
(6.15)
J
A + I — 1
A — 1
To begin, we assume th a t we have the results of Equation 6.14 and know P a > )
for all a and all levels up to 1 , but not for 1 +1 . Using Equation 6.15 for 1 + 1
and Equation 6.10, we have the following
a ,1+ 1
a + I — 1
( , „ A
o , + I — 2
(
t “ - 1 j
k a ~ 2 , I
(6.16)
Referring to Equation 6.2, setting the lower p art of the binom ial coefficient to
A - 1 and solving for A, we have th a t A = a - 1. Equating the upper portion of
the binomial coefficient of Equation 6.2 w ith the upper portion of the first term
of the binomial coefficient of Equation 6.16 yields
A + L - 2 — a +1- 2
Now substituting A = a - 1 yields
a - l + L - 2 = a + l - 2
Solving for L we have L = 1 + 1
105
Thus the first term of Equation 6.16 is equivalent to P a_lfi+1.
For the second, we equate A - 1 from Equation 6.2 w ith a - 1 of Equation 6.16,
which yields A = a.
We then equate
A + L - 1 from Equation 6.2 w ith a + L - 2 and solve for L. This results in
L = 1. Thus the second term is equivalent to P a > i.
Combining these two results, we can rewrite Equation 6.16 as
Pa,1+1 = Pa-1,1+1 + P ft,l (6.17)
Since the second term to the right of the equals sign in Equation 6.17 is in
term s of 1 , these values are all available by the assum ptions of the induction.
The problem is w ith the first term , which is in term s of 1 4- 1, which we are
attem pting to prove.
The key to solving this is to solve first the equation for a = 2, which, when
substituted into Equation 6.17, yields:
P 2,1+1 = P 1,1+1 + P2.1 (6.18)
For the discussion following Equation 6.14, we know th a t Pi,i+i = 1; thus we
can calculate P 2,i+i-
Knowing P 2,i+i allows us to use Equation 6.17 to calculate P 3,i+i, which is
equal to
P 3,1+1 = P 2,1+1 + P 3 ,1 (6.19)
W ith this inductive approach we can calculate all of P a,i+i term s. This com
pletes the inductive proof of Equation 6.2.
We now substitute A attributes w ith N clusters per attrib u te into Equa
tion 6.2. This yields
106
P a.n
(6.20)
This is ju st Equation 6.1. Thus for any set of attributes and clusters per
attrib u te, this represents the largest incomparable set of clusters th a t can be
generated, based on the details of the proof of Equation 6.2. Hence, Equation 6.1
is proven.
Having described the overall presentation approach, the next section will look
at how lattice clusters are formed in a distributed environment such as QBC.
6.2.4 Distributed Cluster Formation
Having presented the concept of the entity cluster, this section will consider how
such clusters can be formed in the distributed QBC environment. As noted, the
cluster is formed based on the d ata instances th a t comprise the entity set th a t
the cluster represents. The d ata available to the query committee for a single
entity set can exist as one of the following three cases:
• A single agent has all of the d ata for a single entity under its control and
view.
• T he entity is split vertically, 5 such th at no single agent has control or view
of all attributes of the entity.
• The entity set is split horizontally, such th at each agent has values for all
attributes of the entity for the entity instances th a t it controls, but a single
agent cannot view all instances of a single entity set.
In the first case, the agent responsible for the d ata can easily generate the
incom parable, dom inant display set of clusters and provide these to the black
board for consideration by the user. The agent simply discards any cluster th a t
5 Vertically, as if a vertical line were drawn through a relation in a relational database, such that
the attributes and associated key were partitioned among two databases. Additional partitions
could be drawn to further subdivide the database.
107
is strictly dom inated by another cluster until an incom parable, dom inant display
set results.
In the second case, a single agent cannot view all the attributes, and hence
cannot generate the display set in isolation. It m ust collaborate w ith the other
agents of the query comm ittee th at control the d ata for other attributes of the
entity.
Initially, each agent subm its to the blackboard the entity key and associated
clusters for each entity instance th a t has a t least one of its attrib u te values within
the first quartile for some attribute. This is called the agent’s first set (AFS). It
is assumed th a t a canonical entity cluster has been agreed to by the agents, such
th a t the position of a particular attribute in the cluster for the total entity is
known. Since the entity cluster canonical form is known to each agent, then each
of the clusters in the AFS satisfies this canonical form, w ith those attributes not
supported by a particular agent indicated w ith an “x” . Each agent subm its its
AFS.
The combination of A FS’s is called the entity first set (EFS). T he EFS consists
of a set of partial clusters6represented in entity cluster canonical form and the
associated key values. The next phase of cluster form ation is to join those partial
clusters th a t share the same key. The cluster join can be perform ed by merging
the two clusters, such th a t an attribute cluster value from one cluster will replace
the V th a t acts as a place holder in the other partial cluster. If all of the x ’s are
elim inated from an entity cluster after completing all possible joins, this entity
cluster is called a complete entity cluster.
After all clusters w ith identical keys are joined, the next step is to try to elim
inate those clusters th at are dom inated by some complete cluster. To accomplish
this, the x ’s in all of the partial clusters are replaced w ith the value “2” . This
represents the best possible resolution of these unknown attrib u te values. Note
6Partial in the sense that not all attributes w ithin the cluster have an attribute cluster number
(e.g., quartile number), since some have an “x” that indicates that the agent that posted the
cluster did not support that particular attribute.
108
th a t in no case can an “x” be replaced w ith an attribute cluster value of “1” since
all entity instances w ith attributes containing attribute cluster “1” have already
been included in the EFS. A cluster is elim inated only if it is strictly dom inated
by some complete cluster in the EFS. It is not elim inated if it is dom inated by
some partial cluster th a t has the 2-replacement, since (as noted) this represents
the best possible result for this partial cluster attribute. The actual result might
be much less th an the cluster formed w ith the 2-replacement.
After performing this 2-replacement-based cluster elim ination, any rem aining
partial clusters are used as capabilities to a ttrac t the necessary attrib u te cluster
values to replace all “x” indicators w ith the actual attribute cluster values.
Following this, all clusters th a t are strictly dom inated by any other cluster in
the EFS are elim inated. T he result is a set of non-comparable dom inant clusters
th a t represents Version 1 of the display set, which is called DSi.
The reason th at DSi is not the final display set is th a t each agent has only
partial knowledge. Consider the case in which there exists a filter th a t, based on
m agnitude, elim inates from consideration a large group of entity instances. Let
there be two agents, w ith Agent A supporting a set of attributes and Agent B
supporting a different set. The only common attribute they share is the key to
the entity set. Each agent supports all instances of the entity set, b u t contains
only the instance values for the attributes th a t it supports. Agent A has access to
m agnitude d ata and can perform the elim ination of all instances th a t do not have
the proper m agnitude. Agent B has access to instances th a t include attributes
other th an m agnitude. W hen it is forming its AFS, it may include in the cluster
form ation of its first set instances th a t would have been elim inated had it known
their m agnitude. Thus, in a sense, its first set is flawed, since it includes cluster
values th a t will not be joinable w ith any d ata from agent A. In the worst case,
the agent’s first set includes only d ata th a t would have been elim inated had the
m agnitude been known.
In the course of the algorithm presented previously, those partial clusters
based on first-set d ata from Agent B th a t cannot be supported by Agent A
109
will never be converted to a complete cluster. In such a case, these incomplete
partial clusters m ust be elim inated, and Agent B m ust propose those clusters th a t
are in the second set (e.g., for second quartile data) for each of its attributes.
The algorithm is then re-executed. If no incomplete partial clusters exist after
processing, then the algorithm term inates successfully. If not, then d ata from
the third-ranked cluster will have to be added to the blackboard. This process
continues until there are no clusters th a t cannot be completed. The result is
the display set, which represents a dom inant set of non-com parable clusters th a t
represent all d ata for this entity available to the query comm ittee. This concludes
the case of vertical partitioning of the entity set.
In the th ird case, horizontal partioning, each agent proposes its dom inant,
non-com parable set of clusters and posts them to the blackboard. This total set
is then processed to elim inate any clusters th a t are strictly dom inated by an
other cluster in the set. The rem aining set of dom inant, non-com parable clusters
represents the display set.
110
Chapter 7
Implementation and Experimental Results
This chapter describes the QBC im plem entation and the results of applying
the system to a set of representative queries in the astronomy dom ain. The
im plem entation seeks to test the m ain concepts described in the previous sections.
The implementation does not represent a complete im plem entation of all aspects
of the QBC design th a t has been described. The discussion will begin w ith a
description of the overall system components, and then describe those details of
the implementation th a t are especially significant. Finally, the results of applying
the implementation to a representative set of queries in the astronom y dom ain
are presented.
7.1 QRC Implementation
This section will consider the operational primitives of the QBC im plem entation,
the control approach used and details of the polytyped term , a critical com ponent
of the QBC implementation.
7.1.1 QBC Primitives
The various facets of query elaboration identified in this research can be imple
m ented as a sequence of the following generalized types of elaboration primitives:
1. Term discovery given context (TD-C)
111
2. Context discovery given term (CD-T)
3. Dimension instantiation (DI)
4. Filter posting (FP)
5. Lattice-based clustering (LBC)
The TD-C prim itive attem pts to discover hypotheses for unspecified term s.
In the d ata dimension, TD-C generates hypotheses for unspecified entities. In
the real dimension of a world, TD-C elaborates activities or entities into activity
primitives. TD-C operates as a two-phase procedure. The first phase generates a
hypothesis based on the m atch on a single context term . The second phase then
selects those hypotheses th a t are consistent w ith all of the context provided. If
complete consistency can not be achieved, then those hypotheses which have the
highest num ber of m atches with the context term s of the unspecified type are
selected.
The CD-T primitive attem pts to discover additional context given to a term .
In the d ata dimension, CD-T looks at entities and attem pts to find additional
attributes th a t are associated w ith them . In the real dimension of the prim ary
world, the CD-T m atches on an activity primitive and attem pts to find missing
activities or entities th a t should be associated w ith the activity. In the case of a
w-relationship, the CD-T context discovered can actually be a high-level repre
sentation of a prim ary world th at is expressed in term s of an activity prim itive
or activity cluster.
The TD-C and CD-T primitives are performed by the QBC entity-elaboration
function.
The DI prim itive attem pts to build requisitions based on the term s and asso
ciated context in the query kernel. In the real dimension of the prim ary world,
the DI builds the data requisition th at is required to support the real world ac
tivities described. This requisition becomes the d ata dimension of the prim ary
112
world, so in essence, the DI is instantiating a new dimension (a d ata dimen
sion), based on the inform ation provided in the real-world dimension. For a
w-relationship, although not implemented in the QBC prototype, the DI could
build the transform ations required to m ap a specification world into a prim ary
world. An alternative approach, implemented in QBC is to handle the associated
prim ary world specification w ith the w-relationship as one of its context term s.
The other context term is the one representing the specification world. These
term s can be activity prim itives or activity clusters; thus, they represent a fairly
powerful construct.
The DI primitive is perform ed by the QBC prd-elaboration function, where
“prd” refers to prim ary world, real dimension.
The F P primitives are used to determ ine if the relevant solicitations exist on
which to post the filter. If this is the case, then these primitives check to see if
the proper triggering conditions exist. If so, the filter is posted. If not, the a
solicitation of the triggering condition is posted. If the triggering conditions are
posted, the the FP primitive posts the filters on the appropriate attribute.
The FP primitives are im plem ented with the QBC filteragent for posting the
filter or the solicitation of the trigger, and dateuserinterfaee providing the user
interface for solicitations requiring dates.
Lattice-based clustering builds the display set, subject to any filters. This
represents the set of non-com parable entity clusters th a t dom inate all other clus
ters th a t represent the actual d a ta held by the query com m ittee th a t is consistent
w ith any filter th at may be active. This is implemented w ith the QBC filterpost-
cluster function.
While each agent in the current QBC implementation is assumed to possess
each of these elaboration prim itives, this is not required. Agents th a t possess
knowledge relevant to only a single dimension would, in general, need only the
TC-C and DC-T primitives. O ther agents may be concerned only w ith the linking
of one dimension to another, as in the generation of a requisition in response to
113
activity primitives in the real dimension of a world. This type of agent would
require only the DI primitives.
The QBC system can be viewed conceptually as a collection of agents with
homogeneous operational characteristics, each representing a particular database
or knowledge base. However, for the purposes of the prototype, QBC was im
plemented as a list of databases and knowledge bases th a t are used by a single
stack of elaboration primitives to elaborate th e query kernel. In essence, the
agent functions are successively applied to each database and knowledge-base
to see w hat effect the application has on the current state of the query ker
nel. This elaboration choice was selected because it adequately dem onstrated
the distributed elaboration techniques, but avoided the need for a more elab
orate testbed - a research project in its own right. A generalized blackboard
system , GBB, was considered, but its primitives were not particularly useful for
the operations required by QBC (Gallagher, Corkill and Johnson 1988).
7.1.2 Distributed Elaboration Control
Agents participate in distributed problem solving when they have som ething to
contribute. In the distributed vehicle m onitoring testbed, an agent participated
when the vehicle was w ithin its physical area of responsibility, or when it was
notified by another agent th a t a vehicle m ight be in its area of responsibility
(Lesser and Corkill 1983). In Hearsay-II, agents participated when d ata appeared
at the level of the blackboard th a t the agent m onitored for input (Erm an, et.
al 1980). In QBC, there is no concept of physical proximity or levels of speech
parsing, as in these other systems. In QBC, the basis for agent participation
is affinity w ith the data contained in the evolving query state. This affinity is
based on an agent’s recognition th a t the current state of the query satisfies two
preconditions:
• There m ust exist a non-null intersection between term s known to the agent
and term s used to describe the current state of the query elaboration and
114
• The agent m ust possess d ata associated w ith this intersection th a t either
identifies an unspecified term associated w ith a specified context, provides
additional context for a specified term or provides d ata required to move
into a new dimension.
One of the concerns in query elaboration is th a t the elaboration m ust not
grow in a uncontrollable m anner. In QBC, this growth control is provided by
recognizing layers in the query elaboration process and restricting processing
to adjacent layers. The innerm ost layer of query elaboration is elaborating the
unspecified term . In the d ata dimension, this results in the generation of entities
th a t satisfy the context provided. The next layer of processing is to elaborate the
context, based on the term s contained on the blackboard. For example, in the
data world only entities are elaborated w ith additional attributes. Any existing
attributes of an entity are not elaborated; thus, growth is controlled. In the
real dimension, the next layer is to accumulate any relevant requisitions. The
final layer of processing is to add the filters and clusters to the context term s
on the blackboard. Thus, the elaboration begins at the central term s (e.g., the
entity in the d ata dimension, the AP in the real dimension), moves to the context
term s, then to the requisitions, and finally to value restrictions and display in
the final phase of the elaboration. W ith this four-phase approach, growth is
controlled, while the application of d ata held by the query com m ittee to the
query elaboration is carefully orchestrated.
If the user is particularly interested in the elaboration of some newly discov
ered context, then the user can decide to use this context term as the focus of a
new query kernel. By placing the user in the loop to indicate when an interesting
context should be elaborated, the query com m ittee is prevented from indulging
in uncontrolled growth.
115
7.1.3 Polytyped Term
A critical component of the QBC implementation is the polytyped term . The
polytyped term supports the concept of the unspecified term , as well as the
concept of term s having a set of synonym names th a t characterize the type of
d ata represented by the term . This ability to be known by m ultiple names is the
reason for calling the term polytyped. This section will describe the polytyped
term .
Ullman characterizes the relational system as value oriented, since the identity
of tuples is based on values of the attributes of the tuple, not on some location
in storage (1988). Because of its distributed nature, QBC also takes a value-
oriented approach. M atches between d ata held by the agents and d ata on the
blackboard are based on value m atches. In contrast to the relational model,
these values may represent the values of actual d ata held by the agents, d ata
type names, AP names or w-relationship names - or associated components of
any of these structures. Using matches based on values, agents transfer relevant
knowledge or d ata from their database to the blackboard.
The m ain d ata structure used for QBC is the polytyped term . The polytyped
term is used to represent both the data th a t describes the agent’s d ata (the m eta
data), as well as the evolving state of the query elaboration on the blackboard.
The following lisp structure represents the m ain components of the term along
w ith their default values:
(defstruct term
(ttype nil)
(unspecified nil)
(synl nil); populated w ith structures of type syn
(hypothesis nil) ; populated with structures of type hypoth
(req nil)) ;the requisition
The ttype represents the type of the term , which can be an entity, activity,
activity prim itive, activity cluster or w-relationship.
116
If the term is unspecified, the unspecified slot is set to true, which will be a
signal to the members of the query com m ittee th a t hypotheses should be gener
ated and resolved to determ ine the type of this term .
For an unspecified term , the hypothesis slot holds the list of hypotheses th at
the agents have generated, b u t which have not yet been transform ed into consis
tent entity candidates (CEC).
T he syn list (synl) contains the list of names th at are, in some sense, synonyms
for this term . W hen a hypothesis is transform ed into a CEC, this CEC is added
to the syn list as a nam e for the term .
The requisition (req) holds the m apping between this term and term s in
another dimension. If the term is an activity primitive, the requisition represents
A-sols and A-caps for d ata required to support the particular activity primitive.
If the term is a w -relationship, then (although not in the current implementation)
the requisition could hold th a t which is necessary to transform one world into
another.
The rem ainder of this section will describe pertinent aspects of the syn, hy
pothesis and requisition.
7 .1 .3 .1 Syn
As noted, the syn list is populated w ith syns. The lisp structure for the m ain
components of a syn is as follows:
(defstruct syn
(name)
(context nil)
(inherit nil)
(propagate nil))
The syn name slot holds one of the external names by which this term is
referenced. Since a term can be used to represent an entity, attribute of an
entity, activity, activity prim itive or w-relationship, this nam e represents one of
117
the names under which this structure is known. If this structure can be known
by other nam es, these other names will appear as the nam es of other syns in the
synlist for this term .
While some of these other names may be synonyms in the strict sense of the
word (meaning identical objects), other members of the syn list are synonyms
in the sense th a t the nam e is closely related to other names on the list as a
specialization or generalization.
In QBC, a specialization is defined as a term th a t inherits all of the charac
teristics of its generalizations. Similarly, a generalization propagates its charac
teristics to all ts specializations. The inherit slot of th e syn indicates those syns
from which the characteristics of this term are to be inherited. The propagate
slot of a term indicates the term s to which the characteristics of this slot are
to be propagated. If term A propagates to term B, in QBC this indicates th at
term A is a generalization of term B. It is also true th a t if term A propagates
to term B, then term B m ust inherit from term A. While both of these are true,
these relationships need only be partially specified, since the relationships are
sym m etrical, and one can be filled in if the other is known.
Any specialization or generalization relationships are indicated implicitly by
the inherit and propagate list. If two syns are truly synonyms in the English
sense of the word, then syn names representing each of the synonyms would
propagate to and inherit from this synonym.
The context slot holds a list of the context term s associated w ith this partic
ular syn.
7.1.3.2 H y p o th esis List
Each of the hypotheses is represented w ith a hypothesis structure of the following
form:
(defstruct hypoth
(name) ; populated w ith name
118
(context nil)) ;populated w ith structures of type term
The name slot indicates the name of this hypothesis. Thus in the example
given earlier, if the user specifies a query kernel th at is looking for an unspecified
entity w ith size and m agnitude, one of the entity hypotheses is galaxy. In this
case, galaxy would be the name assigned to the nam e slot.
The context represents th at aspect of the query kernel th a t triggered the agent
to propose the hypothesis. To understand how this works, we again consider the
galaxy and variable star databases. Since the context of galaxy includes both
size and m agnitude, the agent representing the galaxy database can propose
galaxy as a hypothesis w ith m agnitude and size as the listed context. An agent
representing variable double stars would propose variable star as a hypothesis
w ith only the m agnitude context, since a stellar object looks like a point light
source w ith no size. At this point, we have two hypotheses: galaxy w ith context
m agnitude, and size and double star w ith context m agnitude. Since the context
required for the unspecified term is a subset of the context of galaxy, galaxy will
be selected as a concrete entity concept (CEC). However, since m agnitude and
size are not subsets of the context of double star, double will not be selected as
a CEC.
If no hypothesis completely satisfied all of the context, then the best hypoth
esis would be selected for the CEC, where best is the hypothesis w ith the highest
num ber of m atches on the target context.
7.1 .3 .3 R eq u isition List
W hile the hypotheses, syns and associated context represent descriptive data
th a t is, in some sense, at the same level of abstraction, the requisition represents
d ata th a t is at a different, finer level of abstraction. The requisition holds the
m appings from one dimension to the next th a t have been elaborated by the query
comm ittee. The requisition contributed by the agent forms the basis for a higher
specificity dimension. For example, a requisition in the real dimension represents
119
the d ata required to support the activity indicated by the activity clusters and
primitives th a t contain the requisition. This requisition forms the d ata dimension
th at is used for the next layer of QBC processing. In the d ata dimension, the
requisition holds solicitations th a t can be used to elaborate clusters th a t are
consistent w ith any filters th a t are in effect. These layers are analogous to the
levels in the Hearsay II blackboard architecture (Erm an, et. al 1980).
7.2 QBC Experimental Results
This section will consider a num ber of scenarios of the QBC implem entation.
These scenarios begin in various dimensions. All were run on A ustin Kyoto
Common Lisp Version (1.485) on a Sun SPA ECstation IPC.
7.2.1 Data World Elaboration of Entity
The first scenario represents an example of direct, entity elaboration taken from
scenario 10 in the QBC listing of appendix C. The results are presented in sim
plified form for ease of reading. Appendix D includes the complete listing of this
example. This scenario takes place in the d ata dimension. The query kernel
consists of an unspecified term of type of entity w ith a context of m agnitude
and size. For this example, the query com m ittee consisted of agents representing
galaxy, nebula, variable star and double star databases.
W hile some elaborations may be able to elaborate the initial syns provided,
such is not the case w ith this example since the syn is unspecified. This is shown
in the listing as no change after syn elaboration.
The results of hypothesis generation consisted of the following hypothesis
term s and associated context term s:
1. variable star w ith context: m agnitude
2. double star w ith context: m agnitude
120
3. nebula w ith context: size and m agnitude
4. galaxy w ith context: size and m agnitude
Since nebula and galaxy completely satisfy the constraints of having a context
of m agnitude and size, they are selected as the consistent entity candidates. If
no entity satisfied these constraints, then QBC would have selected those entities
th at came the closest.
The next phase of processing seeks to discover additional attributes th a t are
associated w ith galaxy and planetary nebula. The results of this phase for the
galaxy term are as follows:
1. size
2. m agnitude
3. shape
4. distance
5. surface-brightness which is a syn to m agnitude
6. declination in m inutes
7. declination in degrees which is a syn to declination
8. right ascension in minutes
9. right ascension in hours which is a syn to right ascension
10. ngc num ber
11. messier num ber
Analogous results were also obtained for the nebula.
A galaxy filter agent, recognizing th a t galaxy exists as a syn and has the right
ascension hour has an attrib u te, looks for the triggering conditions th a t would
121
Table 7.1: Galaxy Identification D ata
G a la x y C lu s te rs
Messier NGC RA-hr RA-min Dec-deg Dec-min Cluster
M51 5194 13 29.89 47 12 (1 2 2 3 1)
M63 5055 13 15.8 42 2 (3 2 2 3 1)
M94 4736 12 50.89 41 7 (113 3 1)
M101 5457 14 3.5 54 21 (1 4 111)
M102 5866 15 6.5 55 48 (4 1 4 2 3)
M106 4258 12 19.0 47 18 (1 2 1 3 1)
M108 3556 11 11.6 55 40 ( 3 1 3 2 1 )
M109 3992 11 57.7 53 22 (3 2 3 1 1)
invoke this filter. These triggering conditions are the date and tim e. Not finding
these, it posts of solicitation for date and time.
A user interface agent responds to requests for date and tim e. It observes this
solicitation and requests date and tim e from the QBC user. The user provides
M arch 23, 1991 at 2100 hours. W hen these are posted, the filter agent imposes
the right ascension filter based on the date and tim e supplied. This translates
into a right ascension range from approxim ately 9 to 15 hours. It filters out all
right ascensions th a t are not contained w ithin this six hour period.
The data satisfying this filter are then presented as the display set of dom
inant, non-com parable lattice-based clusters. Note th a t the clusters are stated
relative to the d ata th a t satisfies the filter, not the entire d a ta in the galaxy
database. Hence the quartile values will be different th a t those shown in chap
ter 6 which included the entire database. For the purposes of this exam ple, the
identifying characteristics for all d ata satisfying the filter are shown in table 7.1.
The attributes on which the clusters are formed are the following:
1. MAGNITUDE
2. SURFACE-BRIGHTNESS
3. SIZE
122
4. DISTANCE
5. SHAPE
The following represents the dom inant, non-com parable set of clusters:
• (3 2 3 1 1)
• (3 1 3 2 1 )
• (12 13 1)
• (14 1 1 1 )
• (113 3 1)
The quartile values associated w ith each attrib u te include the following, where
each pair of num bers represents the range, w ith th e first quartile range listed first:
1. M AGNITUDE (8 8.5) (8.5 8.5) (9 10) (10 10.5)
2. SURFACE-BRIGHTNESS (9 9) (9 10) (10 10) (10 11)
3. SIZE (20 12) (10 10) (8 6) (6 3)
4. DISTANCE (40 40) (30 30) (20 20) (20 20)
5. SHAPE (S E L I)
The cluster cut-set is also presented in term s of the quartile value ranges,
where each line represents a different cluster. The first range shown is for mag
nitude, the second for surface-brightness, etc. For the GALAXY clusters, the
following represents the cut-set presented in term s of the value ranges for each
of the attributes:
1. ((9 . 10) (9 . 10) (8 . 6) (40 . 40) S)
2. ((9 . 10) (9 . 9) (8 . 6) (30 . 30) S)
123
3. ((8 . 8.5) (9 . 10) (20 . 12) (20 . 20) S)
4. ((8 . 8.5) (10 . 11) (20 . 12) (40 . 40) S)
5. ((8 . 8.5) (9 . 9) (8 . 6) (20 . 20) S)
7.2.2 Real-World Elaboration of AP
In the next example, taken from scenario 4 in the QBC listing, the query kernel
initially contains an unspecified term of type activity primitive with context
view. The goal of this elaboration is to find an activity primitive, the associated
data requisition and then perform any required elaboration on it. The activity
primitive hypotheses that are generated consist of the following:
1. view*deep-sky with context: view
2. view*deep-space with context: view
Since both of these activity primitives satisfy the initially specified context,
both are accepted. Since activity primitives have only binary context, additional
context discovery provides no additional information.
The next step is to acquire the requisition. In this case, we will follow through
on the view*deep-space activity primitive. The requisition associated with the
view*deep-space activity primitive consists of an unspecified term with the con
text of right ascension and declination. W hat this says is th at, irrespective of
the deep-space object, in order to view it one must have the d ata necessary to
locate the object - this d ata is the right ascension and declination.
This requisition provides the basis for further elaboration in the data dimen
sion. In this dimension, the hypotheses generated for the unspecified term and
the associated context are as follows:
1. star with context: declination and right ascension
2. variable star with context: declination and right ascension
124
3. double star with context: declination and right ascension
4. nebula with context: decimation and right ascension
5, galaxy with context: declination and right ascension
Since all of these term s satisfy the requirements of attributes of declination
and right ascension, all will become consistent entity candidates.
In the next phase of processing, additional attributes for each of these entities
will be elaborated. For the galaxy, these include the attributes indicated in
the previous example. The attributes displayed for the other entities are those
attributes indicated in the database for those entities, and will not be repeated
here.
7.2.3 Specification World to Data World
In this example the query kernel is defined in the specification world. This query
kernel consists of the entity astronomical instrument and the w-relationship use-
to-instrum ent. In this particular case, the use-to-instrument does not need to
have a requisition of transform ations, since the activity primitive th at is discov
ered as part of the w-relationship elaboration with astronomical instrum ent is
sufficient to characterize the primary world.
Elaboration of this query kernel discovers that the primary world associ
ated with this w-relationship consists of the activity primitive view* deep-space.
Elaboration of this activity primitive results in the unspecified term w ith right
ascension and declination as context, as described in the previous example. As in
the previous example, the ultim ate result is discovery of the various-deep space
objects represented by the agents participating in this particular elaboration that
possess a right ascension and declination.
125
Chapter 8
Conclusions
This chapter will identify the contributions th at this research has made to the
field of computer science, indicate the limits of the results of the research and
identify areas for future research.
8.1 Contributions
This research has opened up a new dimension of database access by allowing
the data, as interpreted by intelligent agents, to assist the domain-naive user in
formulating his query. Just as window icons of the modern personal computer and
workstation have helped provide a m eans for the naive user to discover w hat can
be done in an unfamiliar environment, query elaboration will allow the system
to present a set of options to the user based on very little initial user input.
One of the attractive features of QBC’s query elaboration is that the system will
respond in depth to the initial query kernel provided by the user. Depending on
the nature of the user’s initial query kernel, this response will include real-world
context in term s of activity clusters and primitives, m eta data from the data
world expressed in term s of entities and entity context and lattice-based data
clusters that summarize the data in a small set of options and rationale.
The user then learns from this response and is free to interact with any level
of this in-depth response to refine the query kernel, so the query committee can
better serve the data needs of the user. Just as the icons and associated menus
126
on a personal computer or workstation perm it the user to try things ju st to see if
they have the desired result, it is speculated th at the levels of response provided
by QBC will provide a deep context by which the user can understand what data
is available. He will also have a context th at he can alter to bring the results
closer to what he believes he requires.
It is also im portant to recognize th at a new research area has been success
fully opened in the context of a multiagent system. This distributed aspect is
im portant, since databases come as distinct systems th at are managed by sepa
rate entities. In recognition of this fact, the QBC research has been performed
under the assumption of distinct and nearly autonomous agents. The “nearly”
qualification is im portant, since it is assumed that the agents have consented
to participate in the query committee and as such, they are assumed to have
reconciled their various semantic models to some universal-type network. Thus,
agents can ensure th at they are talking about the same thing, if it exists at the
same point in the type network.
The overall conclusion from the research is th at query elaboration is a very
useful capability to have in a database and based on the QBC research and
prototype, query elaboration is realizable in real databases. In addition to open
ing up the new research direction of query elaboration, the QBC research has
made a number of specific contributions to database and distributed artificial
intelligence;
1. Identified various types of query elaboration.
2. Provided a framework to describe and implement query elaboration.
3. Developed an innovative data presentation m ethod called lattice-based clus
tering.
The contributions in each of these areas will be discussed in the following
sections.
127
8.1.1 T y p es o f Q uery E la b o ra tio n
One of the difficult problems th a t had to be addressed by this research was how
to think about query elaboration. On the most simplistic level, if a query is
represented by a set of program statem ents in some query language such as SQL,
then query elaboration is simply building th at query. But if the user does not
know how to write such a query in the first place, then how can he give the
query committee sufficient information to formulate the query for the user. This
is a problem of how you describe the unknown, and it represented a fundamental
problem in the QBC research. Database browsers do not have this problem since,
in general, users direct the browser through those portions of the database that
appear most interesting. Thus, the user never has to describe the unknown.
While other system provide a query repair capability, they assume th a t the user
begins with something th at can be coerced into a query.
W ith QBC an entirely different approach was desired. The goal was to turn
the data, as represented by an intelligent agent, into something th at would be
analogous to the advisors to which people turn for help on their taxes and legal
affairs, or even the “guru” th at is consulted by those trying to find the Unix1
command necessary to perform some function. In all of these cases, the user is
not an expert in the domain, but supplies the domain expert with information
th at he can use to provide the necessary information to the user. In a sense,
all of these experts are performing query elaboration, by taking the user’s vague
statem ent of need and turning it into the desired information.
The first major contribution th at QBC has made to this area is the concept
of the unspecified type. This perm its the database user to specify an unknown
and then begin to describe some of the characteristics of this unknown. While
conventional database management systems perm it the user to describe desired,
but initially unknown data values for a particular attribute in terms of the values
of other attributes, the user has no way to specify an unknown attribute. In QBC
1 Trade Mark of AT&T.
128
the user can specify an unknown attribute in term s of an unspecified type, and
the associated, desired context. For implementing the unspecified type, QBC has
introduced the polytyped variable. The polytyped term s serves as the unknown,
as well as a collection point for query committee hypotheses regarding the data
types th at may satisfy the context of the unspecified type.
In addition to serving as the unknown, the polytyped variable is also used
to represent th at desired context and, as the elaboration proceeds, the relevant
information th at is made known to the query committee as a whole. A major
contribution is the use of the polytyped variable to build relevant fragments of
the semantic model in which those types used for the context are described. This
enhances the query committee’s ability to make a contribution by widening the
target provided by the context to include specializations of terms. When these are
not available, generalizations are supplied. In this way, if the query committee
cannot provide a result th at is consistent with the initial context, th at initial
context can be altered to one that, although close to the initial context, may be
sufficiently broader to allow a result to be generated.
While the ability to name an unknown and describe its context represents one
im portant means by which the user can describe the unknown, it does not go far
enough. Looking again to hum an interaction, we find th at in many cases, users
cannot begin to describe directly the d ata th at they need. However, they can
describe indirectly describe it in a num ber of different ways. Another significant
contribution of this research is the identification of various means by which data
could be indirectly described. QBC has identified two such ways.
The first allows the user to describe some activity consisting of a noun-like
construct, the entity, and a verb-like contract, the activity. These form activity
primitives, which can further be combined into activity clusters. While similar,
but lower-level constructs can be seen in Shank’s conceptual dependency graphs
(Shank and Abelsonl977), the significant contribution of this research is the
recognition th at the connection between real-world activity, as described by ac
tivity clusters and primitives and data, is the data requisition. Just as expeditions
129
must requisition supplies for the trip, so data m ust, in a sense, be requisitioned
from an appropriate data repository in order to support th at activity. The data
requisition idea represents a new paradigm by which data can be associated with
real-world activity. In addition to identifying this new paradigm, the research also
looked carefully at how the requisition could be built from fragments of knowl
edge about portions of the requisition th at may be held by various members of
the query committee. The research presents this approach as dominance-based
knowledge factoring, by which requisition fragments will flow to those poly typed
variables th at they dominate, in the sense of being generalizations.
While these two methods of talking about the unknown might seem sufficient,
there was one additional approach suggested by Lehnert in her description of
various types of questions for question and answering systems (Lehnert 1978).
Her cause and effect type of questions were distinct from either the specification of
the unknown through an unspecified type or through the requisition for some real-
world activity. This cause/effect type of question is characterized by describing
some real-world activity, and then describing the relationships th at this activity
has to some other real-world activity. To support this type of question, this
research used the concept of a specification world, which the user describes, and
then a specified world-to-world relationship. W ith these, the query committee
can then attem pt to characterize the new world, called the prim ary world. Once
th at is accomplished, then the appropriate data requisitions can be identified and
data options presented.
W ith these three types of query elaboration, QBC provides a very powerful
means by which the user can describe the unknown. While it is left for fu
ture research to determine whether this list includes all possible types of query
elaboration, the initial list is felt to represent a significant contribution to this
emerging research area.
130
8.1.2 Fram ew ork for Q uery E lab oration
As a framework for describing distributed query elaboration, the multiworld,
two-dimensional model, developed by this research, represents a significant ex
tension to database models. In effect, this model extends the database into the
“real world” by providing a means of ascribing a context to data th at can fa
cilitate query elaboration. While semantic models can be used to describe data
in a database, and semantic nets allow objects to be described in terms of their
relationships to other objects, none of these models provides a sufficient context
for determining what data is needed to support some real-world activity. The
QBC model has, in effect, merged the existing models into a uniform model
th at transcends data and real-world activity. The semantic model (in this case a
modification to the ER model (Chen 76)) is used by the query committee to dis
tinguish the unknown entity from its context, using the knowledge inherent in a
semantic model. This represents activity in the data dimension of a world. In the
associated real dimension, the semantic net is used to describe the relationship
of one entity to another.
W ith this model, synonyms of an entity can be identified by following spe
cialization links in a semantic net. However, in addition to requiring a semantic
net to describe noun-like entities, this research has also proposed th at the seman
tic net concept be extended to verb-like activities. These activity-entity pairs,
which are called activity primitives, are related to the data world through requi
sitions. If a particular piece of d ata is needed to perform the activity primitive,
then it would be included in the activity primitive requisition or could be assem
bled into such a requisition during run-time. Such an assembly process would
use fragments of the requisition th at may be scattered among dominant activity
primitives or groups of activity primitives th at have been assembled into a cluster
of activity primitives. To extend the database further, the QBC model includes
the concept of world-to-world relationship (w-relationship) th at can be used to
construct new worlds from specification worlds.
131
All of this has been unified in one model, and, in the QBC implementation, has
all been implemented in terms of a polytyped term s. Hence, there are uniform
functions that operate on polytyped terms irrespective of whether th at term
represents data, activity primitives or w-relationships.
The contribution th at this research has made includes not only the individual
means by which the data, its use and its various real world relationships can be
described, but the fact th at they are all represented in the same model. While
the model has been extended up from the intensional or m eta d ata into the real
world, it has also incorporated the data itself. The means of this incorporation
is the concept of the filter and the lattice-based cluster, which will be discussed
in the next section.
8.1.3 Data Presentation
An extremely im portant result from this research is the development of the
lattice-based clustering approach, by which the actual characteristics of the data
can be described to the user. The clusters are tailored, in real-time, to the spe
cific query kernel provided by the user. They represent a means of describing
the “best” that can be retrieved, based on an ordering of the attribute values.
Just as the spreadsheet has provided a powerful means by which users can ask
“W hat if” questions, the clusters presented to the user provide a means by which
the user can ask “W hat if” questions of the database. The user can then change
the displayed clusters by changing whether the high or low value of a partic
ular attribute is to be considered the most desirable. This simple change will
significantly change the clusters th at are displayed to the user. In addition, by
changing the number of clusters per attribute, the user has a means of limiting
the cardinality of the set of clusters displayed. This provides a variable resolution
approach by which the user can interact with a very high level representation
of whatever represents the “best” options from the actual data. He could be
gin with a large number of attributes and coarse resolution per attribute (i.e., a
132
small number of clusters per attribute), and as he finds areas of special interest,
concentrate on a smaller number of attributes and a higher resolution per clus
ter. In this way, he can keep the cardinality of the set of displayed clusters to a
reasonable limit.
Another significant result of this research is the determination of the upper
bound for the set of clusters displayed to the user in the lattice-based cluster
approach, based on the number of attributes of interest and the number of par
titions used for each attribute. The upper bound is represented in a very simple,
but not intuitive, binomial coefficient. This provides a reasonably small upper
bound, considering the number of clusters that it represents. Also, based on the
actual d ata in the database, the actual set of m utually incomparable and collec
tively dominant clusters displayed to the user may be significantly less than this
upper bound. It should also be emaphasized, that each of these clusters may
represent multiple tuples of information in the query committee database. Thus
the cluster concept itself reduces the cardinality of the data w ith which the user
m ust deal and lattice-based clustering reduces this cardinality even more.
A very attractive feature of lattice-based clustering is th at it provides a set
of best options with very little in the way of hidden default values. While each
attribute value range has a default preferred value, th at preference can be readily
observed by the user in the clusters presented to him, and could be easily changed
to make the opposite end of the scale the preferred end. This provides a very
powerful way for the user to reposition the query with a minimal amount of
work. An im portant aspect of this presentation m ethod is th at the presentation
provides the user with concrete representations of the data for his ineraction.
8.1.4 Distributed AI
While focusing on database and query elaboration, this research, resulted from
a desire to extend distributed artificial intelligence (DAI) into a new area. DAI
has thus been a strong focus of this research. An im portant contribution of this
133
research is th at it has pushed DAI into the database domain at a very im portant
tim e in the history of database technology. Work is underway to make large
databases available, on-line over various wide area networks. One example of
this is the E arth Observing System D ata Information System (EOSDIS), which
will support the various EOS satellites th at are to be launched during the latter
part of this decade (Dozier 1990). The EOSDIS system will serve not only as
a repository for EOS data, but also as a directory for earth science d ata th at
is held by other agencies. EOSDIS and other databases th at exist today will
be accessible over various networks by scientists operating from workstations
around the world. (Lederberg and Uncapher 1989). In the past, much of this
sort of d ata has been buried in the labs of individual scientists, accessible only to
those working on research projects in collaboration w ith the principal investigator
responsible for the d ata or, those who personally request this data from the
principal investigator. Now, however, will be placed in a publicly accessible
repository, from whence it can be accessed by those who not only do not know
the PI, but may not even be totally familiar with the nature or even the existence
of the data. This raises a significant problem th at can be solved by systems such
as QBC. If an investigator is not familiar with data, then he cannot use it, even
if it is relevant. Query elaboration provides an approach by which intelligent
agents will be able to assist scientists working across multiple domains to find
the data they need. QBC offers an initial step to provide such a capability.
8.2 Limitations
This chapter has highlighted the im portant contributions of the QBC research.
Ju st as it is im portant to indicate the accomplishments of research, it is also
im portant to delineate the limits of th at research. This establishes the boundaries
for application of the technology and identifies a point at which future research
can be initiated.
134
One means of assessing the limitations of the QBC research is to assess the
nature of the QBC query options output to the user, relative to the type of query
th at a person could write if he could specify a general query in term s of an ER-
structure. The basic QBC result is specified in terms of one or more entities that
serve as the requisition for some real world activity and its associated context.
This provides the means for the user to discover both entities and attributes of
entities. However, because of the nature of query elaboration and concern that
the growth of the elaboration be controlled, the query elaboration approaches
described will address multiple-entity discovery, but not joined-entity discovery.
Multiple-entity discovery is supported, since the requisition of an activity
primitive can contain multiple solicitations that will lead to the realization of
multiple entities in the data dimension. This is supported by the current research.
W hat is not supported is the case in which there are multiple unspecified entities
joined by some common context.
The simplest joined-entity discovery is the binary relationship th at relates two
unspecified variables with some shared context. As an example, suppose th at the
user’s initial query kernel consisted of two unspecified types joined by the shared
entity “size” . Assuming a more extensive query committee than the current
QBC implementation, this could result in agents posting galaxy and planetary
nebulas, as we have seen, but also countries, states, cities, and colleges. As can
be noted from this example, while all of these entities share the attribute size, the
relationships between these various entities varies from relatively close to very
distant.
A more extreme example of joined-entity discovery is unconstrained, transi
tive discovery. Thus, a user might begin with an unspecified entity w ith context
astronomy. This could lead to the discovery of planetarium s th at are located
in cities th at have universities th at have football teams, which have schedules.
Thus, we end up with astronomy providing the context for the discovery of a
football schedule. The problem with any type of joined discovery is th at the
sequence of joins must be both individually and collectively meaningful. This
135
represents an example of w hat M otro calls the connection trap, and was one of
the concerns in the development of query elaboration (Motro 1986). Because of
this concern, QBC was placed on a very short “leash” so it would not embark
upon a meaningless transitive discovery.
This short leash consists of a single unspecified type th at forms the target of
the elaboration, or a specified type for which additional context is sought. If, in
the course of context discovery, an interesting context is discovered th at the user
wishes to pursue, he can then use this as the basis for a new query kernel, with
the interesting context as the sole context for a new unspecified type. In this
way, under user control, QBC can be used to explore transitive relationships.
In term s of a relational database query, this limitation means th at while joins
involving keys related to a single entity are supported (this would be context
discovery) a general join between attributes of different relations would not be
supported.
Another area in which QBC will not discover a query involves relationships
among actual data values. For example, a person could formulate a query to
find all of the double-stars th at are within 10 degrees of a galaxy. This is not
the type of query th a t QBC was designed to address. QBC could help a novice
discover double-stars and galaxies and the attributes of these, including their
right ascension and declination. At this point, the user has sufficient informa
tion to formulate a query to search the database for all double stars th at have
a right ascension and declination within 10 degrees of those for a galaxy. It is
not clear how any query elaboration system could arrive at such a highly specific
query from the myriad of such queries th at could be w ritten involving relation
ships between specific data values for the attributes of various instances in the
database. One of the strong focuses of the QBC research has been to provide
results th at are usable and credible. To provide such a specific query stretches
one’s credulity.
136
8.3 Future Research
One of the Im portant areas for future research is to extent query elaboration
to a geographically distributed, network environment. The initial research uses
the very convenient blackboard architecture for thinking about and implementing
the QBC system. A blackboard provides a convenient basis for query elaboration
since the whole query committee is assumed to surround the blackboard. Thus,
one does not have to address the problem of how to involve the necessary agents.
While the user of a blackboard for query elaboration represents an im portant
first step, and provides the basis for addressing some of the critical questions in
distributed query elaboration, it is only a first step. W ith the advent of networks
and distributed databases, the next step is to extend this work to a network. This
extension of the research would be concerned with issues such as how the relevant
agents are to be discovered and involved in the query elaboration. Broadcasting
a solicitation to all members of the network, although used in the contract net
(Smith 1980), may not be too realistic for a worldwide network of autonomous
systems. The first challenge is to find means by which agents can be discovered,
possibly via recommendations by other agents, much as one finds specialists in
unfamiliar medical fields via recommendations from family doctors.
The second challenge is to investigate means by which the amount of data
exchanged can be kept to a minimum. In the QBC research, all of the agents
view all the data on the blackboard. While this still might have to be the case
for a geographically distributed environment, efforts should be applied to find
ways to reduce the amount of data required. One possible approach would be
for an agent, in a sense, to send a representative to a query committee meeting.
T hat agent would possess some basic knowledge such th at it could decide what
had to be sent back to the main agent for resolution.
Another area for future research suggested by the QBC research is the general
problem of collaboration among expert systems. While QBC has addressed this
to some extent, since its agents involve both data and knowledge, the general
137
problem of how agents with different types of expertise can collaborate to apply
their various expertise to the solution of a single problem has not been addressed.
As companies develop proprietary expert systems, and again w ith the network
capability for tying these together, the whole problem of higher-level knowledge
collaboration protocols represents a fruitful area for future research.
138
Bibliography
Ackland, Bryan, Alex Dickinson, et al. 1985 CADRE - A system of Cooperating
VLSI Design Experts. Proceedings of the ICCD.
Bancilhon, Francois and Raghu Ramakrishnan. 1988. An A m ateur’s Intro
duction to Recursive Query Processing Strategies. Readings in Artificial
Intelligence & Databases. John Mylopoulos and Michael L. Brodie (eds).
M organ Kaufmann Publishers, Inc., San Mateo, California.
Barr, Avron and Edward Feigenbaum. The Handbook of Artificial Intelligence,
Vol. I, Addison-Wesley Publishing Company, 1981.
Barr, Avron and Edward Feigenbaum. The Handbook of Artificial Intelligence,
Vol. II, Addison-Wesley Publishing Company, 1982.
Blanford, Ronald P. 1987. Adaptive Progressive Refinement. Technical Report
No. 87-07-05. Departm ent of Com puter Science, FR-35, University of
W ashington, Seattle, W ashington.
Bogart, Kenneth P. 1983. Introductory Combinatorics. Pitm an Publishing Inc.
Marshfield, Massachusetts.
Brachman, R .J. and James G. Schmolze. 1985. An Overview of the KL- ONE
Knowledge Representation System. Cognitive Science 9:171-216.
Brodie, Michael L. and Frank Manola. 1989. Database Management: A Sur
vey. Readings in Artificial Intelligence & Databases. John Mylopoulos
and Michael Brodie (eds). Morgan Kaufmann Publishers, Inc. San Mateo,
California.
Chang, Shi-Kuo and Jyh-Sheng Ke. 1987. Translation of Fuzzy Queries for
Relational Database Systems. IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol. PAMI-1, No. 3, July
Chen, Peter Pin-Shan. 1976. The Entity-Relationship Model - Toward a Uni
fied View of Data. ACM Transactions on Database Systems,Vol. 1, No. 1,
March.
139
Cohen, Paul R., and Edward Feigenbaum. 1982 The Handbook of Artificial
Intelligence Volume III, Addison-Wesley Publishing Company.
Cohen, Philip R. and Hector J. Levesque. 1987. Persistence, Intention, and
Commitment. SRI International Technical Note 415.
Conklin, Jeff. 1981. Hypertext: An Introduction and Survey. Computer,
(September), Vol 20, No. 9:pp 17-42.
Corkill, Daniel D., Kevin Q. Gallagher, and Kelly E. Murray. 1986. GBB: A
Generic Blackboard Development System. D epartm ent of Com puter and
Information Science, University of M assachusetts, Am herst.
Crawley, Peter and Robert P. Dilworth. 1973. Algebraic Theory of Lattices.
Prentice-Hall, Inc., Englewood Cliffs, New Jersey.
Date, C. J. 1984. An Introduction to D atabase Systems, Volume II. Addison-
Wesley Publishing Company, Menlo Park, CA.
Davis, Randall and Douglas B. Lenat. 1982. Knowledge-Based Systems in
Artificial Intelligence. McGraw-Hill International Book Company.
Dillon, William R. and M atthew Goldstein. 1984. M ulivariate Analysis: M eth
ods and Applications. John Wiley & Sons, New York.
Dozier, Jeff. 1990. Looking Ahead to EOS: The E arth Observing System.
Computers in Physics. M ay/June.
Earnst, George W. and Allen Newell. 1969. GPS: A Case Study in Generality
and Problem Solving, Academic Press.
Eicher, David J. 1989. Deep-Sky Observing W ith Small Telescopes. Enslow
Publishers, Inc., Hillside, New Jersey.
Erman, Lee d., Frederick Hayes-Roth, Victor Lesser and D. Raj Reddy. 1980.
The Hearsay-II Speech Understanding System: Integrated Knowledge to
Resolve Uncertainty. ACM Computing Surveys, (June). Association for
Computing Machinery.
Farinacci, M artha, Mark S. Fox, Ingemar Hulthage and Michael D. Rychener.
1985 The Development of ALADIN - An Expert System for Aluminum
Alloy Design. Robotics Institute, Carnegie- Mellon University, Pittsburgh,
PA.
Fikes, Richard E., Peter E. Hart and Nils J. Nilsson. "Learning and Executing
Generalized Robot Plans,” Artificial Intelligence 3: 251-288, 1972.
140
Gallagher, Kevin Q, Danial D. Corkill and Philip M. Johnson. 1988. GBB
Reference Manual - GBB Version 1.2. COINS Technical Report 88-66.
Gallaire, Herv, Jack Minker and Jean-M arie Nicolas. 1984. Logic and D ata
bases: A Deductive Approach. ACM Computing Surveys, 16:2 (June).
Genesereth, M., ” An overview of meta-level architecture,” In Proceedings of the
Third Annual National Conference on Artificial Intelligence, AAAI, 1983.
Gross, Daniel. 1988 Applications of semantic networks in database search Sz
retrieval. Expert Systems, Vol. 5, No. 2, (May)
Goldstein P., and D.G. Bobrow. 1980. Descriptions for a Program m ing Envi
ronment. Proceedings of the First Annual Conference National Conference
on Artificial Intelligence. The American Association for Artificial Intelli
gence (AAAI-80). Stanford, CA, August 1980.
Hayes-Roth, Frederick, Donald A. W aterman and Douglas B. Lenat. 1983.
Building Expert System. Addison-Wesley Publishing Company, Inc.
Hinke, Thomas H. 1988a. Inference Aggregation Detection in D atabase M an
agement systems. Proceedings, 1988 IEEE Symposium on Security and
Privacy. The Computer Society of the IEEE, (Oakland).
Hinke, Thomas H. 1988b. Database Inference Engine Design Approach. Pro
ceedings of The 1988 Workshop on Database Security. IFIP Working Group
11.3 (Database Security), Kingston, Ontario, Canada (October).
Hofstadter, Douglas R. 1979. Goedel Escher, Bach: An Eternal Golden Braid.
Vintage Books. New York.
Hull, Richard and Roger King. 1987. Semantic Database Modeling: Survey,
Applications, and Research Issues. ACM Computing Surveys. (Septem
ber) , Association for Computing Machiner.
Imielinski, Tomas. 1988. Intelligent Query Answering in Rule Based Systems.
In Foundations of Deductive Databases and Logic Program ming. Jack
Minker (ed.). Morgan Kaufman Publishers, Inc. Loas Altos, California.
Johannes, James D. 1984 Judgmental-Knowledge Bases: Problem Solving and
Expert Systems. International Conference on D ata Engineering. Los An
geles, Ca.
Johnson, W. Lewis, et. al. The Knowledge-Based Specification Assistant: Final
Report. USC/Information Sciences Institute, M arina del Rey, September
1988.
141
Karkoschka, E. The Observer’s Sky Atlas. 1990. Springer-Verlag, Berlin.
Kellogg, Charles. 1982. Knowledge Management: A Practical Amalgam of
Knowledge and D ata Base Technology. In Proceedings of the American
Association for Artificial Intelligence 82 Conference (Pittsburgh, Pa. Aug.).
King, Jonathan J. 1980. Intelligent Retrieval Planning. Proceedings of the First
National Conference on Artificial Intelligence. Stanford University.
K nuth, Donald E. 1973. Sorting and Searching. Addison-Wesley Publishing
Company, Reading, Massachusetts.
Laird, John E., Allen Newell and Paul S. Rosenbloom. 1987. SOAR: Ar
chitecture for General Intelligence. Artificial Intelligence Yol 33 No. 1
(September) : 1 - 64.
Laird, John, Paul Rosenbloom and Allen Newell. 1986. Universal Subgoaling
and Chunking. Kluwer Academic Publishers, Boston.
Langley, Russel. 1971. Practical Statistics for Non-mathematical People. Drake
Publishers, New York.
Lederberg, Joshua and K either Uncapher. 1989. Towards a National Collabo-
ratory. Report of the Invitational Workshop at the Rockefeller University.
March.
Lehnert, Wendy G. 1978. The Process of Question Answering: A Computer
Simulation of Cognition. Lawrence Erlbaum Associates, Publishers. Hills
dale, New Jersey. D istributed by the Halsted Press Divisionof John Wiley
& Sons, New York.
Lenat, D.B. 1975, Beings: Knowledge as interacting experts. Proceedings 4th
International Joint Conference on Artificial Intelligence.
Lenat, D.B. 1979. On A utom ated Scientific Theory Formation: A Case Study
using the AM Program . In Machine Intelligence 9 - Machine Expertise
and the Hum an Interface, edited by J. E. Hayes, Donald Michie and L.I.
Mikulich. Halsted Press: a division of John Wiley & Sons, New York.
Lenat, Douglas B. 1983. The Role of Heuristics in Learning by Discovery: Three
Case Studies. In Machine Learning - An Artificial Intelligence Approach,
Ryszard S. Michalski, Jaim e G. Carbonell and Tom M. Mitchell (eds).
Morgan Kaufman Publishers, Inc., Los Altos, CA.
Lenat, Doug, Mayank Prakash and Mary Shepherd. 1986. CYC: Using Com
mon Sense Knowledge to Overcome Brittleness and Knowledge Acquisition
Bottlenecks. The AI Magazine., 6, no. 4 pp. 65-85.
142
Lesser, Victor R. and Daniel D. Corkill. 1983. The Distributed Vehicle Mon
itoring Testbed: A Tool For Investigating Distributed Problem Solving
Networks. The AI Magazine.
Lesser, Victor and Lee Erm an. 1977. A Retrospective View of the Hearsay-
II Architecture. Proceeding, International Joint Conference on Artificial
Intelligence, (August).
Lesser, Victor R. 1983. The D istributed Vehicle Monitoring Testbed: A Tool For
Investigating D istributed Problem Solving Networks. The AI Magazine,
Fall.
Levesque, Hector J. 1986. Knowledge Representation and Reasoning. Annual
Review of Computer Science. Republished in Readings in Artificial Intel
ligence and Databases. John Mylopoulos and Michael L. Brodie. Morgan
Kaufmann Publishers, Inc. 1988.
McLeod, D.J. 1977. High level definition of abstract domain in a relational data
base system. Com puter Languages 2. 3 (July 1977), 61-73.
Menzel, Donald H. and Jay Pasachoff. 1983. Peterson Field Guide to Stars and
Planets. Houghton Mifflin Company, Boston.
Michalski, Ryszard S. 1980. Pattern Recognition as Rule-Guided Inductive In
ference. IEEE Transactions on P attern Analysis and Machine Intelligence,
Vol. PAMI-2, No. 4, July.
Michalski, Ryszard S. and Robert E. Stepp. 1983. Learning From Observation:
Conceptual Clustering. In Machine Learning: An Artificial Intelligence
Approach, Ryszard S. Michalski, Jaime G. Carbonell and Tom M. Mitchell.
Morgan Kaufmann Publishers, Inc., Loas Altos, CA.
M organstern, M atthew. 1984. Active Databases as a Paradigm for Enhanced
Computing Environments. Proceedings of the International Conference on
Very Large Databases.
M organstern, Matthey. 1987. Security and Inference in Multilevel Database and
Knowledge-Base Systems. ACM International Conference on Management
of D ata (SIGMOD-87), San Francisco.
M organstern, M atthew. 1988. Controlling Logical Inference in Multilevel D ata
base Systems. Proceedings of the 1988 IEEE Symposium on Security and
Provacy, Oakland, California.
M otro, Amihai. 1986. BAROQUE: A Browser for Relational Databases. ACM
Transactions on Office Information Systems, Vol. 4, No. 2, pp. 164-181.
143
M otro, Arnihai. 1986b. Supporting Goal Queries in Relational Databases.
Proceedings First International Conference on Expert Database Systems,
Kiawah Island, South Carolina.
Motro, Amihai. 1988. FLEX: A Tolerant and Cooperative User Interface to
Databases. Com puter Science Departm ent, University of Southern Califor
nia, Los Angeles, CA.
Parzen, Emanuel. 1960. Modern Probability Theory and Its Applications. John
Wiley & Sons, New York.
Pearl, Judea. 1988. Probabilistic Reasoning in intelligent Systems: Networks
of Plausible Inference. M organ Kaufman Publishers, Inc. San Mateo, Cal
ifornia.
Ram akrishnan, Raghu and Avi Silberschatz. 1985. The MR Diagram - A Model
for Conceptual Database Design. Proceedings of Very Large Database Con
ference. Stockholm; pp 376-393.
Rich, Charles and Richard W aters. 1988a. Automatic Programming: M yths
and Prospects. Com puter (August):40-51.
Rich, Elaine. 1983. Artificial Intelligence. McGraw-Hill Book Company.
Rich, Charles and Richard Waters. 1988b. The Program m er’s Apprentice: A
Research Overview. Com puter (November): 10-25.
Robertson, G., D. McCracken and A. Newell. 1981. The ZOG approach to man-
machine communications. International Journal Man- Machine Studies.
Vol. 14:461-488.
Sacerdoti, Earl D. 1974. Planning in a Hierarchy of Abstraction Spaces. A rti
ficial Intelligence, 5: 115-132.
Salton, Gerald and Michael J. McGill. 1983. Introduction to Modern Informa
tion Retrieval. McGraw-Hill Book Company. New York.
Schank, Roger and Robert Abelson. 1977. Scripts Plans Goals and Understand
ing. Lawrence Erlbaum Associates, Publishers. Hillsdale, New Jersey.
Scheid, John and Steve Holtsberg. 1989. Ina Jo Specification Langauge Ref
erence M anual. Technical Report TM- 6021/001/05. Unisys Corporation,
Culver City, CA.
Schwartz, Michael F., ”The Networked Resource Discovery Project: Goals, De
sign, and Research Efforts,” CU-CS-387-88, Departm ent of Com puter Sci
ence, University of Colorado, Boulder, CO, May 1988.
144
Shasha, Dennis. 1985. NetBook - a data model to support knowledge explo
ration. Proceedings of Very Large Database Conference. Stockholm; pp
418-425
Sherrod, P. Clan with Thomas L. Koed. 1981. A Complete Manual of Amateur
Astronomy. Prentice Hall Press, New York.
Shipman, David W. 1981. The Functional D ata Model and the D ata Language
DAPLEX. ACM Transactions on Database System. Association for Com
puting Machinery.
Simon, Herbert A. 1983. Why Should Machines Learn. In Machine Learning -
An Artificial Intelligence Approach. Ryszard S. Michalski, Jaim e G. Car-
bonell and Tom M. Mitchell (eds.). Morgan Kaufman Publishers, Inc. Los
Altos, CA.
Smith, Reid G. 1980. The Contract Net Protocol: High-Level Communica
tion and Control in a Distributed Problem Solver. IEEE Transaction on
Computers, Vol. C-29, December.
Smith, Reid G. 1980. The Contract Net Protocol: High-Level Communica
tion and Control in a Distributed Problem Solver. IEEE Transaction on
Computers, Vol. C-29, December.
Smith Reid G., and Randall Davis. 1981. Frameworks for Cooperation in
Distributed Problem Solving. IEEE Transactions on Systems, M an, and
Cybernetics. January.
Smith, John Miles. 1986. Expert Database Systems: A D atabase Perspective.
Proceedings on the First International Conference Workshop.
Smith, Terence, Donna Peuquet, Sudhakar Menon and Pankaj Agarwal. 1987
KBGIS-II: A knowledge-based geographical information system. Interna
tional Journal of Geographical Information Systems.Vol. 1, No. 2, 149 -
172.
Star, Jeffrey and John Estes. 1990. Geographical Information Systems: An
Introduction. Prentice Hall.
Stefik, Mark. 1981a Planning with Constraints (M OLGEN:Part 1). Artificial
Intelligence. 111-140.
Stefik Mark. 1981b. Planning and M eta-Planning (M OLGEN:Part 2). Artificial
Intelligence. 141-170.
145
Stevens, S.S. 1946. “On the theory of scales of measurement” , Science, 103,
677-680.
Strepp, Robert E. Ill and Ryszard S. Michalski. 1986. Conceptual Clustering:
Inventing Goal-Oriented Classifications of Structured Objects. In Machine
Learning Volumn II. Morgan Kaufmann Publishers, Inc. Loas Altos, CA.
Stonebraker, Michael and Joseph Kalash. 1982. Proceedings Eighth Interna
tional Conference on Very Large D ata Bases, Mexico City. (September),
1- 10.
Stonebraker, Michael. 1988. Readings in D atabase Systems. M organ Kaufman
Publishers, Inc. San Mateo, CA.
Suchman, Lucy A. 1987. Plans and Situated Actions - The problem of human
machine communication. Cambridge University Press.
Thompson, R.H. and W.B. Croft. 1989. Support for browsing in an intelli
gent text retrieval system. International Journal of Man- Machine Studies.
Volume 30,639-668.
Tou, F.N. , M.D. Williams, R. Fikes, A. Henderson, and T. Malone. 1982
RABBIT: An Intelligent Database Assistant. Proceedings of the National
Conference on Artificial Intelligence. Pittsburgh, (August).
Ullman, Jeffrey D. 1988. Principles of D atabase and Knowledge - Base Systems,
Volume 1. Computer Science Press.
Wakerly, John F. 1990. Digital Design Principles and Practices. Prentice Hall,
Englewood Cliffs, New Jersey.
Wiederhold, Gio. 1984. Knowledge and Database Management. IEEE Soft
ware, (January).
146
Appendix A
QBC Database
This appendix includes the deepspace database th at the QBC research targeted.
The deep space database contains the first 35 Messier objects th at are either
nebulas or galaxies, plus a selection of double stars. This information was taken
from (Menzel and Pasachoff 1983) and (Karkoschka 1990).
Due to page size lim itations, the Nebula, Galaxy and Double Star databases
are each shown as two databases, although in the QBC implementation they axe
each a single database.
The Nebula database consists of all the Messier objects th at are nebulas
with Messier numbers less than 43 plus all of the Messier object th at are in the
Northern Sky (e.g. North of about +30 Dec)
The Nebula database is contained in table A .l and table A.2. In the QBC
implementation some of these fields are divided into two fields. The RA field is
divided into RA-hr and RA-min and the Dec field is divided into Dec-deg and
Dec-min. In addition, the first num ber in the m agnitude field of table A.2 is
called the m agnitude and the second number is called the surface brightness in
the schema used in the QBC code. In addition, types and shapes are divided
into distinct fields based on whether the nebula is of type diffuse or planetary.
The Galaxy information for the QBC Galaxy database is contained in ta
ble A.3 and table A.4. In the QBC code, the RA field is divided into an RA-hr
and RA-min field. Likewise, the Dec field is divided into a Dec-deg and Dec-min
field.
The double star database was derived by taking the first binary star entry
for each constellation group from those listed in (Menzel and Pasachoff 1983). It
is shown in Table A.5 and Table A.6. It will be noted in the QBC listing that
some of the fields have been subdivided into two fields. Also, due to limitations
of the page size, this database has been divided into two databases. In the QBC
implementation these are combined into a single database.
147
Table A .l: Nebula Database 1
N e b u la D a ta b a s e 1
Messier NGC RA Dec
M l 1952 05 34.5 +22 01
M8 6523 18 03.7 -24 23
M16 6611 18 18.9 -13 47
M17 6618 18 20.8 -16 10
M20 6514 18 02.4 -23 02
M27 6853 19 59.6 +22 43
M42 1976 05 35.3 -05 23
M43 1982 05 35.5 -05 16
M57 6720 18 53.6 +33 02
M76 650 01 42.2 +51 34
M97 3587 11 14.9 +55 01
Table A.2: Nebula Database 2
N e b u la D a ta b a s e 2
M NGC Mag Size Shape Type Distance
M l 1952 9-9 6 Fi 4 Dif 4000
M8 6523 5-10 60 Em4 Diff 5000
M16 6611 6-9 25 Em2 Diff 6000
M17 6618 6-10 40 Em2 Diff 5000
M20 6514 7-10 25 Em2 Diff 5000
M27 6853 8-8 8 A 4 Pian 1000
M42 1976 4-8 40 Em3 Diff 1500
M43 1982 9-10 12 Em3 Diff 1500
M57 6720 9-6 1.5 R 3 Plan 2000
M76 650 11-9 2.5 A 5 Plan 5000
M97 3587 11-10 3 D 0 Plan 3000
Table A.3: Galaxy D atabase 1:
G a lax y D a ta b a s e 1
Messier NGC RA Dec
M31 224 00 42.7 +41 16
M32 221 00 42.7 +40 52
M33 598 01 33.8 +30 39
M51 5194 13 29.9 +47 12
M63 5055 13 15.8 +42 02
M77 1068 02 42.7 -00 01
M81 3031 09 55.8 +69 04
M82 3034 09 56.2 +69 42
M94 4736 12 50.9 +41 07
M101 5457 14 03.5 +54 21
M102 5866 references M101
M106 4258 12 19.0 +47 18
M108 3556 11 11.6 +55 40
M109 3992 11 57.7 +53 22
MHO 205 00 40 +41.7
Table A.4: Galaxy Database 2
G a la x y D a ta b a s e 2
Messier NGC M agnitude Surface Brightness Size Distance Shape
M31 m m X 3.7 10 10’ 2.5 E 5
M32 221 8.5 8 4’ 2.5 E 2
M33 598 6 11 50’ 3 Sc 4
M51 5194 8.5 10 10’ 20 Sb 2
M63 5055 9 10 10’ 20 Sb 5
M77 1068 9 8 4’ 50 Sb 2
M81 3031 7 9 20’ 10 Sb 5
M82 3034 8.5 9 10’ 10 Ir 6
M94 4736 8.5 9 6’ 20 Sb 3
M101 5457 8 11 20’ 40 Sc 1
M102 5866 10.5 9 3’ 30 L 6
M106 4258 8.5 10 12’ 20 Sb 6
M108 3556 10 9 8’ 30 SC 8
M109 3992 10 10 6’ 40 Sb 4
MHO 205 9 10 10’ 2.5 E 5
150
Table A.5: Double Star Database 1
D o u b le S ta r D a ta b a se
ADS Name R.A . Dec. Mags. P.A. Sep.
558 55 Psc 00 39.9 +21 26 5.5-8.7 194 6.6
561 alpha Cas 00 40.5 +56 32 2.2-8.9 63
940 phi And 01 09.5 +47 15 4.6-5.S 133 .5
1507 gamma Ari 01 53.5 +19 18 4.6~4.7 0 7.8
1697 iota Tri 02 12.4 +30 18 5.3-6.9 71 3.9
1477 alpha UMi 02 31.8 +89 16 2.0-8.9 18
2080 gamma Cet 02 43.3 +03 14 3.6-6.2 297 2.8
2157 eta Per 02 50.7 +55 54 3.8-8.5 28
— theta Eri 02 58.3 -40 18 3.2-4.3 7.4
2888 epsilon Per 03 57.9 +40 01 2.9-8.0 9 9.0
3137 phi Tau 04 20.4 +27 21 5.1-8.7 50
3823 beta Ori 05 14.5 -08 12 0.2-6.7 206 9.2
4566 theta Aur 05 59.7 +37 13 2.6-7.1 320 3.0
5107 beta Mon 06 28.8 -07 02 4.6-5.1 132 7.2
5400 12 Lyn 06 46.2 +59 27 5.4-6.0 74 1.7
5423 alpha CMa 06 45.2 -16 43 -1.5-8.5 5 4.5
5983 delta Gem 07 20.1 +21 59 3.6-8.2 223 5.9
6988 iota Cnc 08 46.7 +28 46 4-6.6 307 30.4
7203 sigma UMa 09 10.4 +67 08 4.9-8.2 357 3.6
7724 gamma Leo 10 20.0 +19 51 2.2-3.5 124 4.4
8489 2 CVn 12 16.1 +40 40 5.9-9.0 260 11.5
8531 17 Vir 12 22.5 +05 18 6.S-8.6 337 20.6
— _
alpha Cru 12 26.6 -63 06 1.6-2.1 114 4.7
8630 Gamma Vir 12 41.7 -01 27 3.5-3.5 287 3.0
-------------
alpha Cen 14 39.6 -60 50 0-1.2 214 19.7
Table A.6: Double Star D atabase 2
D o u b le S ta r D a ta b a s e
ADS Name R.A . Dec. Mags. P.A. Sep.
558 55 Psc 00 39.9 +21 26 5.5-8.7 194 6.6
9338 pi Boo 14 40.7 +16 25 4.9-5.8 108 5.7
9375 54 Hya 14 46 -25 27 5.2-7.2 126 8.8
9617 eta CrB 15 23.2 +30 17 5.6-S.9 27 1.0
9701 delta Ser 15 34.8 +10 32 4.1-5.2 179 3.9
9909 xi Sco AB 16 04.4 -11 22 4.9-4.9 44 0.7
10087 lam bda Oph 16 30.9 +01 59 4.2-5.2 22 1.5
10157 zeta Her 16 41.3 +31 36 2.9-5.5 83 1.6
10345 mu Dra AB 17 05.3 +54 28 5.7-5.7 25 1.9
11635 epsilon Lyr 18 44.3 +39 40 5-6.1 353 3.7
12540 b eta Cyg 19 30.7 +27 58 3.2-5.4 54 34.4
13632 alpha Cap 20 17.6 -12 31 4.3-9.0 221 45.5
14279 gamma Del 20 46.7 +16 08 4.3-5.2 268 9.8
15032 beta Cep 21 28.6 +70 34 3.2-7.8 250 13.7
15971 zeta Aqr 22 28.8 -00 01 4.3-4.5 207 1.9
Appendix B
Terms
This section is a glossary of term s used in QBC.
A-caps - Activity capabilities represent knowledge of an entity and its associated
attribute. An A-cap represents data th at can be retrieved from a database.
AFS - Agent first set. The represents the set of partial clusters and associated
entity instance key in which at least one of the attribute cluster values is
from the first cluster (e.g. first quartile if four cluster per attribute are
used.
A-sol - Activity solicitation represents the desire for an entity and its assciated
attribute.
Activity - An action or state of being applied to an entity
AC - Activity cluster is a cluster of APs
A ttribute Cluster Preference (ACP) - Indicates whether high values or low val
ues of an attribute are preferred by the current user.
AP - Activity Primitives. Each AP consists of one activity associated w ith one
object. An example AP is (view nebula), in which the activity view is
associated with the object nebula.
CEC - Consistent Entity Candidate - A hypothesis th at satisfies its context.
Cluster Independent r-frags - R-frags that are associated with components of
an AC, but are not specific to a particular AC.
Compound AC - An AC th at itself contains ACs as components.
D ata Dimension - The part of a world th at contains data, and if an associ
ated real dimension of the world exists, the data dimension describes data
necessary to support the activity described in the real dimension.
153
Dec - Declination
DKE - Dominance-based Knowledge Factoring
E-cap - Entity capability represents knowledge about the existence of a partic
ular entity.
EFS - Entity first set, the combined set of partial entity clusters subm itted by
each agent, the EFS is comprised of all of the AFSs subm itted by the
various agents.
EIEG - Explicit initial, explicit goal state problem
EIMG - Explicit initial, m eta goal state problem
Entity - Describes some thing about which d ata is stored in a database. An
entity as attributeshat describe the characteristics of the entity.
E-sol - Entity solicitation, represents the desire for a particular entity.
Final Entity State - the state of the query elaboration prior to the imposition
of filtering and lattice-based clustering.
Lattice-based clustering (LBC) - A m ethod of clustering the query options based
on forming a lattice of the cartesian product of the individual attribute
clusters.
Query by Committee System - The QBC system is both the implementation
tth a t will be developed by this research and the approach to formulating
queries by committees consisting of a hum an and machines.
Query Committee - The query committee is the set of computer-based agents
th at are collaborating on a particular query elaboration.
Query Elaboration - Once the user has presented his query kernel to the QBC
system, the QBC system begins the task of transform ing this query ker
nel into the final query to be presented for information retrieval. This
transform ation process is called query elaboration.
Query Formulation - This term represents the ” how” process of developing a
query, including both the presentation of the query kernel by the data
consumer and the elaboration of the query by the QBC system.
Query Kernel - The query kernel is the initial fragment of a request for data
th at the system uses as the basis for elaborating the fragment into the final
request for data from the system.
154
RA - Right assension
Real Dimension - The part of a world th at contains descriptions about activities
in the real world such as the act of viewing a particular nebula or being in
a particular city.
R-frag - a requisition fragment, which is a portion of a requisition.
Requisition - Contains the data required to support the activity described by
the state primitives in the associated real dimension.
Simple AC - An AC th at does not contain other ACs, it contains only APs,
Universe - T hat which incorporates all data th at is associated with a particular
QBC query elaboration session. A universe contains one or more worlds.
User - The hum an user of the QBC system. The user is the person w ith the
problem, who seeks to apply the QBC system to finding the necessary
information to solve his problem. Thus, the user is also called the problem
owner or the data consumer.
Worlds - Provide a means of encapsulating a description of some real world
activity.
W -relationship - World relationship th at relates the state predicates of one world
to the state predicates of another world.
155
Appendix C
QBC Code
;;; This file contains the Query by Committee System
(in-package "USEE")
(setf *print-level* nil) ;required to print on pollux
;;; Major structures used in QBC
;;; Terms are used for entities, activity primitives (APs),
;;; and world-to-world relationships. For entities, the
;;; context represents a list of attributes,
;;; for APs an entity and an activity, for world-to-world
;;; relationships, a set of APs.
(defstruct term
(ttype nil); e = entity, ap = AP, w = w-to-w,
;ac = activity, f-sol-user
(unspecified nil)
(parent nil) ;reference to dimension which references this term
(current nil); rank of current best syn— the number not the syn
(final nil) ; highest rank number which is best ranked item
(synl nil); populated with structures of type syn
(req nil) ;the requisition, data entities for capabilities,
;solicitations for APs, or transformations for AP to AP
;relationship
(hypothesis nil)) ; populated with structures of type hypoth
;;; very general synonym
(defstruct syn
(name)
(clusters nil) ; represents highest, 75k, 50%, 25% and low values
(value nil)
(reference nil) ;this is the term to which the quality is relative
(context nil) ;populated with structures of type term
(inherit nil); Holds set of syn names that can propagate to this syn
(propagate nil); Holds set of syn names to which this syn can
;propagate attributes and r-frags
;; Syns that are near each other will be on the same synl, but will not
;; inherit anything or propagate anything between thenselves
(pedigree nil)
(rank nil)
(relation nil) ; hold the relation which this syn represents
; this is calculated from the data, thus may not be needed
(valuehigh nil)
(valuelow nil))
;;; hypothesis
(defstruct hypoth
(name) ; populated with name
(context nil)) ;populated with structures of type term
;;; requisition fragment
(defstruct rfrag
(terms nil); list of terms posted from agents
(values nil); list of values posted from agent databases
(esynl nil) ;list of entity syns used in the match for this frag
;to be added
(asynl nil));list of activity syns used in the match for this frag
;to be added
;;; database relation
(defstruct dbrel
(name)
(schema nil)
(nonkey nil)
157
(meta nil)
(db nil))
;;; Global lists, at least one for each dimension
(defvar *pdd* nil) ;primary world data dimension terms
(defvar *prd* nil) ;primary world real dimension terms
(defvar *srd* nil) ;secondary world real dimension terms
;;; Astronomical Data
;This relation combines galaxyl and galaxy2 into a single relation.
(setf ?galaxyla (make-dbrel
:name ’galaxy
:schema (list ’messier 'ngc ’ra-hr
’ra-min ’dec-deg ’dec-min 'magnitude
’surface-brightness ’size
’distance ’shape)
:nonkey 6
;; indicate the column where the non-key field begins
:meta (list ’key ’key ’key ’key ’key ’key #’<
#•< #’> #*> ’(S E L I))
;; meta contains information on which attributes are keys and for non-key
;; attribute, the activity cluster preference, indicated with the
;; relationship that a high quartile member is to have with a lower
;; quartile member
;; key indicates a key field which will not be part of the cluster
:db (list
’(M31 224 00 42.7 +41 16 3.5 10 10 2.5 E)
'(M32 221 00 42.7 +40 52 8.5 8 4 2.5 E)
’(M33 598 01 33.8 +30 39 6 11 50 3 S)
158
’(ME1 5194 13 29.9 +47 12 8.5 10 10 20 S)
’(M63 5055 13 15.8 +42 02 9 10 10 20 S)
’(M77 1068 02 42.7 -00 01 9 8 4 50 S)
'(M81 3031 09 55.8 +69 04 7 9 20 10 S)
*(M82 3034 09 56.2 +69 42 8.5 9 10 10 I)
’(M94 4736 12 50.9 +41 07 8.5 9 6 20 S)
’(M101 5457 14 03.5 +54 21 8 11 20 40 S)
’(M102 5866 15 06.5 +55 48 10.5 9 3 30 L)
’(M106 4258 12 19.0 +47 18 8.5 10 12 20 S)
’(M108 3556 11 11.6 +55 40 10 9 8 30 S)
’(M109 3992 11 57.7 +53 22 10 10 6 40 S)
* (MHO 205 00 40 +41 42 9 10 10 2.5 E))))
(setf Tgalaxyl (make-dbrel
:name ’galaxy
•.schema (list ’messier ’ngc ’ra-hr
*ra-min ’dec-deg ’dec-min ’magnitude)
monkey 6
;; indicate the column where the non-key field begins
meta (list 'key ’key ’key 'key ’key ’key #’>)
;; meta contains information on which attributes are keys and for non-
;; attribute, the activity cluster preference, indicated with the
;; relationship that a high quartile member is to have with a lower
;; quartile member
;; key indicates a key field which will not be part of the cluster
:db (list
’(M31 224 00 42.7 +41 16 3.7)
’(M32 221 00 42.7 +40 52 8.5)
*(M33 598 01 33.8 +30 39 5.9)
’(M51 5194 13 29.9 +47 12 8.4)
’ (M63 5055 13 15.8 +42 02 8.8)
’(M77 1068 02 42.7 -00 01 9.1)
'(M81 3031 09 55.8 +69 04 6.9)
’(M82 3034 09 56.2 +69 42 8.7)
’(M94 4736 12 50.9 +41 07 8.1)
’(M101 5457 14 03.5 +54 21 8.1)
’(M106 4258 12 19.0 +47 18 9)
’(M108 3556 11 11.6 +55 40 10.5)
’ (M109 3992 11 5 7 .7 +53 22 1 0 .6 )
'(MHO 205 00 40 +41 42 9 ) ) ) )
(set! ?galaxy2 (make-dbrel
:name ’galaxy
:schema (list ’messier 'ngc 'magnitude
’surface-brightness 'size
’distance ’shape)
:nonkey 2
:meta (list ’key ’key #’< #’< #’> #’> '(S E L I))
; * , key indicates a key field which will not be part of the cluster
:db (list
’ (M31 224 3.5 10 10 2.5 E)
' (M32 221 8.5 8 4 2.5 E)
’ (M33 598 6 11 50 3 S)
’ (MSI 5194 8.5 10 10 20 S)
’ (M63 5055 9 10 10 20 S)
’ (M77 1068 9 8 4 50 S)
’ (M81 3031 7 9 20 10 S)
’ (M82 3034 8.5 9 10 10 I)
’ (M94 4736 8.5 9 6 20 S)
’(M101 5457 8 11 20 40 S)
'(M102 5866 10.5 9 3 30 L )
’(M106 4258 8.5 10 12 20 S)
’(M1Q8 3556 10 9 8 30 S)
’(M109 3992 10 10 6 40 S)
’(MllO 205 9 10 10 2.5 E))))
(setf ?nebulala (make-dbrel
:name ’nebula
:schema (list ’messier ’ngc ’ra-hr
’ra-min ’dec-deg ’dec-min ’magnitude
* surface-brightness ’size
’plan-shape ’diff-shape
'plan ’diff ’distance)
:nonkey 6
;; indicate the column where the non-key field begins
:meta
(list ’key ’key 'key 'key ’key ’key #’< #’< #’>
’(R D A 0) ’(Em Fi R 0) ’(P P P 0) ’(D D D 0) #’>)
;; meta contains information on which attributes are keys and for non-key
160
;; attribute, the activity cluster preference, indicated with the
;; relationship that a high quartile member is to have with a lower
;; quartile member
;; key indicates a key field which will not be part of the cluster
:db (list
’(Ml 1952 05 34.5 +22 01 9 9 €> 0 Fi 0 1 4000)
’(M8 6523 18 03.7 --24 23 5 10 60 0 Em 0 1 5000)
’(M16 6611 18 18.9 -13 47 6 9 25 0 Em 0 1 6000)
’ (M17 6618 18 20.8 -16 10 6 10 40 0 Em 0 1 5000)
’(M20 6514 18 02.4 -23 02 7 10 25 0 Em 0 1 5000)
’(M27 6853 19 59.6 +22 43 8 8 8 A 0 1 0 1000)
’(M42 1976 05 35.3 -05 23 4 8 40 0 Em 0 1 1500)
’(M43 1982 05 35.5 -05 16 9 10 12 0 Em 0 1 1500)
’(M57 6720 18 53.6 +33 02 9 6 1.5 E 0 1 0 2000)
’(M76 650 01 42.2 +51 34 11 9 2.5 A 0 1 0 5000)
'(M97 3587 11 14.9 +55 01 11. 10 3 D 0 1 0 3000))))
(setf ?nebulai (make-dbrel
;name ’nebulal
:schema (list 'messier 'ngc 'ra-hr
'ra-min 'dec-deg 'dec-min )
monkey 2
;; indicate the column where the non-key field begins
rmeta (list 'key 'key 'key ’key ’key ’key #’<)
;; meta contains information on which attributes are keys and for non-
;; attribute, the activity cluster preference, indicated with the
;; relationship that a high quartile member is to have with a lower
;; quartile member
;; key indicates a key field which will not be part of the cluster
:db (list
’(Ml 1952 05 34.5 +22 01)
’(M8 6523 18 03.7 -24 23)
'(M16 6611 18 18.9-13 47)
’(M17 6618 18 20.8 -16 10)
’(M20 6514 18 02.4 -23 02)
'(M27 6853 19 59.6 +22 43)
’(M42 1976 05 35.3 -05 23)
’(M43 1982 05 35.5 -05 16)
’(M57 6720 18 53.6 +33 02)
’ (M76 650 01 4 2 .2 +51 34)
’ (M97 3587 11 1 4 .9 +55 0 1 ) ) ) )
(setf ?nebula2 (make-dbrel
:name ’nebula2
:schema (list ’messier ’ngc ’magnitude
’surface-brightness ’size
’plan-shape ’diff-shape
’plan ’diff ’distance)
:nonkey 2
;; indicate the column where the non-key field begins
:meta
(list ’key ’key #*< #'< #’>
*(R D A 0) ’(Em Fi R 0) ’(P P P 0) ’(D D D 0) #’>)
;; meta contains information on which attributes are keys and for non-
;; attribute, the activity cluster preference, indicated with the
;; relationship that a high quartile member is to have with a lower
;; quartile member
;; key indicates a key field which will not be part of the cluster
:db (list
(Ml 1952 9 9 6 0 Fi 0 1 4000)
(M8 6523 5 10 60 0 Em 0 1 5000)
(M16 6611 6 9 25 0 Em 0 1 6000)
(M17 6618 6 10 40 0 Em 0 1 5000)
(M20 6514 7 10 25 0 Em 0 1 5000)
(M27 6853 8 8 8 A 0 1 0 1000)
(M42 1976 4 8 40 0 Em 0 1 1500)
(M43 1982 9 10 12 0 Em 0 1 1500)
(M57 6720 9 6 1.5 R 0 1 0 2000)
(M76 650 11 9 2.5 A 0 1 0 5000)
(M97 3587 11 10 3 DO:1 0 3000))))
(setf ?doublestarla (make-dbrel
:name ’double-star
:schema (list ’ads ’name ’ra-hr
’ra-min ’dec-deg ’dec-min ’min-magnitude
’max-magnitude 'pa ’sep)
;nonkey 6
;; indicate the column where the non-key field begins
:meta (list ’key ’key 'key ’key ’key ’key #’< #’<
# ’> # ’>)
; ; meta contains information on which attributes are keys and for non-key
;; attribute, the activity cluster preference, indicated with the
;; relationship that a high quartile member is to have with a lower
;; quartile member
;; key indicates a key field which will not be part of the cluster
:db (list
’(558 55-Psc 00 39.9 +21 26 5.5 8.7 194 6.6)
*(561 alpha-Cas 00 40.5 +56 32 2.2 8.9 0 63)
’(940 phi-And 01 09.5 +47 15 4.6 5.5 133 .5)
’(1507 gamma-Ari 01 53.5 +19 18 4.6 4.7 0 7.8)
’(1697 iota-Tri 02 12.4 +30 18 5.3 6.9 71 3.9)
’(1477 alpha-UMi 02 31.8 +89 16 2.0 8.9 0 18)
’(2080 gamma-Cet 02 43.3 +03 14 3.6 6.2 297 2.8)
’(2157 eta-Per 02 50.7 +55 54 3.8 8.5 0 28)
’(-3 theta-Eri 02 58.3 -40 .18 3.2 4.3 0 7.4)
’(2888 epsilon-Per 03 57.9 +40 01 2.9 8.0 9 9.0)
’(3137 phi-Tau 04 20.4 +27 21 5.1 8.7 0 50)
’(3823 beta-Ori 05 14.5 -08 12 0.2 6.7 206 9.2)
'(4566 theta-Aur 05 59.7 +37 13 2.6 7.1 320 3.0)
’(5107 beta-Mon 06 28.8 -07 02 4.6 5.1 132 7.2)
’(5400 12-Lyn 06 46.2 +59 27 5.4 6.0 74 1.7)
’(5423 alpha-CMa 06 45.2 -16 43 -1.5 8.5 5 4.5)
’(5983 delta-Gem 07 20.1 +21 59 3.6 8.2 223 5.9)
’(6988 iota-Cnc 08 46.7 +28 46 4 6.6 307 30.4)
’(7203 sigma-UMa 09 10.4 +67 08 4.9 8.2 357 3.6)
’(7724 gamma-Leo 10 20.0 +19 51 2.2 3.5 124 4.4)
’(8489 2-CVn 12 16.1 +40 40 5.9 9.0 260 11.5)
’(8531 17-Vir 12 22.5 +05 18 6.5 8.6 337 20.6)
’(-2 alpha-Cru 12 26.6 -63 06 1.6 2.1 114 4.7)
’(8630 Gamma-Vir 12 41.7 -01 27 3.5 3.5 287 3.0)
’(-1 alpha-Cen 14 39.6 -60 50 0 1.2 214 19.7)
’(558 55-Psc 00 39.9 +21 26 5.5 8.7 194 6.6)
’(9338 pi-Boo 14 40.7 +16 25 4.9 5.8 108 5.7)
’(9375 54-Hya 14 46 -25 27 5.2 7.2 126 8.8)
’(9617 eta-CrB 15 23.2 +30 17 5.6 5.9 27 1.0)
’(9701 delta-Ser 15 34.8 +10 32 4.1 5.2 179 3.9)
’(9909 xi-Sco-AB 16 04.4 -11 22 4.9 4.9 44 0.7)
’(10087 lambda-Oph 16 30.9 +01 59 4.2 5.2 22 1.5)
’(10157 zeta-Her 16 41.3 +31 36 2.9 5.5 83 1.6)
’(10345 mu-Dra-AB 17 05.3 +54 28 5.7 5.7 25 1.9)
’(11635 epsilon-Lyr 18 44.3 +39 40 5 6.1 353 3.7)
163
■(12540 beta-Cyg 19 30.7 +27 58 3.2 5.4 54 34.4)
’(13632 alpha-Cap 20 17.6 -12 31 4.3 9.0 221 45.5)
’(14279 gamma-Del 20 46.7 +16 08 4.3 5.2 268 9.8)
*(15032 beta-Cep 2128.6 +70 34 3.2 7.8 250 13.7)
’(15971 zeta-Aqr 22 28.8 00 01 4.3 4.5 207 1.9)
)))
;;; Entities with no attributes
;;; % % % % % % % % % % % % % % % % % % % % i % % v m % % % % i % m % v m % % % % % % i % % % % % % % m t % % % % m
(setf ?astro-location-name (make-term
:ttype ’e
:aynl (list (make-syn :name ’astro-location-name))))
(setf Towner (make-term
:ttype 'e
:synl (list (make-syn :name ’owner))))
(setf Tmeeting (make-term
:ttype ’e
:synl (list (make-syn :name ’meeting))))
(setf Telectronic-meeting
(make-term
:ttype ’e
:synl (list (make-syn :name ’electronic-meeting))))
(setf Tbulletin-board (make-term
:ttype * e
:synl (list (make-syn :name ’bulletin-board))))
(setf ?news-group (make-term
:ttype ’e
:synl (list (make-syn :name ’news-group))))
(setf Ttelephone-hotline
(make-term
:ttype ’e
:synl (list (make-syn :name ’telephone-hotline))))
164
(setf ?utility (make-term
:ttype ’e
:synl (list (make-syn :name ’utility))))
(setf ?site (make-term
:ttype ’e
:synl (list (make-syn :name ’site))))
(setf ?museum (make-term
:ttype 'e
:synl (list (make-syn :name ’museum))))
(setf ?planetarium (make-term
:ttype ’e
:eynl (list (make-syn :name ’planetarium))))
(setf ?observatory (make-term
:ttype ’e
:synl (list (make-syn :name ’observatory))))
(setf ?ngc (make-terra
:ttype 'e
:synl (list (make-syn :name ’ngc))))
(setf ?messier (make-term
:ttype 'e
:synl (list (make-syn :name ’messier))))
(setf ?ra (make-term
:ttype ’e
:synl (list (make-syn :name ’ra)
(make-syn :name ’ra-hr))))
(setf ?ra-min (make-term
:ttype ’e
:synl (list (make-syn :name ’ra-min))))
;; Current limitations of the implementation makes
;; the order of the syns critical. If
;; ra-hr is first, experiment 4 and 10 v/ill have
;; different results.
(setf ?ra-hr (make-term
:tt y p e ’e
:s y n l ( l i s t (m ake-syn :name ’ra )
(m ake-syn :name ’r a - h r ) ) ) )
(setf ?dec (make-term
:ttype ’e
:synl (list (make-syn :name ’dec)
(make-syn :name ’dec-deg))))
;; Current limitations of the implementation makes
;; the order of the syns critical. If
;; dec-deg is first, experiment 4 and 10 will have
;; different results.
(setf ?dec-deg (make-term
:ttype 'e
:synl (list (make-syn :name ’dec)
(make-syn :name ’dec-deg))))
(setf ?dec-min (make-term
:ttype *e
:synl (list (make-syn :name ’dec-min))))
(setf ?magnitude (make-term
:ttype 'e
:synl (list (make-syn :name ’magnitude))))
(setf ?size (make-term
:ttype 'e
:synl (list (make-syn :name ’size))))
(setf ?type (make-term
:ttype 'e
:synl (list (make-syn :name ’type))))
(setf ?shape (make-term
rttype ’e
:synl (list (make-syn :name ’shape))))
( s e t f ?d is ta n c e (make-term
:tty p e ’e
:s y n l ( l i s t (m ake-syn :name ’d is t a n c e ) ) ) )
(setf ?ads (make-term
:ttype ’e
:synl (list (make-syn :name ’ads))))
(setf ?name (make-term
:ttype ’e
:synl (list (make-syn :name ’name))))
(setf ? surf ace-brightne ss (make-term
:ttype ’e
:synl (list (make-syn :name ’surface-brightness
:inherit (list ’magnitude))
(make-syn :name ’magnitude))))
(setf ?magnitudel (make-term
:ttype ’e
:synl (list (make-syn :name 'magnitudei
:inherit (list ’magnitude))
(make-syn :name ’magnitude))))
(setf ?magnitude2 (make-term
:ttype ’e
:synl (list (make-syn :name ’magnitude2
:inherit (list ’magnitude))
(make-syn :name ’magnitude))))
(setf ?pa (make-term
:ttype ’e
:synl (list (make-syn :name ’pa))))
(setf ?sep (make-term
:ttype ’e
:synl (list (make-syn :name ’sep))))
(setf ?constellation (make-term
:ttype ’e
:synl (list (make-syn :name ’constellation))))
( s e t f ?m agnitude- range
(make-term
:ttype ’ e
:synl (list (make-syn :name ’magnitude-range
:inherit (list ’magnitude))
(make-syn :name ’magnitude))))
(setf ?period (make-term
:ttype ’e
:synl (list (make-syn :name ’period))))
(setf ?spectral-type (make-term
:ttype ’e
:synl (list (make-syn :name ’spectral-type))))
(setf ?flamsteed-number (make-term
:ttype 'e
:synl (list (make-syn :name ’flamsteed-number))))
(setf ?bayer-designation (make-term
:ttype ’e
:synl (list (make-syn :name ’bayer-designation))))
(setf ?other-name (make-term
:ttype ’e
:synl (list (make-syn :name ’other-name
:inherit ’name)
(make-syn :name ’name))))
(setf ?visua1-magnitude
(make-term
:ttype ’e
:synl (list (make-syn :name ’visual-magnitude
:inherit (list ’magnitude))
(make-syn :name ’magnitude))))
(setf ?absolute-magnitude (
make-term
:ttype 'e
:synl (list (make-syn :name ’absolute-magnitude
:inherit (list ’magnitude)))))
( s e t f ? c o lo r -in d e x (make-term
:tty p e "e
:s y n l ( l i s t (m ake-syn :name ’c o lo r - in d e x ) ) ) )
(setf ?spectral-type (make-term
:ttype 'e
:synl (list (make-syn :name 'spectral-type))))
(setf ?luminosity-class (make-term
:ttype ’e
:synl (list (make-syn :name 'luminosity-class))))
(setf ?notes (make-term
:ttype ’e
:synl (list (make-syn :name ’notes))))
(setf ?method (make-term
:ttype ’e
:synl (list (make-syn :name ’method))))
(setf ?advantage (make-term
:ttype ’e
;synl (list (make-syn :name ’advantage))))
(setf ?disadvantage (make-term
:ttype ’e
:synl (list (make-syn :name 'disadvantage))))
(setf ?use (make-term
:ttype 'e
:synl (list (make-syn :name ’use))))
(setf ?maximum-exposure (make-term
:ttype 'e
:synl (list (make-syn :name ’maximum-exposure))))
(setf ?asa-no (make-term
:ttype ’e
:synl (list (make-syn :name ’asa-no.))))
(setf ?color-sensitivity (make-term
:ttype ’e
:synl (list (make-syn :name ’color-sensitivity))))
(setf ?resolution (make-term
:ttype ’e
:synl (list (make-syn :name ’resolution))))
(setf ?contrast (make-term
:ttype ’e
:synl (list (make-syn :name ’contrast))))
(setf ?advantages (make-term
:ttype ’e
:synl (list (make-syn :name ’advantages))))
(setf ?use (make-term
:ttype *e
:synl (list (make-syn :name ’use))))
(setf ?max imum-expo sure (make-term
:ttype ’e
:synl (list (make-syn :name ’maximum-exposure))))
(setf ?subject (make-term
:ttype ’e
:synl (list (make-syn :name ’subject))))
(setf ?naked-eye (make-term
:ttype ’e
:synl (list (make-syn :name ’naked-eye))))
(setf ?binocular (make-term
:ttype 'e
:synl (list (make-syn :name ’binocular))))
(setf ?refractor (make-term
:ttype ’e
:synl (list (make-syn :name ’refractor))))
(setf ?newtonian (make-term
:ttype ’e
:synl (list (make-syn :name ’cassegrain))))
(setf ?cassegrain (make-term
:ttype ’e
:synl (list (make-syn :name ’cassegrain))))
(setf ?schmidt-cassegrain (make-term
:ttype ’e
:synl (list (make-syn :name ’schmidt-cassegrain))))
(setf ?maksutov (make-term
:ttype ’e
:synl (list (make-syn :name ’maksutov))))
(setf ?deep-sky (make-term
:ttype *e
:synl (list (make-syn :name ’deep-sky))))
; ; ; Entities with attributes
:;;X X X X X X X X X X % X m % X % X X % i% % % % % % % X % X X % X % % % i% % X X % % X X X X m % % % m % X % % % % X X % X X
(setf Tgalaxy (make-term
:ttype ’e
:synl
(list (make-syn
:name ’galaxy
:relation ?galaxyla
:inherit (list ’astro-object)
:context
(list ?messier ?ngc ?ra-hr ?ra-min
?dec-deg ?dec-min ?magnitude
?surface-brightness
?size ?distance ?shape)))))
(setf ?nebula (make-term
:ttype ’e
:synl
(list (make-syn
:name ’nebula
:relation ?nebulala
•.context
171
(list ?ngc ?ra ?dec ?magnitude
?surface-brightness
?size ?sliape ?distance)))))
(setf ?double-star (make-term
:ttype ’e
: synl
(list (make-syn
:name ’double-star
:relation ?doublestarla
:context
(list ?ads ?name
?ra ?dec ?magnitudel
?magnitude2 ?pa ?sep)))))
(setf ?double-star2 (make-term
:ttype ’e
: synl
(list (make-syn
:name ’double-star
:context
(list ?name ?distance)))))
(setf ?variable~star2 (make-term
:ttype ’e
:synl
(list (make-syn
:name ’variable-star
:context
(list ?name ?spectral-type)))))
(setf ?variable-star (make-term
:ttype ’e
: synl
(list (make-syn
:name ’variable-star
:context
(list ?name ?constellation ?type
?ra ?dec ?magnitude
?magnitude-range ?period)))))
(setf ?star (make-term
:ttype ’e
: synl
(list (make-syn
:name ’star
:context
(list ?flamsteed-number
?bayer-designation ?other-name
?ra ?dec ?absolute-magnitude
?spectral-type
?luminosity-class ?distance
?notes)))))
(setf Tastrophotography-systems (make-term
:ttype *e
: synl
(list (make-syn
:name ’astrophotography-systems
:context
(list ?method ?advantage ?disadvantage
?use ?maximum-exposnre)))))
;;; from Sherrod and Koed "A Complete Manual of Amateur Astronomy"
(setf ?black-and-white-film (make-term
:ttype 'e
:synl
(list (make-syn
:name ’black-and-white-f ilm
:context
(list ?asa-no ?color-sensitivity
?resolution ?contrast
?advantage ?disadvantage
?use ?maximum-exposure)))))
;;; from Sherrod and Koed "A Complete Manual of Amateur Astronomy"
(setf ?color-film (make-term
:ttype ’e
: synl
(list (make-syn
:name ’black-and-white-film
:context
(list ?asa-no ?color-sensitivity
?resolution ?contrast
?advantage ?disadvantage
?use ?maximum-exposure)))))
(setf ?astro-view-location (make-term
: ttype *e
: synl
(list (make-syn
:name ’astro-view-location
:context
(list ?astro-location~name ?owner)))))
;;; from Sherrod and Koed
;;; "A Complete Manual of Amateur Astronomy"
(setf ?astro-instrument
(make-term
:ttype *e
: synl
(list (make-syn
:name ’astro-instrument
:context
(list ?subject ?naked-eye ?binocular
?refractor ?newtonian ?cassegrain
?schmidt-cassegrain ?maksutov)))))
;;; Support Entities
(setf ?value (make-term
:ttype ’e
: synl
(list (make-syn
:name ’value))))
( s e t f ?d ate (m ake-term
174
: ttype ’e
: synl
(list (make-syn
:name 'date
:context
(list ?value)))))
;;; Entity to Entity Knowledge
; ; ;%%%%%ra%raram%%%m%%rammram%n%rara%ram%ra%%%%m%
;;; Implementation note - currently a database must show
;;; the propagate list from the dominant entity to the
;;; dominated entity. QBC will not post syn elaboration
;;; information if the database does not show the propagates.
;;; T h is is because the syn elaboration is based on a match and
; ;; currently the match must be on the dominant syn
(setf ?information-source (make-term
:ttype ’e
: synl
(list
(make-syn :name ’informat on source
•.propagate (list ’meeting
’site))
(make-syn :name ’meeting)
(make-syn :name ’site))))
(setf ?meeting (make-term
:ttype ’e
:synl
(list
(make-syn :name ’meeting
:propagate (list ’electronic-meeting
’physical-meeting))
(make-syn :name ’electronic-meeting)
(make-syn name ’physical-meeting))))
(setf ?electronic-meeting (make-term
:ttype ’e
175
:synl
(list
(make-syn :name ’electronic-meeting
:propagate (list 'bnlletin-board
’ new-group
'telephone-hot-line
’utility))
(make-syn :name ’bulletin-board)
(make-syn :name ’news-group)
(make-syn :name ’telephone-hot-line)
(make-syn :name ’utility))))
(setf ?physical-meeting (make-term
:ttype ’e
: synl
(list
(make-syn :name ’physical-meeting
:propagate (list ’astronomy-club))
(make-syn :name ’astronomy-club))))
(setf ?site (make-term
:ttype ’e
: synl
(list
(make-syn :name ’site
:propagate (list
’museum ’observatory
’planetarium))
(make-syn ;name ’museum)
(make-syn :name ’observatory)
(make-syn :name ’planetarium))))
(setf ?astro-object (make-term
:ttype ’e
: synl
(list
(make-syn :name ’astro-object
:propagate
(list ’planet ’star 'galaxy
’nebula ’quasar ’meteor
’comet 'astroid
’constellation) ) )) )
(setf ?film (make-term
:ttype ’e
: synl
(list
(make-syn :name ’film
:propagate (list ’black-and-white-film
’color-film))
(make-syn :name ’black-and-white-film)
(make-syn :name ’color-film))))
(setf ?deep-space (make-term
:ttype ’e
: synl
(list
(make-syn :name ’deep-space
■.inherit (list ’deep-sky))
(make-syn :name ’deep-sky
:inherit (list ’deep-space))
(make-syn :name ’galaxy
:inherit (list ’deep-space))
(make-syn :name ’nebula
:inherit (list ’deep-space))
(make-syn :name ’star-cluster
:inherit (list ’deep-space)))))
(setf ?telescope (make-term
:ttype ’e
: synl
(list
(make-syn :name ’telescope)
(make-syn :name ’refractor
:inherit (list ’telescope))
(make-syn :name ’newtonian
:inherit (list ’telescope))
(make-syn :name ’cassegrain
:inherit (list ’telescope))
(make-syn :name ’schmidt-cassegrain
:inherit (list ’telescope))
(make-syn :name ’maksutov
:inherit (list ’telescope)))))
;;; Activities
(setf ?photograph (make-term
:ttype ’ac
: synl
(list (make-syn
:name ’phot ograph
:context (list ?size))
(make-syn
:name ’view
:context (list ?size)
rpropagate (list ’photograph)))))
(setf ?view (make-term
:ttype ’ac
: synl
(list (make-syn
:name ’view
:propagate (list ’photograph))
(make-syn
:name ’photograph))))
; - r , % % t V A V m % % % % % % % % % % % % m m % % % % % % % % % % % % % % % J M % % % % % % % % % % % % % % % % % % % % % %
;;; Activity Primitives
(setf ?view*deep-sky (make-term
:ttype ’ap
: synl
(list
(make-syn
-.name ’viev;*deep-sky
:context
(list ?view
178
?deep-sky)))
:req (list
(make-term
:unspecified t
:ttype ’e
: synl
(list
(make-syn
-.name ’(unspecified)
:context
(list
(make-term
:synl (list
(make-syn
:name ’ra-hr)))
(make-term
:synl (list
(make-syn
:name ’dec-deg))))))))))
(setf ?view*deep-space (make-term
:ttype 'ap
:synl (list
(make-syn
:name ’view*deep-space
:context
(list ?view ?deep-sky)))
:req (list
(make-term
:unspecified t
:ttype ’e
: synl
(list
(make-syn
:name ’(unspecified)
:context
(list
(make-term
:synl (list
(make-syn
:name ’ra)))
(make-term
: 83ml (list
(make-syn
:name ’dec))))))))))
(setf ?photograph*deep-space (make-term
:ttype ’ap
: synl
(list
(make-syn
:name ’phot ograph*deep-spac e
:context
(list
(make-term
:ttype ’ac
: synl
(list (make-syn
:name 'photograph)))
(make-term
:ttype ’e
:synl (list
(make-syn
:name ’deep-space))))))
:req (list ?astrophotography-systems
?film)))
;;; Old version that used ?variables, which had the effect of
;;; dragging in all knowledge of these variable before processing
;;; even began.
;;(setf ?photograph*deep-space (make-term
;; :ttype ’ap
;; : 33ml (list (make-syn :name ’photograph*deep-space
;; :context
;; (list ?photograph
;; ?deep-space)))
;; :req (list ?astrophotography-systems
;; ?film)))
;; j m m i f f i f f l f f l m f f l m m m r a m m m m f f l m m x m m x m
;;; Activity Clusters
180
(setf ?star-party (make-term
:ttype 'acluster
:synl (list
(make-syn
: name ’starparty
:context
(list ?view*deep-space
?photograph*deep-spac e)))
:req (list ?astro-view-location)))
' . u v a m x % % % m M % % % % % % % m . % % % % % % % % % % % % % m / a % % % % % v m % % % % m % % % v m %
;;; lorld-to-world relationships
;;; could also provide requisition of other techniques such as a
;;; particular expert system. But the relationships can also be
;;; handled with a context match. The context discovery will
;;; discovery all of the terms that are related to the use-to-instrument
;;; term.
(setf ?use-to-instrument
(make-term
:ttype ’w
: synl
(list
(make-syn
:name ’use-to-instrument
:context
(list
(make-term
:ttype ' ap
: synl
(list
(make-syn
:name ’view*deep-sky)))
(make-term
:ttype *e
: synl
(list
(make-syn
181
:name ’astro-instrument))))))))
;; Original with ?variables
(setf ?use-to-instrument (make-term
:ttype ’w
: synl
(list
(make-syn
:name ’use-to-instrument
:context
(list ?view*deep-sky
?astro~instrument)))))
* 13 3 s f c s |c 3 s)c $3 3 s j t 3 s |c ^ < |> 3 3 3 * 33 3 333 333 3 33333 33 3 c < 3 * 3333 V 3 33 3 c 3 3 3 3 3 3 33
;; Utility Functions
* * 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
;; The various utility functions are
;; coded as source*output*typeoutput
;; with the output indicated as name (n), term (t), syn (s) or
;; hypoth (h) if single elements or
;; with the output indicated as nl, tl, si or hi if
;; the output is a list of these
> I
;; context is ctx
* *
;; pterm is used to represent a term
syn to context - name output
3333333333333333333333333333333333333333333333333333333333333333
Will provide the context of a single syn
Output is list of term names
;;; old-name syncontextlist which was incorrect and single-syn-context
(defun s*ctx*nl (synelement)
(let ((olist nil))
(dolist (conelement (syn-context synelement) olist)
(dolist (consyn (term-synl conelement) t )
(setf olist (cons (syn-name consyn) olist))))))
J J J * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;;; syn to context - term output
; • J * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;;; ¥ill provide the context of a single syn
;;; Output is list of terms
(defun s*ctx*tl (synelement)
(let ((olist nil))
(dolist (c o n e le m e n t (syn-context synelement) olist)
(dolist (consyn (term-synl conelement) t )
(setf olist (cons consyn olist))))))
J J | * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;;; term to syns - name output
; • ; 5 f c a * c ^ < * * ** ** * *** ****** * ****** **** ** ***** ********** * *********** *****
;;; Will provide the set of syns for a term
;;; Output is list of syn-names
;;; old-name synlist
(defun t*syn*nl (pterm)
(cond ((term-p pterm)
(let ((olist nil))
(dolist (element (term-synl pterm) olist); element is syn
(setf olist (cons (syn-name element) olist)))))
(t nil)))
; ; J * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;;; term to context - name output
; ; ; ****************************************************************
;;; Will provide the entire context for all of the syns of a term
;;; output is list of names in the context of all the terms syns
;;; old name contextlist
(defun t*ctx*nl (pterm)
183
(let ((olist nil))
(dolist (synelement (term-synl pterm) olist); list of syns
;; synelement takes on the value of a syn
(setf olist (s*ctx*nl synelement)))))
;;; term to context - term output
;;; Will provide the entire context for all of the syns of a term
;;; output is list of terms in the context of all of terms syns
(defun t*ctx*tl (pterm)
(let ((olist nil))
(dolist (synelement (term-synl pterm) olist) ; list of syns
;; synelement takes on the value of a syn
(setf oliBt (s*ctx*tl synelement)))))
« • • Jit**********
;;; term to hypothesis - name output
;;; Will provide the set of syns for a term
;;; Output is list of names for all of the hypotheses of a term
;;; old-name hypolist
;;; note that hypothesis names are stored as names and not syns
;;; but they should be stored as syns
(defun t*hyp*nl (pterm)
(let ((olist nil))
(dolist (element (term-hypothesis pterm) olist)
(setf olist (cons (hypoth-name element) olist)))))
hypoth to context - name output
Will provide the set of names for all of the terms in the
context of a single hypothesis
Output is list of names for all context terms of a hypotheses
(defun h*ctx*nl (hypterm)
(let ((olist nil))
(dolist (conterm (hypoth-context hypterm) olist)
;;conterm is a term from which the syns will be extracted
(setf olist (append (t*syn*nl (eval conterm)) olist)))))
name to context term - term output
;;; transforms context names to context terms
;;; output is a list of context terms from pterm, or nil
;;; old name conname-to-terms
(defun n*ctx*tl (names pterm)
(let ((olist nil))
(dolist (synelmt (term-synl pterm) olist); list of syns
;; synelement takes on the value of a syn
(dolist (ctxterm (syn-context synelmt) t)
;; ctxterm represents a context term
(dolist
(consyn (term-synl ctxterm) t)
;; consyn represents a the syn list of a context term
(when
(member (syn-name consyn) names)
(setf olist (cons ctxterm olist))))))))
name to syn - syn output
;;; transforms syn names to a list of syns
;;; output is a list of syns from pterm, or nil
(defun n*syn*tl (names pterm)
(let ((olist nil))
(dolist (synelmt (term-synl pterm) olist); list of syns
;; synelement takes on the value of a syn
(dolist (sname (syn-name synelmt) t)
(when (and (member sname names)
(not (member synelmt olist)))
(setf olist (cons synelmt olist)))))))
name to syn - syn output
;;; transforms syn name to a list of syns
;;; output is a list of syns from pterm, or nil
;;; This was modified to make syninfo work
;;; This has been modified from n*sn*tl to
;;; assume that names is a list of one name and
;;; that the names retrieved from the pterm
;;; using (syn-name synelmet can be either a list
;;; or a single element. There is a test and
;;; different logic depending upon the result
(defun sinfo*n*syn*tl (inname pterm)
(let ((olist nil) (names (list inname)))
(dolist (synelmt (term-synl pterm) olist); list of syns
;; synelement takes on the value of a syn
;;; Note that (syn-name synelmt is not always a list, this
;;; has caused problems.
;;; if names comes as a list, then do one thing, other wise
;;; do another. This is a bug fix — The same basic
;;; process works fine without this bug fix for the
;;; context discovery routines, not sure what the problem is here
(cond
((listp (syn-name synelmt))
(dolist
(sname (syn-name synelmt) t)
(when (and (member sname names)
(not (member synelmt olist)))
(setf olist (cons synelmt olist)))))
(t (when
(and
(member (syn-name synelmt) names)
(not (member synelmt olist)))
(setf olist (cons synelmt olist)))))))))
name to 3301 - syn output — modified to work with postluster
transforms syn names to a list of syns
186
;;; output is a list of syns from pterm, or nil
;;; difference is that (list ) has been applied to
;;; syn-name synelmt to make postcluster work
Cdefun postcluster*n*syn*tl (names pterm)
(let ((olist nil))
(dolist (synelmt (term-synl pterm) olist); list of syns
;; synelement takes on the value of a syn
(dolist (sname (list (syn-name synelmt)) t)
(when (and (member sname names)
(not (member synelmt olist)))
(setf olist (cons synelmt olist)))))))
set m e m b e rs h ip
;;; will take sets of the form
;;; setl ((GALAXY))
;;; set2 ((UNSPECIFIED) (PLANETARY-NEBULA) (GALAXY))
;;; and return true for (setmember setl set2)
;;; at least one member of a subset of the first set must agree with
;;; at least one member of the second set for this to return true
(defun setmember (setl set2)
(let ((name nil) (result nil))
(dolist (sub2 set2 name)
(if (setf result (intersection setl sub2))
(setf name result)))))
pretty print term
(defun ppterm (interm)
(let ((pterm (eval interm)))
(dolist
(synelmt (term-synl pterm) nil)
;;synelmt is a syn
(terpri)
(cond
((null pterm) nil)
(t (princ "Syn name = ")
(princ (syn-name synelmt))
(if (syn-context synelmt)
(dolist
(conterm (syn-context synelmt) nil)
(terpri)
(princ "Context Term ")
(ppterm conterm)
(terpri))))))
(dolist
(rfragelmt (term-req pterm) nil)
;; rfragelmt is an rfrag
(terpri)
(cond ((null pterm) nil)
((null rfragelmt) nil)
((listp (rfrag-terms rfragelmt))
(princ "Requisition fragments ")
(dolist (rfragterm (rfrag-terms rfragelmt) nil)
(terpri)
(princ "Rfrag Term ")
(ppterm rfragterm)
(terpri)))
(t nil)))))
;; pretty print hypothesis
(defun pphypoth (interm)
(let ((hyplist interm))
(dolist
(hypoelmt hyplist nil)
;;(print "inside pphypoth -- hypoelmt")
;;(pprint hypoelmt)
;;hypoelmt is a hypothesis of type hypo
(terpri)
(princ (hypoth-name hypoelmt))
(if (hypoth-context hypoelmt)
(dolist (conterm (hypoth-context hypoelmt) nil)
(terpri)
(princ "Context Term ")
(ppterm conterm))))))
;;; Collect r-frags
;;; % % % % % % % l % % % m % % % % % % % % % % % % % % % m % % % % % % % % % % % % % % % % % % % % l % % % % % % % % l
;;; This consists of two recursive functions that work on syns
;;; or terms. Each will collect rfrags
(defun erfrag*term (dimension)
(let ((trfraglist nil))
(if (null dimension) nil
(dolist
(dterm dimension trfraglist)
(setf trfraglist
(append
(term-req (eval dterm))
trfraglist (crfrag*syn (terra-synl (eval dterm)))))))))
(defun crfrag*syn (synlist)
(let ((srfraglist nil))
(if (null synlist) nil
(dolist (synelmt synlist srfraglist)
(setf srfraglist (append (crfrag*term (syn-context synelmt))
srfraglist))))))
; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
; Julian Date Calculation
; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;Julian date calculation from
; Jean Meeus, Astronomical Formulae for Calculators
; published by William-Bell, Inc, Richmond, VA
; Test code
(setf yyyy 1978)
(setf mm 1 1)
;;(setf dd 13)
;;(print "Julian date")
;;(print (julian yyyy mm dd))
(defun julian (year month day)
(let ((jd nil))
(cond ((> month 2) t )
( t (setf year (1- year)) (setf month (+ month 1 2))))
;; a and b are corrections if the date is greater than 1582,
;; when th julian calendar was corrected. Since QBC is not
;; concerned with historical dates, this correction will
;; always be made.
(setf a (truncate (/ year 1 0 0)))
(setf b (+ (- 2 a) (truncate (/ a 4))))
(setf jd
(+ (truncate (* 365.25 year))
(truncate (* 30.6001 (1+ month)))
(+ day 1720994.5) b))))
;;the following was taken from Meeus Ghapter 7
;; it will calculate the julian date for any UT at greenwich
(defun sidereal-time (year month day ut)
(let ((time nil) (tt nil) (statOOut nil))
;; statOOut is the sidereal time at Oh UT in Greenwich
(setf tt (/ (- (julian year month day) 2415020) 36525))
(setf revolutions
(+ .276919398
(* 100.0021359 tt)
(* .000001075 (* tt tt))))
(setf statOOut (* (rem revolutions 1) 24))
;(print "sideral time at Oh")
;(print statOOut)
;(print "ut")
;(print ut)
;(print "added amount")
;(print (* ut 1.002737908))
(setf time (mod (+ statOOut (* ut 1.002737908)) 24))))
;(setf yyyy 1978)
;(setf mm 1 1)
;(setf dd 13)
;(setf ut 4.5791698)
;(print "Example from Meeus for Nov 13, 1978 at 4.5791698 UT")
;(print (sidereal-time yyyy mm dd ut))
;(setf yyyy 1990)
;(setf mm 2)
;(setf dd 1 0)
;(setf ut 19 )
;(print "sidereal time at Greenwich, Feb 10, 1990 at 19h using Meeus")
;(print (sidereal-time yyyy mm dd ut))
;; the following was taken from Karkoschka pg 121
; ; although the algorithm for calculating the julian date
;; was taken from Meeus
;; added .5 correction after the 2447790.66
(defun sidereal-time2 (year month day local-time ut-diff east-long)
(let ((time nil))
(setf time
(mod (+ (/ (- (julian year month (+ day (/ ut 24)))
2447790.66) 15.2184)
(+ local-time ut-diff) (/ east-long 15)) 24))))
f I I — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
;;;database print
» » * —. —— — — — — — — — — — — — — — — — — — — — — ——— — — — — — — — — — — — — — — — — — — ———— — — —— — —
;;; prints each row of the database on a different line
(defun dbprint (db)
(dolist (row db nil)
(print row)))
;;; Syn Elaboration
191
s i c s f c s f c s f c s f c s j c % # & * f c
sjcslc^c^c^e^csicsic^c^JicsicHc $;& $s|c9ic$ $ & $»|;)iesi;:fr&;fc $ * * $ * : £ $ $ $ ifrsfe* $:{e;$ $ ;|c £ £ $ 3 e $ $ $ $ $ $ $ $ * $ * $ : £ &
Syn elaboration involves three functions. Under the first, a
syn exists with a propagate and/or inherit list. The entities
listed in these lists should be transformed into syns and added
to the syn list of the syn in which they were originally found.
If all agents also do this, then matches can be made based on
syns alone. If a match is make, then any additional syns are
added to the syn list of the evolving query kernel
The second function is a match on syns, similar to the match on
syns used for adding additional context.
;; The third function should propagate context, propagation and
;; inheritance lists to all of those 33ms that are dominated
;; proplist will propagate proplist and
;; inherit list will propagate inherit lists
;; Missing context propagation
;; Missing rfrag propagation
;; instantiate syn
;; This function instantiates syns from the inheritance or
;; propagation lists of syns already on the syn list of a tern.
;; Its input is a syn, and its output is the term with an
;; extended syn list consisting of newly instantiated terms
;; Test data
; ; (setf testerm testterm2)
;; (print "Test term prior to Syn Elaboration")
;; (ppterm testerm)
;; (instsyn testerm)
;; (print "Test term after syn elaboration")
;; (ppterm testerm)
(defun instsyn (pterm)
(let ((newsynnames nil) (syname nil) (newprop nil)
(newinherit nil) (newpropinherit nil))
(dolist (synelmt (term-synl pterm) pterm)
(setf newsynnames
(append
(set-difference (syn-propagate synelmt)
(t*syn*nl pterm))
(set-difference (syn-inherit synelmt)
(t*syn*nl pterm))))
(if newsynnames
(dolist
(Byname newsynnames nil)
(setf (term-synl pterm)
(cons (make-syn :name (list syname))
(term-synl pterm))))))
;;; The following establishes the propagation in iheritance
;;; lists for the newly created syns.
(setf newpropinherit
(propinherit (term-synl pterm)))
(if newpropinherit
(setf (term-synl pterm) newpropinherit))
(setf newprop (prop (term-synl pterm)))
(if newprop (setf (term-synl pterm) newprop))
(setf newinherit (inherit (term-synl pterm)))
(if newinherit (setf (term-synl pterm ) newinherit))))
;;; Collect additional syn information
* * •
;;; This functions will collect information that agents have
;;; in their database on specialization or generalization
;;; relationships specifically, and additional information
;;; contained in propagate, inherit lists
;;; Used gencontext as basis
;;; dterm is a term from a database
;;; qterm is a term from the query elaboration on the blackboard
(defun syninfo (dterm qterm)
(let ((intersect nil) (nev/prop nil) (newinherit nil)
(dataterm (eval dterm)) (queryterm (eval qterm))
(change nil))
;;; first test is that there must be some common syns
;;; between the database and the evolving query.
;;; intersect holds this intersection
{setf dbnames (t*syn*nl dataterm))
(setf qnames (t*syn*nl queryterm))
(if (not (listp dbnames))
(setf dbnames (list (list dbnames))))
(if (not (listp qnames))
(setf qnames (list (list qnames))))
(setf intersect (setmember dbnames qnames))
;;; cases
;;; syn intersect null => do nothing since no common names
;;; syn intersect non-null then check for any additional
;;; propagate or inherit data
;;; otherwise do nothing
;;; returns the modified qterm if there is something to add,
;;; otherwise nil
(cond
((not intersect) nil) ;no common syn names
( t
(setf name (car intersect))
(dolist
(dsyn (sinfo*n*syn*tl name dataterm) queryterm)
(dolist
(qsyn (einfo*n*sym*tl name queryterm) queryterm)
(when (setf newprop
(set-difference (syn-propagate dsyn)
(syn-propagate qsyn)))
(setf (syn-propagate qsyn)
(append newprop (syn-propagate qsyn)))
(setf change t))
(when (setf newinherit
(set-difference (syn-inherit dsyn)
(syn-inherit qsyn)))
(setf (syn-inherit qsyn)
(append newinherit (syn-inherit qsyn)))
(setf change t))))))
(if change qterm nil))) ; returns modified qterm is any change
; Prop and Inherit Discovery
;;; similar to context-discovery
;;; Loops through database seeing if there is any additional
;;; propagation or inheritance information to be added
;;; returns nil if nothing can be done
(defun syninfo-discover (datalist querylist)
(let ((reslut nil))
(dolist
(pddterm querylist querylist)
(dolist (dbrelation datalist result)
(setf result (syninfo dbrelation pddterm))
(if result (setf pddterm result))))))
* * J—— — — — — ’ — —
;;; hypothesis generation
i * *
;;; Database generates hypothesis if there is a match on
;;; context returns queryterm if hypothesis can be
;;; generated otherwise nil.
(defun genhypoth (dterm qterm)
(let ((intersect nil) (hypocandidate nil) (hyponames nil)
(dataterm (eval dterm)) (queryterm (eval qterm)))
(when
(term-unspecified queryterm) ;proceed if queryterm unspecifed
;; first test is that there must be some common terms
;; between the database and the evolving query.
;;intersect holds this intersection
;; the context lists for this function will be in terms of
; ;terms for context
(setf intersect
(intersection (t*ctx*nl dataterm)
(t*ctx*nl queryterm)))
;; pull terms that coorespond to the names of context
;; terms that are common to queryterm and dataterm
;; cases
;; intersect null => do nothing since no common names
;; intersect non-null and
;; hypothesis name on hypothesis list then add context
;; hypothesis name not on hypothesis list then
;; add name and context
(cond ((null intersect) nil) ; no common names return nil
;; FUTURE - may want to merge new syns into existing hyp,
;; based on checking subset relationship rather than
;; intersection
;; Now if only one term is common, no way to add to syns
;; of hypothesis if dataterm has additional syns. Need merge synl
((intersection
(t*hyp*nl queryterm) (t*syn*nl dataterm))
;;hypothesis name posted
(dolist
(oldhypoth (term-hypothesis queryterm) qterm)
;; first find the hypothesis to which
;; this dataterm applies - name intersect
(if (intersection
(t*syn*nl dataterm)
(hypoth-name oldhypoth))
(dolist
(contextcand intersect queryterm)
;; returns queryterm
(if (not
(member
contextcand
(t*syn*nl
(hypoth-context oldhypoth))))
(setf (hypoth-context oldhypoth)
(append
(n*ctx*tl contextcand dataterm)
(hypoth-c ont ext oldhypoth))))))))
;; need to add hypothesis name and context
(t (setf (term-hypothesis queryterm)
(cons
(make-hypoth :name (t*syn*nl dataterm)
:context (n*ctx*tl intersect queryterm))
(term-hypothesis queryterm)))
qterm))))) ;returns modified query term
;;; Generate Hypotheses
******** if:#*******##:!!**********!!::!! ********************** ********
applies genhypoth with respect to each relation
accesses global *db* and *pdd*
*db* holds all of the relations
*pdd* holds all of the terms of the primary world data dimension
(defun gen-entity-hypothesis (datalist querylist)
;; if genhypoth returns nil, then qterm is not modified
;; and is added back
;; to output list, otherwise the result of genypoth is a modified
;; query term that is added to olist
;; outputs a new querylist
(let ((result nil) (olist nil))
(dolist (relation datalist querylist)
(dolist (qterm querylist t)
(setf result (genhypoth relation qterm))
(if result (setf qterm result))))))
; ; ; matchnr
;;; compares a synelement to a hypothesis element.
; ; ; in general, if the context of a synelement is a subset
;;; of the context of a hypothesis element, then the hypothesis
;;; is a good one.
;;; Since each context is a syn, to do this correctly, there must be
;; ; at least one match between each of the contexts of the synelement
;;; and each of the contexts of the hypothesis. This function replaces
;;; an earlier approach in which the set of names associated with the
;;; syn are checked to see if they are a subset of the set of names
;;; associated with the hypothesis. The problem with this earlier
;;; approach is that all of the syns of the synelement must be a
;;; subset of the set of context names associated with the hypothesis.
;;; In addition to handling the synelement to hypothesis element
;;; comparision correctly, it also keeps track of the number of
;;; context terns that match. And, in fact, this is what is returned.
;;; In this way, the hypothesis that has the best match can be selected,
;;; if there is no hypothesis that completely covers all of the
;;; context term of the unspecified synelement.
(defun matchnbr (synelmt hypoelmt)
197
(let (( count 0) (match nil))
(dolist (unspecctxterm (syn-context synelmt) count)
(setf match nil)
(dolist
(hypoctxterm (hypoth-context hypoelmt) match)
(if (intersection (t*syn*nl unspecctxterm)
(t*syn*nl hypoctxterm))
(setf match t)))
(if match (setf count (1+ count))))))
*******************************************************************
Post new syns based on acceptable hypothesis
Hypothesis Consistency check
Finds those hypothesis that have the best consistency
with the contexts of a unspecified term
If a hypothesis is consistent with the context of any
syn then it will be taken as a good hypothesis and
turned into a syn.
If there are more than one good hypothesis, then a new syn will
be created for each hypothesis
no attempt is made to separate syns at this point
This applies only to unspecified terms
Under version 1 wanted to see if syn context is a subset of the
hypoth context.
if so then add the hypothesis to the synl
under version 2 , find that hypothesis for which there is the best
match between the desired context of the unspecified term and the
context of the hypothesis. In this case, best is determined by
the number of context terms which match. Since a context term is
represented as a polytyped term, a polyterm matches another
polyterm is there is a least one common syn
Also change the term unspecified slot to nil to signal
that this is no longer an unspecified term.
The approach has the advantage of leaving the original state of the
unspecified syn list
this will return a queryterm, modified if appropriate or unmodified
198
;;; Since elements are constantly being added to synl, need to
;;; filter out all syns that are not unspecified when going through
;;; process of adding new syns.
;;; returns nil if no change to queryterm
(defun hyp-to-syn (qterm)
(let ((conlist nil) (newsyns nil) (queryterm nil) (best nil)
(bestcount 0) (newcount 0) (bestlist nil))
(setf queryterm (eval qterm))
(cond
((not(term-unspecified queryterm)) nil); term not unspecified
(t (dolist
(hypoelmt (term-hypothesis queryterm) qterm )
;; provides a list of hypoth structures
;; each having a contextlist
;; ensure that proplists and inherit
;; lists are up to date
(prop (term-synl queryterm))
(inherit (term-synl queryterm))
(dolist
(synelmt (term-synl queryterm) queryterm)
(cond
((and (equalp (syn-name synelmt)
'(unspecified))
(setf newcount (matchnbr
synelmt hypoelmt)))
;;; a higher count means more context terms were matched,
;;; thus replace the old best hypothesis with the new best.
;;; If the counts are the same, then add this new hypothesis to the liBt.
(cond ((> newcount bestcount)
(setf bestcount newcount)
(setf newcount 0 )
(setf bestlist (list hypoelmt)))
((= newcount bestcount)
(setf bestlist
(cons hypoelmt bestlist))
(setf newcount 0))
(t nil)))
;;; the following is concerned with hypotheses satisfiying
;;; constrained unspecified types. For it to work effectively,
;;; there must be a function which expands syn lists to
;;; instantiate syns that are references as propagated
;;; to or inherited from
199
((and (syn-name synelmt)
(hypoth-name hypoelmt)
(setf nev/count (matchnbr
synelmt hypoelmt)))
;;; a higher count means more context terms were matched,thus replace
;;; the old best hypothesis with the new best. If the counts
;;; are the same, then add this new hypothesis to the list.
(cond ((> newcount bestcount)
(setf bestcount newcount)
(setf newcount 0)
(setf bestlist (list hypoelmt)))
{(= newcount bestcount)
(setf bestlist
(cons hypoelmt bestlist))
(setf newcount 0 ))
(t nil))) ))>
(dolist (best bestlist qterm)
(setf (term-synl queryterm)
(cons (make-syn
:name (hypoth-name best)
;;; note that the hypothesis context is used rather than the
;;; original context of the unspecified syn. Thus some if
;;; not all of the next stage of
;;; processing occurs (i.e. context discovery)
:context (hypoth-context
best))
(term-synl queryterm))))
(setf (term-unspecified queryterm) nil)
qterm);returns modified query term
(t nil))))
;;; Resolve hypotheses
(defun resolve-hypoth (querylist)
;;; if hyp-to-syn returns nil, then qterm is not modified
;;; and is added back to output list, otherwise the result
;;; of hyp-to-syn is a modified
;;; query term that is added to olist
;;; outputs a new querylist
(let ((result nil) (olist nil))
(dolist (qterm querylist querylist)
(setf result (hyp-to-syn qterm))
(if result (setf qterm result)))))
;;; Collect additional context
;;; If match on at least one syn name, see if dataterm has any context
;; ; terms that are not on queryterm, and if so, add them.
;;; HEED TO MODIFY TO TO CHECK TO SEE IF CONTEXT IS ALREADY ON
;;; LIST PRIOR TO ADDING IT — BUT NEWCONTEXT SHOULD CATCH THIS
;;; PROBLEM DETECTED IN PRD ENTITY DISCOVERY DEMO
;;; Database generates hypothesis if there is a match on syn
(defun gencontext (dterm qterm)
(let ((intersect nil) (newcontext nil) (newcontexterms nil)
(dataterm (eval dterm)) (queryterm (eval qterm)))
;;; first test is that there must be some common syns
;;; between the database and the evolving query.
;; intersect holds this intersection
(setf newcontext
(set-difference
(t*ctx*nl dataterm)
(t*ctx*nl queryterm)))
(setf intersect
(setmember
(t*syn*nl dataterm)
(t*syn*nl queryterm)))
;;; cases
;;; syn intersect null •*> do nothing since no common names
;;; syn intersect non-null and
;;; queryterm context subset of dataterm context,
;;; add missing context
;;; otherwise do nothing
(cond ((not intersect) nil) ;no common syn names
(newcontext ;some new context terms exist on dataterm
(dolist (syns (n*syn*tl intersect queryterm) qterm)
;; ; syns is syn for the queryterm syns that are common
(setf
(syn-context syns)
(append
(syn-context syns)
(n*ctx*tl newcontext dataterm)))))
201
( t n i l ) ) ) ) ; no a d d itio n a l c o n te x t to add
;;; Context Discovery
(defun context-discover (datalist querylist)
(dolist (pddterm querylist querylist)
(dolist (dbrelation datalist t)
(setf result (gencontext dbrelation pddterm))
(if result (setf pddterm result)))))
;;; syndom
* • * s f c s j i s f c # f c s * : # : 4 £ d s j f c s f c s | t s | s s | c 3 f c j | c 4 f c < f c s i ; s f c ; f e s |£ £ s f r * a | : s f c i ^ s l s j j e 5 f c * * s f c u j c * * * * * *
;;; takes two syns as arguments
;;; returns true if syna dominates synb as indicated by a
;;; comparision of its propagation and inherit list.
;;; This function looks only at
;;; the current lists associated with the syns and does not attempt
;;; to generate the transitive closure of the lists
;;; Determines if one syn dominates another
(defun syndom (syna synb)
(if (or (member (syn-name synb) (syn-propagate syna))
(member (syn-name syna) (syn-inherit synb))) t))
; ; ; syndom2
;;; takes two syns as arguments
;;; returns true is syna dominates synb as indicated by a comparision
;;; of its propagation and inherit list. This function looks only at
;;; the current lists associated with the syns and does not attempt
;;; to generate the transitive closure of the lists
;;; Determines if one syn dominates another
;;; in contrast to syndom, syndom2 assumes that the syn names are of
;;; the form (name)
(defun syndom2 (syna synb)
(when (and (syn-p syna) (syn-p synb))
(if (or (member (car (syn-name synb)) (syn-propagate syna))
(member (car (syn-name syna)) (syn-inherit synb))) t)))
;;; stopped here
; ; • & a f c s k & # ****** ***** ***** ****** *********
;;; proplist
; ; ; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;;; Given a synlist, will make one pass through the list, merging all
;;; proplists into the dominant syn’s proplist. Each syn in the synlist
;;; is considered in turn compared as the syn of interest (SI), The SI
;;; is compared with other syns (SOs) of the list. If the SI dominates
;;; the SO and if the SO has a propagation that the SI does not have
;;; then the additional propagation is added to the Sis propagate list.
;;; returns synlist with modified propagate lists if there as been a
;;; change in the synlist, otherwise nil is returned
(defun proplist (synlist)
(let ((change nil) (diff nil))
(dolist (syna synlist synlist)
(dolist (synb synlist t)
(when (syndom syna synb)
(setf diff (set-difference
(syn-propagate synb)
(syn-propagate syna)))
(when diff
(setf (syn-propagate syna)
(append (syn-propagate syna)
diff))
(setf change t)))))
(if change synlist nil)));return synlist is change or nil if nochange
; ; ; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;;; inheritlist
; ; ; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;;; Given a synlist, will make one pass through the list, merging all
;;; inheritlists into the dominant syn’s proplist. Each syn in the synlist
;;; is considered in turn and compared as the syn of interest (SI). The SI
;;; is compared with other syns (SOs) of the list. If the SO dominates
;;; the SI and if the SO has an inheritance that the SI does not have
;;; then the additional inheritanceis added to the Sis inheritance list.
;;; returns synlist with modified propagate lists if there as been a
203
;;; change in the synlist, otherwise nil is returned
(defun inheritlist (synlist)
(let ((change nil) (diff nil))
(dolist (syna synlist synlist)
(dolist (synb synlist t)
(when (syndom synb syna)
(setf diff (set-difference
(syn-inherit synb)
(syn-inherit syna)))
(when diff
(setf (syn-inherit syna)
(append (syn-inherit syna)
diff))
(setf change t)))))
(if change synlist nil)));return synlist if change or nil if nochange
;;; propinherit
;;; Given a synlist, will make one pass through the list,
;;; If syn X propagates to syn Y, then Y's ihneritance list should
;;; include X.
;;; Likewise, if X inherits from Y,
;;;then Y should indicate a propagation to X
;;; This function makes one pass through the synlist making this
;;; adjustement. In contrast to proplist and inheritlist, there is
;;; no transitive relationships, so one pass is sufficient
(defun propinherit (synlist)
(let ((change nil) (diff nil))
(dolist (syna synlist synlist)
(dolist (synb synlist t)
(when (syndom synb syna)
(when (not (member (syn-name synb)
(syn-inherit syna)))
(setf (syn-inherit syna) (cons
(syn-name synb) (syn-inherit syna)))
(setf change t))
(when (not (member (syn-name syna)
(syn-propagate synb)))
(setf (syn-propagate synb) (cons
204
(syn-name syna) (syn-propagate synb)))
(setf change t)))))
(if change synlist nil))) ; return synlist if change, nil if not
; * * $ :|c a}?# * a)cjfcsicsjc * * * # #
;;; prop
;;; Continues to call proplist unitl propagation action stops as
;;; indicated by prolist making no changes and returning nil
;;; prop returns the modified synlist
(defun prop (synlist)
(let ((newlist nil))
(setf newlist (proplist synlist)) ; newlist holds modified list or nil
(if newlist (prop newlist) synlist)))
;;; inherit
• ’ ' & i f ; s f c
;;; Continues to call inheritlist unitl propagation action stops as
;;; indicated by inheritlist making no changes and returning nil
;;; inherit returns the modified synlist
(defun inherit (synlist)
(let ((newlist nil))
(setf newlist (inheritlist synlist))
; newlist holds modified list or nil
(if newlist (inherit newlist) synlist)))
s j c ^ c : f c s fc j j c a f c j f c s j s j f e j f c s f c s f c s f c s l t s f c a k j f c j | : s } « 3 k : f c : j s s f r s f c : ❖ s fc 5 $ : s f c s f c j f c
post content - match on single syn
A source term (sterm) and a target term (tterm) are compared.
If there is a match on syns, then an rfrag is made and appended
to the requisition list of the target term. This rfrag holds the
205
;;; contents of the requisition slot of the source term and the entity
;;; syns on which the match was made.
;;; This assumes that source contents has been fully instantiated.
;;; The match is based only on a single syn, thus it is called conmatchl
;;; This function returns the modified tterm
(defun conmatchl (sterm tterm)
(let ((intersect nil))
(setf intersect (intersection (t*syn*nl sterm) (t*syn*nl tterm)))
(when intersect
(setf (term-req tterm)
(append (term-req tterm)
(make-rfrag
:terms (append (term-req sterm))
:esynl (append intersect)))))))
post content2 - match on syns from two contexts
This is used in the activity match, where the activity must
match an element from an entity list and an element from an
activity list
The input is two activity terms, which are normal terms with
the syn having two contexts — an activity and an entity context
an activity term has two context terms. Normally these are entity and
activity terms. While the positions are not typed the terms are.
This term returns the target term if a modification occurs
or nil if not
(defun conmatch2 (stermin ttermin)
(let ((sterm (eval stermin)) (tterm (eval ttermin)))
(let (
(s-ent nil) ;source entities terms from context list
(s-act nil) ;source activity terms from context list
(t-ent nil) ;target activity terms from context list
(t-act nil) ;target activity terms from context list
(a-intersect nil) ;activity intersection
(e-intersect nil) ;entity intersection
;first context term from first syn of source
(sfirstcon (first (syn-context (first (term-synl sterm)))))
206
;second context term from first syn of source
(ssecondcon (second (syn-context (first (term-synl sterm)))))
;first context term from first syn of target
(tfirstcon (first (syn-context (first (term-synl tterm)))))
;second context term from first syn of target
(tsecondcon (second (syn-context (first (term-synl tterm))))))
;;next cond sorts out which are entity and which are activity terms for
;; the source
(cond ((equalp (term-ttype sfirstcon) ’e)
(setf s-ent sfirstcon)
(setf s-act ssecondcon))
(t (setf s-ent ssecondcon)
(setf s-act sfirstcon)))
;-;next cond sorts out which are entity and which are activity terms for
;; the target
(cond ({equalp (term-ttype tfirstcon) ’e)
(setf t-ent tfirstcon)
(setf t-act tsecondcon))
(t (setf t-ent tsecondcon)
(setf t-act tfirstcon)))
;; intersection needs to be in terms of names
(if (and (term-p s-ent) (term-p t-ent))
(setf e-intersect (intersection (t*syn*nl s-ent)(t*syn*nl t-ent))))
(if (and (term-p s-act) (term-p t-act))
(setf a-intersect (intersection (t*syn*nl s-act)(t*syn*nl t-act))))
(cond ((and e-intersect a-intersect)
(setf (term-req tterm)
(append (term-req tterm) (list
(make-rfrag
:terms (term-req sterm)
:esynl e-intersect
:asynl a-intersect))))) ; if match return
;modified target term
(t nil))))) ; no match return nil
• • * s i c * s f c ★ * * j f c & s i c # * s i s ^ * * * * ^ * s f c ^ ^ 4 c
;;; Applies the database to the prd query kernel terms
;;; where prd stands for primary world, real dimension
207
j j | ****************************************************************
(defun prdcontent (datalist querylist)
;; if conmatch2 returns nil, then qterm is not modified and is added back
;; to output list, otherwise the result of conmatch2 is a modified
;; query term that is added to olist
;; outputs a new querylist
(let ((result nil) (olist nil))
(dolist (relation datalist querylist)
(dolist (qterm querylist t)
(setf result (conmatch2 relation qterm))
(if result (setf qterm result))))))
;;;DSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSD
;;; QBC Script Functions
;;;DSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSDSD
; ; ; **********1 ( 1********* ****** ************* **********************
;;; Entity Elaboration
;; ; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
(defun entity-elaboration (pdd db)
;;; given a primary world data dimension (pdd) and a list
;;; of database terms (db) this will discover the entity
;;; and then discover any additional attributes that
;;; go with the entity.
;;; returns pdd, modified or not
(print »%%%%%%mra%ra%%m%%%m%%%%m%mmra%%%%ra%m»)
(print "startsyn Dimension State Prior to Syn Discovery Elaboration")
(print
(dolist (aterm pdd t)
(ppterm (eval aterm)))
(print »%%%%%mm%%%%%ram%ra%%%%ra%%mmra%mm%ran)
(print "After Syn Elaboration")
(print
(syninfo-discover db pdd)
(dolist (aterm pdd t)
(instsyn (eval aterm)))
208
(dolist (aterm pdd t)
(ppterm (eval aterm)))
;(syninfo-discover db pdd)
(dolist (aterm pdd t)
(ppterm (eval aterm)))
;; (pprirtt (eval (first pdd)))
(print m % % % % % % % i% % % % % % u % % % % m m % % % % m % % % % % % % % % % % % % % % % % % % % ”)
(print "Dimension State After Syn Hypothesis Generation")
(print i f f l f f l f f l m f f l r a m m m f f l f f l s s f f i m f f l f f l f f l f f l " )
(setf pdd (gen-entity-hypothesis db pdd))
(terpri)
(dolist (aterm pdd t)
(pphypoth (term-hypothesis (eval aterm))))
;;(pprint (eval (first pdd)))
(print "m%rarara%mra%ram%%%mra%ra%%%%%%%%rara%")
(print "Syn Hypotheses Transformed into Syns")
(print m % % % % % v m % % % % % % % % % % % % % % % % % % % % % % % m % % % % % % % % % % % m % % % % ' ' )
(setf pdd (resolve-hypoth pdd))
(terpri)
(dolist (aterm pdd t)
(ppterm aterm))
;(pprint (eval (first pdd)))
(print
(print "After Syn Elaboration")
(print
;;(syninfo-discover (eval db) (eval pdd))
(dolist (aterm pdd t)
(instsyn (eval aterm))
(ppterm (eval aterm)))
;;(syninfo-discover (eval db) (eval pdd))
(print
(print "Discovery of Additional Syn Context")
(print n m m % v m m % % % % v m % % % m % v m % m % % % v m % % m % m v ' )
(terpri)
(setf pdd (context-discover db pdd))
(dolist (aterm pdd pdd)
(ppterm aterm)))
; ; j*************************************************************
;;; Real Dimension content elaboration
; ; ; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
(defun prd-elaboration (prd db)
;;; given a primary world real dimension (prd) and a list
;;; of database terms (db) this will elaborate the content
;;; of terms that have a match on both an entity syn and an activity syn
(print m % % t % % % i % % % % % % % i % % % % % % % % % % v m % % % % v m % % % % % % % % % % % % % % % % ' ' )
(print "startcon - Dimension State Prior to Content Elaboration")
(print n i % % m % % % % % % i % % v m v m % % % % % % % % v m % % % % % % v m % v m % i % % % > ' )
(dolist (aterm prd t)
(ppterm (eval aterm)))
(print ' ' m m m m r a m m m m m m m m m m m r a m " )
(print "Dimension State After Content Elaboration")
(print m % % % % % % % % % % % % % m % % % % % % % m % % % % % % m % % % % i % % % % % % i m % % % ” )
(setf prd (prdcontent db prd))
(terpri)
(dolist (aterm prd prd)
(ppterm (eval aterm))))
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Specification world to primary world real dimension
And then to primary world data dimension
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
210
;;; Utility function
* t V * | * 4 ^ 4 j j p 4 ^ 4 ^ 4 4 ^ 4 4 ^ 4 ^ • £ » 4 ^ 4 ^ j |^ ^ 4 ^ * 4 ^ 4 ^ ^ 4 ^ * j j g 4 ^ 4 ^ 4 ^ 4 ^ jjj 4 ^ 4 j | ^ 4 ^ 4 4 ^ ^ 4 ^ 4 ^ 4 l |» 4 ^ 4 ^ 4 ^ ^ 4 ^ 4 ^ ^ 4 4 ^ 4 ^
;;; Extract real new real dimension
;;; Given an elaborated world relationship
;;; This will extract the real dimension that
; ;; was elaborated.
;;; If will also prepare it for further processing by
;;; ensuring that all context names are lists.
(defun extractnrd (wrel)
;;; given a wrel, that has been entity-elaborated to discover
;;; a new ap, extracts the new ap.
(let (( pterm nil))
(setf pterm
(second (syn-context
(first (term-synl (eval (first wrel)))))))
(if (not (listp (syn-name (first (term-synl pterm)))))
(setf (syn-name (first (term-synl pterm)))
(list (syn-name (first (term-synl pterm))))))
pterm))
;;; Main specification to data dimension function
;;; spec-to-data
;;; This assumes that the new primary world is the second member of the
;;; list of context terms
(defun spec-to-data (spec realdb datadb)
;; realdb contains the agents that are to participate in
;; real world elaborations
;; datadb contains the database that are to participate in
;; retrieval of actual data from the data repository agents
(let ((srdi nil) (newrd nil) (srdla nil) (srd2 nil)
(srd3 nil) (prdd4 nil))
(setf srdi (entity-elaboration spec realdb))
; ; ; want to elaborate the new ap that was found, which is view*deep-sky
;;; the following code extracts the new context that was found.
(setf newrd (list (extractnrd srdi)))
211
;; This will discover additional context for the newly
;; discovered real dimension
(setf srdia (entity-elaboration newrd realdb))
;; This will discover any new rfrags for the newly
;; discovered real dimension
(setf srd2 (prd-elaboration srdla realdb))
;; This will extract the rfrags and use these as the
;; basis for data world elaboration
(setf srd3 (crfrag*term srd2))
(dolist (apterm srd3 nil)
(setf newdd (rfrag-terms apterm))
(setf pdd4 (entity-elaboration newdd realdb))
(f ilterpostcluster pdd4 datadb))))
Database Value Functions
; equalselrow
;******************#******************************************
; Equal select row
; This is a relational select based on an equality test
The following function works with both numberic and
non-numeric data since the test uses an equal,
input must be a list of rows, not a dbrel structure
Attribute number 0 is the first attribute.
212
(defun equalselrow (keyvalue attribnr database)
(let ((found nil))
(dolist (row database found)
(if (equal keyvalue (nth attribnr row))
(setf found (cons row found))))))
;;; rangeselrow
;;; Range select row
;; This is a relational select operation based on a range check.
;; The following function works with numeric data only
;; since it tests fox a range of values
;; input must be a list of rows, not a dbrel structure
;; Attribute number 0 is the first attribute.
(defun rangeselrow (lowvalue highvalue attribnr database)
(let ((found nil))
(dolist (row database found)
(if (and (>= highvalue (nth attribnr row))
(<= lowvalue (nth attribnr row)))
(setf found (cons row found))))))
;;; collectval
j • j $$ £ £4 c # $ > | c $
;;; Collects all of the values from one columne of the relation
(defun collectval (attribnr database)
;; assumes that the first attribute is attribute 0
;; The values from the last row are first on the returned list.
(let ((valist nil))
(dolist (row database valist)
(setf valist (cons (nth attribnr row) valist)))))
; quarsetup
Calling sequence
attribnr : the attribute number for which quartiles are to be
generated. The first attribute is attribute 0
relop : the relation that a more desirable element is
to have with respect to a least desirable
database : a list of tuples
level : the levels of recursion. Each level partitions the
set into two parts. Thus level 2 partitions the set
into four parts or quartiles.
sample call: (quarsetup 6 #*> (dbrel-db Tgalaxyl) 2)
As noted database, is a list of tuples
; This function returns the ranges of values for the first, second,
; third and fourth quartile.
; relop is the operator that is used to sort such that the most
: desirable value is first in the list
; vlist is value list
; levels is the number of levels of recursion that are to be run
; level 2 partitions list into quartiles
; the first attribute is attribute 0
; quasetup calls the recursive function quartileform
; Return list of number representing the values at each end of each
; partition. For example, the following output
; (10.6 9 9 8.7 8.5 8.1 6.9 3.7)
; would indicate the following partitioning:
; 10.6 to 9 first quartile
; 9 to 8.7 second quartile
; 8 .5 to 8.1 third quartile
; 6.9 to 3.7 fourth quartile
; Since these quartile end points are based on actual data, there may
; by caps between quartiles due to the fact that there was no data
; in that gap, as in the gap between 6.9 and 8.1, for which there
; was no data.
(defun quarsetup (attribnr relop database level)
(let ((quarlist nil) (vlist nil) (dbsort nil))
(when (not (null database))
(setf quarlist (quartileform 2 1 0 (i- (length database))))
(setf dbsort (sort (collectval attribnr database) relop))
(dolist (item quarlist vlist)
(setf vlist (cons (nth item dbsort) vlist)))
(setf vlist (reverse vlist)))))
214
;;; quartileform
;;; quartile formation
;; a recursive function that generates a list of indexes for
;;; partitioning a set. Level 2 partitions into quartiles.
;;; This function does not deal with the actual data, but with the
;;; number of items in the list and the quartile break-points fox
;;; a list with N items.
(defun quartileform (maxlevel currentlevel topi bottom2 )
(let ((top2 nil) (bottoml nil) (Isize (1+ (- bottom2 topi))))
(setf top2 (+ topi (1- (ceiling (/ lsize 2)))))
(setf bottoml (1+ top2))
;;(print "topi and top2 ")
;;(print topi)
;;(print top2)
;;(print "bottoml and bottom2 ")
;;(print bottoml)
;;(print bottom2)
;;(print "currentlevel")
;;(print currentlevel)
(cond ((= currentlevel maxlevel)
(list topi top2 bottoml bottom2)}
(t (append
(quartileform maxlevel
(1+ currentlevel) topi top2)
(quartileform maxlevel
(1+ currentleve1) bottoml bottom2 ))))))
• * *
;;; between
;;; returns true if a value is between two other values
(defun between (target number1 number2)
(if (or (and (<= target number1) (>= target number2))
(and (<= target number2) (>= target numberl))) t nil))
;;; quarcolset
;;; quartile column set
;;; This function replaces the column values with the quartile
;;; values for an entire column
(defun quarcolset (attribnr database quarlist)
;; assumes that the first attribute is attribute 0
(let ({val nil))
(dolist (row database val)
(setf val (nth attribnr row))
(cond ((between val (first quarlist)
(second quarlist))
(setf (nth attribnr row) ’1))
((between val (third quarlist)
(fourth quarlist))
(setf (nth attribnr row) ’2 ))
((between val (fifth quarlist)
(sixth quarlist))
(setf (nth attribnr row) '3))
((between val (seventh quarlist)
(eighth quarlist))
(setf (nth attribnr row) ’4))))))
;;; nonnumquarcolset
;;; quartile column set
;;; This function replaces the column values with the quartile
;;; values for an entire column
(defun nonumquarcolset (attribnr database quarlist)
;; assumes that the first attribute is attribute 0
;; quarlist contains the values that represent the various quartiles
(let ((val nil))
(dolist (row database val)
(setf val (nth attribnr row))
(cond ((equalp val (first quarlist))
(setf (nth attribnr row) '!))
((equalp val (second quarlist))
(setf (nth attribnr row) ’2 ))
((equalp val (third quarlist))
(setf (nth attribnr row) '3))
((equalp val (fourth quarlist))
(setf (nth attribnr row) ’4))))))
;;; trimkey
;;; trims off the leading key fields using nonkeystart to
;;; indicate where the keys stop
(defun trimkey (target nonkeystart)
(let ((projection nil))
(dolist (row target (reverse projection))
(setf projection (cons (nthcdr nonkeystart row)
projection)))))
• • j 5 j c * J f e ; f c ^ j f c * 5 ^ * $ s fc * s f ;
;;; relquargen
;;; relation quartile generate
;;; generate the quartiles for an entire relation, and projects out
;;; the key fields, so that only the non-key, quartile representations
;;; of the relation remain.
;;; it strips off of key fields and replaces the values in the
;;; relation with the quartile values
(defun relquargen (relation)
(let ((target nil) (projection nil)
(attribute (dbrel-nonkey relation))
(quarlist nil)
(acp (nthcdr (dbrel-nonkey relation) (dbrel-meta relation))))
;;acp is attribute cluster preference
(setf target (copy-tree (dbrel-db relation)))
(dolist (relop acp target)
(cond ((listp relop)
(nonumquarcolset attribute target relop))
(t (setf quarlist (quarsetup attribute relop target 2))
(quarcolset attribute target quarlist)))
(setf attribute (1+ attribute)))))
217
;;; quarvaluelist
;;;*************************************************************
;;; quartile value list
;;; generate the quartiles values for all nonkey attributes of a relation,
;;; derived from relquargen, but does not modify the database
(defun quarvaluelist (relation)
(let ((attribute (dbrel-nonkey relation))
(quarlist nil)
(finallist nil)
(target (dbrel-db relation))
(acp (nthcdr (dbrel-nonkey relation) (dbrel-meta relation))))
;;acp is attribute cluster preference
(dolist (relop acp (reverse finallist))
(covd ' (listp relop) (setf finallist (cons relop finallist)))
(t (setf quarlist (quarsetup attribute relop target 2 ))
(setf finallist (cons quarlist finallist))))
(setf attribute (1+ attribute))
)))
; ; ; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
; ; ; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;;; Gut set determination
;;; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;;; The following programs are used to determine the cut set
;; These programs will determine the display set given clusters for all
;; of the data in a database.
;; If also can be used to compare two sets to see the domainance
;; relationship that exists between them.
(defun dom-analysis (tobechecked list2)
;;This takes two lists and prepares a dominance analysis.
(let ((olist nil) (temp nil))
(dolist (element tobechecked olist)
218
(cond ((setf temp (dom element list2))
(setf olist (append temp olist)))
(t (setf temp (list element '* ■(**** *)))
(setf olist (cons temp olist)))))))
(defnn dom (cluster clist)
;; cluster is a cluster list of the form (1 2 1)
;; clist is a list of clusters of the form (( 1 1 1 ) (2 13))
;; if cluster dominates some member of the list or is dominated
;; by some member then that member is returned.
;; If cluster is not comparable to any members of the list then
;; nil is returned
(let ((olist nil) (temp nil))
(dolist (element clist olist)
(cond ((cluster” cluster element)
(setf temp (list cluster ’= element))
(setf olist (cons temp olist)))
((doml cluster element)
(setf temp (list cluster ’> element))
(setf olist (cons temp olist)))
((doml element cluster)
(setf temp (list cluster *< element))
(setf olist (cons temp olist)))
(t nil)))))
;;may not be needed of cluster-cluste eliminated
(defun domonly (cluster clist)
;; only looks at dominates of cluster over clist
;; cluster is a cluster list of the form (1 2 1)
;; clist is a list of clusters of the form (( 1 1 1 ) (2 13))
;; if cluster dominates dominates all members
;; then that cluster is returned.
;; If cluster is not comparable to any members of the list
;; then that is indicated
(let ((olist nil) (temp nil))
(dolist (element clist olist)
(cond ((doml cluster element)
(setf temp (list cluster ’> element))
(setf olist (cons temp olist)))
(t nil)))))
(defun cluster® (elementl element2)
;(print "here is elementl and 2 ")
;(print elementl)
;(print element2)
(cond ((or (null elementl) (null element2)) t)
((if (= (car elementl) (car element2 ))
(cluster® (cdr elementl) (cdr eleaent2 ))))
(t nil)))
(defun doml (elementl element2)
;(print "here is elementl and 2 ")
;(print elementl)
;(print element2)
(cond ((or (null elementl) (null elements)) t)
((if (or {< (car elementl) (car element2 ))
(= (car elementl) (car element2 )))
(doml (cdr elementl) (cdr element2 ))))
(t nil)))
(defun find-dom-set (dom-list original)
;;takes input from the results of dom-analysis and throws out all elements
;; that are dominated by something
;; builds list of things that are dominated by something, and then
;;; subtracts this list from original list used in dom-analysis
(let ((olist nil) (temp nil) (temp2 nil))
(setf temp (dolist (element dom-list olist)
(cond ((equalp (cadr element) ’<)
(setf olist (cons (car element) olist)))
(t nil))))
(setf temp2 (set-difference original temp))))
220
;;; getrange
;;; returns range of values given a quartile
;;; quartile is a single number that indicates the quartile
;;; value list is the list of values that define the
;;; various quartiles
;;; attribute is the number of the attribute
;;; attvals is the list of quartile values for each attribute
(defun getrange (attribute quartile attvals)
(let ((valuelist nil))
(setf valuelist (nth attribute attvals))
(cond ((> (length valuelist) 4)
; if the value list has only 4 elements then this is a non-numeric
; ordering
(case quartile
(1 (cons (first valuelist) (second valuelist)))
(2 (cons (third valuelist) (fourth valuelist)))
(3 (cons (fifth valuelist) (sixth valuelist)))
(4 (cons (seventh valuelist) (eighth valuelist)))))
(t
(case quartile
(1 (first valuelist))
(2 (second valuelist))
(3 (third valuelist))
(4 (fourth valuelist)))))))
;;; cutsettoval
;;; Transforms a cut set to its value ranges
;;; clusters are the list of clusters
;;; cvals are the list of cluster values, with the first attribute
;;; value list first, etc.
(defun cutsettoval (clusters cvals )
(let ((newvclusters nil) (attribute nil) )
(setf newclusters (copy-tree clusters))
(dolist (row newclusters newclusters)
(setf attribute 0 )
(dolist (element row row)
(setf (nth attribute row)
(getrange attribute element cvals))
(setf attribute (1+ attribute))))))
;(pprint (quarsetup 6 #*> (dbrel-db Tgalaxyl) 2))
;(setf quartile-db (copy-list (dbrel-db ?galaxy2)))
;(print quartile-db)
;(setf quartile-list (quarsetup 5 #"> (dbrel-db ?galaxy2) 2))
;(print "quartile-list")
;(print quartile-list)
rcolset S quartile-db quartile-list)
;(print quartile-db)
;(print "Here is ?galaxy2")
;(pprint ?galaxy2)
• * * j f c s f c j J c * s fc i f c j f c * s fc s k ){c * * 3 ( e s |c i j c j j c * sfc *
;;; gencluster
;;; Given a relation, generates the cutset.
;;; Prints
;;; The relations in terms of keys and clusters
;;; The cluster cut set
;;; The list of values that delineate the quartiles
;;; the cut set in terms of quartile, with the various
;;; clusters in the same order as presented in the cluster cutset
;;; printing.
(defun gencluster (relation)
(let ((rel2 (copy-tree relation))
(clusterswithkeys nil)
(clusters nil)
(domcluster nil)
(cvalist nil)
(cutsetvals nil)
(final-clusters nil))
(setf clusterswithkeys (relquargen rel2))
(terpri)
(princ "Clusters Associated with Keys in ")
(princ (dbrel-name relation))
(terpri)
(dbprint clusterswithkeys)
(setf clusters (trimkey clusterswithkeys (dbrel-nonkey relation)))
(setf domeluster (dom-analysis clusters clusters))
(setf final-clusters (find-dom-set domeluster clusters))
(terpri)
(terpri)
(princ "Cluster Cut-Set from ")
(princ (dbrel-name relation))
(terpri)
(print "Attributes represented by cut-set")
(priut (nthcdr (dbrel-nonkey relation) (dbrel-scbema relation)))
(dbprint final-clusters)
(setf cvalist (quarvaluelist relation))
(terpri)
(terpri)
(print "List of quartile values for each attribute")
(dbprint cvalist)
(terpri)
(terpri)
(princ "Cluster Cut-Set In Terms of Quartile Value Kanges from ")
(princ (dbrel-name relation))
(terpri)
(setf cutsetvals (cutsettoval final-clusters cvalist))
(dbprint cutsetvals)))
;;; Value test data
;(print "Galaxy 1")
;(gencluster Tgalaxyl)
;(print "Galaxy 2")
;(gencluster ?galaxy2)
;(print "Nebula 2")
;(gencluster ?nebula2)
;(print "Double Star")
;(gencluster ?doublestar)
;;; % % % % % % % % % % % % % % % % l % % % % % m % % l % % % % % % % % % % % % V m % % % % % % l % % % % % % % % % % % l
;;; % % % % % % v m % % % % % % % % % % % % % % i % % % % % % % % % i % % % % % % % % % % % m % v m % % m % % % % %
;;; Processing to post clusters based on solicitations
;;; % % % % % % % % % % % % % % m % % % % % % % % % % % % % % % % % % % % % % % % m % % % % % % % % % m m % % % %
J * ^*************&***#*********************************************
;;; Postfilter
; ; ; * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;;; This function posts a filter predicate on the specified
;;; attribute name for the specified syn.
;;; dimension is a particular level of the blackboard
;;; this function posts the filters in the dimension as a side effect
(defun postfilter (attributename syname lowvalue highvalue dimension)
(let ((targetsynlist nil))
(dolist
(pterm dimension nil)
(dolist
(synel (term-synl (eval pterm)) nil)
;;synel is a single syn
(when (and (or (equalp syname (syn-name synel))
(and (listp (syn-name synel))
(member syname (syn-name synel))))
;;above handles case is syn-name synel is
;; not a list
(member attributename
(s*ctx*nl synel)))
(dolist
(conterm (syn-context synel) nil)
;;conterm is a term of a syn context
(dolist (synelmt (term-synl conterm) nil)
(when
224
(equalp attributename
(syn-name syaelmt))
(setf
(syn-valuelow synelmt)
lowvalue)
(setf
(syn-valuehigh synelmt)
highvalue)))))))))
;;; Getfilterval
* * * 3§E <|» a§C ^ 9$« ^ •[» *$E E|C «f( ^ E 3|t )$E «$C sfE 3|C 3fC 3ft ^ E 9§E 2§E 9$E9 | C E$E SjE 5&E ^ E «|E ^ 3$E )}> E$C 9 ^ 9|C S$C 3 $ £ # | C «$C 3|C 9)C 9jt E|E »$• 9(E a|C E$C ^ E E}E Ejf 5^5 )|E 9{! E|E Ej* e|E 9|c e|!
;;; Given an attribute and syn, this function returns a list
;;; containing (valuelow valuehigh)
;;; attribute name for the specified syn.
;;; dimension is a particular level of the blackboard
;;; this function posts the filters in the dimension as a side effect
;;; Since filter triggers are stored in the same way as filter values,
;;; this function can also be used to retrieve these.
(defun getfilterval (attributename syname dimension)
(let ((result nil))
(dolist
(pterm dimension nil)
(dolist
(synel (term-synl (eval pterm)) nil)
;; synel is a single syn
(when (and (member syname (syn-name synel))
(member attributename
(s*ctx*nl synel)))
(dolist
(conterm (syn-context synel) nil)
;; conterm is a term of a syn context
;; check each syn for filter values
;; if more than one filter exists,
;; this will return the first found
(dolist
(synelmt (term-synl conterm) nil)
(when (equalp
attributename
(syn-name synelmt))
(setf lowvalue
225
(syn-valuelow
synelmt))
(setf highvalue
(syn-valuehigh
synelmt))
(if (null result)
(setf result
(list
lowvalue highvalue)))))))))
result))
;;; new-getfilterval
* • >Jc s fc j f c a f t # s |s 9 f t $ $ j f c $ s f c s f c j j c : f ; j f c j f c s f c % ♦ * s fc $ ♦ # ^ |e a { c s f c s f c s | c s fc £ J fc # & s f* ^ J fc
;;; used by filteragent
;;; protects against parameters not being lists
(defun newgetfilterval (attributename syname dimension)
(let ((result nil) (synlist nil) (names nil) (context nil)
(s-context nil))
(if (not (listp dimension)) (setf dimension (list dimension)))
(dolist
(pterm dimension nil)
(setf synlist (term-synl (eval pterm)))
(if (not (listp synlist)) (setf synlist (list synlist)))
(dolist
(synel synlist nil)
;; synel is a single syn
(setf names (syn-name synel))
(if (not (listp names)) (setf names (list names)))
(setf context (s*ctx*nl synel))
(if (not (listp context)) (setf context (list context)))
(when (and (member syname names)
(member attributename context))
(setf s-context (syn-context synel))
(if (not (listp s-context))
(setf s-context (list s-context)))
(dolist (conterm s-context nil)
;; conterm is a term of a syn context
;; check each syn for filter values
;; if more than one filter exists,
;; this will return the first found
(dolist
(synelmt (term-synl conterm) nil)
(when (equalp
attributename
(syn-name synelmt))
(setf lowvalue
(syn-valuelow
synelmt))
(setf highvalue
(syn-valuehigh
synelmt))
(if
(null result)
(setf result
(list
lowvalue highvalue)
»)»)))
result))
;;; Transform solicitation to capability — does not use filters
• * * a i t * * : * : * * : # !
;;; This function is invoked by the database to see if there
;;; exists any solicitation to which the database agent can
;;; respond. If such a solicitation exists, the agent
;;; presents display clusters to the user
(defun postcluster (pdd db)
(let ((relation nil) (synnamelist nil) (synndb nil) (synpdd nil)
(dolist
(pddterm pdd nil)
(dolist
(singledb db nil)
(setf synndb (t*syn*nl (eval singledb)))
(setf synpdd (t*syn*nl (eval pddterm)))
(setf synnamelist
(setmember (t*syn*nl (eval singledb))
(t*syn*nl (eval pddterm))))
(cond ((and (setf synnamelist
(setmember (t*syn*nl (eval singledb))
(t*syn*nl (eval pddterm)))))
(setf synlist
(postcluster*n*syn*tl synnamelist
(eval singledb)))
(dolist (selenient synlist nil)
(if (setf relation (syn-relation selement))
(gencluster relation)))))))))
;;this is the point to extract the filter and then filter
;; the data, but first see if can just get the data printed.
* * * 3$c 3^* njc j$ c nfc 3 $c n|c sfc n|c 9)c s|c s|c ^ ^ ^ ^ ^ ^ ^ ^ ^ a^» >|» ^
;;; Transform solicitation to capability with filters
;;; This function is invoked by the database to see if there
;;; exists any solicitations to which the database agent can
;;; respond. If such a solicitation exists, the agent extracts
;;; any filters and then presents display clusters to the user
;;; pdd is primary world data dimension query kernel
;;; db is the agent databases
(defun filterposteluster (pdd db)
(let ((relation nil) (synnamelist nil) (synndb nil) (synpdd nil)
(attrilist nil) (attrel nil) (filterl nil) (newrel nil))
(dolist
(pddterm pdd nil)
(dolist
(singledb db nil)
(setf synndb (t*syn*nl (eval singledb)))
(setf synpdd (t*syn*nl (eval pddterm)))
(setf synnamelist
(setmember (t*syn*nl (eval singledb))
(t*syn*nl (eval pddterm))))
(cond ((and (setf synnamelist
(setmember (t*syn*nl (eval singledb))
(t*syn*nl (eval pddterm)))))
(setf synlist
(postcluster*n*syn*tl synnamelist
(eval singledb)))
(dolist
(selement synlist nil)
;selement is a syn
(when
(setf relation
(syn-relation selement))
; ;the following is necessary to get a pass by
;; value effect, copy-tree does not seem to
;; do this, so the following is required
;; otherwise the database is permanently altered
(setf newrel (aake-dbrel
:name (dbrel-name relation)
.'schema (dbrel-schema relation)
monkey (dbrel-nonkey relation)
:meta (dbrel-meta relation)
:db (dbrel-db relation)))
(setf attrilist (dbrel-schema newrel))
(dolist
(attrel attrilist nil)
(setf filterl
(getfilterval attrel
(dbrel-name
newrel)
pdd) )
(when (or
(first filterl)
(second filterl))
(setf (dbrel-db newrel)
(rangeselrow (first filterl)
(second filterl)
(position attrel attrilist)
(dbrel-db
newrel)))) ))
(gencluster newrel))))))))
; ; ; Filters
• • j * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
• | j * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
• * ***********************************************
;;; Applies RA filter based on UT, which can
;;; be taken as local standard time plus or minus
;;; and hour.
;;; The filter calclulates the ra for the UT and then
;;; uses this as the minium ra. The maximum ra is
;;; this initial number plus six.
(defun rafilter (yyyy mm dd ut dimension)
(let ((ralow nil) (rahigh nil))
(setf ralow (sidereal-time yyyy mm dd ut))
(setf rahigh (+ ralow 6 ))
(p^stfiltor 'ra-hr ’galaxy ralow rahigh dimension)
ralow))
;;; Same as rafilter, except is assumes that its
;;; input is provided on a list in order to work with
;;; filteragent, the general agent filter function
(defun rafilter2 (yyyy mm dd ut dimension)
(let ((ralow nil) (rahigh nil))
(setf ralow (sidereal-time yyyy mm dd ut))
(setf rahigh (+ ralow 6 ))
(postfilter ’ra-hr ’galaxy ralow rahigh dimension)
ralow))
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;;; filter agent
• * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
;;; This function represents a general filter agent.
;;; If first checks to see if there exist any solicitations
;;; for which it has a filter. If so, it then checks to see
;;; if there exists any triggers. If not, it posts a
;;; solicitation for a trigger on the trigger list.
;; If it finds a trigger either on the trigger database or in
;; the dimension, then it extracts the trigger value,
; ; computes the trigger function and then posts the results
; ; on the approprate attribute.
;; t-attribute is the attribute from which the trigger will be taken
;; t-syn is the name of the syn from which the trigger will come
;; s-attribute - solicitation attribute is the name of the attribute
;; to which the effect of the trigger will be applied
;; s-synname is the name of the syn to which the effect will be
;; applied
;; f-function is the function that maps trigger values to solicitation
;; filter values.
;; t-dim is the dimension at which the filter agent is looking
;; for a trigger
;; s-dim is the dimension in which the agent will post a filter if
;; if finds one.
;; f-list is the list of filter trigger solicitations
;; If solicitation to which filter applies exists, and trigger context
;; exists, then posts filter on trigger context and returns nil.
;; If solicitation to which filter applies exists, but trigger context
;; does not exist, then adds trigger to global trigger list
(defun filteragent (t-att t-synname s-att
s-synname f-function t-dim s-dim f-list)
(let ((f-triggl nil) (f-trigg2 nil) (solexists nil) (f-arg nil)
(synnames nil)
(context nil) (result nil) (synlistl nil))
;;; first see if an approprate solicitation exists, if not, then exit.
;;; This filter agent assumes that only a single value is used
;;; as the filter trigger
;(print "entering agentfilter")
(setf solexists
(dolist (pterm t-dim result)
(setf synlistl (term-synl (eval pterm)))
(if (not (listp synlistl)) (setf synlistl (list synlistl)))
; (print "is synlistl a list")
; (pprint (listp synlistl))
(dolist (synel synlistl result)
;;synel is a single syn
231
(setf synnames (syn-name synel))
(if (not (listp synnames))
(setf synnames (list synnames)))
(setf context (s*ctx*nl synel))
(if (not (listp context))
(setf context (list context)))
(if (and (member s-synname synnames)
(member s-att context))
(setf result t)))))
(when solexists
;; there is a solicitation that the filter matches
;; now it must see if there is a trigger, or
;; post a solicitation for a trigger
(setf f-triggl (newgetfilterval t-att t-synname s-dim))
(cond ((or (setf f-trigg
(newgetfilterval t-att t-synname s-dim))
(setf f-trigg
(newgetfilterval t-att t-synname f-list)))
;; f-trigg exists in dimension or on the trigger list
;; extract it
;; assume only the first value is used for the trigger value
(setf f-arg (first f-trigg))
(funcall f-function (first f-arg) (second f-arg)
(third f-arg) (fourth f-arg) s-dim))
(t ; no triggers exist, post solicitation for one
(setf *f-list*
(cons (make-term
:ttype ’f-sol-user
:synl (list (make-syn
:name t-synname
:context
(list (make-term
:ttype *e
:synl (list (make-syn
:name t-att)))))))
*f-list*))
)))))
(defun dateuserinterface (dimension)
;;; This function looks for filter trigger value solicitations and
;;; requests these of the user
(let ((result nil) (year nil) (yyy nil) (mm nil) (dd nil) (hh nil)
(time nil) (ut nil))
(dolist
(pterm dimension nil)
(when (and (equalp (term-ttype (eval pterm)) ’f-sol-user)
(dolist
(synel (term-synl (eval pterm)) result)
;; synel is a single syn
(if (not (listp (syn-name synel)))
(setf (syn-name synel)
(list (syn-name synel))))
(if (or (member ’date (syn-name synel))
(member ’date (s*ctx*nl synel)))
(setf result t))))
(print "provide four digit year yyyy")
(setf yyyy (read *terminal-io*))
(print yyyy)
(print "provide two digit month mm")
(setf mm (read *terminal-io*))
(print mm)
(print "provide two digit day dd”)
(setf dd (read *terminal-io*))
(print dd)
(print "provide universal time in two digit 24 hour form hh")
(setf hh (read *terminal-io*))
(print hh)
(postfilter ’date ’date
(list yyyy mm
dd hh )
nil dimension)))))
233
;;; Query Kernels
; ;; Test term
(setf termvl (make-term
:unspecified t
: ttype ’e
:synl
(list
(make-syn
:name ’(unspecified)
:context
(list (make-term
:synl (list (make-syn :name ’magnitude)))
(make-term
:synl (list (make-syn :name ’size)))
)))))
(setf terml (make-term
:unspecified t
:ttype ’e
:synl
(list
(make-syn
:name ’(unspecified)
:context
(list (make-term
:synl (list (make-syn :name ’magnitude)))
(make-term
:synl (list (make-syn :name ’size)))
)))))
(setf term3 (make-term
:unspecified t
:ttype ’e
:synl
(list
(make-syn
:name ’ (unspecified)
:context
234
Clist (make-term
:synl (list (make-syn :name ’magnitude)))
(make-term
:synl (list (make-syn :name ’period)))
(make-term
:synl (list (make-syn :name ’size)))
)))))
(setf term4 (make-term
:txnspecified t
:ttype ’e
:synl
(list
(make-syn
:name ’(astro-objeet)
:context
(list (make-term
:synl (list (make-syn :name ’magnitude)))
(make-term
:synl (list (make-syn :name ’size)))
)))))
(setf termS (make-term
.unspecified t
:ttype ’e
: synl
(list
(make-syn
rname ’(galaxy)))))
(setf prdterml (make-term
:unspecified t
:ttype ’ap
:synl
(list
(make-syn
:name ’(unspecified)
:context
(list (make-term
:synl
(list
235
(make-syn
:name ’view
))))))))
(setf prdterm2 (make-term
:ttype ’ap
:synl (list
(make-syn
:name 'photograph*deep-space
:context
(list
(make-term
:ttype ’ac
: synl
(list (make-syn
:name ’photograph)})
(make-term
:ttype ’e
:synl (list
(make-syn
:name ’deep-space))))))))
(setf srdtermi
(make-term
:ttype ’wr
: synl
(list
(make-syn
:name ’(use-to-instrument)
:context
(list (make-term
:synl (list
(make-syn
:name ’astro-instrument
))»»»
(setf testsynl
(list
(make-syn :name ’telescope
;propagate
(list 'refractor 'cassegrain ’schmidt-cassegrain
’maksutov))
(make-syn :name ’refractor
:propagate (list ’celestron))
(make-syn :name ’celestron
:propagate (list ’spotting))
(make-syn :name ’cassegrain
:propagate (list ’celestron))
(make-syn :name ’schmidt-cassegrain
:propagate (list ’telescope))
(make-syn :name ’spotting
:propagate (list ’birdwatcher))
(make-syn :name ’birdwatcher
:propagate (list ’cheap-birdwatcher))
>)
(setf testsynl2
(list
(make-syn :name ’telescope
•.inherit (list ’instrument))
(make-syn :name ’refractor
:inherit (list ’telescope))
(make-syn :name ’instrument
:inherit (list ’things))
(make-syn :name ’cassegrain
:inherit (list ’telescope))
(make-syn tname ’schmidt-cassegrain
•.inherit (list ’cassegrain))
(make-syn :name 'maksutov ;use to test loops
:inherit (list ’schmidt-cassegrain))
))
(setf testterml (make-term
:ttype ’ap
: synl
(list (make-syn
:name ’ view*deep- space
:context
(list ?view
?deep~space) ) )
))
(setf testsyn3
(list
(make-syn
:name 'telescope
:propagate (list ’refractor 'cassegrain ’schmidt-cassegrain
'maksutov) }
(make-syn
:name 'refractor
:propagate (list ’celestron))
(make-syn
:name 'celestron
:inherit (list 'refractor ’schmidt-cassegrain))))
(setf testterm2 (make-term
:ttype ’e
:synl testsyn3))
;;; QBC Test Sequences
(print "iililillliililililllillllllliillilllllllllllllllllllll")
(print "Data World Entity Elaboration")
(print "Starting with unspecified entity with context of")
(print "magnitude and size")
(print "liilllllllllllllllllllllllilllllllllllllllllllllllilli")
(setf terml (make-term
:unspecified t
:ttype ’e
: synl
(list
(make-syn
238
:name ’(unspecified)
:context
(list (make-term
:synl (list (make-syn :name 'magnitude)))
(make-term
:synl (list (make-syn :name ’size)))
)))))
(setf *kernellpdd* ’(terml))
(setf *db* ’(?galaxy Tnebula ?double-star ?double-star2
?variable-star ?star ?variable-star2))
(entity-elaboration *kernellpdd* *db*)
(print ' ' 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2")
(print "Primary world, real dimension elaboration")
(print "Starts with an unspecified activity primitive")
(print "with context view and tries to find an activity")
(print " primitive that fits")
(print "2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2")
(setf prdterml (make-term
runspecified t
:ttype ’ap
: synl
(list
(make-syn
:name ’(unspecified)
:context
(list (make-term
: synl
(list
(make-syn
:name ’view
))))))))
(setf *kernel2prd* ’(prdterml))
(setf *prddb* ’(?galaxy Tnebula ?double-star ?double-star2
?variable-star ?star ?variable-star2
?view*deep-space ?photograph*deep-space))
(entity-elaboration *kernel2prd* *prddb*)
(print "333333333333333333333333333333333333333333333333333333")
(print "Starts with photograph*deep-space and finds the data")
(print " requisition that should go with this")
(print "333333333333333333333333333333333333333333333333333333")
(setf prdterm3 (make-terra
:ttype *ap
rsynl (list
(make-syn
:name ’photograph*deep-space
:context
(list
(make-term
:ttype ’ac
; synl
(list (make-syn
-.name ’photograph)))
(make-term
:ttype ’e
:synl (list
(make-syn
:name ’deep-space))))))))
(setf *kemel3prd* ' (prdterm3))
(setf *prddb* ’(?galaxy ?nebula ?double-star ?double-star2
?variable-star ?star ?variable-star2
?view*deep-space ?photograph*deep-space))
(prd-elaboration *kernel3prd* *prddb*)
(print "44444444444444444444444444444444444444444444444444444")
(print "Starts with an unspecified activity primitive with context")
(print "view and tries to find an activity primitive that fits")
(print "Then it finds the data requisition for the activity")
(print "primitive that it finds,")
(print "44444444444444444444444444444444444444444444444444444")
(setf prdterm4 (make-term
•.unspecified t
:ttype ’ap
: synl
(list
(make-syn
:name ’(unspecified)
;context
(list (make-term
:synl
(list
(make-syn
:name ’view
)»))>)>
(setf *kernel4prd* ’(prdterm4))
(setf *prddb* *(?galaxy ?nebula ?double-star ?double-star2
?variable-star ?star ?variable-star2
?view*deep-space ?photograph*deep-space
?view*deep-sky))
(setf *prddb2* ’(?galaxy ?nebula ?double-star))
(setf *prdl* (entity-elaboration *kernel4prd* *prddb*))
(setf *prd2* (prd-elaboration *prdl* *prddb*))
(setf *prd3* (rfrag-terms (first (term-req (eval (first *prd2*))))))
(setf *prd4* (entity-elaboration *prd3* *prddb*))
(postfilter ’ra-hr ’galaxy 1 5 *prd4*)
;(postfilter ’ra-hr ’galaxy 9 15 *prd4*)
(filterpostcluster *prd4* *prddb2*)
(print "5555555555555555555555555555555555555555555555555555")
(print "Data World Entity Elaboration to Display Clusters")
(print "Starting with unspecified entity with context of")
(print "magnitude and size")
(print "5555555555555555555555555555555555555555555555555555")
(setf termB (make-term
•.unspecified t
:ttype * e
: synl
(list
(make-syn
:name '(unspecified)
:context
(list (make-term
:synl (list (make-syn :name ’magnitude)))
(make-term
:synl (list (make-syn :name ’size)))
)))))
(setf *kernellpdd* ’(term5))
(setf *db* ‘(?galaxy ?nebula ?double-star))
(setf pdd (entity-elaboration *kernellpdd* *db*))
; March 23, 1991 at 21 local time
(terpri)
(terpri)
(setf low (rafilter 1991 3 23 21 pdd))
(princ "RA =")
(print low)
(princ "RA + 6 hours - range of RA of interest")
(print (+ low 6))
(filterpostcluster pdd *db*)
(print " 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6")
(print "Specification world to data world")
(print "Beginning with astro-instrument and w-relationship")
(print "use-to-instrument will find view*deep*sky as")
(print " a use of astro-instrumnet")
(print " Then pulls data based on requisition")
(p r in t ”66666666666666666666666666666666666666666666666666")
(setf srdterm6
(make-term
:ttype ’wr
:synl
(list
(make-syn
:name ’ (use-to-instrument)
:context
(list (make-term
:synl (list
(make-syn
:name ’astro-instrument
)))»)))
(setf *kernel6srd* ’(srdterm6))
(setf *db* ’(?galaxy ?nebula ?double-star
?view*deep-sky ?photograph*deep-space
?use-to-instrument
?view*deep-space))
(setf *db2* ’(Tgalaxy ?nebula ?double-star))
(spec-to-data *kernel6srd* *db* *db2*)
(print "77777777777777777777777777777777777777777777777777777")
(print "Data World Entity Elaboration")
(print "Starting with unspecified entity with context of")
(print "magnitude and size and period — results in Galaxy")
(print "which has no period, but has magnitude and size and nebula")
(print "which is similar to galaxy, and variable star which")
(print "has peroid and magnitude, but not size since stellar object")
(print "This tests QBC’s ability to make selections which")
(print "do not satisfy all of the constraints.")
(print "777777777777777777777777777777777777777777777777777777")
(setf term7 (make-term
:unspecified t
:ttype ’ e
: synl
243
(list
(make-syn
:name ’(unspecified)
:context
(list (make-term
:synl (list (make-syn :name ’magnitude)))
(make-term
tsynl (list (make-syn :name ’period)))
(make-term
:synl (list (make-syn :name ’size)))
)))))
(setf *kernel7pdd* ’(term?))
(setf *db* ’(?galaxy ?nebula ?double-star ?double-star2
?variable-star ?star ?variable-star2))
(entity-elaboration *kernel7pdd* *db*)
(print "8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8")
(print "Data World Entity Elaboration")
(print "Starting with constrained entity of type astronomical object")
(print "with context of magnitude and size")
(print "This tests QBCs capability to deal with constrained type.")
(print "8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8")
;;; This will find additional syns, but the syns may or may not
;;; have magnitude and size. The while this has a negative impact
;;; on precision, it is good for recall and ensures that no option
;;; will be missed. A new function could be implemented to garbage
;;; collect all syns that do not satisfy the initial constraints, but
; ; ; this is left as a future extension.
(setf term8 (make-term
.unspecified t
:ttype ’e
: synl
(list
(make-syn
:name ’ (astro-object)
:context
(list (make-term
rsynl (list (make-syn :name ’magnitude)))
244
(make-term
:synl (list (make-syn :name ’size)))
)))))
(setf *kernel8pdd* ’(term8))
(setf *db8* ’(?astro-object))
(entity-elaboration *kerael8pdd* *db8 *)
(print '*999999999999999999999999999999999999999999999999999999")
(print "test of finding additional context for specified term")
(print "999999999999999999999999999999999999999999999999999999")
* • self te rro '3 (s n a k e -te rm
:unspecified t
:ttype ’e
: synl
(list
(make-syn
;name '(galaxy)))))
(setf *kernel9pdd* ’(term9))
(setf *db9* ’(?galaxy))
(entity-elaboration *kernel9pdd* *db9*)
(print "1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0")
(print "Data World Entity Elaboration to Display Clusters")
(print "Starting with unspecified entity with context of")
(print "magnitude and size")
(print "Filters imposed and filteragent requests dates from")
(print "User to satisfy filter trigger")
(print "1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0")
(setf termlO (make-term
:unspecified t
:ttype ’e
; synl
(list
(make-syn
:name ’(unspecified)
:context
(list (make-term
:synl (list (make-syn -.name ’magnitude)))
(make-term
:synl (list (make-syn :name ’size)))
)))))
(setf *kernelipdd* ’(termlO))
(setf *dblO* ’(?galaxy ?nebula ?double-star ?variable-star))
(setf pdd (entity-elaboration *kernellpdd* *dblO*))
(setf *f-list* nil)
(filteragent ’date 'date ’ra-hr ’galaxy #'rafilter
*dblO* *dblO* *f-list*)
; March 23, 1991 at 21 local time has significant number of galaxies
; October 10, 1991 at 21 local time has no galaxies
;; Looks for request for data information in pdd
(dateuserinterface pdd)
;; Looks for request for date information on *f-list*
(dateuserinterface *f-list*)
(filteragent ’date ’date ’ra-hr ’galaxy #’rafilter
pdd pdd *f-list*)
(filterpostcluster pdd *db*)
Appendix D
Prototype Results
This section shows a single, complete QBC run in the data dimension. It be
gins with a query kernel consisting of an unspecified entity with the context of
magnitude and size. QBC derives galaxy and nebula as the types th at satisfy
these constraints. A galaxy filter agent, observing th at a term of type galaxy
with context RA exists, posts a solicitation for a date and time with which to
activate the filter. A user interface agent, seeing this solicitation, requests a date
and time from the QBC user. QBC then goes on to present the lattice-based
clusters that are associated w ith galaxy and nebula d ata associated with this
elaboration. This is test run 10 of the QBC listing contained in appendix C.
" 101010101010101010101010101010101010101010101010"
"Data World Entity Elaboration to Display Clusters"
"Starting with unspecified entity with context of"
"magnitude and size"
"Filters imposed and filteragent requests dates from"
"User to satisfy filter trigger"
"loioioioioioioioioioioioioioioioioioioioioioioio"
"startsyn Dimension State Prior to Syn Discovery Elaboration"
Syn name = (UNSPECIFIED)
Context Term
Syn name = MAGNITUDE
Context Term
Syn name = SIZE
"After Syn Elaboration"
i f f l m m f f l m m f f l B f f l f f l f f l f f l f f l f f l m r a i m f f l "
247
Syn name = (UNSPECIFIED)
Context Term
Syn name ** MAGNITUDE
Context Term
Syn name = SIZE
Syn name = (UNSPECIFIED)
Context Term
Syn name » MAGNITUDE
Context Term
Syn name = SIZE
"Dimension State After Syn Hypothesis Generation"
"rm ra m ra m % % rm % m % m m % ra ra % r/x m ra % % % % "/."
(VARIABLE-STAR)
Context Term
Syn name = MAGNITUDE
(DOUBLE-STAR)
Context Term
Syn name = MAGNITUDE
(NEBULA)
Context Term
Syn name = SIZE
Context Term
Syn name = MAGNITUDE
(GALAXY)
Context Term
Syn name = SIZE
Context Term
Syn name = MAGNITUDE
"Syn Hypotheses Transformed into Syns"
inmmmmrammmmmmnmmmmmu"
Syn name = (NEBULA)
Context Term
Syn name = SIZE
Context Term
Syn name = MAGNITUDE
Syn name = (GALAXY)
Context Term
Syn name = SIZE
Context Term
Syn name = MAGNITUDE
Syn name = (UNSPECIFIED)
Context Term
Syn name = MAGNITUDE
Context Term
;?yn saint, SIZE
n%ramm%%%%%%%%ram%rarararara%ra%m%%ra%H
"After Syn Elaboration"
Syn name •* (NEBULA)
Context Term
Syn name = SIZE
Context Term
Syn name = MAGNITUDE
Syn name “ (GALAXY)
Context Term
Syn name = SIZE
Context Term
Syn name = MAGNITUDE
Syn name *= (UNSPECIFIED)
Context Term
Syn name = MAGNITUDE
Context Term
Syn name = SIZE
"Discovery of Additional Syn Context"
H tt/o/o/o/tt/0/0/0/ 6/o/o/o/o/ 6/o/e/e/o/ 6/o/ 0/o/t/i/e/©/#/«v«/#/a/*/e/e/0/0/0/o/a/V0/ W W W V V W V W V l #
h h h h k n k h h n h k h n h h h h k h h h h h h h h h h n h k n h k n n n h k h h h h h h h h h h h n k h h
Syn name = (NEBULA)
Context Term
Syn name = SIZE
Context Term
Syn name = MAGNITUDE
Context Term
Syn name = DISTANCE
Context Term
Syn name = SHAPE
Context Term
Syn name = SURFACEBRIGHTNESS
Syn name = MAGNITUDE
Context Term
Syn name = DEC
Syn name = DEC-DEG
Context Term
Syn name = DEC
Syn name = DEC-DEG
Context Term
Syn name = RA
Syn name = RA-HR
Context Term
Syn name = RA
Syn name = RA-HR
Context Term
Syn name = NGC
Syn name = (GALAXY)
Context Term
Syn name = SIZE
250
Context Term
Syn name = MAGNITUDE
Context Term
Syn name = SHAPE
Context Term
Syn name = DISTANCE
Context Term
Syn name = SURFACE-BRIGHTNESS
Syn name = MAGNITUDE
Context Term
Syn name ** DEC-MIN
Context Term
Syn name = DEC
Syn name = DEC-DEG
Context Term
Syn name = DEC
Syn name = DEC-DEG
Context Term
Syn name = RA-MIN
Context Term
Syn name = RA
Syn name = RA-HR
Context Term
Syn name = RA
Syn name * RA-HR
Context Term
Syn name « NGC
Context Term
Syn name = MESSIER
Syn name = (UNSPECIFIED)
Context Term
Syn name = MAGNITUDE
Context Term
Syn name = SIZE
"provide four digit year yyyy" 1991
1991
"provide two digit month mm" 03
3
"provide two digit day dd" 23
23
"provide universal time in two digit 24 hour form hh"
21
Clusters Associated with Keys in GALAXY
(M109 3992 11 57.700000000000003 53 22 3 2 3 1 1)
(M108 3556 11 11.6 55 40 3 1 3 2 1)
(M106 4258 12 19.0 47 18 1 2 1 3 1)
(M102 5866 15 6.5 55 48 4 1 4 2 3)
(M101 5457 14 3.5 54 21 1 4 1 1 1)
(M94 4736 12 50.899999999999999 41 7 1 1 3 3 1)
(M63 5055 13 15.800000000000001 42 2 3 2 2 3 1)
(M51 5194 13 29.899999999999999 47 12 1 2 2 3 1)
Cluster Cut-Set from GALAXY
"Attributes represented by cut-set"
(MAGNITUDE SURFACE-BRIGHTNESS SIZE DISTANCE SHAPE)
(3 2 3 1 1)
( 3 1 3 2 1 )
(12 13 1)
(14 111)
(113 3 1)
"List of quartile values for each attribute"
( 8 8.5 8.5 8.5 9 10 10 10.5)
(9 9 9 10 10 10 10 11)
(20 12 10 10 8 6 6 3)
(40 40 30 30 20 20 20 20)
(S E L 1)
Cluster Cut-Set In Terms of Quartile Value Ranges from GALAXY
((9 . 10) (9 . 10) ( 8 . 6) (40 . 40) S)
((9 . 10) (9 . 9) ( 8 . 6) (30 . 30) S)
( ( 8 . 8.5) (9 . 10) (20 , 12) (20 . 20) S)
( ( 8 . 8.5) (10 . 11) (20 . 12) (40 . 40) S)
( ( 8 . 8.5) (9 . 9) ( 8 . 6 ) (20 . 20) S)
Clusters Associated with Keys in NEBULA
1 > "•’.3 22 1 3234241 2)
(M8 6523 18 3.7000000000000002 -24 23 1 3 1 4 1 4 1 1)
(M16 6611 18 18.899999999999999 -13 47 1 2 2 4 1 4 1 1)
(M17 6618 18 20.800000000000001 -16 10 1 3 1 4 1 4 1 l)
(M20 6514 18 2.3999999999999999 -23 2 2 3 2 4 1 4 1 1)
(M27 6853 19 59.600000000000001 22 43 2 1 3 3 4 1 4 4)
(M42 1976 5 35.299999999999997 -5 23 1 1 1 4 1 4 1 3)
(M43 1982 5 35.5 -5 16 3 3 2 4 1 4 1 3)
(M57 6720 18 53.600000000000001 33 23141414 3)
(M76 650 1 42.200000000000003 51 34 4 2 4 3 4 1 4 1)
(M97 3587 11 14.9 55 14332414 3)
Cluster Cut-Set from NEBULA
"Attributes represented by cut-set”
(MAGNITUDE SURFACE-BRIGHTNESS SIZE PLAN-SHAPE DIFF-SHAPE PLAN DIFF
DISTANCE)
( 1 3 1 4 1 4 i 1)
Cl 2 2 4 1 4 1 1)
( 1 3 1 4 1 4 1
1)
( 2 1 3 3 4 1 4 4)
( 1 1 1 4 1 4 1 3)
(3 1 4 1 4 1 4 3)
(4 2 4 3 4 1 4 1)
(4 3 3 2 4 1 4 3)
253
"List of quartile values for each attribute"
(4 6 6 8 9 9 11 11)
(6 8 9 9 10 10 10 10)
(60 40 25 12 8 3 2.5 1.5)
(R D A 0)
(EM FI R 0)
(P P P 0)
(D D D 0)
(6000 5000 5000 4000 3000 1500 1500 1000)
Cluster Cut-Set In Terms of Quartile Value Ranges from NEBULA
((4 . 6) (10 . 10) (60 . 40) 0 EM 0 D (6000 . 5000))
((4 . 6) (9 . 9) (25 . 12) 0 EM 0 D (6000 . 5000))
((4 . 6) (10 > 10) (60 . 40) 0 EM 0 D (6000 . 5000))
((6 . 8) (6 . 8) (8 . 3) A 0 P 0 (1500 . 1000))
((4.6) (6 . 8) (60 . 40) 0 EM 0 D (3000 . 1500))
((9 . 9) (6 . 8) (2.5 . 1.5) R 0 P 0 (3000 . 1500))
((11 . 11) (9 . 9) (2.5 . 1.5) A 0 P 0 (6000 . 5000))
((11 . 11) (10 . 10) (8 . 3) D 0 P 0 (3000 . 1500))
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11257164
Unique identifier
UC11257164
Legacy Identifier
DP22818