Close
USC Libraries
University of Southern California
About
FAQ
Home
Collections
Login
USC Login
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Folder
An automatic programming approach to high level program monitoring and measuring
(USC Thesis Other) 

An automatic programming approach to high level program monitoring and measuring

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Copy asset link
Request this asset
Request accessible transcript
Transcript (if available)
Content AN AUTOM ATIC PROGRAM M ING APPROA CH TO HIGH LEVEL
PRO G RA M M ONITORING AND MEASURING
by
Yingsha Liao
A Dissertation Presented to the
FACULTY OF TH E GRADUATE SCHOOL
U NIVERSITY OF SOUTHERN CALIFORNIA
In P artial Fulfillment of the
Requirem ents for th e Degree
D O C TO R O F PHILOSOPHY
(Com puter Science)
May 1992
Copyright 1992 Yingsha Liao
I
UMI Number: DP22850
All rights reserved
INFORMATION TO ALL U SERS
The quality of this reproduction is d ep en d en t upon the quality of the copy subm itted.
In the unlikely event that the author did not sen d a com plete m anuscript
and th ere are m issing pag es, th e se will be noted. Also, if m aterial had to be rem oved,
a note will indicate the deletion.
UMI DP22850
Published by P roQ uest LLC (2014). Copyright in the Dissertation held by the Author.
Microform Edition © P roQ uest LLC.
All rights reserved. This work is protected against
unauthorized copying under Title 17, United S ta tes C ode
P roQ uest LLC.
789 E ast Eisenhow er Parkway
P.O. Box 1346
Ann Arbor, Ml 4 8 1 0 6 -1 3 4 6
UNIVERSITY OF SOUTHERN CALIFORNIA
THE GRADUATE SCHOOL
UNIVERSITY PARK
LOS ANGELES, CALIFORNIA 90007
This dissertation, written by
under the direction of h.i.S.  D issertation
Committee, and approved by all its members,
has been presented to and accepted b y The
Graduate School, in partial fulfillm ent of re­
quirem ents for the degree of
Ph.D.
CpS
“92
Yingsha Liao
D O C TO R OF PHILOSOPH Y
Dean o f G raduate Studies
Date 9 9 2
DISSERTATION COMMITTEE
To m y parents,
Shengyu Liao and Songbi W an
and my brother and sisters,
Yiping Liao, Yihong Liao, Xiangyan Liao
ii
Acknowledgments
I am very grateful to my advisor Bob Balzer for his encouragem ent and support
throughout my years at ISI. His guidance was essential in teaching me how to con­
duct research. Bob helped improving the presentation of th e m aterial greatly by his
careful reading and enlightening comments and taught me a great deal about good
scholarship.
I would also like to thank my com m ittee members Richard Hull and Alvin De-
spain for their valuable comments on the dissertation proposal, drafts, and final
defense.
Special thanks to Donald Cohen, Neil Goldman, K rishnam urthy Narayanaswamy
for spending a great deal of tim e with me and with this docum ent. They were always
there as patient listeners whenever I had questions or needed feedback.
Many thanks to Dean Jacobs and Paul Rosenbloom for serving in my guidance
com m ittee and providing suggestions and comments.
I wish to thank Dennis Allard, Kevin Benner, M artin Feather, Edward Ipser Jr.,
Lewis Johnson, Surjatini W idjojo, Dave W ile, and Lorna Zorm an and th e research
and support staff at the software division who made my tim e at ISI enjoyable and
productive.
111
CONTENTS
A ck n o w led g m en ts iii
A b stra ct v iii
1 In tro d u ctio n 1
1.1 Program M onitoring and M easuring a t High L e v e l................................... 2
1.2 Thesis and Problem Statem ent ..................................................................... 3
1.3 Overview of S m a rtM o n ito r............................................................................... 4
1.3.1 A High Level PM M S y s t e m ................................................................ 5
1.3.2 A High Level PM M S cenario................................................................ 8
1.4 C o n trib u tio n s ........................................................................................................... 10
1.5 Scope of the R esearch................   11
1.6 O rganization of the D isse rta tio n .........................................................................12
2 B ack grou n d and R ela ted W ork 13
2.1 Specifying PMM Requirem ents .........................................................................14
2.1.1 D ata-Based Program M onitoring and M e a su rin g ..............................15
2.1.2 M odel-Based Program M onitoring and M e a su rin g ......................  16
2.1.3 Very High Level Languages and PM M ............................................... 19
2.2 D ata C o lle c tio n ........................................................................................................19
2.3 D ata P ro cessin g ........................................................................................................22
2.4 Related R e s e a r c h .................................................................................................... 23
2.4.1 Program T ran sfo rm atio n ...........................................................................23
2.4.2 Static Analysis of P r o g r a m s ....................................................................24
2.4.3 Increm ental Generation of Derived D a t a ............................................25
2.5 A P5 C om putation Model - A PM M View .....................................................26
3 A P ro g ra m M o n ito rin g and M easu rin g S p ecifica tio n L anguage 27
3.1 PM M Specification Language O b je c tiv e s .......................................................28
3.2 A PM M Specification L a n g u a g e ........................................................................ 30
3.2.1 Program M onitoring and M easuring Event M o d e l.......................... 30
3.2.2 Predefined Event Types and Control R e la tio n s .................................33
3.2.3 PM M Specifications ................................................................................. 38
iv
3.3 Syntax and Semantics of the L a n g u a g e ............................................................. 40
3.3.1 Syntax of the PMM Specification Language ..................................... 40
3.3.2 Semantics of the PMM Specification L a n g u ag e ..................................40
3.3.3 Properties of Valid PMM Specifications............................................... 42
3.4 S u m m a r y .................................................................................................................... 43
4 A n A u to m a tic P rogram m in g S y stem for P M M S p ecifica tio n 44
4.1 Aspects of Instrum entation G e n e ra tio n .............................................................45
4.1.1 Event Schema Generation  ............................................... 46
4.1.2 Instrum entation Site S electio n ............................................................  48
4.1.3 Instrum entation Code G e n e ra tio n ..........................................................50
4.1.4 Approach S u m m ary ..................................................................................... 51
4.2 Static A pproxim ation .  .................. 51
4.2.1 Representing Sites and Their R elatio n sh ip s........................................ 53
4.2.2 Static Approximation M a p p in g ............................................................. 54
4.2.3 Static Approximation Transformation A lg o rith m s .......................... 55
4.2.4 A Static Approximation E x a m p le ..........................................................58
4.3 Instrum entation for Prim itive E v e n ts ................................................................ 62
4.3.1 An A bstract Instrum entation I n te rf a c e ............................................... 62
4.3.2 Instrum entation Generation for P r im itiv e s ........................................ 66
4.4 Instrum entation Generation A lgorithm s............................................................ 69
4.5 S u m m a r y ............................................................................ 72
5 O n In crem en ta l C o m p u ta tio n o f M on itorin g R e su lts 74
5.1 Issues of Increm ental C o m p u ta tio n .....................' ............................................75
5.1.1 D eterm ining Temporal D ep en d en cy ...................................................... 76
5.1.2 D eterm ining the D ata to R eco rd ........................ ' ................................. 77
5.1.3 G enerating Instrum entation .  ............................................................. 78
5.2 Representing Temporal Relationship ................................... 79
5.2.1 Event Dependent G r a p h ...........................................................................79
5.2.2 Building Event Dependent Graphs ......................................................80
5.3 Run Tim e F ilte rin g ..................................................................................................82
5.3.1 Intra-Event C o n d itio n s..............................................................................82
5.3.2 Inter-Event C o n d itio n s..................................................... 82
5.3.3 Scope A n a ly s is ............................................................................................83
5.4 Increm ental Instrum entation G e n e r a tio n .........................................................84
5.4.1 Dealing W ith Derived R e la tio n s .............................................................86
5.4.2 Tim e and Space Compromises in P M M ...............................................86
5.4.3 O ther O p tim iz a tio n s ................................................................................. 88
5.5 Sum m ary ................................................................................................................... 88
v
~6 S m a rtM o n ito r E valu ation 90
6.1 A pplication Program and Q u e s tio n .................................................................. 90
6.2 The E x p erim en t........................................................................................................ 91
6.3 Experim ent A n a ly s is ..............................................................................................92
6.4 S u m m a r y ...................................................................................................................93
7 C o n clu sio n s and F uture W ork 94
7.1 S u m m a r y ...................................................................................................................95
7.2 M ain c o n trib u tio n s ........................................ 95
7.3 Lim itations of the R e s e a r c h ................................................................................97
7.4 Future W o rk ............................................................................................................... 97
7.5 Concluding R e m a r k s ..............................................................................................98
A p p e n d ix A
The PM M Specification L a n g u ag e.............................................................................100
A .l Prim itives and A ttributes ................................................................................. 100
A.2 Control R e la tio n s ...................................................................................................103
A.3 Value Types and O ther R e la tio n s ................................................................... 104
A.4 PM M Language S y n ta x ........................................................................................ 105
A p p e n d ix B
The Prototype Im p le m e n ta tio n ................................................................................107
B .l Sm artM onitor Im plem entation O v e rv ie w ......................................................107
B.2 Sm artM onitor Com ponents ..................................................  109
B.2.1 PM M Specification L a n g u a g e ..............................................................109
B.2.2 A utom atic P r o g r a m m e r ........................................................................I l l
B.2.3 Static A pproxim ation ...............................................................................112
B.2.4 Instrum entation Interface .................................................................... 112
B.2.5 G enerating In s tru m e n ta tio n .............................................  113
B.3 Im plem entation S u m m a r y .................................................................................113
R eferen ce L ist 114
vi
LIST OF FIGURES
1.1 A High Level Program M onitoring and M easuring System ................... 5
4.1 A Piece of Source P ro g ra m ..............................................................................45
4.2 A PM M S pecification................................................................................... 46
4.3 A Source Program R ep resen tatio n .....................................................  53
5.1 An Event Dependent G r a p h ..........................   81
Abstract
Program m onitoring and measuring (PM M ) is the activity of collecting empirical
d ata of a program ’s execution to answer questions about the program ’s performance.
PM M is usually done by altering the program to collect interesting d ata as it runs.
U nfortunately, this is itself an arduous task involving all the difficulties of program
construction and m aintenance. Generally there are three tasks involved in altering
th e program to answer performance questions: determ ining w hat d ata has to be
collected, determ ining where in the program th a t d ata can be collected, and adding
code to the program to collect th at data and to process it to produce th e desired
results. As program s become larger and as the com putational structures employed
by languages become more complicated, performing those tasks becomes a challenge
for hum an program m ers.
This dissertation presents a system th at autom ates each of these tasks. Its input
is a high level specification of PM M questions and the source program . Its output
is an augm ented version of the program whose execution produces both the results
of the original program and the answers to the PMM questions. This system has
proved to be very effective. It exploits several techniques not previously used. First,
PM M questions are specified in a specification language th at facilitates both question
specification and autom atic program augm entation. The language is based on an
Entity-R elationship model and a set of programming language dependent prim itives
th a t enables the relevant d ata to be collected to be determ ined from the prim itives
used in the questions. Determ ining where to insert instrum entation is done by re­
lating run tim e program behaviors with the source program constructs th at produce
th a t behavior and a static analysis which locates those program constructs in the
source program . Adding code to collect th at data can be done efficiently via instan­
tiating generic instrum entation tem plates associated w ith each of the program m ing
language prim itives. Second, the instrum entation for com puting the answers to the
viii
questions is constructed autom atically by filtering, combining and merging the data
collected by these prim itive tem plates using th e formalism and power of a relational
query processor. Finally, to minimize th e collection of extraneous data, static anal­
ysis is used to filter out irrelevant sites at compile tim e and tem poral analysis is
used to filter out d ata at run time.
Chapter 1
Introduction
Program m onitoring and measuring (PMM) is the activity of collecting inform ation
about the perform ance characteristics of a program. Such d ata is useful for finding
perform ance bottlenecks and understanding perform ance tradeoffs, which in turn
can be used to improve the performance of the program. In th e absence of special
purpose hardw are, m onitoring and measuring is done by altering the program to
collect interesting d ata as it runs. Unfortunately, this is itself an arduous task
involving all th e difficulties of program construction and m aintenance. Generally '
the program m er starts with some performance questions. In order to answer these
questions he has to do the following tasks:
• determ ine w hat d ata has to be collected
• determ ine where in the program th at d ata can be collected
• add code to the program to collect th a t data and to process it to produce the
desired results
The program m er then runs the instrum ented program on some d ata of interest to
obtain th e answer to his question. This generally leads either to changes in the
program or further questions, and so the process iterates. As program s become
larger and as th e com putational structures employed by languages become more
com plicated, correctly and efficiently fulfilling each of the tasks becomes a challenge
jfor hum an program m ers. This dissertation presents an autom atic program m ing ap­
proach to m onitoring and measuring program execution th a t autom ates each of the
three tasks m entioned above. This approach exhibits how program m ing language
1
features, source program s, and program monitoring and m easuring needs can bej
incorporated into an autom atic programming system to m onitor and m easure the;
perform ance of program execution. The system ’s input is the source program and!
a high level specification of w hat the program m er wants to know. The output is anj
instrum ented version of the program whose execution produces both th e results of
the original program and th e answers to the specified questions.
1.1 Program Monitoring and Measuring at High
Level
High level program monitoring and measuring means th a t th e specification is w ritten
in term s of w hat the program m er wants to know, rather th an in term s of how a
m achine can com pute w hat he wants to know -that is it is expressed in term s of high
level abstractions. Notice th a t he may be interested in “low level” activities, such as
how much tim e the program spends paging, but he can ask such questions w ithout
having to describe how to figure out the answers.
Program m onitoring and measuring requires knowledge of the m onitored pro­
gram, the sem antics of the program m ing language, and the sem antics of m onitoring
and m easuring questions. Because the specifications are oriented towards hum an un­
derstanding rather than m achine execution, those specifications are declarative and
m ust be translated by the program monitoring and m easuring system into procedu­
ral code. Usually perform ance d ata of a program ’s execution can only be collected
at a low level and its volume is huge. Simply collecting all low level d ata is not only
inefficient but also im practical. Some performance d ata is not directly accessible, for
instance, the history of program execution. In order to get th e required results, di­
rectly accessible perform ance d ata needs to be recorded so th a t the required results
can be derived from it. However, collecting low level d ata needs to deal with the im ­
plem entation of some high level constructs of the program m ing language. Because
the collected low level d ata is not oriented towards hum an understanding it needs
to be translated and aggregated into forms th at are easy for hum an consumption.
The translation is even more im portant when very high level languages are used
because the gap between the level where the data is collected and th e level where
2
the required resuits are specified is very large. W ithout com plete understanding of
the sem antics of th e program m ing language and of the organization of the source
program , correct and efficient translation would not be possible. Requiring the pro­
gram m er to m anually do the m onitoring and measuring of program execution not
only creates a burden for him but also has high risk of introducing additional m ain­
tenance problems[BP88]. Hence, the process involved in program m onitoring and
m easuring should be autom ated. In addition, applying high level program m onitor­
ing and m easuring by using our autom atic program m ing approach makes program
m onitoring and m easuring easier, fast, and less error prone.
D espite m ajor difficulties m et by general purpose autom atic program m ing sys­
tem s, progress has been m ade in domain specific autom atic programming[KB81,
Kan83, Bar85, RW88, Qia90]. There are several reasons why autom atic program ­
ming for program m onitoring and m easuring is feasible. F irst, program m onitoring
and m easuring is an inform ation processing process[Sno88]. Developm ent in data
retrieval techniques makes it possible to autom atically transform low level d ata into
higher level forms. Second, collecting performance data of program execution can be
decomposed into collecting perform ance data of the execution of th e particular syn­
tactic constructs th a t composed the program. In a program m ing language, there are
a few constructs th at are the building blocks of programs. Based on th e semantics of
the program m ing language, instrum entation plans can be constructed beforehand for
m onitoring and m easuring the execution of those constructs[Bal69, Knu71]. Third,
both the PM M specification and the source program are available before program
execution. They can be used to determ ine where in the program s the relevant data
can be collected and how to com pute the answers to the questions defined in the
specification from the collected data.
1.2 Thesis and Problem Statement
The thesis of this dissertation is th a t the task o f installing software performance
instrum entation can and should be automated.
T he problem we address is as follows:
Given:
3
- A source program
- A set of perform ance questions about the run-tim e behaviors of the pro­
gram
Domain: Program s w ritten in the APS language[Coh88]
G enerate: An augm ented program th a t satisfies the following constraints:
- The augm ented source program preserves the functionality of the original
source program
- The execution of the augm ented program produces answers to the ques­
tions.
There are two reasons for choosing AP5 programs as this research domain: First,
AP5 is a very high level program m ing language; second, it is in daily use in my
research environment[Bal85]. In other words, the ideas of this research can be
practically im plem ented and dem onstrated w ith the available facilities.
1.3 Overview of SmartMonitor
The following issues are addressed in this dissertation:
• T he developm ent of a specification language for stating PM M questions.
• T he identification of the necessary reasoning processes for generating the aug­
m ented program from a set of PMM questions in this language and a source
A P5 program.
• T he architecture of an autom atic program m ing system th a t perform s th at
reasoning and generates th e required instrum entation to answer the PMM
questions.
They are addressed through building an autom atic program m ing system — called
Sm artM onitor, for high level PM M of AP5 programs. Sm artM onitor is a high level
program m onitoring and measuring system. As illustrated in figure 1.1, Sm artM oni­
to r takes both a source program and a set of PM M questions about the perform ance
4
Source Program
PMM
compiler
PMM Spec
Augmented Source Program
compiling
&
execution
Answers & Results
Figure 1.1: A High Level Program M onitoring and M easuring System
of the program as inputs and produces an augm ented source program . T he aug­
m ented program is then compiled and executed to produce both the results of the
original program execution and the answers to the questions. The set of ques­
tions is w ritten in a specification language, called the PM M specification language.
Sm artM onitor is an autom atic program m ing system for program m onitoring and
m easuring of A P5 program execution. By autom atic program m ing, we m ean th a t
th e answers to perform ance questions are com puted w ithout any hum an assistance
after the set of questions is given.
1.3.1 A H igh L evel P M M S y stem
Sm artM onitor requires the program m er to provide a declarative description of what
he wants to know about the execution of his program. This is done via a specification
language which embodies a model of program execution. At the m ost prim itive
level, the activity of m easuring and m onitoring involves interrupting the program at
relevant points during its execution and gathering data. The d ata model supported
5
by Sm artM onitor is sim ilar to an Entity-Relationship model[Che76] in which entities
and relationships are categorized as follows:
• Point events: These are events which occur at a single point of execution.
Examples are th e entry of a function, or the arrival of an interrupt.
• Interval events: These are events associated w ith a pair of execution points,
called the starting and ending points. An example is the execution of a func­
tion.
• A ttributes of events: These associate events with d ata th a t can be observed
when a point event occurs or at the starting or ending points of interval events.
Examples include the tim e at which the event occurs, the nam e of the function
of which this event is an execution, the value of program variables at th a t tim e,
etc.
• Control relations: These are relations among events. An example is the calls
relation which indicates th a t one function execution calls another.
• other relations among non-events: These include any other relations among
d a ta objects, e.g., the values of the param eters passed to different executions.
An exam ple is the fact th at one value is greater than another.
From these entities and relationships one can define further relationships of in­
terest by composing expressions in the relational calculus. These expressions define
abstractions and aggregations. C hapter 3 discusses the specification language and
issues th a t arise in this design.
The PM M questions about th e execution of a program are asked before its exe­
cution. Sm artM onitor modifies the source program to collect only the relevant data,
i.e., th e d ata th a t are needed to com pute the answers to these questions. This is
im portant for two reasons. F irst, directly collectible perform ance d ata is at very low
level w ith very large volume. Collecting irrelevant d ata takes not only extra space
jbut also extra tim e. Hence, there is a twofold reason to reduce recording irrelevant
jdata. Second, the cost of the instrum entation code is part of w hat is ultim ately
m easured. Because the program m er is interested in the perform ance of the original
code, not the instrum entation code, the cost of the instrum entation code is thus a
6
perturbation of the d ata of interest, and should therefore be m inim ized. Of course,
this cost cannot be totally elim inated as long as we restrict ourselves to software
m onitoring. R ather our goal is to provide substantial autom ation of an extrem ely
useful, albeit im perfect, activity th a t programmers now do manually.
However, selectively collecting the required data needs to deal w ith data collec­
tion freedom, i.e., there are m any ways to collect the same data. Furtherm ore, per­
form ance d ata of program execution are com putation state dependent, i.e., whether
or not the d ata collected at one place is relevant depends on the d ata collected at
some other places. These difficulties raise challenges as to how to generate the aug­
m ented program taking advantages of the source program structures, the semantics
of the program m ing language, and the semantics of the questions. In particular,
given a set of questions about a program ’s execution, the following questions m ust
be answered:
• W hat d ata m ust be collected?
• W here in the source program can/should th at d ata be collected?
• How can/should code be added to the program to collect th a t d ata and to
process it to produce the desired answers?
• How can the dynam ic aspects of testing conditions and collecting data be
m anaged?
In im plem entation, Sm artM onitor uses AP5 relations to m odel a program ex­
ecution activity. In particular, it uses a relational schema to m odel the execution
activity described in the PM M specification. This schema is expressed in term s of
stored relations (ones whose tuples are explicitly asserted) and derived relations.
Sm artM onitor only needs to generate instrum entation to collect d ata for the stored
relations because the derived relations including the answers to the PM M questions
can be autom atically com puted from th e stored relations using database query eval­
uation techniques[Coh89b, U1189]. T he process of recording d a ta for the stored
relations is called data collection. T he process of com puting d ata for the questions
is called data processing. Because d ata collection progresses gradually as the pro­
gram execution proceeds, the Sm artM onitor needs to determ ine w hether or not it
7
should do d a ta processing increm entally, i.e., processing the d ata as soon as they
are available as discussed in C hapter 5.
T he Sm artM onitor works as follows. First, it checks th e PM M specification
to ensure it is well defined. Second, it checks the specification against the source
program to find out where instrum entation for data collection should go and to check
w hether th e instrum ented source program could provide answers to the questions.
The places in the source program where instrum entation for d ata collection will be
placed are called instrum entation sites. Next, the PM M specification is translated
into instrum entation code. Finally, th at instrum entation code is merged into the
source program at the selected sites.
How can the system generate instrum entation for various PM M questions? Based
on the techniques to be discussed in C hapters 3, C hapter 4, and C hapter 5, a set of
instrum entation tem plates for prim itives and control relations is presented together
w ith ways of filling them in and combining them . These building blocks characterize
th e prim itives, control relations, and relationships between source programs and
perform ance questions so th a t a high level PMM question can be decomposed into
a set of prim itives, control relations, and the abstractions built on them . Using
a small set of simple prim itives and control relations, the system can combine the
instrum entation collected for these prim itives to answer any PM M question.
1.3.2 A H igh L evel P M M S cen ario
Suppose th e program m er is considering the installation of a local software cache to
record th e result of function G when it is directly called by function F. In order to
estim ate th e value of the change he would like to know how often such a cache would
hit, and how much tim e would be saved. In the absence of a tool like Sm artM onitor,
a program m er would install instrum entation manually. F irst he m ust determ ine
w hat d ata to collect. Then he m ust determ ine where in his program the data can
be collected. N ext, he m ust insert th e appropriate pieces of code into th e places
identified, ensuring th a t in each case th e correct d ata is collected. Finally he needs
to insert code to report results. All of this is what Sm artM onitor does autom atically
from a specification of w hat the program m er wants to know.
8
In tKis exam ple, the question is how much tim e is spent in how m any calls tcj
G from F th a t have the same param eters as previous calls to G from the same
invocation of F.
T he system first analyzes the question to figure out what d ata m ust be collected.
It determ ines th a t d ata of the executions of G and F are needed. It th en uses static
analysis to find out where in the source program this d ata can be recorded. Here
there are some choices. D ata of the executions of G can be collected by inserting
instrum entation code either into the definition of function G or around the function!
calls of G inside the source program. Similarly there are choices in collecting d a tJ
of the executions of F. Because th e answers to the question depend only on those
G executions th a t are directly called by F executions, only those G executions th at
are directly called by F are relevant. Since there is a very close relationship between
the PM M specification language and the semantics of the program m ing language
in which the source program is w ritten as is discussed in C hapter 3, the system
can figure out th a t only those function calls of G th at appear inside th e definition
of function F are relevant and they are chosen as the instrum entation sites of G
executions. Because all F executions are needed to com pute the answers, all function
calls of F are relevant. Instead of inserting instrum entation code for all function calls
of F, th e system instead inserts instrum entation code into the definition of function
F.
T he system then creates a relation for each kind of d ata so th at d ata of the G and
F executions can be separately recorded and referenced. Let us assum e th a t relation
G-execution and relation F-execution are used to record the d ata of the execution of
G and F respectively. Two attributes, Parameters and Caller, are used for recording
attrib u tes of function executions. Relation G-execution-counter is used to record the
num ber of tim es G is executed. Finally, the system generates instrum entation code
for the question using instrum entation generation plans to combine these function
executions and the control relation calls th a t groups together all G executions th at
occur w ithin a single F execution. The instrum entation code is m erged into the
source program to generate an augm ented source program. Because the question
only asks for th e num ber of the required G execution, the collected d ata of G and F
executions are only used internally to com pute th at answer. As soon as th e answer
is com puted, they are no longer needed. The system generates instrum entation code
9
th a t only records th e num ber of the required G executions based on the conditions
m entioned above. Because of these conditions, the system also generates some filters
to test those conditions at run tim e to ensure th a t the d ata collected is relevant. For
the exam ple question, the filter is th at there is a previous G execution having the
same actual param eter values as the current one. In order to check the condition,
actual param eter values of G executions m ust be collected and recorded so th a t they
can be referenced later.
T he instrum entation code generated for function calls of function G inside the
definition of F to answer to the posed question uses a relation called G-execution to
keep track of the actual param eters and the caller for each invocation of function
G, and a relation G-execution-counter to keep track of the num ber of execution of G
th a t had the same actual argum ents and caller as some previous invocation. This
instrum entation, paraphrased in English to make it more readable, is:
IF there is a tuple in the G-execution relation with
the same actual parameters and the same caller
THEN increase the G-execution-counter by one
ELSE insert a tuple in the G-execution relation with
the current actual parameters and caller
T he condition of the IF statem ent is used as a run-tim e filter to filter out those G
executions th a t do not satisfy the condition required in th e question.
T he instrum entation code generated for the definition of function F is:
Insert a tuple into the F-execution relation
T he augm ented source program will be compiled and executed. A fter the execution
of the augm ented program , the num ber recorded in the relation G-execution-counter
is th e answer to the posed question.
1.4 Contributions
The prim ary contributions of the research are the design and im plem entation of the
autom atic program m ing system th a t supports high level program m onitoring and
m easuring and the dem onstration of the viability of this approach. They include:
10
• a model of program execution and a language in which high level questions
about executions can be expressed
• a dem onstration th a t these questions can be autom atically transform ed into
instrum entation th a t computes the required answers
• an understanding of the techniques for perform ing th a t transform ation and
the tradeoffs inherent in their use
m ore specifically, they include
• using a specificational approach to high level program m onitoring and m ea­
suring
• designing a specification language for specifying PM M questions of AP5 pro­
gram execution
• applying autom atic program m ing for high level program m onitoring and m ea­
suring
— the principles of choosing monitoring and m easuring prim itives for a very
high level language
— the methodology of com puting answers to PM M questions by merging
PMM specifications into the m onitored source program so th at only rel­
evant d ata are collected
— the design and im plem entation of the instrum entation for prim itives of
the AP5 language
— the m ethodology of applying static analysis to specification merging
— the m ethodology of increm ental com puting tem poral dependent derived
perform ance data.
1.5 Scope of the Research
The m ain focus of the research is to provide a framework and the required underlying
support for a high level PMM . The framework is based on program executions
11
on single instruction flow and single dataflow machines w ith a central clock. The
source programs are w ritten in a very high level im perative program m ing language,
in particular, the AP5 language. The framework is not intended for dealing w ith
m onitoring real tim e program m ing system[Pla84].
M any issues related to PM M are not addressed here. Such issues include:
- PM M com pensation: th e current framework does not take the overhead in­
troduced by software m onitoring and m easuring into consideration when it
does program m onitoring and measuring. In order to accurately m onitor and
m easure program execution com pensation should be provided for this overhead
- user interface: the current system does not address user interface issues
- choosing representation for interm ediate data: the current framework does
not try to choose representations for the interm ediate d ata to enhance the
perform ance of th e augm ented (i.e., instrum ented) program
- sharing: the approach described in this dissertation perform s optim ization
only on th e basis of local analysis of d ata dependency. T hus while determ ining
which tuples m ight be needed to answer the question, the m ethod examines
d ata flow through each of the questions separately.
1.6 Organization of the Dissertation
The rest of the dissertation is organized as follows: C hapter 2 gives a brief review
of related work and the AP5 com putation model; C hapter 3 describes in detail our
PM M language and the ideas behind it; C hapter 4 describes our autom atic pro­
gram m ing system, Sm artM onitor, and the transform ation techniques for the PMM
language; C hapter 5 describes our techniques for increm entally com puting moni­
toring results; C hapter 6 reports some experim ental results and experiences of the
Sm artM onitor system; and C hapter 7 summarizes our accom plishm ents and de­
scribes some future work.
12
Chapter 2
Background and Related Work
(
Program M onitoring and M easuring (PM M ) has been an essential activity since pro­
gram m ers came to deal w ith the difficulties of programming[PN81]. Conceptually,
PM M involves determ ining w hat to m onitor and m easure, collecting necessary data,
and processing the collected data. There are m any aspects of a program ’s execution
th a t PM M can focus on. First, there is a control aspect which consists of points of
control in program execution. Second, there is a d ata aspect which consists of all
input d ata and internal d ata of the program ’s execution.
Model[Mod79] pointed out th a t program m ers want to m onitor and m easure their
program s’ execution which reflects the nature of the high-level control regime. Fur­
therm ore, they want to the m onitoring and measuring results be expressed in terms
of which they have structured their programs. However, trad itio n al PM M facilities
are oriented towards the control structure of the com putation and the state of data
elem ents at too low a level. This results in an overwhelming flood of details th at ob­
scure rath er th an illum inate the activities of program execution th a t are relevant to
the program m ers. Moreover, the programmers may not understand the im plem enta­
tion level details th a t would be obtained with traditional system s, as they m ay know
little about the im plem entation or about the underlying program m ing language de­
tails. This is especially true for very high level program m ing languages, such as
AP5[Coh88], REFINE[Sys86] and SETL[SSS81]. Furtherm ore, d ata structures and
control structures in program s w ritten in high level languages m ay have so much
inform ation associated w ith them th a t the programmers would not want to see it
all, even if it were completely comprehensible. It is im portant th a t high-level PMM
facilities be able to answer questions about data structure and operations instead
13
of simply showing them in full detail. It is also helpful for the system to compile
low-level bits of inform ation into higher-level structures th a t provide answers to the
program m ers’ question at th e level on which it was conceived.
In this chapter, we discuss the various areas th a t PM M touches and provide
an overview of related research in the areas. Specifically, we discuss th e issues o:
specifying PM M requirem ents, collecting d ata during program executions, and pro­
cessing th e collected data. For each of them , we discuss various aspects of th e issue,
related research in the area, and the im pact of using very high level program m ing
languages.
2.1 Specifying PMM Requirements
Specifying PM M requirem ents is to tell a PM M system w hat to m onitor and m ea­
sure. There is a spectrum of ways for program m ers to com m unicate w ith a moni­
toring system about w hat they want to know. At one extrem e, a system provides
a fixed set of options w ith predefined meanings about w hat d ata can be collected
and lets program m ers choose from them . Usually, the d ata includes the num ber of
tim es a statem ent or function is executed or the am ount of tim e spent in executing
a statem ent or function. At the other extrem e, a system provides a language to letj
program m ers state w hat they want to m onitor and m easure. W ithin the spectrum ,1
various forms of positions are possible. For instance, some system is capable of
collecting a set of d ata of program execution and it provides a very simple language
th a t enables program m ers to tell the system under w hat conditions th a t d ata should
be collected [Sym86, Ben88].
Like any other specifications, program m onitoring and m easuring specifications
include three aspects[Gol83]. They are coverage th at concerns w hat activities of:
program execution to m onitor and measure, extent th at concerns the level of detaill
at which PM M should occur, and tim e th at concerns how frequently th e monitoring*
and m easuring of a program execution should be done. T he existing approaches are
described from those aspects.
T here are two basic schemes for PMM[CC76, PN81].
14
1. A set of d ata is collected wEile a program is being executed w ithout knowing
any specific question. T he d ata is used to answer general questions about th e
program ’s execution.
2. Given a program and a set of questions, d ata is collected to answers the set of
specific questions.
We call th e first scheme data-based PM M and the second scheme model-based PMM.
2.1.1 D a ta -B a se d P rogram M o n ito rin g an d M easu rin g
D ata-based PM M systems routinely collect d ata about the perform ance of program
execution to satisfy PM M requirem ents. In data-based PM M system s, there is
only a trivial language if any in which program m ers state their PM M requirem ents.
Usually, program m ers can only choose from a few fixed PM M categories of differ­
ent perform ance aspects of a program ’s execution. For exam ple, those categories
cover tim e or space usages of program execution. Profile based system s are typical
data-based PM M systems. An execution profile apportions the execution cost of
a profiled program to its com ponent parts[Knu71]. The cost is usually the tim e
spent in executing the component parts or th e num ber of tim es the com ponents are
executed. T he level of program decomposition (the extent of PM M ) for profiling
depends on the language in which the program is w ritten. For languages w ith ex­
plicit control-flow, statem ent level profiling is appropriate[Knu71]. If th e language
encourages small routines, then routine level profiling may provide as much infor­
m ation as statem ent oriented profiling[G+83]. Higher-level languages (those with
control-flow im plicit in operators or generators) m ay require profiling on individual
operators[C + 83]. In those system s, very lim ited inform ation, e.g. the execution
tim e of program com ponents, on all program com ponents is collected. Hence, not
all inform ation of program execution, e.g., the values passed to th e param eters of
functions, is available for program m ers to process w hat they needed. Little or no
support is provided for program m ers to selectively tell the systems which compo­
nents to m onitor. Therefore, it is hard or impossible to m onitor the executions of
the selected com ponents th a t satisfy some higher level conditions because there is
no proper way to tell PM M systems.
15
There are several shortcomings in profile based system s. First, some desired
behaviors which program m ers seek and high level concepts used in their programsj
can not be explicitly expressed, they are im plicitly represented in the processing!
phrase and are compiled in their code (sometimes in their m ind). In order to get
w hat they w ant, program m ers are forced to tran slate their PM M requirem ent th at
could be naturally expressed in high level concepts into th e im plem entation of these
concepts so as to use th e systems and later translate it back. Second, it is hard or
impossible for program m ers to specify requirem ents th a t focus on only a particular
part of th eir program s’ executions. Finally, since no d ata about the control aspect
of program s’ execution and no d ata about some of the d ata aspects of program
execution (e.g., the actual param eters to some functions) are recorded, they are not
available to the program m ers.
2.1.2 M o d el-B a sed P rogram M o n ito rin g an d M easu rin g
M any researchers [Mod79, Sno84, 0CH91] realized the problems of the data-based
PM M system s m entioned in the previous section. They figured out th at if a system
lets program m ers state w hat they want to know then the system can do much bet­
ter in avoiding collecting irrelevant lower level data. In m odel-based PMM systems,
there is a PM M language, which is different from the program m ing language in which
the m onitored program is w ritten, to represent what program m ers want to m onitor
and m easure. Explicitly representing program m ers’ PM M requirem ents enables the
system to focus on relevant d ata and to filter out irrelevant data. Unlike data-based
program m onitoring and m easuring tools, model-based tools only collect d ata th at
are used in the program m ers’ PM M specification, i.e., their coverage is selective.
T he extent of model-based tools is tow ard program m er defined program activities.
Those activities are usually at a higher level th an th a t of data-based PM M tools.
Conditions are defined on when m onitoring and m easuring should happen, thus
m onitoring and m easuring is m ore selective in the sense th a t not all collected data
is recorded. Relational database based PMM [PL83, Sno84] is a model-based PMM.
Powell and Linton[PL83] realized th a t the relational representation[Cod70, U1188] is
a very general representation. Given some high level PM M questions, the answers to
16
the questions can be com puted from the collected d a ta via relational query evalua­
tion techniques once the collected d ata is represented in relations. Snodgrass[Sno84]
applied th a t idea to com puter system monitoring. In his system , program m ers pro­
vide prim itive d a ta descriptions of their program execution (e.g., the schema of the
prim itive data) and insert sensors m anually to collect th a t d ata in com puter sys­
tem s. Prim itive d ata collected by th e sensors is recorded in relations. Program m ers
then state w hat they w ant to know in term s of the prim itive d ata before program
execution via a query language called TQuel[Sno87], which is an extension to re­
lational query language Quel[HSW85]. Queries can only be m ade on the prim itive
d ata which has been installed. The high level d ata is com puted from the low level
d ata as the execution of the program progresses.
M odel-based approaches have been used in program debugging as well. In pro­
gram debugging, m any researchers recognized the need to let program m ers state
w hat they want to know about their program ’s behaviors at a higher level of ab­
straction. T he event based view of program behavior[W il86, Pnu86], originally
developed for concurrent or distributed systems, has been used in m any debugging
systems to describe program behaviors [BH83, BW83, Bru85, Bat88, HK88]. By
this view, an event models som ething occurring and any behavior of a system is
considered to be an event. An interesting activity consists of a sequence of events,
where the particular events in the sequence may be at any appropriate level of
abstraction. Since any interesting program m ing system is capable of producing a
vast num ber of different behaviors, depending upon different inputs it receives, the
event-based view can be very expressiveness in describing possible behaviors of a
system w ith a set of sequences of event[BW83, Bat88]. This approach provides a set
of predefined prim itive events and a language based on EBB A upon which high level
events can be defined [BW83, Bat88]. Typical prim itive events are function execu­
tions, file operations, and d ata references. Relationships betw een different events
are captured by the order in which they occur. P ath expressions [Bru85] are used
to represent th a t order. Inform ation available to define an event is the d ata defined
in users’ program s, calling sequence of functions, and prim itive operations on data.
Program m ers specify w hat they want to monitor by defining events in term s of the
predefined prim itive events and other defined events. A m onitoring system only
17
collect¥^ataTfoF1Ehe predefined eventsTDefined events are recognized by using finite
state autom ata techniques w ith syntactic p attern m atch [BW83, Bru85, Bat88].
Up to now, we have discussed those PM M languages th a t are used as a sepa­
rated language. There have been some efforts to enhance program m ing languages
to include some m onitoring and m easuring facilities. These enhancem ents explic­
itly represent the inform ation which was traditionally left im plicit in program ex­
ecution and makes the inform ation accessible w ithin the program m ing languages.
M any languages include some m onitoring statem ents inside program m ing languages
[Han87, LW69, Smi84, Dav80, Bat83]. For instance, SNOBOL[Han87] has facilities
th a t support three kinds of m onitoring. First, it supports accessing program in­
ternal param eters through keywords. Second, it supports using p attern m atch to
invoke m onitoring and m easuring actions. Third, it supports some condition test­
ing operators so th at program m ers can do m onitoring and m easuring conditionally.
Accessing program internal param eters is done by tracing. Tracing can be applied
to different param eters. First, value tracing accesses value of variables whenever a
value is assigned to a traced variable by an assignm ent statem ent or as a result of
value assignm ent in p attern m atching. Second, Function tracing traces entry point
and retu rn point of a function. Third, Label tracing causes trace print out whenever
transfer is m ade to a label and only under some condition. Fourth, Keyword Tracing
causes trace print out when the value of a nam ed keyword is changed.
Program m ers can define trace functions to be invoked when any of the events
m entioned above occurs. Special variables and functions are introduced in the lan­
guage to facilitate program m onitoring and measuring. For exam ple, a special vari­
able $stno is used to record the num ber of the statem ent currently being executed
and a special function arg(fn,n) is used to access th e n th param eter value of the
function fn.
T he facilities described above make PMM more convenient. However, in order
to m onitor program execution programmers m ust explicitly w rite m onitoring code
using these facilities and insert the code into their program s. Many found th at
jexplicitly enhance a program m ing language to do program m onitoring and m easur­
ing has a lot of restrictions[PN 81, BH83]. Moreover, directly changing a program
for m onitoring purposes could introduce some additional m aintenance problems.
18
Hence, a separate language is often used to describe PM M requirements[Bru85,
Sno82, Sno84, Sno88, SPSB91].
2 .1 .3 V ery H igh L evel L angu ages an d P M M
T he gap between very high level program m ing languages and the underlying compu­
tatio n environm ent upon which the programs w ritten in th e languages are executing
is larger than th a t of m ost conventional high-level program m ing languages and en­
vironm ents. Using very high level program m ing languages has m any im pacts on
specifying PM M requirem ents. F irst, the costs of program execution are not appar­
ent from the source code. Hence, PM M systems m ust provide lower level details to
describe some execution activities th at are not p art of the high level languages’ exe­
cution model. Moreover, those systems m ust provide a language for specifying those
details m entioned above. Second, because those details useful for describing th e per­
form ance of program execution are not part of the languages’ execution model it is
m ore im portant to let program m ers to state them declaratively and let systems to
figure out how to get them . It is m ore efficient and less error prone for a m onitoring
system to do th at. Third, the d ata structures and control structures in programs
w ritten in very high level languages may have so m uch inform ation associated with
them th a t it is m ore im portant for program m ers to ask high level questions which
uses inform ation of both d ata state and control d ata of program s. Doing so enables
the program m ers to focus on w hat they want to know w ithout being overwhelmed
by details. Fourth, the large gap also makes it more im portant for the PM M lan­
guage to support abstractions so th a t program m ers can introduce their own term s
and some idiom atic usages in specifying w hat they want. Supporting abstractions,
e.g., aggregations and specializations, also makes it easy for program m ers to tell
m onitoring system s w hat they want. T hat is especially tru e if w hat they want to
know includes those execution details introduced for the PM M purposes.
2.2 Data Collection
In program m onitoring and m easuring, the d ata collected is usually th e tim e and
space usages of program execution. D ata collection is very im portant in PMM
19
because d ata collection collects d ata upon which inform ation needed by program ­
mers is based and because the range of data th at can be collected determ ines what
questions can be answered.
D ata collection can be done by either hardw are or software. Systems using special
hardw are to collect d ata can be found in [Ben88, HW90, CLW90]. Using special
hardw are to collect d a ta of program execution has the advantage th a t d ata collection
does not affect th e perform ance of program execution. The disadvantages relative
to software d ata collection are th a t it requires special hardw are, the d ata collected
is very massive at a very low level, it is very inflexible, and the collected d ata is very
expensive to analyze. There are m any systems th a t collect d ata of program execution
via software[Knu71, CC76, Sno82, G+83, C+83, Pla84, Sno88]. Some systems do
d ata collection by altering the compiler of the program m ing language in which the
m onitored program is written[CC76, G+83, C+83]. O ther systems directly alter the
m onitored source program s to collect d ata about them[Knu71, Sno82, Pla84, Sno88].
W ith either hardw are d ata collecting or software d ata collecting there is an issue as
to how m uch d ata to collect. There are various ways of determ ining how much data
to collect. M ethods range from collecting all d ata th a t can be collected[CC76, G+83]
to collecting only the specified data[Sno82, Pla84, Sno88, LL89].
Systems also differ in what d ata to collect. Some systems[Bal69, Sam89, HW90]
record execution traces. O ther systems[Knu71, G+83, C+83] collect less data. Pro­
file based systems collect only the tim e spent on program com ponents an d /o r the
num ber of tim es these com ponents were executed during program execution. Most
of those systems do not have th e flexibility of controlling w hat to m onitor. Usually
only a few predefined forms of m easurem ent are supported (e.g., how m any times
each procedure is called and the tim e it consumes). In data-based system s, because
it is hard or impossible for program m ers to com m unicate w ith m onitoring and mea­
suring system s to tell w hat they want, the systems usually collect too much data
and m ost of th a t d ata is not relevant to w hat the program m ers w ant. Being told
w hat to collect, m odel-based systems collect all prim itive d a ta upon which other
d a ta are based. W hen processing PMM requirem ents of a program ’s execution,
these system s depend on processing techniques to filter out irrelevant d ata after it
is collected.
20
How to tell th e system w hat to m onitor at the prim itive level has a significantj
effect on how d ata are collected[PN81]. Various m ethods are used.
• Labeling: Program m ers are required to label the statem ents th a t they want to
m onitor. The system will collect execution d ata of those labeled statem ents
[LL89].
• Position: Program m ers specify th e statem ents th at they want to m onitor by
their positions in the source program[BH83]. For instance, m ain> Q > P is used
to tell the system w hat they want to know is the execution of statem ent P
which is inside the Q statem ent of the function main.
• Index: Some systems give each line of the source program a line num ber and
use it to locate statem ents of interest[Sno82, GYK90].
• Category: Some systems categorize the syntactic constructs into some cate­
gories and use the names of the constructs to nam e th e statem ents of interest.
For exam ple, if P is an assignm ent statem ent inside function main then it can
be referred as the assignm ent statem ent inside function main. If there are
several assignm ent statem ents inside main then the tex tu al order of the state­
m ents is used to specify required statem ent. Some m ethods m entioned above
could be used together w ith this method[PN81].
Requiring program m ers to figure out w hat d ata to collect and where th a t d ata
can be collected is a burden for them . T h at is especially bad when very high
level languages are used because determ ining what to collect and where to collect
m ight require knowledge th a t is either not available to ordinary program m ers or
very difficult to m aster. W hen very high level languages are used some execution
activities may not correlate to statem ents. Hence, only using source level statem ents
to describe m onitoring and m easuring requirem ents is not good enough. New term s
m ust be introduced to describe the activities of program execution th at are specific
to the com putation model used by th e program m ing languages b u t not correlated
to the executions of statem ents.
21
2.3 Data Processing
D ata processing transform s the low level d ata collected into derived forms to satisfy
program m ers’ PM M requirem ents.
In data-based system s, because the systems do not let you describe higher level
questions, there is little or no d ata processing explicitly supported. For those th a t
collect all prim itive d ata available, program m ers usually do the d ata processing
either by reading th e d ata or by w riting program s to do it. Cohen[CC76] provided
a procedure based language for program m ers to ask questions to a database which
contains d ata about program execution. Samadi[Sam89] described a system th a t
uses an expert system to interpret the collected data.
In m odel-based system s, relational query evaluation and optim ization m ethods
are used to derive high level d ata from prim itives. Snodgrass[Sno82] uses relational
algebra to com pute the required high level d ata from the collected data. His system
first represents the collected prim itive d ata in relations. It then com putes answers
to program m er questions, which are represented as TQ uel queries, using a database
query processor th a t is based on relational algebra. Using relational algebra to derive
required high level d ata from the collected d ata enables the PM M system to use the
optim ization techniques used in database query evaluation. Because low level d ata
is collected increm entally, Snodgrass realized th a t it is not very efficient to collect
all low level d ata first and then process th a t data. He introduced an increm ental
derived d ata com putation algorithm th at com putes derived d a ta increm entally. In
his algorithm , increm ental com putation is achieved by generating low level d ata
and sending them to an event processor. The event processor first builds a m atch
network th a t defines how derived forms can be com puted from the low level d ata and
then dynam ically interprets the m atch network on the low level data. In the EBBA
based framework of Bates[Bat88] and in the path expression based frameworks of
Bruegge[Bru85] and Hseush[HK88], the com putation of high level d ata from low
level d a ta is done using finite state autom ata m ethods plus p attern m atch. In those
frameworks, low level d ata are first collected and then sent to an event recognizer
to be processed. The event recognizer first uses the p attern of the low level d a ta to
filter out those d a ta th a t did not m atch the required p atterns. It then uses finite
state au to m ata m ethods to com pute the derived data.
22
Using very high level program m ing language makes PM M d ata processing m ore
difficult because of the large gap between very high level program m ing languages
and the underlying com puting environment upon which program s w ritten in th e
languages are executing. Because th e PM M d ata collected directly is usually at low
levels th a t reflect the im plem entation of the languages, th e large gap requires m uch
more com plicated com putation to com pute the required high level forms from the
low level data.
2.4 Related Research
O ur PM M system is a m odel-based software m onitoring system . It lets program ­
mers specify w hat they want to m onitor and m easure before program execution in a
specification language. It then merges the PM M specification w ith the source pro­
gram m onitored using program transform ation techniques. In doing so it uses static
analysis to figure out where in the m onitored source program d ata needed to sat­
isfy PM M requirem ents can be collected. It then transform s the requirem ents into
instrum entation code so th at d ata can be collected and processed into the derived
d ata of interest to the program m ers. In the following subsections, we briefly discuss
program transform ation, static analysis, and increm ental derived d ata com putation.
2.4.1 P ro g ra m T ran sform ation
Program transform ation[BD77] is a means to formally develop efficient program s
from specifications. Program transform ations are widely used in autom atic program ­
m ing [Bal85, Bar85, Bal86, Ric86, RW88, Smi90] and program synthesis [MW80,
Qia90]. A transform ation system accepts a source program and advice on how to do
th e transform ation (e.g., choices of data representations), th e transform ation system
generates an efficient program based on a set of correctness-preserving transform a­
tion rules under the guidance of the advice. Surveys and studies com paring different
existing transform ation system s can be found in [PS83, Fea86].
Evolution Transform ations[JF90] are transform ations whose purpose is to elab­
orate and change specifications in specific ways. They are used to support program
evolution. Unlike traditional program transform ations, evolution transform ations
23
are not correctness-preserving. Instead, they add new sem antics into the m odi­
fied specifications on purpose. An example is adding a new param eter to a function
which changes th e semantics of the program. In program developm ent such a change
m ight be exactly w hat one wants.
O ur system supports a set of dom ain specific transform ations th a t preserve the
functionality of the source program , while like evolution transform ations it generates
instrum entation code th a t collects and com putes d ata for the PM M specification.
It uses knowledge of th e source program , knowledge of the PM M specification, and
knowledge of the relationship between the PM M specification and the source pro­
gram to optim ize instrum entation code needed for com puting answers to the PMM
specification. Unlike other transform ation system s, it generates different instrum en­
tatio n for the same PM M specification for different source program s.
2 .4 .2 S ta tic A n a ly sis o f P rogram s
Static analysis is the analysis of source programs to find syntactic an d /o r semantic
relationships of program com ponents, for instance, w hat functions call other func­
tions or where variables are bound, set, or referenced. S tatic analysis has been used
in m any applications [PC90, 0 0 9 0 , Xer83, RD90, Nar89], for instance, global op­
tim ization by compilers [AS U86, FOW 87, RP88, RD90, Wol90], program m ing error
checking [0 0 9 0 , CMB91], in program m aintenance [PC90], and in program evolu­
tion support [Nar89].
In our PM M system , static analysis is used to select relevant instrum entation
sites of a PM M specification. S tatic analysis is used differently from how it is used
in other applications in th at th e relationship between program execution activities
and th e relationship betw een program execution activities and syntactic constructs
in a source program are used to find instrum entation sites of the specification.
Relationships betw een program execution activities, like one function execution calls
another function, are used as the filtering conditions for selecting instrum entation
sites so as to elim inate those sites th at can be proven irrelevant at compile time. Both
th e types of program execution activities and the instances of program execution
activity could be used to describe th e sequence of instances so th a t the performance
of a program ’s execution can be specified more accurately th an using the type of
24
program execution activities alone as in other work, e.g., Cecil[OO90]. This is
because we focus on dynam ic properties of com puter program while others like
Cecil focus on static properties.
2 .4 .3 In c r e m e n ta l G en era tio n o f D er iv ed D a ta
Increm ental generation of derived d ata is very like database query evaluation by
which derived d ata is generated from stored d ata [Kin81a, Kin81b, CGM90, U1189,
JK84, GMN84, Coh86]. However, unlike database query evaluation, the d ata upon!
. '
which the derived d a ta is based is collected increm entally and is not available all
at once in program m onitoring and measuring. Cohen[Coh89a] studied increm ental
generation of derived d ata in compiling database transition triggers. In his sys-J
tern, triggers are com piled into m atch network so th a t efficiency can be achieved
by sharing partial results. In studying integrity constraints of database systems]
increm ental generation is used to com pute derived d a ta th a t violate database con-J
straints [Nic82, QW 86, SJGP90]. Increm ental generation of derived data is alsoj
studied by Forgy[For79] in efficiently im plem enting production system s. In Cohen’s
work[Coh89a] a language is used to specify conditions becom ing tru e in databases.
T he language is a tem poral extension to the language of first order logic which
enables references to both th e state before and after a database transition. The
required derived d a ta is specified as triggers using th e language. Triggers are com­
piled into a network of m atchnodes, each of which has an associated description
and a program and is connected to other nodes, its predecessors and successors.
D ata satisfying a description is com puted at the node w ith which the description
is associated and is used as the inputs to the successors of th e node. T he output
of th e node w ithout successors is the derived d ata needed. This system assumes
th a t there is no relationship between the d ata upon which th e derived d ata is based
other th an the relationships specified in defining the derived data. In our PMM
system , derived d a ta is usually based on m any prim itive d ata and there are some
relationships among the prim itive d ata th a t could be used to m ake the com putation
m ore efficient (e.g., th e tem poral relationships).
25
2.5 AP5 Computation Model — A PMM View
An A P5 program uses relations to represent the state of the d ata m anipulated by a
program . Program execution changes these relations and thus moves from one dat J
state to another. Furtherm ore, these transitions are atom ic. W ithin a state, dat J
is accessed via associative retrieval. The consistency of the states is autom atically
m aintained relative to a set of user defined consistency rules. W henever a transition
is proposed, these rules’ consistency conditions are checked. If there is no violation
of th e conditions, the transition is made to the proposed state. If there are vio­
lations, th e rules m ay propose additional updates to restore their conditions. The
augm ented transition is then attem pted just like the original. This in tu rn may lead
to m ore violations and repairs. Only if a consistent transition can be found, is the
d atabase updated. O therw ise the transition is aborted (i.e., no change is m ade to
the database) and an exception is raised.
The success of an atom ic transition may also trigger user defined autom ation
rules which fire on every successful transition th at satisfies th eir predicates. These
predicates are expressed in a tw o-state relational logic enabling reference to the
before and after states of a transition. The body of autom ation rules is AP5 program
whose execution may cause further state changes to occur. M ore inform ation about
A P5 can be found in [Coh88].
O ur program m onitoring and measuring system uses A P5 both as the source
program m ing language and as its im plem entation language (to simplify the recording
and access of instrum entation d ata and the com putation of answers to the PMM
questions from the prim itive d ata collected).
26
Chapter 3
A Program Monitoring and Measuring
Specification Language
High level program m onitoring and m easuring requires an explicit specification of
th e program execution activities to m onitor and a set of questions about those ac­
tivities. T he purpose of the explicit specification (including th e PM M questions)
is to describe the activities of interest, i.e., to tell th e PM M system w hat to mon­
itor and m easure. By doing so the PMM system can focus only on the execution
activities specified and avoid collecting d ata th at is irrelevant to th e questions. Ex­
plicitly specifying program execution activities to m onitor and the PM M questions
about them requires a language. Although a num ber of fundam ental principles con­
cerning good principles of specification languages[BG79] are proposed and a few
specification languages have been used successfully[SSS81, Sys86], there is currently
no specification language th a t is designed specifically for program m onitoring and
m easuring. This chapter presents such a language designed for specifying program
execution activities to m onitor and questions about those activities.
The rest of the chapter is about a PMM specification language for the program
m onitoring and m easuring of AP5 programs. First, we discuss the objectives of
th e PM M specification language. Second, we present a m odel of program execution
activities, called the PM M event model. Third, we introduce a vocabulary for
specifying execution activities th at are specific to the A P5 com putation model.
Finally, we define the syntax and semantics of this specification language and the
properties of valid PM M specifications.
27
3.1 PMM Specification Language Objectives
The goals of th e PM M specification language are analogous to those of very high
level language’s [BCG83] - to allow a program m er to describe in the m ost n atu ra
term s w hat he wants. The specification language is an interface betw een a pro­
gram m er and an autom atic program m ing system. It plays both an external anc
internal role. T he external role is as a specification language to specify PM M re­
quirem ents. It enables a program m er to state w hat he w ants to know instead of how
it is to be determ ined. It also allows him to state his PM M requirem ents in term s
of his problem domain. It thus makes specifying PM M requirem ents easier. ToJ
satisfy these objectives requires the PMM language be high level, declarative, and
expressive. Being expressive requires the language be able to accurately describe the
perform ance of program execution. More specifically, the language m ust be able to,
identify the events of interest in the program execution and th e relevant portions of
th e control and d ata states needed to define those events including access to dynam ic
program execution inform ation, like the contents of the runtim e stack or the historyj
of d a ta state changes, which is not directly accessible in conventional program m ing
languages. These abilities enable the program m er to focus on th e portions of the
program execution which he is interested in. Finally, the language should support!
abstraction so th a t a program m er can build higher level term s from lower ones and
avoid irrelevant details.
T he internal role for the PM M specification language is to tell the system what
to m onitor and m easure. In particular, besides the specification requirem ents m en­
tioned above, the PM M specification language needs to satisfy internal practicality
and efficiency requirem ents. First, the transform ation from a PM M specification to
th e instrum entation used to collect d ata and process the d ata during th e program
execution m ust be doable autom atically. We call this the operational requirem ent.
Second, the instrum entation generated m ust be efficient in both execution tim e and
space. We call this the efficiency requirem ent.
There is a conflict between these two roles played by th e specification language.
On one hand, program m onitoring and m easuring needs to transform the PM M
specifications into a set of d ata to collect and m ethod of com puting the answers from
this data. Hence, the closer the level of the specification of execution activities is to
28
th e com putation environm ent the easier the m apping will be. On the other hand,!
specification languages attem p t to move the specifications closer to the problem!
dom ain and farther from the details of th e com puting environm ent. Thus, the
higher th e level of the specification language, the m ore com plex th e m apping th a t
th e autom atic program m ing system m ust perform.
T he PM M specification language is designed to incorporate th e two roles m en­
tioned above w ithin a coherent framework. The specification language provides
a vocabulary of prim itive program execution activities and their relationships to
m odel th e features of the AP5 com putation model. They enable a program m er to
specify program execution activities such as atomic execution and rule triggering as
well as relationships such as the atom ic execution th a t triggered some rule. High
level activities are defined in term s of prim itive activities and th eir relationships.
T he principles for composing these prim itives into high level abstraction in ways
th a t facilitate m onitoring and m easuring are:
• Prim itives in the activity model correspond to the execution of some specific
syntactic structures.
• Behavior com position allows simplification and abstraction.
T he expressivity and efficiency requirem ents indicate several desirable properties
of th e specification language: activities should be represented as a hierarchy where
activities are represented as nodes and the leaves correspond to th e observable prim i­
tive activities so th a t m apping between the inform ation collected during the program
execution and these prim itive activities is simple. O ther nodes in th e hierarchy are
defined in term s of the leaves and nodes th a t have already defined. Not only does
this simplify com puting high level m appings from lower level ones, represented by
th e leaves, but it also greatly simplifies observing the behavior of a complex program
from the details of its statem ents by specifying the intended activities at m ultiple
levels of abstraction. The nodes in the hierarchy are activities which can be com­
puted from th e nodes below them . This also facilitates behavior abstraction and
program m onitoring and m easuring since the system can use inform ation about the
hierarchy to filter out some irrelevant activities.1 Hence, a program m er can build
1 How that will facilitate program monitoring and measuring will be discussed in chapter 4.
29
up high level term s and specify PM M requirem ents w ith them . Simultaneously,
the autom atic program m ing system can use these term s to build up an efficient
im plem entation of m apping low level data of program execution into a level th a t is
com prehensible to th e program m er.
3.2 A PMM Specification Language
Intuitively in the specification language, the execution of a program is m odeled as a
sequence of states. T he transitions from one state to th e next are called events. Each
event is of some type and occurs at a different tim e. For m onitoring purposes we
in stan tiate this model w ith a very fine grain size of states. M ost of these execution
events are related to syntactic constructs of the program m ing language th a t appear
in the program being m onitored, i.e., they’re p art of some statem ent. However they
generally correspond much m ore strongly to an activity model th at is not p art of the
program m ing language semantics, but is nevertheless specific to th e program m ing
language.
3.2.1 P ro g ra m M o n ito rin g and M ea su rin g E v en t M o d el
O ur specification language is based on an extended E ntity-R elationship d ata model
[Che76], in which program execution activities (called events) are modeled by en­
tities, and relationships among them are m odeled as relations (called control rela­
tions). T he prim ary com ponents of our PMM language are th e explicit definition
of events, attrib u tes of and relationships among events, constructors for building
derived events, and perform ance questions about those events.
To describe the m odel in more detail we will proceed in four steps. We begin by
describing event types th a t model the classes of the program execution activities.
Second, we describe control relations th at model th e relationships am ong events.
T hird, we describe derived relations and event types. These are built from event
types and control relations. Finally, we describe PM M specifications.
30
E v en ts and E v en t T y p es
Events are used to m odel activities in AP5 program execution. Each event has a
type. Inform ation useful to describe the perform ance aspects of th e activities is
abstracted into a set of attrib u tes defined for each event type. In this m odel, we
distinguish event types and value types. Value types are types whose m eaning can
be universally understood while the meaning of event types can only be understood
by their relationships to other event types or value types. Event types are organized
as a hierarchy by using specialization. For exam ple, Integer, List are of value type
and function-execution is an event type. There are two kinds of prim itive event
types, point event type and interval event type.
A point event type models a set of points on th e execution trace of a program .
For exam ple, during program execution entering or leaving a function is a point
event. Each m em ber of the point event type is called a point event.
An interval event type models a set of pairs of point events on th e execution p ath
of a program . Each m em ber of the interval event type is called an interval event.
An interval event takes place over a consecutive segment of tim e. For example, a
function execution is an interval event.
R e la tio n sh ip A m o n g E v en ts
Events are introduced to model program execution activities. Control relations are
introduced to m odel the relationships among events. In th e PM M specification
language we distinguish three kinds of relationships: first, temporal relations model
the tem poral relationships between two events, for instance, th e execution order
of two atom ic execution in AP5 programs; second, attributes m odel events and
their execution environm ent, for exam ple, the duration of an atom ic execution; and
third, Static Relations are used to define conditions on values, for instance, the
value passed to the first param eter of a function execution is not equal to zero. A
relation can be stored or derived. A stored relation consists of relationships th a t
are explicitly asserted. A derived relation consists of relationships th a t are derived
from stored relations or other derived relations. In the following sections we present a
general tem poral relation among events and define some com putation m odel specific
relationships am ong events for the AP5 com putation model. We assum e th a t the
31
AP5 program s to be m onitored are running on a SISD m achine w ith a centralizec
clock.
T he execution order of events is a tem poral relationship in program execution.
In th e PM M language, there is a tim e attrib u te defined on each of the point event
types. T he value of the tim e attrib u te is the value of the clock when th e event
occurs. Tem poral relationships of events are defined on the values of their tim e
attrib u tes. Because there is a natural order among the values of th e tim e attribute]
the value of the tim e a ttrib u te is used to describe th e execution order of point!
events. Hence, for two different points P i and P 2 th e following relation is true: (ORj
(Before Pi P 2) (Before P2 P i)), where (Before Pi P 2) m eans th a t Pi was executed
earlier th an P 2 was. T he execution order among interval events is defined in term s
of the point events associated w ith them . For exam ple, for interval events we call
the starting point event of it begin-point and the finishing point event end-pointJ
Correspondingly, we call the tim e attributes of the two point events begin-time andl
end-time. Suppose X and Y are interval events then (Before X Y) is defined to be
true if the finishing point of X is earlier than the starting point of Y. More precisely]
(Before x y) = ((x y) (3 (tx ty) (AND (end-time x tx) (begin-time y ty) (< tx ty))))
D eriv ed E v en t T y p e s and D eriv ed R ela tio n s
The control relations and event types in the specification language allow a complex
program activity to be described, but it would be m ore convenient to m onitor andl
m easure a large program if program m ers can define their own term s. It would also
be m ore convenient if they can define high level concepts in term s of low level ones.
Derived event types enable them to do that.
Intuitively, higher level activities are modeled by clustering and filtering. Clus­
tering aggregates low level events and previously defined high-level events into higherj
level events. Such higher level events specify a collection of events and describe howj
they relate to each other. Filtering serves to elim inate from consideration events
th a t are not relevant to an activity model being investigated. Filtering is effected
by specifying required relationships among cluster m em bers. T he relationships of
32
events form ing an activity m odel are expressed in term s of tem poral constraints des­
ignating acceptable orderings, and relational constraints am ong attrib u tes defined
by different events.
A derived relation is defined on event types and control relations. It enables
a program m er to build up application specific term s (derived events). A derived
relation has two parts:
- Nam e, specifying the nam e of the defined relation.
- A specification of the events or relations the defined relation is based on and
th e relationships they m ust satisfy. It is of the form
{ (® 1 » j • ••> ® n ) | *^2; • • • j *®n)}
where F ( x \ ,x 2, ..., x n) is a first order predicate form ula w ith event types and
control relations used as predicates.
A derived event type is a unary derived relation.
3 .2 .2 P red efin ed E v en t T y p e s and C o n tro l R ela tio n s
Prim itive events (prim itives) are the basic building blocks for describing the execu­
tion activities of a program . It is therefore crucial in designing a PM M specification
language to choose an appropriate level of detail for these prim itive building blocks.
This choice makes three fundam ental com m itm ents. F irst, it m akes some activities
unobservable, e.g., in the A P5 language, if we choose atom ic execution as a prim ­
itive activity then the interm ediate state of the database is unobservable. Second,
it makes some activities describable only as part of com posite activities, e.g., again
in th e AP5 language, if we choose rule triggering as a prim itive activity then it can
only be p art of an atom ic execution. Third, the lower th e level of prim itive activities
the m ore work is needed in defining execution activities in term s of these prim itives.
Thus there is a tradeoff to be m ade between the level of details the prim itives de­
scribe and the efficiency of reasoning w ith it: more detail m akes observing more
accurate b u t results in m ore work.
We use the following principles to choose the prim itive event types for the PMM
specification language. T he m ain purpose of m onitoring and m easuring is to un­
derstand the perform ance tradeoffs of program execution. Therefore, the execution
33
d a ta collected should help a program m er focus on specific portions of his program.
T he guideline for choosing a set of prim itives for a com putation m odel is th a t there
is a very close correspondence between those prim itives and source program m ing
language constructs so th a t d a ta about those prim itives can be easily associated
w ith th e p arts of the source program w ritten in the language.2 Because a program ­
m ing language has only a small set of prim itive constructs and a source program
is constructed from them , choosing those constructs as prim itives m akes it easy to
associate instrum entation d ata w ith those building blocks.
This guideline can be realized by the following steps:
1. Identify a set of im portant language constructs in the source language. For the
A P5 language, this set includes function, relation, rule, and atom ic statem ent.
2. If a construct identified is a d ata structure used in program execution, all
of th e operations on it are also selected as prim itives. For exam ple, opera­
tions on relations in AP5, such as, relation insertion and relation deletion are
prim itives.
3. If a construct identified is a piece of executable program , th e execution of the
construct is selected as a prim itive. For exam ple, the execution of a function
is a prim itive.
4. M ake tem poral relationships among the selected prim itives and the idiom atic
usages of them predefined control relations. For exam ple, th e relation before
is used to specify one event occurs before another.
T he input and output param eters of prim itives are chosen as attrib u tes of th at
event type. In addition, we also define the tim e a ttrib u te for each point event type
which is the value of the clock when the event occurs. Finally, th e d a ta state of the
m onitored program is defined as an attrib u te for prim itive events. For example, in
A P5 program s, the tru th or falsity of predicates on their d a ta state can be used as
2This will also help the autom atic programmer for the PMM language to select the places in
the source program to insert instrumentation in Chapter 4.
34
constraints on an event. This enables program m ers to use the d ata state of their
program to describe m onitoring activities.3
As exam ples of applying these principles, we describe how some of th e predefined
prim itives are selected next. See A ppendix A for a com plete list of AP5 prim itives
and th eir attrib u tes.
R e la tio n
In th e A P5 language, Relations4 are used to represent d ata and the relationships
am ong them . There are two kinds of Relations: stored relation and defined rela­
tion. Stored Relations are prim itive and Defined Relations are defined in term s of
Stored Relations or other Defined Relations. T he operations defined on Relations
are testing, inserting, deleting, generating, and triggering. There is a set of pa­
ram eters associated w ith each of these operations. For exam ple, the inserting and
deleting operations require a Relation nam e as one of their param eters. Based on
th e guidelines stated above we have the following prim itives and attrib u tes. The
prim itive event types defined on relations are: relation-insertion, relation-deletion,
relation-test, relation-generation, and relation-triggering. The attrib u tes defined for
relation-insertion and relation-deletion are the nam e of a relation and a tuple of val­
ues to be inserted or deleted. In addition to th e attrib u tes m entioned above, there
is an a ttrib u te Test-result for relation-test. Because there are m ore param eters used
in relation generation, there are a few additional attrib u tes for relation-generation.
They are generation-pattern whose value is the generation p attern used in generating
th e relation and generated-tuples whose value is th e set of tuples generated for the
relation. More detailed description of the attrib u tes and prim itive types defined on
Relations can be found in the appendixes.
Given the term s defined above we can describe program execution activities
about Relations. For exam ple, suppose we have a ternary relation work-on-% w ith
th e first param eter of type person, the second param eter of type project, and the
3It also enables them to access application specific terms defined in their program. For example,
if both p e r s o n and m a n a g e r are types in a program then it is possible to check if a p e r s o n is a
m a n a g e r .
4The relation used here is a data representation o f the APS language. It is different from the
relations used in the PMM specification language.
35
third one of type integer, (work-on-% ’John 'FSD 50) means John works on the
FSD project 50% of the tim e. T he prim itives defined on Relation can be used to
describe activities like testing the relation Work-on-% w ith the last param eter equal
50, or generating relation Work-on-% w ith generation-pattern (input input output)
-as follows:
{X | (relation-test X ’Work-on-'/,) A
(parameter X 3 50)}
{Y j (relation-generation Y ’Work-on-*/,) A
(Generation-pattern Y ’(input input output))}
jwhere parameter and generation-pattern are attrib u tes of event type relation-test and
relation-generation respectively.
A to m ic S ta te m e n t
In the AP5 language, an atom ic execution moves program execution from one data
sta te to another. A lthough in the AP5 language sem antics an atom ic execution
appears to be atom ic, it is really m ade up of m any steps and perform s m any internal
tasks. An atom ic execution makes changes to th e AP5 database, checks consistency
of th e database, and triggers demonic actions. T he outcom e of an atom ic
depends on the set of updates proposed, consistency conditions specified
|by th e consistency rules defined in the program and th e demonic conditions specified
jby the autom ation rules. Syntactically there is an atom ic statem ent whose execution
is an atom ic execution. Because atom ic execution corresponds to th e execution
of atom ic statem ent we choose it as a prim itive, called atomic-execution and its
^attributes are: (1). Data-gathering-time, whose value is the tim e spent in com puting
proposed updates of the atom ic execution; (2). Proposed-updates, whose value is the
proposed updates for relations; (3). Updates-done, whose value is the updates done
by th e atom ic execution; (4). C-rules-triggered, whose value is the list of consistency
rules triggered; (5). A-rules-triggered, whose value is the list of autom ation rules
triggered; (6), Rules-triggered, whose value is th e list of rules triggered.
These prim itives can be combined to describe high level activities such as an
atom ic execution th a t triggers the rule one-person-with-two-offices and updates the
relation office-of-person as follows:
condition
execution
36
{X j (atomic-execution X) A
(Rules-triggered X ’(ONE-PERSON-WITH-TWO-OFFICES)) A
(Updated-done X ’(OFFICE-OF-PERSON))}
where both Rules-triggered and Updated-done are attrib u tes defined for event type
atomic-execution.
R u le
In the A P5 language, rules are used to m aintain the consistency of the AP5 database
and to support dem onic actions. Rules enable an AP5 program to react to state
s of th e A P5 database. Rule execution occurs inside an atom ic execution. It
i
has two steps, first th e AP5 system checks rules against th e changes in the database
to see if there is any rule triggered (violations). If there are some rules triggered
jthe action p art of the rules is executed using th e values th a t triggered the rules as
param eters. T he execution of the action p art of the rules m ay suggest some repairs
I
for the violations of th e database or introduce further changes to the database. Using
th e principles m entioned above, the prim itives chosen are: (1). Rule-execution,
jwhich represents th e rule triggering and rule action executing of a rule. (2). Rule-
triggering, which represents the rule triggering part of th e execution, (3). Rule-
body-execution, which m eans the execution of the action p art of a rule. Some of
;heir defined a ttrib u tes are: Values-triggered, whose value is th e list of values th a t
triggered the rule, and Proposed-updates, whose value is the list of updates proposed
3y the rule.
For exam ple, activities like triggering a rule one-person-with-two-offices th a t pro­
posed some updates to relation office-of-person can be expressed using the prim itives
defined on rule as follows:
{X | (Rule-triggering X ’ONE-PERSON-WITH-TWO-OFFICES) A
(Proposed-updates X ’(OFFICE-OF-PERSON))}
where Proposed-updates is a defined attrib u te for event type Rule-triggering.
P r e d e fin e d C o n tro l R e la tio n s o f A P 5 P r im itiv e s
T here are some common control relationships in the AP5 com putation model, such
as one function execution calling another. It is both convenient and efficient to make
change
37
th em p a rt of the specification language. Suppose X and Y are function-execution
events then (Calls X Y) means th a t Y is an execution of some function which is
(directly called during th e execution of X. Triggers is another exam ple of a commonly
used control relation in th e APS com putation model. Assum ing th a t X is an atomic-
execution and Y is a rule-triggering then (triggers X Y) m eans the execution of X
^triggered Y. Introducing those idiom atic description of the relationship among events
.not only m akes it easy to describe program activities to m onitor but also makes the
im plem entation m ore efficient.5
A g g reg a te O p era to rs
In specifying program m onitoring and m easuring requirem ents, sometim es a pro­
gram m er needs operators th a t are defined on a set of events. Aggregate operators
such as count, average, sum, mix, and max are introduced to make specifying the
aggregate inform ation of program execution activities easy. Count is defined on any
event type. It counts the num ber of tim es events of the type occurred. Average,
'sum, mix, and max are defined on num erical attrib u tes of event types. They com­
pute th e average value, the sum, the m inim um value, and th e m axim um value of
a ttrib u te values respectively.
3 .2 .3 P M M S p ecifica tio n s
A PM M specification specifies a program m er’s m onitoring and m easuring require­
m ents about th e execution of his program . It has two parts. T he first p art defines
high level events. The second p art specifies w hat d ata to record for those events.
For exam ple, if th e activity of interest is the execution of a function then the data
to record m ight be the values passed to it or the tim e spent in executing it. We call
jthe second p art of a PM M specification the PM M questions. They can be defined
on the prim itive events and control relations or on th e high level events defined in
jthe first part. PM M questions are m odeled as queries over these prim itive events,
control relations, and higher level events.
5Making the idiom atic usages part of the PMM specification language enables the autom atic
programming system to build up special implementation for them , and thus, makes the implemen­
tation more efficient. T his is discussed more in Chapter 4.
38
o
Thus, a program m er can not only w rite a PM M specification in term s of the
predefined vocabulary but can also define his own term s using abstractions anc
aggregations.
B eh a v io r A b str a c tio n
Derived event types can support behavior abstraction. There are two dimensions^
of abstraction supported. F irst, the level of detail at which portions of an event
can be viewed, e.g., sometimes we only pay attention to w hether an event occurredj
while other tim es we also pay attention to its attributes. Second, th e derived event
m echanism enables a program m er to define derived inform ation in term s of more
prim itive events. A bstraction perm its details about th e a ttrib u tes of derived events
to be ignored. For exam ple, if we are only interested in w hether or not relation F 0 0
is deleted and we are not interested its attrib u tes we can define a new event type,
say delete-FOO, as follows,
{X I (relation-deletion X 'FQO)}
Once the new term is defined, it can be used in the specification th e same way as
th e predefined prim itives.
B e h a v io r A g g reg a tio n
A ggregation clusters low level activities together to constitute high level activities.
Aggregation is realized by treating several different events as com ponents of a derived
event. For exam ple, suppose we are only interested in w hether or not relation work-
on-% is updated and do not care w hether th a t update is an insertion or deletion,
we can define a new event type, say update-work-on-%, as follows,
{X | (relation-insertion X ’work-on-*/,) V
(relation-deletion X ’work-on-X)}
Again, this new term can be used to define other term s th e sam e way as those
predefined prim itives.
39
3.3 Syntax and Semantics of the Language
T he PM M language extends the AP5 relation definition and query language with
predefined prim itive event types, predefined control relations, and derived event
type and m onitoring question definitions.
3.3.1 S y n ta x o f th e P M M S p e cifica tio n L an gu age
As described in previous sections, a PM M specification has two parts: a PMM
event m odel and a question specification. A PM M event m odel consists of a set of
derived relation or derived event type definitions. A derived relation definition has
the following syntax:
(defevent Name :definition (V s.t. WFF))
where Visa list of variables and W FF is a relational calculus like form ula in which
variables in V are free.
A question definition has the following form:
(defquestion Question-Name :definition (U s.t. WFF))
where (J is th e desired set of d ata and W FF is a relational calculus form ula th at
relate these d ata to the defined events.
See A ppendix A for th e com plete syntax of th e PMM specification language.
3 .3 .2 S em a n tics o f th e P M M S p ecifica tio n L an gu age
This section has two parts: first, an operational semantics of the PM M specification
language; second, some constraints on th e specifications w ritten in th e language to
ensure th e specified activities are finite.
T h e R e la tio n D efin ed by an E v en t T y p e D efin itio n
An event ty p e definition defines a set of values denoted by
{(x 1, x 2,...,x n) | F ( x u x 2, ...,£„)}
40
i.e. it defines th e set of values th a t satisfy the form ula F. More specifically, supposej
th a t p i,...,p n is the list of all predicates6 used in the event definition E D , and
suppose P i,...,P n are relations, where P,- consists of all those tuples (g i,. ..,a,k) such
th a t p { a \,...,ak) is known to be true. An event E of event definition E D is made
tru e by this substitution if the following hold:
1. If E is an ordinary event type or control relation, th e E becomes p(&i,
under this substitution, and (61 ;..., bk) is a tuple in the relation P correspond
ing to p.
2. If E is a static relation, then under this substitution E becomes (0 b c)7 anc
th e relation (9 b c) is true.
S y n ta c tic co n stra in ts for th e P M M sp ecifica tio n la n g u a g e
In the PM M specification language, if we consider all prim itive event types and
control relations as stored relations and derived event types as derived relations we
need some constraints to put on the way a derived relation or event type is defined
so th a t th e derived events of the type can be obtained by applying some operations
on th e stored relations. One property we would like to have is finiteness, i.e., given
a set of finite stored relations th e derived relations defined on them should be finite
as well.
One sim ple approach[U1188] to avoiding event definitions th a t create infinite re­
lations from finite ones is to insist th a t each definition used in a PM M specification
satisfy a set of rules. Satisfying the set of rules ensures th a t the events defined by
the definition are finite. The set of rules is as follows:
1. T here are no uses of the universal quantifier V.
2. W henever an OR operator is used, the two formulas connected, say P i V F 2 .
have th e same set of free variables; i.e., they are of th e form
F1{X 1,...i X n) V F2( X u ...,X n)
6Both event types and control relations can be considered as predicates for the specification1
language in the sense that they can be used to tell if an instance is of the specified type or a part|
of the Specified relation.
7 6 is used to represent a s t a t i c r e la tio n like EQ.
41
3. Consider any m axim al subform ula consisting of the conjunction of one or m ore
form ulas Fi A ... A Fm- Then all variables appearing free in any of these F fs
m ust be lim ited in the following sense.
(a) A variable is lim ited if it is free in some Fi, where Ft is not an arithm etic
com parison and is not negated.
(b) Any variable X th a t appears in a form ula (9 X a) or (9 a X ) , where a is
a constant, is lim ited.
(c) Variable X is lim ited if it appears in an event (0 X Y ) or (9 Y X ) , where
Y is a variable already known to be lim ited.
4. A negation operator m ay only apply to a term in a conjunction of th e type
discussed in (3).
We require every definition used in th e PM M specification to be safe in this way.
3 .3 .3 P r o p e r tie s o f V alid P M M S p ecifica tio n s
There are several restrictions on constructing a derived event type from some other
event types:
IC1 T here is no recursive definition.
IC2 There is no tem poral dependency cycles. Suppose th a t P i, P 2 , ..., Pn are point
events if Px -< P2 -< ... ■ < Pn, where Pi -< P2 m eans Pi occurred earlier th an
P2 did, th en i f 1 j = > Pi f Pj- Because of th e linear tim e m odel we used it
does not m ake sense for an event to occur both earlier and later th an another
event.
IC3 Every definition in a PM M specification is safe.
IC4 A ttrib u tes and events which the attrib u tes based upon are com patible, i.e.,
only those attrib u tes defined by an event type can be accessed.
T he first and th ird restrictions are adopted because of the operational require­
m ent and th e efficiency requirement[HK87]. T he second and fourth restrictions are
adopted because they prevent PM M specifications from denoting em pty behaviors,
i.e., no event will ever satisfy them .
42.
3.4 Summary
T he PM M specification language introduced for high level program m onitoring and
m easuring is a specification language tailored for program m onitoring and m easur­
ing. T he fram ework of events, control relations, and abstractions is designed to keep
the syntax and sem antics of this specification language very simple. T here are three
key reasons th a t it can be so simple and still allow the specification of m onitoring
and m easuring complex program behaviors. F irst, there is a rich vocabulary of prim ­
itive program behaviors to describe AP5 program behaviors and their relationships.
Second, there are principles by which complex behavior specification can be built
from th e specifications of simple behaviors and their relationships. T hird, th ere is
ja very im portant relationship between the vocabulary used to describe program be­
haviors and th e syntactic structures used to w rite source program s which facilitate
urogram m onitoring and m easuring.
43
Chapter 4
An Automatic Programming System for PMM
Specification
This chapter presents an autom atic program m ing system (called Sm artM onitor)
th a t accepts a source program and a valid specification of program m onitoring
and m easuring requirem ents, w ritten in the PM M specification language defined
in C hapter 3, and autom atically generates the required instrum entation to satisfy
the specification and merges this instrum entation into th e source program . This
generated instrum entation collects d ata during program execution and com putes
the specified m onitoring results w ithout changing the functionality of the original
source program .
T he goal of instrum entation generation is not m erely generating instrum enta­
tion to record d ata so th a t th e specified results can be com puted, but to do so
efficiently by recording only relevant data. Because th e instrum entation code is
generated at compile tim e, while th e determ ination of w hether a particular piece
of d ata is relevant and should be recorded needs in general run tim e inform ation,
th a t determ ination m ust act as a run tim e filter to restrict the recording to d ata
th a t is tru ly relevant. Recording irrelevant d ata takes not only space b u t also time.
M oreover, because subsequent analysis m ust process all the recorded data, th a t is
an additional com putation cost as well. Hence, there is a twofold reason to record
as little irrelevant d ata as possible.
44
(defun BOTH (C)
(when (> C 0)
(F00 C))
(BAR C))
(defun IR (D)
(BAR D))
Figure 4.1: A Piece of Source Program
4.1 Aspects of Instrumentation Generation
Because Sm artM onitor needs to generate instrum entation code and m erge it w ith the
source program before program execution, it has to answer th e following questions.
F irst, w hat d a ta is relevant to the events specified in the specification? Second, how
to m ap th a t collected d ata into answers for the queries? Third, w here in the source
program should instrum entation be inserted? Finally, how can th e dynam ic aspects
of testing conditions and collecting d ata be managed?
Sm artM onitor has three components:
• Event Schema Generator, which determ ines w hat instrum entation d ata to col­
lect and how th a t instrum entation d ata will be stored and later retrieved. Its
function is to answer the first two questions.
• Instrumentation Site Generator, which determ ines w here in th e source program
the generated instrum entation should go. Its function is to answer the third
question.
• Instrumentation Code Generator, which generates instrum entation code for each
place identified by the Instrumentation Site Generator and merges it into the
source program . Its function is to answer the fourth question.
T he following subsections explain how each of the three parts works and how they
work together. We have selected a single example which we will use throughout this
chapter.
Suppose the source program to m onitor is as in Figure 4.1 (only relevant parts
are shown). Suppose we have a PM M specification as in Figure 4.2. T he paraphrase
45
(defevent fns-call-FOO-and-BAR
:definition
((x) | (Exist (f F00 BAR p)
(AND (Function-execution F00 ’F00)
(Function-execution BAR ’BAR)
(Function-execution x f)
(Parameter x 1 p)
(Not (= p 0))
(Calls x F00)
(Calls x BAR)))))
(defquestion Monitor-fns-dura-params
:definition
((x y) I (Exist (z)
(and (fns-call-FOO-and-BAR z)
(duration z x)
(parameters z y))))
Figure 4.2: A PM M Specification
of the specification is as follows. The event type definition defines a set of events
to be m onitored. T he events, in this case, are function executions th a t directly call
both function FOO and function BAR, and whose first actual param eter is non­
zero. T he question definitions specify which d ata about these events to record, in
this case, th e duration of the defined events and the param eters of th e function calls.
4 .1.1 E v en t S ch em a G en era tio n
T he event schem a generator determ ines what d ata to collect. It creates a relational
schem a for storing instrum entation d ata collected during program execution so th at
answers to the questions defined in the specification can be com puted at th e end of
th e program ’s execution using query evaluation techniques [Coh89b, U1189].
Since PM M specifications are w ritten in the PM M specification language and
th e specification language is a relational calculus language th a t is based on a set
of prim itive event types, attrib u tes, and control relations, by representing those
event types, attrib u tes, and control relations as relations the task of th e autom atic
program m ing system is reduced in two ways. First, it only needs to generate the
46
instrum entation to collect and store the required data. Once this d ata is stored as
relations, th e questions can be answered by th e relational calculus’ query process.
Furtherm ore, because this query processor autom atically com putes derived relations,
th e autom atic program m ing system only needs to collect and store the prim itive
events, attrib u tes, and control relations upon which th e defined derived relations
are based.
T he d a ta to be collected include prim itive events, their attrib u tes, and prim itive
control relations used in the specification. The PM M specification specifies not only
th e a ttrib u tes of prim itive events to collect b u t also the conditions which those
prim itive events m ust satisfy to be relevant. Because these conditions are defined
in term s of th e attrib u te values of events, the generator needs to determ ine which
a ttrib u tes of the events are needed to test the conditions specified.
Being able to collect instrum entation d ata is not enough, however. Instrum en­
tatio n d a ta m ust be stored so as to be retrievable when needed. T he event schema
generator creates a relational schem a th at provides such an accessible storage struc­
tu re for th e instrum entation d ata collected during program execution. One of the
advantages of representing instrum entation d ata via a relational schem a is th at
com puting the answers to the questions can be done by a sophisticated and already
available query processor as discussed above. In particular, prim itive event types
are represented as stored relations, derived event types and control relations are
represented as derived relations, and events or events th a t satisfy some control rela­
tions are tuples in a relation used to store data for the event type or control relation
respectively. A ttributes of an event type are represented as binary relations between
those event types and attrib u te values.
For th e exam ple we used, th e system creates a relation for each of the prim itives,
attrib u tes, and control relations used in the sample specification, such as function-
execution, calls, parameters, duration. It defines some derived relations for the de­
rived event fns-call-FO O -and-BAR and the question monitor-fns-dura-params. For
sim plicity, it gives the created relations the same nam e as th e prim itives or control
relations whose d ata they are used to store. After the execution of the augm ented
source program the d ata specified by th e PMM specification will be available in the
relation monitor-fns-dura-params. For convenience, the process of recording prim itive
events and storing them into stored relations is called data collection.
47
4 .1 .2 In str u m e n ta tio n S ite S e le c tio n
As described in the previous section, the system needs to collect d a ta of prim itive
events and control relations. The instrum entation site generator determ ines at com­
pile tim e w here in th e source program the specified d ata can be collected. It locates
those program constructs in the source program whose execution m ay contribute
d a ta to the events used in the PM M specification.
Given a prim itive event type, there are m any constructs whose execution is asso­
ciated w ith d ata relevant to the events used in the PM M specification. For exam ple,
for prim itive event type function-execution of F 0 0 there are two kinds of constructs
whose execution is associated w ith these type of events. F irst, the definition of func­
tion F 0 0 . Second, th e function calls to function F 0 0 . By inserting instrum entation
code in either kind, the system can record d ata about function execution events. We
call th e places (definitions, function calls) where event instances of prim itive event
types can be recorded instrum entation sites of the prim itive event types. Thus, each
of the function calls of function FOO is an instrum entation site of the prim itive event
type function-execution of FOO.
A PM M specification m ay use m any prim itive events. For each of th e prim itive
events there m ay be m any instrum entation sites. To com pute answers to the ques­
tions, the system uses events recorded in all or some of the sites. For the exam ple
PM M specification and the source program , only the definition of function BO TH
is a relevant site of event f (some function th a t calls both FO O and BA R) in the
event definition of fns-call-FO O -and-BAR in th e sense th a t instances of th a t event
m ight satisfy th e control conditions used in the event definition. T he reason th at
this is the only such site is th a t BO TH is the only function in th e exam ple program
th a t directly calls both FOO and BAR, and hence only executions of it can possibly
satisfy th e stated conditions. We call the events th a t satisfy th e conditions specified
in specifications relevant events of the specification. We call th e sites where the
relevant events can be recorded relevant sites of the specification. We should note
th a t not all of the events occurring at relevant sites necessarily satisfy th e specified
control conditions. W hether they are really relevant m ay depend on th e state of the
com putation of th e source program . For example, for function BO TH only when (>
C 0) is tru e are the events relevant. Run tim e filters are included in th e generated
48
instrum entation to dynam ically check the com putation sta te as needed to determ ine
which of the occurring events are relevant and to only record them .
Selecting relevant instrum entation sites of a PM M specification needs knowl­
edge of the source program . As discussed in C hapter 3, th e prim itives of th e PMMi
specification language are selected in such a way th a t there is a very strong corre-j
spondence betw een them and th e execution of particular syntactic constructs in the
program m ing language. Furtherm ore, runtim e relationships am ong prim itive events
are strongly affected by the syntactic relationships am ong their sites. For exam ple]
in order to satisfy an event specification of a function-execution of BO TH whichj
directly calls a function-execution of FOO, there m ust be at least one function callj
statem ent of function FO O lexically inside th e definition of function BO TH . The
syntactic relationships among instrum entation sites of some events are necessary
conditions of the control conditions defined on the events. Using th e correspon­
dence between prim itive events and their sites the system can find all potential sites!
for the prim itive events used in the specification; Using th e relationships between!
control relations on prim itive events and the syntactic relationships among theirj
sites, th e system can rule out some irrelevant sites. For instance, in the example
PM M specification in Figure 4.2, the desired derived events are function-execution
events th a t directly call both FOO and BAR. Since function IR in the exam ple
source program in Figure 4.1 directly calls only BAR, th e instrum entation site at
function definition of IR is irrelevant.
Hence, th e knowledge has two parts:
1. M apping betw een prim itive event types of the PM M specification language
and th e syntactic constructs of the program m ing language which are their
instrum entation sites.
2. M apping betw een control relations among prim itive events and th e syntactic
relationships betw een their instrum entation sites.
T he instances of these syntactic constructs and their relationships are obtained by
statically analyzing th e source program using a static analysis program[Pro88]. This
static analysis inform ation is stored in a program representation to be discussed in
the next section.
49
This compile tim e elim ination of irrelevant instrum entation sites is achieved b}
retaining only those th a t are potentially relevant - th a t is, those forms which some
run tim e state exists in which they would be part of an event (prim itive or com­
pound) used to answer a PM M question. The selected sites are a first cut approxi­
m ation of th e relevant instrum entation sites of the PM M specification which is latei
refined by dynam ic run tim e filters.
T he essential idea of elim inating irrelevant sites is to prove at compile tim e
th a t some potential sites cannot satisfy the specification using static analysis of thej
source program . Because th a t step is done at compile tim e, it cannot elim inate all
irrelevant events or their instrum entation sites. The reason is th a t w hether an event
is relevant m ight depend on the execution state of the source program . For instance]
i
in the exam ple source program in Figure 4.1 w hether or not th e execution of BO TH
is relevant depends on th e run tim e value of one of its param eters, C, which cannot
in general be determ ined a t compile tim e. Therefore, in order to ensure the answers
com puted are correct, th e system m ust make a very conservative approxim ation -
If there is a possibility a site is relevant it will be retained.
4 .1 .3 In str u m e n ta tio n C od e G en era tio n
T he instrum entation code generator generates the instrum entation code for the in­
strum entation sites th a t have been identified and merges th a t code w ith th e m on­
itored program . T he instrum entation code will collect th e m onitoring d ata needec
and store them into the relations created by the event schem a generator.
Having determ ined the sites for instrum entation and th e relations to store the
instrum entation data, th e source program m ust be modified to collect th e data.J
M oreover, even if instrum entation sites are determ ined, w hether or not the required
d a ta should be collected depends on run tim e conditions. Therefore, the instrum enj
tatio n code generator also needs to generate filters th a t test these conditions at ru n
tim e.
Som etim es the d ata required to answer particular questions, such as th e paging
done by th e operating system or the garbage collection done by the runtim e system!
in Lisp depends upon the internal structure of the execution environm ent and is
beyond the knowledge or control of ordinary program m ers. Even if th e runtim e
50
system does provide some sort of hook for collecting perform ance data, the ability to
tran slate this d ata into som ething th a t is meaningful at the higher level in which the
program is w ritten will require specialized knowledge th a t is not available to m ost
program m ers. Ideally, one program m er w ith such specialized knowledge of such
facilities should provide it to a tool which will then m ake it accessible to everyone
by doing th e translation autom atically. Sm artM onitor provides such an interface
for accessing arbitrary run tim e perform ance d ata available w ithin an execution
environm ent. As a result, it rem ains independent of these perform ance collection
m echanism s, b u t can generate instrum entation code th a t interfaces to and uses
them .
4 .1 .4 A p p ro a ch S u m m ary
O ur goal of instrum entation generation is not merely generating instrum entation
to record d a ta so th a t the specified answers can be com puted but to minim ize the
d ata recorded. T h at goal is realized in two steps. F irst, the system selects only
relevant sites for the PM M specification. Second, the system provides a filter for
each potential instrum entation site to elim inate irrelevant events at run tim e.
We first discuss w hat knowledge about source program s can help th e system to
find th e relevant instrum entation sites. Then we discuss w hat instrum entation is
needed to record events of a prim itive event type given a specific instrum entation
site. Finally, we discuss transform ing a specification into instrum entation to record
relevant d a ta using the knowledge and the instrum entation m entioned above.
4.2 Static Approximation
As described above, static analysis can be used to find all the sites in th e source
program w here a particular event could possibly occur. However it is often possible
w ith additional analysis to prove at compile tim e th at some of these sites cannot
satisfy th e specification. T he focus of this section is to provide a fram ework for
using static analysis of th e source program to select relevant instrum entation sites
of a PM M specification. T he essential idea is to m ap control relations of th e PM M
specification language th a t are defined on the prim itive events used in a specification
51
~into~the necessary syntactic conditions on the instrum entation sites corresponding
to them . Those sites th a t cannot satisfy the necessary syntactic conditions are
elim inated from this static approxim ation. Hence, the effectiveness of the static
analysis in elim inating irrelevant sites depends on its ability to analyze the source
program . In general, the m ore sophisticated the static analysis program is, the
m ore effectively it can check the necessary syntactic conditions, and the b etter the
approxim ation will be.
Sm artM onitor uses a static analyzer to identify potential sites in the source
program and th e syntactic relationships among those sites th a t are im plied by the
control relations used in the specification language. This analysis is recorded in a
database (called static analysis database). Like the PM M specification language,
th e static analysis program representation uses an E ntity-R elationship model to
represent the m onitored source program. Entities are used to model those syntactic
constructs of th e program m ing language (such as function calls, atom ic transitions,
etc.) whose execution corresponds to prim itive events of the PM M specification
language. These syntactic constructs are the instrum entation sites in the source
program . Relations are used to model the syntactic relationships between these
constructs (such as directly calls, updates, etc.). These relationships im ply the
PM M control relations am ong the prim itive events corresponding to the sites of
these events.
For the AP5 program m ing language, relation, function, rule, and atomic have
been chosen as prim itive syntactic constructs. The execution of these constructs and
th e operations defined on them have also been chosen as prim itive constructs (e.g.,
function-execution, atomic-execution, and relation-insertion). For each prim itive
control relation used in the PM M specification language there is a corresponding
relation in th e representation language. For convenience, we use the same nam e as
th e one used for the control relation in the specification language w ith a suffix
static” for the relation nam e. For example, Calls is a prim itive control relation. Its
corresponding relation in the representation language is Calls-static, which is defined
betw een two sites in which a call to the latter appears lexically inside the former.
Sim ilarly th e suffix is used for th e entity types of the representation language. Hence,
function-static represents sites corresponding to function definitions and function-
execution-static represents sites corresponding to function call sites.
52
(function-static fl 'BOTH)
(function-static f2 'IR)
(function-execution-static fel 'FOO)
(function-execution-static fe2 'BAR)
(function-execution-static fe3 'BAR)
(calls-static fl fel)
(calls-static fl fe2)
(calls-static f2 fe3)
where fl, f2, fel, fe2, fe3 are identifiers assigned for instrumentation sites of
the source program.
fl  > (defun BOTH (C)
(when (> C 0)
f e l > (FOO C))
f e 2--------> (BAR C))
f2 — — — > (defun IR (D)
f e 3--------> (BAR D))
Figure 4.3: A Source Program Representation
4.2.1 R e p r e se n tin g S ite s and T h eir R ela tio n sh ip s
Given a source program th e system uses a static analyzer to assign a unique identifier
for each occurrence of the syntactic constructs (sites) defined in th e representation
language. Hence, the system can distinguish among these occurrences (the potential
instrum entation sites). For instance, the identifiers for two function calls of BA R in
th e exam ple source program are different. The static analyzer does static analysis
for th e source program to record relationships among different constructs in the rela­
tions of the representation language. Because there is a unique identifier for each of
th e constructs, given an identifier the system can identify th e particular instance of
the syntactic construct th e identifier corresponds to. Figure 4.3 is th e representation
of our sam ple source program , where function-static, function-execution-static, and
calls-static are relations in the representation language representing function defini­
tions, function calls, and th e direct call of one function by another respectively. For
53
instance, ( f u n c t i o n - s t a t i c f l 'BOTH) represents th a t f l is th e site of function
B O TH , ( f u n c t i o n - e x e c u t i o n - s t a t i c f e l 'FOO) represents th a t f e l is the site
of a function call to FOO , and ( c a l l s - s t a t i c f 1 f e l ) represents th a t th e site of
th e latter, the function call f e l is lexically inside th e the site of the former, the
definition of BO TH (f l).
T he system uses this representation of the source program to find th e instru­
m entation sites of a PM M specification. T he system can do this because the instru­
m entation sites of all prim itive events are explicit in this program representation, it
knows how to select among these (described in section 4.2.3), and it knows how to
combine these prim itive events to detect derived ones.
4 .2 .2 S ta tic A p p ro x im a tio n M ap p in g
Selecting relevant instrum entation sites for th e events used in a specification is
done by transform ing the specification into a set of queries over the static analysis
database. The results of th e queries are the relevant instrum entation sites of the
specification. The system uses a transform ation language, called Syntactic M AP-
ping (SM A P), to represent a set of source-to-target correspondences from the PMM
specification language to th e static analysis result representation language. A cor­
respondence has the following form:
Source-wff =>■ Target-wff;
where S o u rc e -w ff is a well formed form ula in the PM M specification language and
T a rg e t-w ff is a form ula of the representation language. T he S o u rc e-w ff is defined
on the events of the PM M specification language while th e T a rg e t-w ff is defined on
the instrum entation sites of th e events. By making the correspondence, th e system
can select instrum entation sites for prim itive events and transform the relationships
am ong events into necessary syntactic relationships among the instrum entation sites
of the events.
For exam ple, the following is one of the correspondences for a prim itive event:
(fu n c tio n -e x e c u tio n *) = > • (fu n c tio n -e x e c u tio n -s ta tic *)
54
"where represents the t'ail"oflTiist. Given a W P P uOEeTPM M specification ian-|
guage, for instance, ( f u n c tio n - e x e c u tio n x ’ FOO) th e system m akes th e following
transform ation using th e correspondence above.
(function-execution x ’FOO) => (function-execution-static x ’FOO)
T he result is ( f u n c t i o n - e x e c u t i o n - s t a t i c x 'FOO), w here function-execution-
static is a static analysis result binary relation. T he first argum ent is an instru­
m entation site of type function-execution, the second argum ent is a function name.
T he correspondence represents transform ing th e event instance of type function-
execution in th e PM M specification language into form ula th a t represents all func­
tion execution sites of function FOO in the source program .
Similarly, there is a m apping betw een control relations and th e necessary condi­
tions on th e sites corresponding to the events involved in th e control relations. For
exam ple, th e following is one of the correspondences for a control relation:
(contains *) =>• (OR (calls-static *)(updates-static *)(triggers-static *))
where represents th e tail of a list and calls-static, updates-static, triggers-static
are static analysis binary relations. T he first argum ent of those relations is an instru­
m entation site of type function-static, th e second argum ent is an instrum entation
site of type function-execution-static, relation-update-static, rule-triggering-static re­
spectively. T he correspondence represents transform ing control relation contains on
event instances of type function-execution in the PM M specification language into
a form ula th a t represents all function execution sites where th e second param eter is
a side inside the site represented by the first param eter. All of th e transform ations
are driven by the syntactic knowledge of th e correspondence of constructs in the
PM M language and the syntactic constructs of in the program m ing language.
4 .2 .3 S ta tic A p p ro x im a tio n T ran sform ation A lg o rith m s
S tatic A pproxim ation Transform ation (SAT) algorithm s accept a PM M specifica­
tion and generate a set of queries to the static analysis result database of th e source
program . T he results of those queries are the instrum entation sites of the prim i­
tive events used in the PM M specification. Because PM M specifications are w ritten
55
in_tHe~PMM specification language which is a relational calculus language and the
queries to th e static analysis database are also expressed via a relational calcu­
lus language (query language), a question in the PM M specification language can
be transform ed into a set of queries to th e static analysis database by recursively
transform ing each of the logic constructs into the corresponding constructs in the
query language. Hence, the SAT algorithm s consist of one algorithm for each logical
construct. Each of the following subsections describes how to transform a PM M
specification into queries over the static analysis database so th a t relevant instru­
m entation sites can be selected.
P r im itiv e W F F s
The SM AP provides a m apping between prim itive W FFs in th e PM M specifica­
tion language and form ulae in the representation language. The SM AP m apping is
used to transform the prim itive W FFs of the PM M specification language into the
form ulae of the representation language.
T here is a m apping in SMAP for each prim itive event type of the specification
language. However, for efficiency reasons, th e SM AP does not provide a m apping for
every form ula used in the specification language. For instance, there is no m apping
for parameters because it requires some very com plicated flow analysis to utilize
the d ata inform ation of the param eters of functions. So for transform ing prim itive
form ulae of th e PM M specification language into form ulae of the representation lan­
guage, there are two cases to consider. First, th a t a m apping betw een th e prim itive
W F F and a form ula in the representation language exists. In this case, the m apping
is used to transform the prim itive W F F into th e corresponding form ula in th e rep­
resentation language. Second, th a t no such m apping exists. This case only arises
for form ulas th a t are either an attrib u te or a control relation. If the form ula is an
a ttrib u te then the system simply ignores the formula. This is safe because m apping
betw een prim itive events and instrum entation sites does not depend on attrib u tes of
th e events. If th e form ula is a control condition and there is no m apping for it then
th e system ignore it as well. T he resulting approxim ation m ight not elim inate as
m any irrelevant instrum entation sites as the approxim ation obtained by taking the
control condition into consideration could. However, since th e control conditions
56
are used as run tim e filtering conditions to elim inate irrelevant events, they can be
safely ignored during static approxim ation.
C o n ju n ctio n s
{ V \ W i A W 2A ... A W n} is transform ed into { V \ W statici A W static2 A ... A W staticn}
where W;, 1 < i < n, is transform ed into W statici- For those where there is no
static approxim ation, W{ will be tested at runtim e and the system will drop it from
th e conjunctive together w ith th e variables solely used in it.
D isju n c tio n s
{ V \ W 1V W 2V ... V W n} is transform ed into { V \ W siaticx V W static 2 V ... V W statiC n}
where W i, 1 < i < n, is transform ed into W statici-
E x iste n tia l Q u an tification
As above, {V | 3 U W } is transform ed into { V | 3 U W static} in which W is trans­
form ed into W static• However, the system needs to determ ine instrum entation sites
for all prim itive events, including those prim itive events th a t are existentially quan­
tified. Therefore, the system needs to determ ine instrum entation sites for each of
th e prim itive events th a t could in stantiate the variables in U. For instance, for the
exam ple question used in Figure 4.2, the system needs to determ ine instrum enta­
tion sites not only for x b u t also FOO and BAR. T he system first transform s the form
{ V | 3 U W } into th e form { V | 3 U W static} and uses th a t query to find instrum en­
tatio n sites for V . T he system then uses these instrum entation sites to select instru­
m entation sites for each of the variables in U w ith some prim itive event instantiation
as follows: Suppose th a t U is {«x, u 2 , u n}, th e system generates the following
queries for each 1 < i < n. {u,- | 3 ( « i,..., iq_i, iq+i, •••, wn, V )W static}. T he reason
th a t this transform ation is correct is th a t {tq | 3 ( iq ,..., «i_i, tq + i,..., u n, V )W static\
is the set of sites th a t m ake th e form ula W static true, while in order to determ ine
the instrum entation sites of V only a subset is needed. Hence, this transform ation
is very conservative. It selects a super set of the possible instrum entation sites for
each of the existentially quantified variables th a t could be instantiated by a prim itive
event.
57
N e g a tio n s
T he PM M specification language is designed in such a way th a t negation of a prim ­
itive event cannot violate th e safety constraints of th e language. If a negation isj
followed by a W F F then it m ust be a p art of a larger form ula. T he system m ust
be able to test th e W F F using th e bindings of th e other p arts of the formula. The
safety conditions (discussed in C hapter 3) ensure this. In particular, if th e negated
W F F is existentially quantified then the system needs to select instrum entation sites
for those existentially quantified events so th a t they can be recorded and used in
testing th e W F F in com puting the answers to th e questions. Selecting instrum en­
tatio n sites for them uses the algorithm of generating existentially quantified W FF
described above.
U n iv ersa l Q u a n tifica tio n
T he safety conditions of the PM M specification language rule out the possibility
th a t an event will defined in term s of universal quantifications. Hence, the system
does not need to deal w ith this case.
4 .2 .4 A S ta tic A p p ro x im a tio n E x a m p le
T he exam ple PM M specification in Figure 4.2 shows how transform ation from PMM
questions to queries access the static analysis database is done. is used to show
this process. The form ula on the left hand side is th e form ula to be transform ed
and the form ula on the right hand side is the resulting formula.
E x iste n tia l Q u an tifier T ran sform ation
T he system first transform s the definition into a set of queries so th a t instrum en­
tatio n sites for all prim itive events (including those existentially quantified) will be
selected.
((x) I (Exist (f FOO BAR p)
(AND (Function-execution FOO ’FOO)
(Function-execution BAR ’BAR)
(Function-execution x f)
(Parameter x 1 p)
58
(Not (= p 0))
(Calls x FOO)
(Calls x BAR))))
—y
((x) I (Exist (f FOO BAR p)
(AND (Function-execution FOO ’F00)
(Function-execution BAR ’BAR)
(Function-execution x f)
(Parameter x 1 p)
(Not (= p 0))
(Calls x FOO)
(Calls x BAR))))
((FOO) | (Exist (f x BAR p)
(AND (Function-execution FOO ’FOO)
(Funct ion-execut ion BAR ’BAR)
(Function-execution x f)
(Parameter x 1 p)
(Not (= p 0))
(Calls x FOO)
(Calls x BAR))))
((BAR) | (Exist (f x FOO p)
(AND (Function-execution FOO ’FOO)
(Funct ion-execut ion BAR ’BAR)
(Function-execution x f)
(Parameter x 1 p)
(Not (= p 0))
(Calls x FOO)
(Calls x BAR))))
T e stin g C o n d itio n E lim in a tio n
As explained earlier, some attributes and control relations are not explicitly rep­
resented in the static analysis database. A query on that database cannot include
any of these attributes and control relations. Such attributes and control relations,
together with the variables solely used by them are eliminated from the query. In
the example, (Parameter x 1 p) and (Not (= p 0)) are dropped from the con­
junction together with the variable p.
((x) 1 (Exist (f FOO BAR p)
59
(AND (Function-execution FOO ’FOO)
(Function-execution BAR ’BAR)
(Function-execution x f)
(Parameter x i p)
(Not (= p 0))
(Calls x FOO)
(Calls x BAR))))
=>
((x) | (Exist (f FOO BAR)
(AND (Function-execution FOO ’FOO)
(Function-execution BAR ’BAR)
(Function-execution x f)
(Calls x FOO)
(Calls x BAR))))
Similarly, the same transform ation applies to the other two formulae.
T ra n sfo rm a tio n for P r im itiv e s
Event transform ation uses the SMAP to m ap prim itive events and control relations
into queries access th e static analysis database as follows:
(function-execution x ’FOO)
=>
(function-execution-static x ’FOO)
(function-execution z ’BAR)
=>
(function-execution-static z ’BAR)
(function-execution z f)
=>
(function-execution-static z f)
(Calls x FOO)
=>
(Calls-static x FOO)
Using th e generated queries for each of the prim itives, th e derived form ula is trans­
form ed into a query over static analysis database as follows (here only one of the
final queries is listed):
60
((x) I (Exist (f FOO BAR)
(AND (Function-execution FOO ’FOO)
(Function-execution BAR ’BAR)
(Function-execution x f)
(Calls x FOO)
(Calls x BAR))))
=>
((x) | (Exist (f FOO BAR)
(AND (Function-execution-static FOO ’FOO)
(Function-execution-static BAR ’BAR)
(Function-execution-static x f)
(Calls-static x FOO)
(Calls-static x BAR))))
T he result of th e static approxim ation for our exam ple is the following three queries:
((x) | (Exist (f FOO BAR)
(AND (Function-execution-static FOO ’FOO)
(Function-execution-static BAR ’BAR)
(Function-execution-static x f)
(Calls-static x FOO)
(Calls-static x BAR))))
((FOO) | (Exist (f x BAR)
(AND (Function-execution-static FOO ’FOO)
(Function-execution-static BAR ’BAR)
(Function-execution-static x f)
(Calls-static x FOO)
(Calls-static x BAR))))
((BAR) | (Exist (f FOO x)
(AND (Function-execution-static FOO ’FOO)
(Function-execution-static BAR ’BAR)
(Function-execution-static x f)
(Calls-static x FOO)
(Calls-static x BAR))))
They are used to select instrum entation sites for x, FO O , and BA R respectively.
Evaluating these generated queries the static analysis database yields {fl}, {fel},
and {fe2} respectively where fl, fel, and fe2 are defined as in Figure 4.3.
Because there are several ways of recording d ata for a prim itive event, a prim itive
event m ight have several potential instrum entation sites each of which can be used
as an instrum entation site of the prim itive in the PM M specification. For instance,
61
'for tE eJunciw n-execution prim itive event, the instrum entation sites could be either
th e definition of the function or its function calls. Given a source program , there
are m any function calls. The system uses a very sim ple heuristic to choose among
equivalent instrum entation sites w ith regards to functionality. If all of the function
calls of a function are selected as the instrum entation sites of a prim itive then the
function definition of the function is chosen as the instrum entation site, otherwise,
th e selected function calls of the function are used as the instrum entation sites.
4.3 Instrumentation for Primitive Events
Having identified an instrum entation site of a prim itive event used in the PM M spec­
ification, the system needs to generate instrum entation code for it and to merge the
generated instrum entation code into the site. The generated instrum entation code
records events and their a ttrib u te values and tests any specified filtering conditions
as soon as possible to record only relevant events. In th e following subsections, we
first present an abstract interface for the instrum entation th a t characterizes it w ith
six param eters. Properly setting these param eters enables the system to record the
appropriate d ata and test conditions. Next, we present an abstract interface gener­
ator th a t transform s requirem ents on w hat attributes to record for a prim itive event
into param eters of this abstract interface so th at th e proper d a ta can be recorded
and the proper tests perform ed.
4 .3 .1 A n A b str a c t In stru m en ta tio n In terfa ce
T he Instrum entation code for an instrum entation site of a prim itive event tests
conditions defined on the event’s d ata and records d ata about the event. In general,
it m ust record d ata and test conditions both before and after the event happens.
For instance, in order to record the duration of a function execution, the system
needs to access the tim e before and after the function execution. It also needs to
test conditions in d a ta recorded by other sites an d /o r before and after the event’s
execution. Because instrum entation d ata m ust be recorded to be accessible, space
m ust be allocated to hold it. In order to avoid conflict w ith th e execution of the
source program , local variables w ith names th a t are disjoint from those used in
62
The source program m ust bcTintroduced so th at instrum entation will not affect the
execution of th e source program . This ensures th at th e instrum entation code can
only access b u t not change d ata in the original program , and operates on a d ata
space which is disjoint from th at of the original program. Finally, actions m ust be
taken to transform th e recorded d ata into the proper form so th a t th ey are accessible
by other instrum entation code or by PM M queries. We provide an interface w ith six
param eters, which we call interface param eters, to fulfill th e roles discussed above:
• Pre-condition: This is a relevance test based on d ata th a t is available before
th e event. For instance, the precondition of a function call could depend on
th e values of its param eters.
• Local-variables: These are used to store local d ata as described above. For
instance, if th e duration of an event is needed, a local variable is needed to
record th e starting tim e.
• Before-form: This is code to collect d ata available before th e event. For in­
stance, this could read the clock and store the result into a variable allocated
for sta rt tim e.
• After-form: This is code to collect d ata after th e event. For instance, this
could store the retu rn values of a function invocation.
• Post-condition: This is a relevance test based on d a ta th a t is available after
the event. For instance, the postcondition of a function call could compare
the retu rn value to a param eter.
• Action: This is code th a t stores data in the instrum entation database. It is
also used in some cases to remove data th a t is no longer needed.
T he reason for having both a Before-form and a After-form is th a t the system may
need to record d a ta and test conditions both before and after a function-execution
event has happened. Pre-condition and Post-condition are used for testing conditions
based on the d ata recorded at other sites an d /o r based on th e d a ta recorded by the
Before-form and After-form. Local-variables is introduced so th a t the instrum entation
will not have any d ata conflicts w ith the execution of th e original source program.
63
Finally, Action is used to record instrum entation d ata when both th e Pre-condition
and th e Post-condition are true.
Having abstracted the instrum entation for a site into th e above six categories,
generating th e instrum entation of a function execution event for an instrum entation!
site am ounts to generating th e code for each of these categories. To understand thej
interface, let us consider a function definition as our instrum entation site. The sixj
param eters for a function definition can be characterized as follows:
1. Pre-condition, which is used to determ ine w hether th e event recorded at th e
site is relevant based on the com putation state of th e program before th e event
occurs. This code can access th e actuals param eters of th e function call.
2. Local-variables, which is a list of variables to be declared locally for instrum en­
tation. Each can be initialized by a form w ith access to the function call’s
actual param eters.
3. Before-form, which is a form to be executed before th e execution of the func­
tio n ’s body. Its purpose is to collect d ata about th e com putation environm ent
before the function execution. It can access both the actual input param eters
to th e function call and any local variable introduced.
4. After-form, which is a form to be executed after th e execution of th e function’s
body. Its purpose is to record d ata about the com putational environm ent after
th e execution of th e function. It can access th e actual input param eters to the
function call, any local variables introduced, and the results of the function’s
body.
5. Post-condition, which is used to determ ine w hether th e event recorded at the
site is relevant based on th e com putation state of the program after the event
has happened. It can access th e actual param eters of the function call and the
evaluation results of Pre-condition, Before-form, and After-form
6. Action, which is a form to be executed after After-form. T his form moves the
collected d a ta to a location so th a t they can be used either for determ ining
other conditions or for com puting answers to th e specified questions. The
Action form can access th e actual input param eters to the function call, any
64
local variables introduced, the retu rn values of th e function execution, and the]
evaluation results of both Before-form and After-form.
M erging instrum entation into a program is done by w rapping th e site identified
w ithin a tem plate containing the code for each of th e param eter categories. Suppose
th a t the system has already generated instrum entation for the definition of function
FO O , th e instrum entation could be merged into th e definition of function F 0 0 as
follows:
(defun FOO
(let ( return-values)
(if
(unwind-protect
(progn
(values-list
(setq return-values (multiple-value-list
(funcall FOO-original ))))
(progn
(when
)))
(funcall FOO-original ))))
where Foo-original is th e original definition of function FOO.
If th e instrum entation site is a function call of FO O the following slightly different
tem plate is used to merge the instrum entation into the site.
(let ( return-values
(pi al)(p2 a2) ... (pk ak))
(if
(unwind-protect
(progn
(values-list
(setq return-values (multiple-value-list
(FOO pi p2 ... pk)))))
(progn
(when
)))
(FOO pi p2 ... pk)))
where (FOO a l a2 . . . ak ) is the original function call to function FOO . T he
unwind-protect statem ent used in both of the tem plates is to ensure th a t both After­
form and post-condition are always executed; p x, p2, ..., Pk are introduced to enable
65
th e instrum entation to reference actual param eter values w ithout com puting the
p aram eter expressions m ultiple tim es.
T he essence of the two tem plates is to insert filters and forms at the beginning
of a statem ent and at th e end of a statem ent so th a t the com putation state of the
com putation environm ent before and after the execution of the statem ent can be
recorded. W hether or not th e recorded d ata will be relevant depends on th e filter
test outcom es. B ut no m atte r w hat th e outcom e of th e filter condition tested and
d a ta recorded, th e statem ent of the original program is executed in th e augm ented
program if it is executed in the original program. Hence, although th e transform a­
tions which introduce these tem plates change both th e d a ta flow and control flow
of th e source program , they do not alter the functionality of the source program.
4 .3 .2 In str u m e n ta tio n G en era tio n for P r im itiv e s
W ith these tem plates, instrum entation generation for prim itives is carried out by
generating values for th e categories th a t fill the tem plate and thus act as param eters
to it. T he system has knowledge of how to transform requirem ents on attrib u te
values of a prim itive event into proper values of the tem plate param eters for the
event’s instrum entation site. T he system also has knowledge of how to generate
a n d /o r test a prim itive control relation by instrum enting the instrum entation sites
of the events involved so th a t proper d ata can be recorded and th e proper test
can be perform ed. This knowledge is represented by a set of correspondences called
th e Instrum entation G eneration M apping (IGM ) between requirem ents on prim itive
events and control relations of th e PM M specification language and th e appropriate
values of the six tem plate param eters for the instrum entation sites of the events.
Each prim itive event used in a derived definition is an event of some particular
type. Each of these event types in the PM M specification language has a set of
attrib u tes defined on it. Its IGM specifies the correspondence betw een requirem ents
on these a ttrib u te values and th e appropriate values for th e six instrum entation
tem plate param eters for events of th a t type.
G enerating proper values for these six param eters depends on the types of the
prim itive events, th e sites the instrum entation is going to be inserted, the attrib u te
66
nam es whose values are needed, and the relations used to store th e collected data.
T he IGM for prim itive events has th e following form at:
T y p e
Site
Attribute
Relation
where Type is the type of the event, Site is an instrum entation site of th e event,
Attribute is an attrib u te defined on the event’s type, and Relation is the relation used
to hold th e recorded a ttrib u te value. For instance, in the exam ple PM M specification
in Figure 4.2, the parameters of event x of type function-execution are needed. The
IGM for getting the param eters of type function-execution is:
function-execution
h
parameters
param eters
where f i is th e instrum entation site of event x, id and param s are local variables
used to store collected data, get-id and get-parameters are two functions supplied
by th e PM M system to respectively create an event of th e type and collect the
actual values of the form al param eters. Paraphrased in English, this transform ation
records th e param eter values of events of type function-execution in th e relation
p a ra m e te rs by creating an identifier for the instance of th e event type and captures
its param eter values at the before-form of the site, and then inserting this collected
d a ta into the relation p a ra m e te rs . In this sim ple exam ple no condition had to be
tested in th e pre or post conditions and no action had to be taken in the after-form.
T he IGM s for prim itive control relations are different from those of prim itive
events. Because control relations specify relationships am ong events th a t occur at
different tim es, the IGM needs to specify w hat d ata to record in earlier events, how
they should be stored, and w hat conditions to check to ensure th a t only relevant
______________________ h ______________
(id params)
T
(progn
(setq id (get-id function-execution))
(setq params (get-parameters)))
nil
T
( + + param eters id params)
Site
Local-variables
Pre-condition
Before-form
After-form
Post-condition
Action
67
clata are recorded, for later use. T he IGM also specifies how to reference the data!
from these earlier events when the later events occur so th a t th e conditions specified
by th e control relations can be checked at th a t point. M ore specifically, the IGM
for prim itive control relations specifies w hat attrib u tes of th e events involved are
needed, w here they should be stored, and w hat condition checking on the stored
d a ta should be perform ed. They have the following form at:
Type
(S itel Typei)
(S ite2 Type2)
(S iten Typen)
Relation
Site i
(Local-Variablesi ... Action\)
*
S iten
(Local-Variables,,, ... Actionn)
where Type is th e control relation used in th e specification, and Relation is the
relation defined to record d ata of th a t control relation. S ite i, S ite 2, ..., S ite n are
instrum entation sites of th e events used in the control relation. T yp ei, T yp e2, ...,
T yp en are types of the events used in the control relation. T he right hand side of
th e transform ation defines the instrum entation tem plate param eter values for each
of the sites. For exam ple, the IGM for the prim itive control relation Calls (which
records when one event instance invokes another) is as follows:
Calls
(/i function-execution)
(fe i function-execution)
Calls
________________h__________________
(id Callee)
T
(setq id (get-id function-execution))
nil
Callee
nil
_____________ f e i_____________
nil
T
(setq Callee
(get-id function-execution))
nil
T
(+ + Calls id Callee)
68
where / i and f e i are the instrum entation sites of events x and FOO respectively]
id and C a lle e are local variables used to store collected d ata, get-id is a function!
supported by th e PM M system to create an event of the specified type, and C a lls
is a relation to record the two events which call one another. Paraphrasing it in
English: F irst, the caller event is captured in the Before-form of the caller event
site. Second, the callee event is captured in the Before-form of th e callee event site.
T hird, th e relationship between this pair of events is recorded in th e binary relation
Calls.
W ith th e IGMs, one can sim ply state w hat conditions to test and w hat attrib u tes
are needed when generating instrum entation for prim itive events. T he system will
autom atically transform each of them into proper instrum entation code, abstracted
into th e six tem plate param eters, and merge them into the source program . This
also ensures th a t the generated instrum entation code does not introduce any m ain­
tenance problem s. For the exam ple specification in Figure 4.2, one sim ply needs
to specify th a t th e attrib u tes needed for prim itive event x at instrum entation site
B O TH are duration and parameters. The system transform s these requirem ents into
proper settings of the six tem plate param eters at th e instrum entation site.
4.4 Instrumentation Generation Algorithms
Once th e instrum entation sites for each of the events are determ ined, the system
then determ ines w hat attrib u tes of those events are used in th e specification, and
generates instrum entation for recording those a ttrib u te values using the IGM. F i­
nally th e system merges th e instrum entation generated into th e source program
using th e transform ations described section 4.3.1.
T he instrum entation of a PM M specification is generated by generating instru­
m entation for each of the questions one by one. T he Instrum entation G eneration
A lgorithm (IGA) can be understood in three levels as described below.
P r im itiv e s and P r im itiv e s w ith L ocal F ilters
G enerating instrum entation for a prim itive is done by directly using the IGMs of
th e prim itive type. This level also handles prim itives w ith a local filter — th at
69
Is a condition th a t can be tested using only th e attrib u te values of the prim itive
event being tested. For exam ple, testing w hether or not the first param eter value
of a function-execution event is equal to zero can be handled w ith a local filter.
Local filters defined on an event are transform ed into filtering conditions at the
instrum entation sites of th e event.
P r im itiv e s w ith C on trol R ela tio n s
G enerating instrum entation for a conjunct of prim itive events with some control
relation defined on them is done by first applying the IGM s of th e control relation
to determ ine the d ata needed for the prim itive events involved and then applying
th e IGMs of th e prim itives w ith the attributes used in th e conjunct to determ ine
th e setting of the six param eters of the event sites.
G en era l D e riv e d E v en ts
G enerating instrum entation for a question whose definition is not in th e form at for
either of th e first two levels is done by transform ing th e definition into Disjunctive
N orm al Form (D N F) and applying the above algorithm s on each of th e conjuncts.
In general, th e instrum entation generation algorithm for a PM M question is as
follows:
A lg o r ith m 1 (In stru m en ta tio n G en era tio n ) generating instrum entation fo r a
P M M question.
Input: a P M M question and the static analysis database o f the source program.
O utput: instrum entation fo r computing the answers to the question
M ethod:
1. Create a schema to hold the instrum entation data required by creating a
stored relation fo r each o f the prim itive events, attributes and control
relations used in the P M M specification. Create derived relations: fo r
derived e v e n tsd e riv e d control relations, and questions using the stored
relations.
70
FOO
Locai-variables nil
Pre-condition T
Entry-form (setq FOO-id (get-id f u n c t i o n - e x e c u t i o n ))
Exit-form nil
P ost-condition T
A ction ( + + Calls f-id FOO-id)
BAR
Local-variables nil
Pre-condition T
Entry-form (setq BAR-id (get-id f u n c t i o n - e x e c u t i o n ))
Exit-form nil
P ost-condition T
Action (+ + Calls f-id BAR-id)
BOTH
Local-variables (f-id FOO-id BAR-id params t l t2)
Pre-condition T
Entry-form
(setq f-id (get-id f u n c t i o n - e x e c u t i o n )
t l (time)
params (get-parameters))
Exit-form (setq t2 (tim e))
P ost-condition (AND FOO-id BAR-id)
Action
(progn
( + + Parameters f-id params)
(-F+ Duration f-id (- t2 t l) ) )
Table 4.1: Generated Instrum entation
2. Apply the static approximation algorithm to select instrum entation sites
fo r each o f the prim itive events used in the question.
3. Generate instrum entation code fo r each instrum entation site o f a prim ­
itive event using the IG A to collect only data o f the event used in the■
specification or test filtering conditions defined on them.
T he o u tp u t of the algorithm is th e instrum entation code in th e form of the six
tem plate param eters to be merged into each of the instrum entation sites.
For th e sam ple source program in Figure 4.1 and the exam ple PM M specification
in Figure 4.2, using the algorithm m entioned above leads to th e following:
71
E v en t N a m e S ites A ttr ib u te s
X
( f l )
( i d p a r a m e t e r s d u r a t i o n ( p a r a m e t e r 1 ) )
FOO ( f e l ) ( i d )
BAR ( f e 2 ) ( i d )
where id is an attrib u tes of function-execution whose value is th e function-execution
event.
Applying th e Instrum entation G eneration algorithm on th e exam ple PM M spec­
ification in Figure 4.2 th e six forms of each of th e instrum entation sites is generated
as in Table 4.1.
4.5 Summary
We have presented an autom atic program m ing system th a t transform s a source pro­
gram and a PM M specification into an augm ented source program th a t incorporates
the needed instrum entation. T he augm entation is introduced only for m onitoring
purposes and does not interfere w ith or change th e com putation of th e source pro­
gram w ith regard to its functionality. By executing the augm ented program , all and
only th e relevant d ata for the PM M specification is recorded.
Since the PM M specification is w ritten in the PM M specification language which
is a relation calculus language based on a set of prim itive building blocks, the re­
quired results can be obtained by collecting the d ata of th e prim itives and com puting
the results from th e collected data. The system first determ ines w hat d a ta to record
by analyzing w hat the prim itives are upon which the PM M specification is defined.
It then defines a relational schem a to store the collected d a ta of the prim itives used
in the specification.
N ext, th e system determ ines where in the source program instrum entation should
go. T he static analysis database is used to select relevant instrum entation sites for
th e PM M specification. By using knowledge of the m apping betw een the building
blocks of th e specification language and the syntactic constructs and relationships
among the constructs, th e system elim inates m any irrelevant instrum entation sites
a t compile tim e, thus reducing recording irrelevant d a ta at ru n tim e.
Finally, th e system generates instrum entation and merges it into the source
program . T he system contains knowledge of how to instrum ent th e source program
72
so th a t th e needed d a ta of the building blocks can be recorded. T he knowledge is
represented via a set of transform ations. Having determ ined w hat prim itive d ata
to record the system applies the transform ations to generate instrum entation at the
instrum entation sites identified so as to record the necessary data. T he answers
to the PM M questions are com puted from this recorded d a ta via database query
evaluation algorithm s after the program ’s execution.
73
Chapter 5
On Incremental Computation of Monitoring
Results
In this chapter, we exam ine the problem of generating instrum entation th a t incre­
m entally com putes the required m onitoring results as soon as th e d a ta is available so
th a t irrelevant d ata is not collected and d ata th at is no longer needed is removed asj
quickly as possible. T he instrum entation computes these results at run tim e as the
d a ta is collected. It tests conditions defined on the collected d a ta and elim inates the,
d a ta th a t does not satisfy the stated conditions, thus, avoiding recording irrelevant
data. We exam ine this increm ental com putation problem in term s of generating,
instrum entation for com puting th e answers to m onitoring questions. O ur solution
is based on the use of tem poral conditions defined on th e events in the questions to
determ ine which d a ta to record and as relevancy run tim e filters on th e events pro­
ducing th a t data. C entral to this solution is a representation th a t we have defined
th a t makes the tem poral dependency among events explicit and an algorithm th at
uses this representation to generate instrum entation th a t increm entally computes
th e answers to the posed PM M questions.
In order to simplify our description of the increm ental com putation problem and
our solution of it, we assum e th a t m onitoring questions in our relational calculus
PM M specification language are defined as a conjunction of events and control re­
lations. This assum ption does not affect the generality of the approach since more
general questions can always be transform ed into D isjunctive N orm al Form (DNF)
and then each of those disjuncts (which is a conjunction) can be dealt w ith individ­
ually. Hence, this chapter addresses the problem of generating instrum entation th a t
74
increm entally com putes x 2 ... ^ | 3 (Xk+i ■ ■ ■ x n) (A N D E \ , E 2, E n, C\J
C2, Cm)} where x i , x 2, ..., arn are events of the types E i, E 2, E n, Ci, C2 ,
are conditions defined over those events. We first discuss th e case w here E i, E 2,
..., E n are all prim itives and then discuss the case where some of E i, E 2, ..., E n are
derived event types. Furtherm ore, because m ultiple questions can be defined on the
prim itive event types, we assum e th a t there is a separate relational schem a for each
of the questions used in a PM M specification. As a result of this assum ption, we
need only to deal w ith one question at a tim e w ithout worrying about interference
am ong questions.
5.1 Issues of Incremental Computation
T he d a ta upon which the answers to a m onitoring question are com puted are col­
lected at the sites of the prim itive events used in the question. In order for the d ata
collected at a prim itive event’s site to be relevant, it m ust satisfy th e conditions
(relationships) defined on the prim itive event. There are two kinds of relationships
(conditions) th a t can be defined in a PM M question: com parisons am ong event
attrib u te values and tem poral relationships among events. Because all attributes
referenced in a com parison have to be accessible to m ake the com parison, referenced
a ttrib u te values of earlier events m ust be recorded so th a t th e com parisons can be
m ade when the later events occur. Hence, a precondition to generating instrum en­
tatio n to increm entally com pute answers to the question is to first determ ine the
tem poral order am ong the involved events. Then the system can determ ine which
event attrib u tes need to be recorded and create a relational schem a to record th a t
data. Finally, it can generate the instrum entation to test conditions th a t only de­
pend on attrib u tes of th e event in which the test is perform ed a n d /o r th e events th at
occurred before this event (and whose needed a ttrib u te values have been recorded
in the relational schem a). Only when this condition is satisfied are the required
attrib u tes of the current event recorded for use by later condition testers or as part
of th e answer to th e posed PM M questions.
75
Relation Name Format Temporal Relation
Before (Before f g)
f e 9 b )
Calls (Calls f g) H f b 9 b ) A (-< g e f e )
Calls* (Calls* f g)
H f b 9 b ) A (-< g e f e )
Contains (Contains f g) (-< f b 9 b ) A (-< g e f e )
Triggers (Triggers f g)
(■< f b 9 b ) A (-< g e f e )
Updates (U pdates f g)
(-< f b 9 b ) A (-< g e f e )
is a temporal control relation defined on two point events. Given two point
events A and B (-< A B) is true iff A occurred before B.
bA point event with a subscript is used to represent the beginning and ending
point events associated with an interval event. For instance, f t and f e represent
the begin point event and end point event o f interval event f respectively.
Table 5.1: Tem poral Relationships Specified by Control Relations
5 .1 .1 D e te r m in in g T em p oral D e p e n d e n c y
Relationships among events are specified by control relations. Control relations
define or im ply tem poral relationships among the events. For instance, if F callJ
G, then the begin event of F m ust precede the begin event of G. In other words]
th e begin event of F m ust occur before th a t of event G to satisfy F calls G. If no F
has begun (and not yet ended) when a G occurs then this G is not being called by
F. Hence, knowing these tim ing relationship between events is very im portant. It
helps the system determ ine which d ata to store about earlier events and how to use
this stored d a ta to filter later events. Since the conditions are usually defined on!
th e a ttrib u te values of events and those attrib u te values are associated w ith poinJ
events (i.e., attrib u te values are available at particular points of program execution)]
th e system needs to generate instrum entation for each of those point events to store|
d a ta and test conditions. Therefore, tem poral relationships need to be represented
in term s of point events. Because the com putation m odel on which th e system
is based is a sequential m odel th e tem poral relationships among point events can
be represented by a single tem poral relation precede which we represent as
T he tem poral relationships among interval events are represented in term s of the
tem poral relationships of th e point events associated w ith them .
76
For each of th e prim itive control relations defined for a com putation model, the
system m ust be given th e tem poral relationships among the events im plied by those
control relations. Table 5.1 lists the tem poral relationship knowledge for th e control
relations of the AP5 com putation m odel represented by For instance, given
two interval events, th e function executions of F and G, F calls G implies th a t the
begin point of F occurred earlier than th a t of G and th e end point of F occurred
later th a n th a t of G.
5 .1 .2 D e te r m in in g th e D a ta to R ecord
A ttrib u te values of the prim itive events used in the PM M question are recorded
either for com puting final answers or for testing conditions. A relational schema is
created to store this collected d ata so th a t it can be retrieved and com bined w ith
other d a ta to com pute the answers.
O ur goal is to generate instrum entation th a t com putes the answer to the PMM
question while m inim izing the instrum entation space and tim e overhead. This is
achieved by recording only relevant data. There are two dimensions in recording
only relevant data:
• only record d ata of prim itive events th a t satisfy the conditions defined on them
in th e question
• increm entally com pute partial results of the answers and elim inate the col­
lected d ata when they are no longer needed.
F irst, since th e question is defined as a conjunction of conditions over a set of prim i­
tive events, each condition in th e conjunct m ust be satisfied for the prim itive events
to be relevant. Hence, for each prim itive event used in the question, instrum entation
m ust be generated to check the conditions defined on it. If there is a single violation,
th e prim itive event cannot be relevant, thus the d ata for this event should not be
collected or stored. Because the comparisons in conditions can only be tested when
th e d h ta they depend on are available, the conditions which can be checked are
those th a t only depend on th e d a ta of th e current event or events th at occurred ear­
lier! Conditions th a t depend on d ata of later events cannot yet be used as filtering
conjditions m ust wait until those later events occur.
I 77
Second, since th e d ata is collected increm entally the answers can also be com­
p u ted and stored increm entally. As discussed in C hapter 3 th e specification language
is designed in such a way th a t there is a very strong relationship between the prim ­
itive event types used in th e PM M specification language and th e d a ta and control
constructs of th e program m ing language in which the source program is w ritten.
P rim itive events usually correspond to the executions of syntactic constructs of th e
source program . Control relations correspond to the run tim e relationships of events.
In program execution, both d ata and control constructs of th e m onitored program
have th eir scope[Ste90j. Scope refers to the spatial or tex tu al region of the program
w ithin which references m ay occur. Temporal conditions can be used to determ ine
the scope of the events used in PM M questions to reduce the recording of the irrel­
evant data. Instrum entation d ata can be elim inated whenever their scope is over.
For instance, if there is an interval event in a question such th a t every point event
in the definition is contained in th e interval event (we call such an interval event th e
outm ost interval) then the answers to the question can be com puted at the end of
the interval. Furtherm ore, the d ata collected during the interval is no longer useful
for com puting the answers th a t are outside the interval. T h at is because once a
new outm ost interval starts the d ata collected during th e previous interval would
not satisfy the condition th a t every point event is contained in th e new interval.1
5 .1 .3 G en era tin g In stru m en ta tio n
G enerating instrum entation for increm ental processing is achieved by sorting the
prim itive events according to tem poral order. Based on th e conditions to be tested in
the events and based on the d ata needed for com puting the answers, instrum entation
is generated to record d a ta of the earlier events and to test conditions as soon as
th e d a ta on which the conditions depend is available.
G enerating this instrum entation includes th e following steps:
1. explicitly represent the tem poral relationships am ong the prim itive events ref­
erenced in a PM M question,
1If there is a possibility of the two outm ost intervals overlapping then som e additional condition
is needed to ensure this is true.
78
2. for each point event in this set determ ine w hat other events in this set wil]
occur before it,
3. determ ine w hat d ata should be collected and stored at each point event,
4. for each point event in this set determ ine w hat conditions can be tested based
on th e d a ta collected at th e earlier events,
5. for each point event install as its filtering conditions those conditions testable
on the d ata of earlier events.
These steps ensure th a t th e d ata collected and stored at each point of program
execution are relevant based on th e d ata available at th a t point. There are some
cases where irrelevant d ata cannot be elim inated until analysis tim e as is discussed
later.
5.2 Representing Temporal Relationship
T he tem poral relationships among the events referenced in a PM M question are
explicitly represented as a graph called the Event D ependent G raph (EDG), which
explicitly represents the tem poral order among the point events used in th e ques­
tion. By representing the tem poral relationships as a graph the system can use
graph algorithm s to ensure th a t no tem poral dependency cycle exists in a question
definition. T he graph representation also enables the system to use a graph traversal
algorithm when generating instrum entation for the question.
5 .2 .1 E v en t D e p e n d e n t G raph
Given a PM M question definition, its EDG is defined as follows:
D e fin itio n 1 (E D G ) A n ED G is a directed graph whose nodes are point events
used in the definition and whose edges indicate the temporal order among the point
events. Given two nodes N \ and N 2 , there is an edge e fro m N \ to iV2 iff N \ -< N 2
and there is no node N m such that N i -< N m -< N 2.
79
An ED G of a valid PM M question is acyclic,2 although it is not necessarily
connected. Since an EDG is a acyclic graph, for convenience, we call those nodes
w ithout an incom ing edge roots and those nodes w ithout an outgoing edge leaves.
All other nodes are called inner nodes. Given a node in th e graph, those nodes w ith
an edge leading to th e node are called th e predecessors of th e node; those nodes with
an edge leading from the node are called the successors of th e node.
An edge in the graph represents a tem poral constraint on th e nodes involved. If
there is no edge from N{ to N j or from Nj. to N i then any tem poral order could exist
am ong A ^ . and N j. Because -< is a transitive relation, if Ni ~ < N j and N j -< AT*,, then
Ni -< Nk- Hence, for any point event, all of th e point events on the paths from any
root to it should occur before it; all of the point events on the paths from it to any
leave should occur after it. For those points th at cannot reach or be reached from
the point, they could occur either before or after the event.
5 .2 .2 B u ild in g E v en t D e p e n d e n t G raphs
T he goal of building an EDG for a question definition is to m ake the tem poral de­
pendency am ong point events explicit. The graph is built by exam ining each of the
control relations to check w hat tem poral constraints are im plied am ong the events
upon which the control relation is defined. T he knowledge of w hat tem poral con­
straints are im plied by prim itive control relations on events is explicitly represented
in th e system and is used to build the graph. In addition, the system also knows
th a t th e end event of an interval event always occurs after the begin event of the
interval event.
T he following algorithm builds an EDG graph for a PM M question definition.
A lg o r ith m 2 (B u ild an E D G ) Suppose question Q — (A N D E i, E 2, ..., E n C\,
C 2 , ■ ■ ■ , Cm) ■ Let M be a set o f conditions to be tested, initially C 1 , C 2 , Cm. Let
C L U S T E R S be a set o f connected graphic components, initially all point events in
Q ’ s definition.
Repeat the following steps until M = < j > .
2The validation constraints o f the PMM specification language requires that no cyclic temporal
dependency exists.
80
fb
JT a Ri AR,
Figure 5.1: An Event D ependent G raph
1. Select a C fro m M, suppose C has two parameters Ei, Ej.
2. I f the point events associated with Ei and Ej are not in CLUSTERS, report an
error, otherwise connect the point events with directed edges if their temporal
order can be deduced from the sem antics o f C
3. Set M = M - {C }
Finally, fo r all Ei and Ej, which are the begin and end point events o f som e interval
event, if Ei is a leaf then connect an edge from Ei to Ej.
T he algorithm first makes each point event a node. It then exam ines one control
relation at a tim e and connects edges among events involved in th e control relation
using th e explicitly represented tem poral knowledge of th e control relation. For
instance, th e EDG for the following event definition (which is th e exam ple PM M
specification defined in C hapter 4) is shown in Figure 5.1
(defevent fns-call-FOO-and-BAR
:definition
((x) | (Exist (f FOO BAR p)
(AND (Function-execution FOO 'FOO)
(Function-execution BAR ’BAR)
(Function-execution x f)
(Parameter x 1 p)
(Not (= p 0))
(Calls x FOO)
(Calls x BAR)))))
81
where the node set of the EDG is V = { /t fe FOOb FOOe BARb BARe}, the edge
set is E ={(fb FOOb) (fb BARb) (FOOb FOOe) (BARb BAR,) (FOO, f e) (BAR,
f e)}- There is one leaf f e and one root fb-
5.3 Run Time Filtering
M inimizing instrum entation overhead is achieved by inserting instrum entation code
th a t tests conditions at run tim e to elim inate irrelevant d ata and th a t com putes
and stores p artial results and elim inates the no longer useful d ata upon which those
partial results were based. Based on w hat d ata is needed to test a condition there
are two cases to consider: First, conditions th a t depend only on the d ata associated
w ith the event; Second, conditions th a t depend on the d ata associated w ith one
or m ore other events. We call the first case intra-event (or local) conditions and
th e second case inter-event conditions. In the following subsections, we discuss the
two kinds of conditions and exam ine conditions under which partial results can be
com puted.
5.3.1 In tra -E v en t C o n d itio n s
Intra-Event Conditions are conditions th at can be tested by using the local data
available at certain execution points. The essential idea is to test a condition before
d a ta is stored. For exam ple, if the program m er is interested in a function-execution
event whose first param eter value not equal to zero, then the system could generate
instrum entation th a t first tests if the first param eter value of a function invocation
equals zero. Only if the first param eter value is not zero will th e d ata representing
the occurrence of the event be recorded.
5 .3 .2 In ter-E v en t C o n d itio n s
Inter-eve nt conditions are conditions defined on the d ata associated w ith one or
m ore events other th an the one in which the condition is defined. In this case, the
required d a ta from the earlier events needs to be recorded when those events occur
so th a t the conditions can be tested later. R ather than testing these Inter-eve nt
82
conditions as a unit after all the referenced events have occurred, they can be used!
as filtering conditions for each of those events by decom posing them into conditions
testable at each of the instrum entation sites of the referenced events. W henever one
of those events occurs th e portion of th e inter-eve nt condition apportioned to it will
ensure th a t it is a relevant event.
D epending on w hether or not there is a tem poral relationship specified among
the events there are two cases to consider: First, there is some tem poral order among
the involved events; second, there is no tem poral order am ong th e involved events.
If there is some tem poral order among th e involved events then the events can
be sorted according to the tem poral order. Based on th e tem poral order, the system
can generate instrum entation to store d ata needed from th e earlier events and test
conditions on th a t stored d ata in the later events. By th e tim e the later events
occur at least some earlier events should have occurred, otherw ise those events are
not relevant ones which could satisfy the condition.
If there is no tem poral order among the involved events then there are no con­
straints on th e sequence in which the involved events can occur. Hence, when such
an event occurs and no other relevant events have yet occurred, it is still possible
th a t they could occur later. Hence, the system m ust generate instrum entation to
record the needed d ata of the involved events and do the condition checking either
when all the involved events have occurred or at analysis time.
%
5 .3 .3 S co p e A n a ly sis
As discussed previously, the tem poral constraints in a question definition can help
the system determ ine the scope of the d ata collected during program execution. The
scope is used to determ ine when to com pute partial results and when to elim inate
d ata th a t are no longer useful after com puting the partial results.
D e fin itio n 2 ( O u tm o s t I n te rv a l E v e n t) A question definition is said to have an
outm ost interval iff there is an interval event eoutmost in the definition that satisfies
the follow condition: fo r any event e used in the definition other than eoutmost itself
(contains c) is true.
83
Given the conditions about the execution order of th e point events used in the|
question definitions, the outm ost interval for each of th e question definition can be
com puted using the following algorithm :
A lg o r ith m 3 ( O u tm o s t I n te rv a l) This algorithm computes the outm ost interval
event o f a question definition if there is one.
• Input: point events p \, p2, ..., pn used in a question definition and the condi­
tions C \, C2, C m on them. Suppose Cl is o f the fo rm a t (-< p!2)
• Output: the outm ost interval if there is one.
• Method: Apply topological sort on pX ; p2, pn using Ci fo r 1 < i < m . Let:
S be the earliest point events and S be the latest point events. I f both S and
S contains one point event and there is an interval event O whose begin event
and end event in S and £ respectively then output O otherwise there is no
outm ost interval event.
if there is an outm ost interval event and there is no recursion then d a ta collected
for the events of the type can be released at th e end of th e interval. If recursion is
possible then we m ust ensure th a t d ata collected be released only a t the end of the
last interval of the type.
5.4 Incremental Instrumentation Generation
T here are three steps in the increm ental com putation
1. Tem poral Analysis, which determ ines the tem poral dependency among the
prim itive data. It is based on the Event D ependent G raph.
2. Schem a G eneration, which creates a relational schem a to record the instru­
m entation d ata collected during the program ’s execution. It creates a relation
for each of the prim itive events used in a question definition to avoid d ata
conflict among questions.
3. Instrum entation code generation, which generates instrum entation code for
each instrum entation site using the schema and tem poral order from the two
previous steps.
84
Because th e EDG of a question is a directed acyclic graph, th e instrum entation!
generation algorithm is described in term s of graph traverse of th e EDG. In the
following, we first assume th a t the EDG is a connected graph. We then discuss how
to deal w ith the case th a t the EDG is not a connected graph.
A lg o r ith m 4 (D y n a m ic F ilte rin g ) The input o f the algorithm is the ED G o f a
question and the relational schema generated fo r the question. The output o f the
algorithm is the instrum entation generated fo r each o f the prim itive events used in
the definition.
Traverse the graph and do the following starting at root nodes:
1. Root nodes: generate instrum entation fo r recording the data needed by its suc­
cessors and store it in the relations fo r each o f the root nodes.
2. Inner nodes: if all o f its predecessors have been visited then generate instru­
m entation to collect data needed by its successors and to test the conditionsj
that depend only on the data collected at its predecessors and the data collected
by itself. Store the collected data only when the conditions are satisfied.
3. L eaf nodes: if all o f its predecessors have been visited then generate instrum en­
tation to collect data needed fo r computing final answers and to test conditions
that depend on data collected at its predecessors and the data collected by it­
self. Compute partial results based on the data o f its predecessors and the data
collected by itself and store them in the relations defined fo r it. I f this is an
end point event o f an outm ost interval event, eliminate the data stored at its
predecessors.
This algorithm starts from root nodes generating instrum entation for recording the
a ttrib u te values used for either condition testing or final results com puting. It then
visits the im m ediate successors of th e nodes th a t have been visited and generates
instrum entation code for those nodes whose im m ediate predecessors have all been
visited. T he instrum entation code generated tests the conditions th a t depend only
on th e d a ta collected at its predecessors and the d ata collected at this site, and
collects the required d ata for this site. The collected d ata will not be stored unless
th e conditions tested are true. If th e node is a leaf node, partial results are com puted
85
if it is an end point event of an outm ost interval and th e stored tem poral d ata is
elim inated.
If an ED G is not a connected graph then the above algorithm is applied to each
of its connected com ponents and the required m onitoring results are com puted at
analysis time.
5 .4 .1 D e a lin g W ith D eriv ed R e la tio n s
Derived events or derived relations are defined on either prim itive events and con­
trol relations or other derived events and relations. Hence, they cannot be collected
directly during the program ’s execution. Instead, they are com puted from the prim ­
itive events and control relations upon which they are defined ju st like th e PM M
questions described above. If derived events or relations are used in a PM M ques­
tion there are two alternatives. In the first alternative, th e definitions of the derived
events or relations could be expanded in the question definition. In th at case, the
above algorithm works fine. In the second alternative, th e derived events and re­
lations are treated as if they were ju st prim itive events and control relations by
generating instrum entation for com puting them and by only letting other events ac­
cess th e derived d ata not the prim itives and control relations they depended upon.
T he first approach which expands the derived events and control relations needs to
only deal w ith th e prim itive events and control relations, hence, it is very simple.
However, by expanding the definitions it may lose some opportunity for sharing com­
m on derived inform ation. On the other hand, th e second approach makes sharing
easy. T he PM M system takes the first approach because th e focus is on increm ental
com putation of th e answers to PM M questions.
5 .4 .2 T im e an d S p a ce C om p ro m ises in P M M
As described above increm ental com putation depends upon ex tra instrum entation
to do run-tim e filtering and partial result com putation. Because of this ex tra in­
strum entation, th e augm ented program ’s execution has some additional overhead
w ith regards to both tim e and space. One n atu ral question to ask is when is the
increm ental algorithm effective? T h at is, when does this ex tra effort result in an
86
overall reduction in the cost of collecting and analyzing th e instrum entation. We
first exam ine several special cases before considering the general case.
D epending on the tem poral relationships existing am ong th e events used in a
question definition there are three cases of interest. First, there is no order among
th e events in the definition. Second, there is a total order am ong the events of the
definition. Third, there is an outm ost interval event in th e event definition.
If there is no tem poral order among the events used in a question definition,
when an event occurs the instrum entation generated will sim ply record the needed
d a ta of th e event, because there are no other events to check against and the event
m ight be needed later. In other words, even if the event is not relevant now it still
m ight be relevant later. Therefore, only at analysis tim e can the system determ ine
if it is relevant. So, this is a case where run time testing is not very effective.
If there is a to tal order am ong th e events used in a question definition, when
an event occurs, only the im m ediate predecessor needs to be checked. If the event
and its im m ediate predecessor satisfy the conditions defined on them , the event and
its d ata are recorded. T he recorded data represents th e p artial result of the event
and all of its predecessors. Hence, each of the events represents th e partial results
of th e event and the events before it. W hen the last event occurs th e final result
is com puted. Because th e conditions defined in the question are used to filter out
irrelevant events at each of the point events and no event is recorded unless it passes
all of the conditions defined on itself and all of its predecessors, this is a case where
run-tim e testing is m ost effective.
In general, determ ining th e effectiveness of run-tim e testing requires the ability
to estim ate the costs of alternative choices. This in tu rn relies on estim ates (or
m easurem ents) of such things as d ata volume and the likelihood of branch conditions,
which are not readily deduced from the program itself, or readily available from
th e program m er. The system uses simple heuristics to m ake such choices. As an
exam ple, th e specification m ay talk about invocations of function F w ith various
properties, such as an argum ent being less than zero, th a t can easily be checked
when F is called. In this case the argum ent will be com pared w ith zero at run tim e.
If th e condition fails then no further m onitoring activity is perform ed. On the other
hand, if th e specification talks about pairs of invocations of th e functions F and G
87
w ith the sam e argum ents, it simply records all the argum ents of all invocations and
delays the com parisons for analysis time.
All other cases fall between the two cases. In those cases, only some of the
point event’s filtering conditions can be tested. Among those cases a question with
an outm ost interval event is of interest. Because there is an outm ost event, by
definition every event in the question is contained in the event. Hence when the
end event of the interval event occurs, all of the instrum entation d a ta recorded is
released once th e answers are com puted.
5 .4 .3 O th er O p tim iza tio n s
In our approach, questions about a program ’s execution have been transform ed into
queries to a database whose schem a is dynam ically generated based on the PM M
questions th a t have been asked. Once d ata is collected, answers to the questions are
com puted from the stored data. Because query evaluation is used both in com puting
partial results and final results, there are issues of how to com pute the derived
relations fast, and various query optim ization techniques[U1189, Coh86, Coh89a]
could be used.
5.5 Summary
Increm ental com putation improves the efficiency of m onitoring instrum entation us­
ing the following techniques: (1) Filter prom otion which tests conditions th a t an
event needs to satisfy before the d ata is recorded; (2) Scope analysis which uses the
scope of the instrum entation sites to elim inate some irrelevant interm ediate data;
and (3) Increm ental partial result com putation th a t increm entally com putes the
answers for the questions.
D ynam ic filtering applies filter prom otion techniques to the dom ain of program
m onitoring and m easuring. Dynam ic filtering does the following things:
1. Testing m fra-event filters, e.g., compare the values of th e attrib u tes of some
event against a set of constants.
88
2. Collecting d a ta and testing inter-eve nt filters, e.g., com pare the attrib u tes
values of two different events.
3. C om puting answers to th e PM M questions and elim inating d ata th a t is not
useful after certain points, e.g., com pute and store a partial result so th a t less
d a ta needs to be stored.
where the first step integrates intra-eve nt filters w ith th e prim itive sources to test if
an event is of interest before it is recorded, the second step integrates filter testing
against recorded partial results, the third step integrates techniques th a t enable thej
answers to th e questions to be com puted as soon as the th e d ata those answers
depend on is available, hence d ata th a t is no longer useful can be discarded as soon
as possible.
89
Chapter 6
SmartMonitor Evaluation
This chapter focus on the evaluation of th e Sm artM onitor system . It tries to answer
th e following question: How well does the system perform ?
T he perform ance of the system can be m easured by determ ining how effectively it
can filter out irrelevant data. To show the effectiveness of the m ethods introduced in
the previous sections we apply them to a real program used in th e CLF environm ent.
We com pare th e am ount of d a ta collected using the following m ethods:
1. Profile based tools th at record all d ata of interest.
2. O ur m ethod using only static filtering, which employs compile tim e optim iza­
tion to elim inate irrelevant sites.
3. O ur m ethod using only dynam ic filtering, which uses ru n tim e filtering to
elim inate irrelevant d a ta during program execution.
4. O ur m ethod using both static and dynam ic filtering, which uses both the
second and the third m ethod.
6.1 Application Program and Question
We present an A P5 exam ple to illustrate how Sm artM onitor supports very high
level languages. T he exam ple is draw n from a calendar application th a t keeps track
jof people’s schedules. This application defines two application specific types, Ap-
jpointm ent and Trip. Each such object is related to a start tim e, a duration, and a
set of people. In addition to these types and relations, AP5 supports constraints. In
90
particular, the calendar application contains a constraint th a t prohibits scheduling
a person for two different appointm ents w ith overlapping tim es (The constraint is
w ritten here in a pseudo notation to simplify understanding).
PROHIBIT Conflicting-appointments
3(person, appointm enti, appointment2)
(appointmenti ^ appointment^) A
appointment-participant(person, appointmenti') A
appointment-participant(person, a p p o i n t m e n t i ) A
OVERLAP {appointm enti, appointm enti)
C onstraints such as the one above provide an exam ple of high level constructs
w ith perform ance consequences th a t are not readily apparent. In particular, if we
try to schedule an appointm ent, how long does it take for the AP5 runtim e system
to determ ine w hether there is a conflict? W hat choices in th e source code affect this
tim e and how? As described in C hapter 3, the AP5 version of Sm artM onitor provides
event types th a t correspond to such high-level concepts as atom ic executions, and
rule executions (including constraints). Sim ilarly it also provides a relation between
atom ic executions and the rule firings th a t result from it (establishing this relation
required small change to the AP5 im plem entation). As an exam ple, the following
specification asks for th e atom ic executions (and their durations) th a t cause the
above rule (whose nam e is “conflicting-appointm ents”) to run:
appointment-update-duration(x, t) is defined as
3z atomic-execution(x) A
duration(x, t) A
triggers(x, z) A
rule-execution(z “CONFLICTING-APPOINTMENTS”)
6.2 The Experiment
T he experim ent consisted of two hundred random atom ic updates to either an ap­
pointm ent or a trip. T he execution results are sum m arized in th e following table.
Ite m N am e n u m b e r e n trie s
Total updates 200
Updates to trip 97
Updates to appointment 103
Relevant updates 24
91
where th e total updates are th e num ber of atom ic executions th a t update either
A ppointm ent or Trip, the Updates to trip are the num ber of atom ic executions th a t
u p d ate Trip, the Updates to appointment are the num ber of atom ic executions th a t
u pdate Appointm ent, and the relevant updates are the num ber of updates th a t actu­
ally triggered th e rule conflicting-appointments, which is th e d a ta we were looking
for (actually th e duration of these atom ic executions).
T he effectiveness of the four different optim ization m ethods in term s of the num ­
ber of entries checked and recorded on the m onitoring exam ple is sum m arized in
th e following table:
Ite m N am e n u m b e r en trie s checked n u m b e r e n trie s reco rd ed
S t r aight for war d“ 200 200
Static Approximation6 103 103
Run-time Filteringc 200 24
Combined Methods'6 103 24
a Profile-like tools which record all events of a specified type.
4 Use only compile tim e analysis alone.
cUse run time test alone.
dUse both com pile-tim e and run-time analysis
where number of entries checked represents the num ber of tim es th a t the inserted
instrum entation code is executed to see if the event m eets the specified criteria;
number of entries recorded represents the num ber of events th a t satisfied this criteria.
6.3 Experiment Analysis
T he results of th e experim ent confirmed th at most of the d ata collected w ith the
traditional m ethod are irrelevant and th a t our high level focused m onitoring and
m easuring m ethods are useful in reducing this irrelevant data. In particular, it
showed how large reductions could be m ade separately in th e num ber of events
checked and recorded by static approxim ation and run-tim e filtering respectively.
Moreover, it showed th a t th e com bination of the two was com plem entary and very
effective.
92
C urrent trends in program m ing languages are toward the use of strong typing
[Mil84, CW 85, Hud89, ACPP91], m odular program developm ent, and object ori­
ented program m ing practice. Those trends encourage partitioning applications into
loosely connected com ponents. This makes compile tim e analysis of elim inating ir­
relevant d a ta sites very effective. The experim ent served as a very good exam ple
in which the m odules dealing w ith appointment and th e modules dealing w ith trip
could be identified at compile tim e, and thus the compile tim e optim ization was
effective as illustrated by the result. T he fact th a t the application program was
w ritten in AP5 shows th a t this approach works well for languages w ith very high
level constructs, e.g., atom ic executions and rule executions.
6.4 Summary
T he experim ent m ade three points:
• the system we built can work on real program s
• th e results show th a t it is effective in reducing the collection of the irrelevant
d ata
• th e system works for relational paradigm s (i.e., it can be applied to th e high
level relational concepts introduced in AP5).
93
Chapter 7
Conclusions and Future Work
C urrent trends tow ard m ore complex programs and use of higher level languages
m ake program m onitoring and m easuring more im portant, since both trends tend
to obscure the sources of execution costs. These trends also raise a new require­
m ent to program m onitoring and measuring for supporting focused m onitoring and
m easuring of program execution in term s of the abstractions they employ. Existing
m onitoring and m easuring tools do not support this new requirem ent and the prac­
tice of letting program m ers insert instrum entation by them selves for m onitoring and
m easuring is difficult, tedious, and error prone.
This research has shown, by building Sm artM onitor, th a t autom atic program ­
m ing for program m onitoring and measuring is feasible. In particular, this fram e­
work provides a system atic approach to program m onitoring and m easuring
• Program m onitoring and m easuring requirem ents can be expressed in a spec­
ification language
• T he process of inserting instrum entation and analyzing the collected d ata can
be autom ated
Moreover, this research has also shown th at the generated instrum entation code can
be efficient in the am ount of d ata th a t it collects and retains. In particular, m ethods
have been defined to
• select prim itives for the specification language th a t facilitate both specifying
PM M questions and generating effective instrum entation,
94
• use th e PM M specification and static analysis to help th e system generate
instrum entation th a t only collects relevant data,
• use tem poral dependency among events to filter out irrelevant d ata at run
tim e,
• and use a database, w ith its relational schema, abstraction and inferencing
capability, and query processor to facilitate the collection, com putation, and
access to the required m onitoring results.
T he integration and incorporation of knowledge about the PM M specification, the
source program language sem antics, and the source program structures can thus
form the basis of autom atically generating instrum entation th a t m inim izes the in­
strum entation overhead.
7.1 Summary
T his dissertation describes a new approach to the problem of answering questions
about th e perform ance of program s in which an autom atic tool accepts the original
program and a set of questions posed in a formal specification language and installs
instrum entation code in the program much as a hum an program m er would. It
determ ines w hat d ata m ust be collected, finds the places in the source program
where th a t d ata should be collected, and inserts code to collect and process th at
data. A utom ation makes th e process faster and more reliable. T he specifications
are generally quite concise. It is usually much easier to w rite th e specification than
to m anually instrum ent th e program .
In addition to the specification language, this dissertation describes an im ple­
m entation of such a tool, along w ith m any of the m ethods used to produce effective
instrum entation.
7.2 Main contributions
T he prim ary contribution of this dissertation is to provide a fram ework for high
level program m onitoring and m easuring. The key elem ents of this work are:
95
• a m odel of program execution and a language in which high level questions
about executions of th a t m odel can be expressed.
PM M questions can be asked in term s of a very high level specification lan­
guage th a t is based on th e Entity-R elationship m odel w ith a rich set of pro­
gram m ing language specific prim itives. T he declarative nature of the specifi­
cation language makes is simple as we observed in C hapter 4 to reason about
w hat d ata should be collected and where th a t d a ta is available for collection.
• a dem onstration th a t these questions can be autom atically transform ed into
instrum entation th at com putes the required answers.
By selecting prim itives th a t characterize the prim itive perform ance activities
of program execution and using the Entity-R elationship model to model high
level questions in term s of these prim itives, the fram ework facilitates the tight
connection between th e source program ’s execution m odel and the PM M spec­
ification. This connection enables the use of static analysis to locate places
in th e source program where the relevant d ata can be collected. T he use of
tem poral analysis enables run tim e filtering to be used to further reduce the
collection of irrelevant data. As a result, the generated instrum entation code
to collect perform ance d ata can be reasonably efficient. Answers to the PM M
questions are produced by using a relational representation for th e recorded
d ata and a query processor to derive the results.
• an understanding of m any techniques for perform ing th a t transform ation and
th e tradeoffs inherent in their use
T he im plem entation m ade clear the costs associated w ith compile tim e, run
tim e, and analysis tim e filtering. It also illustrated various ways to enforce the
filtering conditions at different stages.
In sum m ary, th e research described in this dissertation represents a big step to­
wards autom atic program m ing of program m onitoring and m easuring capabilities.
It provides a convincing confirm ation to our claim th a t the process of inserting in­
strum entation for program m onitoring and m easuring can and should be autom ated.
T he connection of this work to research in database query optim ization, compiler
96
optim ization, static analysis of com puter program s, and tem poral reasoning demon­
strates th e wide range of capabilities required in such system s.
7.3 Limitations of the Research
T he research results reported in th e dissertation have im plem entation, testing, and
theoretic lim itations:
• There is no com pensation m odel for the instrum entation code. Because in­
strum entation code is inserted into the source program , th e execution of the
instrum entation code com petes w ith the source program for system resources,
such as tim e and space. Currently, the fram ework does not take th a t into
consideration. Instead, it tries to reduce the instrum entation overhead w ith
optim izations of the am ount of d ata checked and recorded.
• There is no declarative way of extending the set of prim itives. Because the
instrum entation interface for the prim itives used in the specification language
is hand crafted and im plem entation dependent, extending the prim itive set is
difficult.
• T he static analysis perform ed is very prim itive, inform ation like d ata flow in
th e source program is not utilized in the work.
• T he set of heuristics for selecting instrum entation algorithm s is very prim itive.
There is a lack of a cost m odel for different instrum entation algorithm s.
• Only questions expressed in conjunctive norm al form are supported.
• There is no full theoretical investigation on the com pleteness a n d /o r the ex­
pressiveness of the specification language.
7.4 Future Work
Extensions of this research should be concerned w ith rem oving m any of the above
lim itations. More specifically, future work should include: extensions to perm it
97
m ore general models of program execution and refinem ents to adm it m ore efficient
instrum entation.
T here are several directions for th e extensions to perm it m ore general m od­
els of program execution. First, supporting a m echanism for extending the set
of prim itives supported by th e PM M specification language. Second, applying the
fram ework on different com putation models and languages, such as parallel program
executions and functional program m ing. T hird, applying the fram ework developed
here in different contexts. For instance, in program anim ation, it is often the case
th at program m ers need to identify th e parts of the source program where the d ata
used to drive some graphic anim ations can be collected. It should be possible w ith
some suitable specification language to describe the kinds of d ata needed, and use
the fram ework developed here to locate only those relevant places in the source pro­
gram and to insert “instrum entation” code th a t autom atically collects the needed
data.
One refinem ent to adm it more efficient instrum entation could include providing
an evaluation m ethod to determ ine the cost of various algorithm s for instrum en­
tation. A second could investigate how to integrate m onitoring requirem ents into
compiler construction tools so th at th e prim itive support for the PM M paradigm is
available directly from the language compiler. For instance, providing annotations
for a program m ing language compiler-compiler so th a t the prim itives of th e PMM
specification language are supported by the language compiler. Hence, modifica­
tions to these language compilers would not be necessary. A th ird refinem ent could
also include investigation for instrum entation com pensations so th a t th e m onitoring
results could m ore accurately reflect the execution cost of the source program .
7.5 Concluding Remarks
This research has shown th a t program m onitoring and m easuring questions can be
described in a high level declarative language and the process of generating and
inserting instrum entation into the source program can be autom ated. A utom atic
program m ing in the dom ain of program m onitoring and m easuring is not only de­
sirable b u t also feasible. This dissertation stands as a solid proof of our claim
98
and represents significant progress towards the goal of autom atic program m ing of
program m onitoring and m easuring.
99
Appendix A
The PMM Specification Language
A .l Primitives and Attributes
jThe following is th e set of predefined prim itives for th e PM M specification language
followed by their attributes.
• Interval
is a new type of entity. It is associated w ith a pair of points of execution,
called the startin g and ending points.
- Param eters(Interval values)
relates an interval event to its param eter (a list of values).
- Param eter(Interval natural-iiumber values)
relates an interval event to its nth param eter (values).
- Return-Values(Interval values)
relates an interval event to its return values (a list of values).
- Return-Value(Interval natural-number values)
relates an interval event to its nth param eter (values).
- Duration(Interval Integer)
relates an interval event to its duration, i.e., tim e spent in on the event.
• Function-lnvocation,
is a subtype of the interval event type.
- Function-Name(Function-lnvocation symbol)
relates a Function-lnvocation event to the function’s nam e.
100
- Function-Execution(function-invocation function-name)
is a derived relation defined on Function-lnvocation and Function-Name.
• Relation-operation,
is a subtype of th e interval event type.
- Relation-Name(Relation-operation symbol)
relates a relation operation to the relation’s name.
• Relation-Test
is a subtype of the Relation-operation type.
• Relation-updates
is a subtype of the Relation-operation type.
• Relation-lnsert
is a subtype of the Relation-updates type.
• Relation-Delete,
is a subtype of th e Relation-updates type.
• Relation-Generating
is a subtype of th e Relation-operation type.
- lnput-pattern(Relation-Generating list)
relates a Relation-Generating event to its generation p attern .
- Generators(Relation-Generating list)
relates a Relation-Generating event to those relations th a t are used as genera­
tors.
- Filters (Relation-Generating list)
relates a Relation-Generating event to those relations th a t are used as filters.
- Relation-Size(Relation-Generating number)
relates a Relation-Generating event to th e size of the generated relation.
• Rule-Operation
is a subtype of the interval event type.
- Rule-Name(Rule-Operation symbol)
relates a Rule-Operation event to the nam e of the rule.
101
• Rule-Triggering
is a subtype of the Rule-Operation type.
• Rule-Body-Execution
is a subtype of th e Rule-Operation type.
• Rule-Execution(Rule-Operation symbol)
is a derived relation defined on Rule-Operation and Rule-Name th a t relates a
Rule-Operation event to the nam e of the rule executed.
• Atomic-Execution
is a subtype of the interval event.
- Data-gathering-time(Atomic-Execution integer)
relates an Atomic-Execution event to the d ata gathering tim e of the execution.
- A-Rule-l\lames(Atomic-Execution list)
relates an Atomic-Execution event to the names of th e autom ation rules trig­
gered.
- C-Rule-I\lames(Atomic-Execution list)
relates an Atomic-Execution event to the names of th e consistency rules trig­
gered.
- Proposed-Update(Atomic-Execution list)
relates an Atomic-Execution event to the tuples of th e proposed updates.
- Update-Done (Atomic-Execution list)
relates an Atomic-Execution event to the tuples of th e updates done.
• Point
is a new type of entity. They occur at a single point of execution.
- Time(Point value)
relates th e Point event to the tim e when it occurs.
• Begin(Point Interval)
relates the Point event to the Interval event. It represents th a t the point event
is the begin point event of the interval event.
• End(Point Interval)
relates the Point event to the Interval event. It represents th a t th e point event
is th e end point event of the interval event.
102
- Data-condition(Point Expression value)
relates th e Point event to an source program expression and its evaluation
value. It represents th a t the evaluation value of the expression is the value
when th e point event occurs. This attrib u te is defined for interval events as
well. W hen it is applied on an interval event it is assumed th a t it is applied
to the begin point event of th e interval event. The Expression is an expression
defined in the source program m ing language whose evaluation does not have
any side effect.
A.2 Control Relations
Control relations are tem poral relationships betw een different events during program
execution. For point events E,-, (Tim e E,) represents th e tim e E, occurred.
• (Before E i E 2 )
defines a relationship between two events E a and E 2. If both E x and E 2 are
point events then th e relation is tru e iff (Tim e E \) < (T im e E 2). For interval
events E i, i = 1,2, Tst- and Te; are the starting tim e and ending tim e of Ei
and Tsi < Tei. (Before E j E 2) is true iff Te 1 < Ts2.
• (Contains E* E 2)
is tru e iff T5l < Ts2 < Te2 < Tel. E i m ust be an interval event.
• (Triggers Atomic-Execution Rule-Execution)
is tru e if the Atomic-Execution event triggers the Rule-Execution event.
• (U pdates Atomic-Execution Relation-updates)
is tru e if th e Atomic-Execution event updates th e Rule-updates event.
• (G enerates Interval Relation-generating)
is tru e if th e Interval event contains the Rule-generating event.
• (Tests Interval Relation-test)
is tru e if th e Interval event contains th e Rule-test event.
103
• (References Interval Relation-operation)
is tru e if th e Interval event contains the Rule-operation event.
• (Calls Function-Executioni Function-Execution2)
is tru e Function-Executioni directly calls Function-Execution2.
• (Calls* Function-Executioni Function-Execution2)
is tru e Function-Executioni directly or indirectly calls Function-Execution2.
A.3 Value Types and Other Relations
Besides th e prim itives defined above, the PM M specification language also includes
some value types and relations defined in the value types. Values types include
program data, tim e, etc. Relations among data objects of th e value types, e.g., the
values of th e param eters passed to different executions include com parison operators
and those predicates th a t are defined in the source program m ing language.
Aggregation operators could be used to define some derived relations. Suppose
th a t R is a relation w ith arity n. T he aggregation operators could be used as follows
to define a new derived relation
(defevent name
rdefinition (aggregation-operator R i I O \ J02 ... /On)))
where 70,- is either IN PU T or O U TPU T. In general, every tuple of th e original
relation corresponds to exactly one tuple of the derived relation. T h a t tuple is
obtained by keeping only the elem ents of the original tuple at th e “in p u t” positions
and adding an integer at the end. Depending on the aggregation operator used,
the num ber added has different meaning. Suppose the operator is a count operator,
The integer at the end of a derived tuple is the num ber of tuples of th e original
relation th a t correspond to th a t derived tuple. Thus, if th e p attern consists entirely
of o u tp u t’s th e derived relation will contain at m ost a single one tuple: th e num ber of
tuples in the entire relation. At th e other extrem e, if the p attern consists entirely of
in p u t’s then the derived relation will contain exactly as m any tuples as th e original |
relation b u t each tuple will have a one at the end. For instance, suppose th a t the
arity of relation R is three
104
(dsfevent cardinality
:definition (count R (input output output)))
defines a derived relation c a r d i n a l i t y (typei number) where typ ei is the type of
jthe first slot in the original relation and num ber is the num ber of those tuples in
the original relation w ith the first slot value explicitly fixed.
Similarly, other aggregation operators are used to define derived event attributes.
A.4 PMM Language Syntax
T h e following is th e BN F definition of th e PM M language. We follow the BNF
conventions to use and as m eta symbols. Note th a t “(” is not used as
ja m eta symbol. Upcased words are key words.
PMM-Specification::
{Relation-def inition}* {Question-definition}*
Relation-definition::
(DEFEVENT Event-Name :DEFINITION Event-specification)
Question-definition::
(DEFQUESTION Question-name :DEFINITION Event-specification)
Activity-name::
'Event-Name I Question-name;
Event-specification::
(E-vars Suchthat Event-Expression) I
(Aggregate-Operator Activity-name input-output-pattern);
Suchthat:: SUCHTHAT I S.T. I ‘I’;
Event-Expression::
Monitoring-Term I (EXISTS E-Vars Event-Expression) |
(FORALL E-Vars Event-Expression) I
(Logic-Operator Event-Expression Event-Expression+) ;
Monitoring-Term: :
(Event-Expression) I
(Activity-name E-var E-var* value-type*)
'Logic-Operator:: OR | AND I NOT
105
Aggregate-Operator: : COUNT I AVERAGE I SUM I MIN I MAX
•input-output-pattern:: (input-output*);
■input-output:: INPUT I OUTPUT;
E-Vars:: (E-Var*) ; E-Var:: Symbol ;
Event-Name:: Symbol; Question-name:: Symbol;
value-type:: Integer | String I List I Symbol;
E vent nam es a n d /o r question names m ust be defined before they are used. A valid
I p MM specification should also satisfy the constraints defined in C hapter 3.
106
Appendix B
The Prototype Implementation
This chapter describes th e prototype im plem entation of the Sm artM onitor. We
first introduce th e basic functionalities supported by th e im plem entation. We then
discuss th e assum ptions and im plem entation decisions m ade for it.
B .l SmartMonitor Implementation Overview
Sm artM onitor was developed at the Inform ation Science In stitu te of University of
Southern California. It is w ritten in Lucid Common Lisp using A P5 on an HP9000
series 300 w orkstation. It has been used to answer perform ance questions about non­
trivial software in real use. T he focus of th e developm ent is on functionality. M any
features are not addressed, e.g., user interface. In order to m ake a working system,
a lot of compromises are m ade. W henever there is a tradeoff between sim plicity and
expressiveness, priority is given to simplicity.
T he system provides:
1. A specification language for m onitoring and m easuring the execution of AP5
program s. T he language is based on a set of prim itive events and control
relations listed at A ppendix A. The language also lets one to define derived
events, derived control relations, and questions.
2. An autom atic program m er for the specification language which includes
(a) instrum entation interfaces for prim itive event types
(b) instrum entation generation tem plates for prim itive control relations
107
(c) m appings between prim itives and their instrum entation sites in the source
program
(d) transform ations from PM M specifications to queries on th e static analysis
result database of the source program
(e) tem poral analysis mechanisms for PM M specifications
3. T he m onitoring results are represented in A P5 relational form at and they
could be accessed directly via A P5’s query language.
Com pared w ith th e ideal im plem entation, th e system has th e following lim itations:
1. Only a subset of prim itives identified for the PM M specification language of
AP5 is im plem ented.
2. Space usage attrib u tes of events is not supported by th e im plem entation.
3. T he tem poral analysis mechanisms for PM M specifications discussed in Chap­
ter 5 is not com pletely im plem ented, specifically, th e increm ental com putation
of the m onitoring results is im plem ented only when there is an outm ost inter­
val.
4. Straightforw ard representation for the instrum entation d a ta is used although
selecting d ata representation depending on how they are used can result in
b etter perform ance.
5. Selection among m ultiple instrum entation algorithm s is done based on very
simple heuristics. For instance, th e current instrum entation always test local
filters at run tim e w ithout knowledge of how effective th e filters are. T h at is
because m aking right decision needs such inform ation as w hat percent of data
would pass th e filters. T h at kind of inform ation is usually not available.
6. Knowledge of the m appings is coded into the system so th a t reasonable ef­
ficiency is achieved. However, it is hard to extend the prim itive event types
of th e PM M specification language because of th at. Sim ilarly knowledge of
dealing w ith prim itive control relations is coded directly into the system and
adding prim itive control relations could not be done declaratively at the cur­
rent im plem entation.
108
T he current im plem entation does m inim al error checking and leaves m ost errors to
be trap p ed by A P5 at runtim e.
Sm artM onitor assumes:
1. T he m onitored source program is loaded because some static analysis uses
inform ation th a t is only available when th e program is loaded.
2. There is a nature initialization and term ination of th e m onitoring and m easur­
ing execution. It requires the program m ers to explicitly tell th e system when
th e program m onitoring and m easuring should be begun and ended.
B.2 SmartMonitor Components
This section describes the m ajor com ponents of the autom atic program m ing system ,
m ajor assum ptions and design decisions used.
B .2 .1 P M M S p ecifica tio n L anguage
A PM M specification has two parts:
1. An activity model which is represented by a relational schem a
2. A set of questions which are represented by a set of queries defined on the
above schema
T hey are defined by the following constructs: event definitions and question defini­
tions. Defining a PM M specification has the following functions:
1. introducing the terminology
2. specifying the interested execution model
3. specifying th e questions.
It enables program m ers com m unicate w ith the autom atic program m ing system to
tell w hat are of interest. It can also be used by the system to focus on only the
relevant d ata specified in th e specification.
109
E v e n t D e fin itio n s
D e fe v e n t is used to define either an event type or a relationship among events.
It supports abstractions. It is mainly used in com puting answers to the questions.
A fter the execution of th e program , the tuples of the relations m ay or m ay not be
kept.
T he system also supports specialization of event types. An event type is defined
by an unary definition. If event type E sub is a subtype of E sup then all attributes
defined on E sup are defined on E suf,
Q u e s tio n D e fin itio n s
D e fq u e s tio n defines a query to the relational schem a defined to characterize the
activities to m onitor. Unlike event definitions, th e d ata com puted by th e question
definitions are th e answers to the questions defined in th e specification. After the
execution of th e program , th e com puted d ata are kept in th e AP5 relation form.
T he specification language is an extension to A P5 schem a definition and query
language. M aking the PMM specification language based on A P5 language enables
Sm artM onitor use the tools built for the AP5 language to process th e PM M speci­
fications. In particular, existing tools of AP5 are used to
• check if a PM M specification is well formed
• define th e event schema used for storing the collected d ata
• support relations other than events and control relations, like + , etc.
Sm artM onitor uses AP5 relations to represent th e relational schem a of the in­
strum entation data. Using AP5 relations enables Sm artM onitor to simply use AP5
query generator to gener-ate algorithm s for the queries generated by Sm artM onitor.
It also enables Sm artM onitor to utilize the query optim ization techniques of AP5
query generator to generate b etter algorithm s. A nother advantage of using AP5 is
the generality of AP5 representation. In AP5, a relation is an abstraction of data.
A relation is characterized by the operations defined upon it. For exam ple, it sep­
arates representations from the functionalities. Because d a ta recording in program
m onitoring and m easuring happens during the execution of th e m onitored program ,
110
jthe overhead of it should be minimized. W ith the separation of representation and
functionalities, SmartMonitor records instrumentation data by treating tuples of
Lome relations as LISP data. By using simple LISP operations to store data rea-
jsonable efficiency is achieved. By providing relational interface required by A P5’s
■relational abstraction, the stored data could be accessed via the query language of
AP5. Hence, A P5’s semantics enables SmartMonitor to achieve its functionality
without sacrificing too much efficiency.
B .2 .2 A u to m a tic P rogram m er
|To a program m er using the system, program m onitoring and m easuring has the
following steps:
1. preparing a specification
2. submitting both the specification and the source program to the system
3. running the generated augmented program
4. reading the answers generated by the execution
T he following commands are used between th e first step and the fourth step:
• T he augm ented program is generated by invoking the following com m and
(Monitor-On PMM-specification Source-Pro gram)
• T he instrum entation of th e augm ented program is elim inated by invoking the
following com m and
(Monitor-OfF Source-Program)
• M onitoring starts on an augm ented program via the following com m and
(Monitor-Begin Source-Program)
• M onitoring ends on an augm ented program via th e following com m and
(Monitor-End Source-Program)
M onitoring and m easuring a source program works in a batch mode. If a program ­
m er changes some p art of the PM M specification, the augm ented program will be
regenerated from scratch. No increm ental generation is supported.
I ll
B .2 .3 S ta tic A p p ro x im a tio n
S tatic analysis is used to store all potential sites and their relationships into a static
analysis database. T he relevant sites of a PM M specification are determ ined by
transform ing the specification into a set of queries to the static analysis database.
T he evaluation results of the queries are th e sites of the specification.
C urrently, there is no declarative way for a program m er to com m unicate w ith the
static analysis tool to let it record w hat is of interest. S tatic analysis is done for the
source program w ithout taking the PM M specification into consideration. It assumes
th a t static analysis results are stable. It tries to fully utilize the static analysis tools
provided in th e environm ent. T hat is because the environm ent already represents
analysis results in AP5 database and enables the use of AP5 query language to query
the results. T he static analysis could be im proved by only recording the relevant
stuff.
B .2 .4 In str u m e n ta tio n In terface
Instrum entation interface supports th e access of th e run tim e perform ance data
of the prim itive events and control relations used in the specification language.
Currently, th e instrum entation interface for accessing run tim e perform ance data
is hand coded for several reasons. First, executing instrum entation code consumes ]
system resource, hence, it should be im plem ented as efficient as possible. Second, it
needs to deal w ith th e im plem entation detail of th e program m ing language.
Prim itives are used to set up the correspondence between the specification lan-
i
guage and th e execution of the program m ing language constructs. There are several
reasons why it is difficult to extend the prim itives and control relations used in the
specification language declaratively:
• There is no declarative way to inform static analysis tool to record all sites
for a new prim itive although it would not be very difficult to extend th e static
analysis tool to do so
• There is no sim ple way to declaratively define the code th a t supports the
interface of the new prim itive
112
lit is our belief th a t th e execution of a program w ritten in a program m ing language
jean be described by a set of prim itives and control relations. Based on th e principles
stated in C hapter 3, m ost of the perform ance questions of a program ’s execution
jean be answers. Hence, changes in th e set of prim itives events and control relations
do not happen very often.
B .2 .5 G en era tin g In stru m en ta tio n
Instrum entation generation consists of th e following com ponents:
1. instrum entation tem plates for predefined prim itives and control relations axe
w ritten in CLOS[Ste90] th at supports object-oriented program m ing. In par­
ticular, th e tem plates are organized following th e type hierarchy in th e PM M
specification language.
2. instrum entation plans for logic operators
th e separation m akes it possible to separate autom atic program m er into language
dependent p art th a t is the first p art and language independent p art th at is the
second part.
B.3 Implementation Summary
T he im plem entation of Sm artM onitor benefits greatly from th e use of th e A P5 lan­
guage and th e CLF[Pro88] environm ent. T he schema definition language and the
query optim izer of AP5 m akes th e representation of the m onitoring d ata and the
com putation of th e m onitoring d ata easier. Because AP5 is m ain m em ory virtue
database th e im plem entation of Sm artM onitor can take advantage of th e conve­
nience associated w ith relational representation w ithout paying th e overhead asso­
ciated w ith traditional database systems. The availability of a very sophisticated
static analyzer in CLF makes the im plem entation simpler. Furtherm ore, the in­
tegration of A P5 and the CLF environm ent makes Sm artM onitor be p art of the
environm ent so th a t th e interm ediate d ata and m onitoring results can be m anip­
ulated using th e tools in the environm ent. All of these m ade the im plem entation
possible and m uch easier.
113
Reference List
[ACPP91]
[ASU86]
[Bal69]
[Bal85]
[Bal86]
[Bar85]
[Bat83]
[Bat88]
[BCG83]
[BD77]
[Ben88]
M. A badi, L. Cardelli, B. Pierce, and G. Plotkin. Dynam ic typing in sta t­
ically typed language. A C M Transactions on Programming Languages
and System s, 13(2):237-268, April 1991.
A. V. Abo, R. Sethi, and J. D. Ullman. Com pilers-Principles, Tech­
niques, and Tools. Addison-W esley Publishing Company, 1986.
R. Balzer. E xdam s-extendable debugging and m onitoring system . In
Proceedings o f the A F IP S Conference, pages 567-586, 1969.
R. Balzer. A 15 year perspective on autom atic program m ing. IE E E
Transactions on Software Engineering, Vol. S E -ll(N o. 11):1257-1268,
Novem ber 1985.
R. Balzer. Living in the next generation operating system . In Proceed­
ings o f the IF IP 10th World Com puter Congress, pages 283-291, Dublin,
Ireland, 1986.
D. R. Barstow. Domain-specific autom atic program m ing. IE E E Trans­
actions on Software Engineering, SE-11(11):1321-1336, November 1985.
J. B atali. C om putational introspection. A. I. Memo 701, M assachusetts
In stitu te of Technology, Artificial Intelligence Laboratory, February 1983.
P. C. Bates. Debugging heterogeneous distributed systems using event-
based models of behavior. In Proceedings o f A C M S IG P L A N and
S IG O P S Workshop on Parallel and Distributed Debugging, pages 11 -
22, M adison, W isconsin, May 1988.
R. Balzer, T. E. C heatham , and C. Green. Software technology in the
1990’s: Using a new paradigm . IE E E Computer, 16(11), November 1983.
R. M. B urstall and J. Dalington. A transform ation system for developing
recursive program s. Journal o f the ACM , 24(l):44-67, January 1977.
J. Benjam in. PILO T: A prescription for program perform ance m easure­
m ent. In Proceedings o f the 10th International Conference on Software
Engineering, pages 388-395, 1988.
114
[BG79]
[BH83]
[BP88]
[Bru85]
[BW83]
[C+83]
[CC76]
[CGM90]
[Che76]
[CLW90]
[CMB91]
[Cod70]
R. Balzer and N. Goldman. Principles of good software specification.
Proceedings on IE E E Conference on Specification o f Reliable Software,
pages 58-67, 1979.
B. Bruegge and P. Hibbard. Generalized p ath expressions: A high level
debugging mechanism. In Proceedings o f the A C M S IG S O F T Software
Engineering Sym posium on High Level Debugging, pages 34-44, Pacific
Grove California, M arch 1983.
B. Boehm and P. N. Papaccio. U nderstanding and controlling software
costs. IE E E Transactions on Software Engineering, 14(10):1462-1477,
O ctober 1988.
B. Bruegge. Adaptability and portability o f symbolic debuggers. PhD
thesis, D epartm ent of Com puter Science, CMU, P ittsburgh, PA 15213,
Septem ber 1985.
P. C. B ates and J. C. W ileden. High-level debugging of distributed sys­
tem s: T he behavioral abstraction approach. The Journal o f System s and
Software, 3(4):255-264, 1983.
C. C outant et al. M easuring the perform ance and behavior of icon pro­
gram s. IE E E Transactions on Software Engineering, SE-9(1), January
1983.
J. Cohen and N. Carpenter. A language for inquiring about th e run-tim e
behavior of program s. Software-Practice and Experience, pages 445-460,
February 1976.
U. S. Chakravarthy, J. G rant, and J. M inker. Logic-based approach to
sem antic query optim ization. A C M Transactions on Database Systems,
15(2):162-207, June 1990.
P. P. Chen. T he Entity-Relationship Model - Toward a Unified View of
D ata. A C M Transactions on Database System s, l(l):9 -3 6 , M arch 1976.
C. C. C harlton, P. H. Leng, and D. M. W ilkinson. Program m onitoring
and analysis: Software structures and architectural support. Software-
Practice and Experience, 20(9):859-867, Septem ber 1990.
J. Choi, B. P. Miller, and R. H. B. Betzer. Techniques for debugging
parallel program s with flowback analysis. A C M Transactions on Pro­
gramming Languages and System s, 13(4):491-530, O ctober 1991.
E. F. Codd. A relational model of d ata for large shared d ata banks.
Com m unications o f the ACM , 13(6):377-387, 1970.
115
Coh86]
Coh88]
[Coh89a]
Coh89b]
CW85]
Dav80]
Fea86]
For79]
FOW87]
[G+83]
GMN84]
;Gol83]
[GYK90]
D. Cohen. A utom atic com pilation of logical specifications into efficient
program s. In Proceedings o f the 5th National Conference on Artificial
Intelligence, pages 20-25, Philadelphia, PA, A ugust 1986. AAAI.
D. Cohen. A P S u ser’ s manual. ISI/U SC , 1988.
D. Cohen. Compiling complex database transition triggers. In Proceed­
ings o f the A C M SIG M O D Conference on M anagement o f Data, pages
225-234, 1989.
D. Cohen. A first order logic database suitable for real program m ing.
Technical report, U SC/ISI, 1989.
L. Cardelli and P. Wegner. On understanding types, d a ta abstraction,
and polym orphism . A C M Computing Surveys, 17(4):471-522, December
1985.
R. Davis. M eta-rules: Reasoning about control. Artificial Intelligence,
15:179-222, 1980.
M. Feather. A survey and classification of some program transform ation
approaches and techniques. In F.R .G , editor, Proceedings o f IF IP T C 2
Working Conference on Program Specification and Transform ation, April
1986.
C. L. Forgy. On the Efficient Im plem entation o f Production Systems.
PhD thesis, D epartm ent of C om puter Science, Carnegie-M ellon Univer­
sity, P ittsburgh, PA 15213, February 1979.
t
J. Ferrante, K. J. O ttenstein, and J. D. W arren. T he program depen­
dence graph and its use in optim ization. A C M Transactions on Program­
ming Languages and System s, 9(3):319-349, July 1987.
S. G raham et al. An execution profiler for m odular program s. Software
Practice and Experience, 13:671-683, 1983.
H. G allaire, J. M inker, and J. Nicolas. Logic and databases: A deductive
approach. A C M Computing Surveys, 16(2):153-185, 1984.
N.M. Goldman. Three dimensions of design developm ent. In Proceedings
o f A A A I Conference, W ashington D. C., 1983.
G. S. Goldszm idt, S. Yemini, and S. K atz. High-level language debug­
ging for concurrent programs. A C M Transactions on Com puter System s,
8(4):311-336, November 1990.
116
[Han87]
[HK87]
[HK88]
[HSW85]
[Hud89]
[HW90]
[JF90]
[JK84]
[Kan83]
[KB81]
[Kin81a]
[Kin81b]
[Knu71]
D. Hanson. Event associations in S N 0 B 0 L 4 for program debugging.
Software-Practice and Experience, pages 115-129, A ugust 1987.
R. Hull and R. King. Sem antic database modeling: Survey, applications,
and research issues. A C M Computing Surveys, 19(3):201-260, Septem ber
1987.
W . Hseush and G. Kaiser. D ata path debugging: D ata-oriented debug­
ging for a concurrent program m ing language. In Proceedings o f AC M
S IG P L A N and SIG O P S Workshop on Parallel and Distributed Debug­
ging, pages 236-247, M adison, W isconsin, May 1988.
G. C. Held, M. R. Stonebraker, and E. Wong. IN G R ES-A Relational
D ata Base System. In Proceedings o f the 1975 N ational Com puter Con­
ference, pages 409-416, Anaheim , California, 1985.
P. H udak. Conception, evolution, and application of functional program ­
m ing languages. A C M Computing Surveys, 21(3):359-411, Septem ber
1989.
D. H aban and D. W ybranietz. A hybird m onitor for behavior and perfor­
m ance analysis of distributed systems. IE E E Transactions on Software
Engineering, 16(2): 197— 211, Febuary 1990.
W . L. Johnson and M. Feather. Building an evolution transform ation
library. In Proceedings o f the 12th International Conference on Software
Engineering, pages 238-248, Nice, France, 1990.
M. Jarke and J. Koch. Query optim ization in database systems. A C M
Com puting Surveys, 16(2): 111— 152, 1984.
E. K ant. On the efficient synthesis of efficient program s. Artificial Intel­
ligence, 20:253-305, 1983.
E. K ant and D. R. Barstow. The refinement paradigm : T he interaction
of coding and efficiency knowledge in program synthesis. IE E E Trans­
actions on Software Engineering, SE-7:458-471, 1981.
J. King. Query Optimization by Sem antic Reasoning. PhD thesis, Stan­
ford University, 1981.
J. King. QUIST: A system for semantic query optim ization in relational
databases. In Proceedings o f the 7th International Conference on Very
Large Data Bases, pages 510-517, Cannes, France, 1981.
D. K nuth. An em pirical study of Fortran program s. Software-Practice
and Experience, 1:105-133, April 1971.
117
[LL89]
[LW69]
[Mil84]
[Mod79]
[MW80]
[Nar89]
[Nic82]
[0CH91]
[0090]
[PC90]
[PL83]
[Pla84]
B. Lazzerini and L. Lopriore. A bstraction m echanism s for event control
in program debugging. IE E E Transactions on Software Engineering,
15(7) :890— 901, July 1989.
P. Lucas and K. Walk. On the Formal D escription o f P L /I , volume 6.
A nnual Reviews of A utom atic Program m ing, 1969.
R. M ilner. A proposal for standard ML. In Proceedings 1984 A CM
Conference on LISP and Functional Programming, pages 184-197. ACM,
1984.
M. L. Model. Monitoring System Behavior In a Complex Computational
Environment. PhD thesis, D epartm ent of C om puter Science, Stanford
University, Stanford, California, 1979.
Z. M anna and R. W aldingger. A deductive approach to program syn­
thesis. A C M Transactions on Programming Languages and System s,
2(1 ):90— 121, January 1980.
K. Narayanaswamy. Static Analysis-Based Program Evolution Support
in the Common Lisp Framework. In Proceedings o f the 11th International
Conference on Software Engineering, Singapore, Singapore, 1989.
J. M. Nicolas. Logic for im proving integrity checking in relational d ata
bases. Acta Informatica, 18(3):227-253, 1982.
R. A. Olsson, R. H. Crawford, and W . W . Ho. A dataflow approach
to event-based debugging. Software-Practice and Experience, 21(2) :209—
229, February 1991.
K. M. Olender and L. J. Osterweil. Cecil: A sequencing constraint lan­
guage for autom atic static analysis generation. IE E E Transactions on
Software Engineering, 16(3):268-280, M arch 1990.
A. Podgurski and L. A. Clarke. A formal model of program dependences
and its im plications for software testing, debugging, and m aintenance.
IE E E Transactions on Software Engineering, 16(9):965-979, Septem ber
1990.
M. L. Powell and M. A. Linton. A database model of debugging. In
Proceedings o f the A C M S IG SO F T Software Engineering Sym posium on
High Level Debugging, pages 67-70, Pacific Grove California, M arch 1983.
B. P lattn er. Real-tim e execution m onitoring. IE E E Transactions on
Software Engineering, SE-10(6):756-764, November 1984.
118J
[PN81]
[Pnu86]
[Pro88]
[PS83]
[Qia90]
[QW86]
[RD90]
[Ric86]
[RP88]
[RW88]
[Sam89]
[SJGP90]
[Smi84]
B. P lattn er and J. Nievergelt. M onitoring program execution: A survey.
IE E E Computer, pages 76-93, November 1981.
A. Pnueli. Specification and developm ent of reactive systems. Inform a­
tion Processing, pages 845-858, 1986.
CLF Project. C LF manual. U SC /Inform ation Science Institute, 4676
A dm iralty Way, M arina del Rey, CA 90292, August 1988.
H. P artsh and R. Steinbruggen. Program transform ation system s. A C M
Computing Surveys, 15(3): 199-236, 1983.
X. Qian. Synthesizing database transactions. In Proceedings o f the 16th
International Conference on Very Large Data Bases, pages 552-565, Bris­
bane, A ustralia, 1990.
X. Qian and G. Weiderhold. Knowledge-based integrity constraint val­
idation. In Proceedings o f the 12th International Conference on Very
Large Data Bases, pages 3-12, 1986.
P. Van Roy and A. M. Despain. The Benefits of Global Dataflow Analysis
for an Optim izing Prolog Compiler. In Proceedings o f the 1990 North
Am erican Conference on Logic Programming, 1990.
C. Rich. A formal representation for plans in the program m er’s appren­
tice. In Proceedings of AAAI, pages 1044-1052, 1986.
B. G. R yder and M. C. Pauli. Increm ental data-flow analysis algorithm s.
A C M Transactions on Programming Languages and System s, 10(1): 1— 50,
January 1988.
C. Rich and R. C. W aters. A utom atic program m ing: M yths and
prospects. IE E E Computer, pages 40-51, A ugust 1988.
B. Sam adi. TUNEX: A knowledge-based system for perform ance tuning
of the UNIX operating system. IEEE Transactions on Software Engi­
neering, 15(7):861-874, July 1989.
M. Stonebraker, A. Jhingran, J. Goh, and S. Potam ianos. On rules,
procedures, caching and views in database systems. In Proceedings o f
the A C M SIG M O D Conference on M anagement o f Data, pages 281-290,
A tlantic City, New Jersey, 1990.
B. C. Sm ith. Reflection and semantics in LISP. In Proceedings of 1984
A C M Principles of Programming Language Conference, pages 23-35, Salt
Lake City, U lta, 1984.
____________________________________________________________________________________ 119_
[Smi90]
[Sno82]
[Sno84]
[Sno87]
[Sno88]
[SPSB91]
[SSS81]
[Ste90]
[Sym86]
[Sys86]
[U1188J
[U1189]
[Wil86]
D. R. Sm ith. KIDS: A sem iautom atic program developm ent system.
IE E E Transactions on Software Engineering, 16(9):1024-1043, Septem ­
ber 1990.
R. Snodgrass. M onitoring Distributed System s: A Relational Approach.
PhD thesis, D epartm ent of Com puter Science, Carnegie-M ellon Univer­
sity, P ittsburgh, PA 15213, December 1982.
R. Snodgrass. M onitoring in a software developm ent environm ent: A
relational approach. In Proceedings o f the A C M SIG S O F T Software En­
gineering Sym posium on Practical Software Development Environm ents,
pages 124 -131, P ittsburgh, Pennsylvania, April 1984.
R. Snodgrass. T he Temporal Q uery Language TQ uel. A C M Transac­
tions on Database System s, 12(2):247-298, June 1987.
R. Snodgrass. A relational approach to m onitoring complex systems.
A C M Transactions on Com puter System s, 6(2):157-196, May 1988.
R. W . Selby, A. A. Porter, D. C. Schm idt, and J. Berney. M etric-driven
analysis and feedback systems for anabling em pirically guided software
developm ent. In Proceedings o f the 13th International Conference on
Software Engineering, pages 288-298, A ustin, Texas, May 1991.
E. Schonberg, J. Schwartz, and M. Sharir. An autom atic technique for
selection of d ata representations in SETL program s. A C M Transactions
on Programming Languages and System s, 3(2):126-143, 1981.
G. L. Steele Jr. CO M M ON LISP: The Language. Digital Press, second
edition, 1990.
Symbolics. The Symbolics G EN ERA Programming Environm ent M an­
ual. Symbolics, Inc, 4 New England Tech Center, 555 V irginia Road,
Concord, MA 01742, July 1986.
t
Reasoning Systems. Refine User’ s Guide. Palo Alto, CA, 1986.
J. D. Ullman. Principles of Database and Knowledge-Base System s, vol­
um e 1. Com puter Science Press, 1988.
J. D. Ullman. Principles o f Database and Knowledge-Base System s, vol­
um e 2. Com puter Science Press, 1989.
J. C. W ileden. Applying event based analysis to specifications and de­
signs. Inform ation Processing, pages 577-581, 1986.
120
[Wol90]
[Xer83]
M. Wolfe. D ata dependence and program restructuring. The Journal o f
Supercomputing, 4:321-344, 1990.
Xerox Palo Alto Research Center. Interlisp Reference M anual, October
1983.
121 
Asset Metadata
Creator Liao, Yingsha (author) 
Core Title An automatic programming approach to high level program monitoring and measuring 
Contributor Digitized by ProQuest (provenance) 
School Graduate School 
Degree Doctor of Philosophy 
Degree Program Computer Science 
Degree Conferral Date 1992-05 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag oai:digitallibrary.usc.edu:usctheses,OAI-PMH Harvest 
Format theses (aat) 
Language English
Permanent Link (DOI) https://doi.org/10.25549/usctheses-oUC11255792 
Unique identifier UC11255792 
Identifier DP22850.pdf (filename) 
Legacy Identifier DP22850 
Document Type Dissertation 
Format theses (aat) 
Internet Media Type application/pdf 
Type texts
Source University of Southern California Dissertations and Theses (collection), University of Southern California (contributing entity) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email uscdl@usc.edu
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button