Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
AN AUTOM ATIC PROGRAM M ING APPROA CH TO HIGH LEVEL PRO G RA M M ONITORING AND MEASURING by Yingsha Liao A Dissertation Presented to the FACULTY OF TH E GRADUATE SCHOOL U NIVERSITY OF SOUTHERN CALIFORNIA In P artial Fulfillment of the Requirem ents for th e Degree D O C TO R O F PHILOSOPHY (Com puter Science) May 1992 Copyright 1992 Yingsha Liao I UMI Number: DP22850 All rights reserved INFORMATION TO ALL U SERS The quality of this reproduction is d ep en d en t upon the quality of the copy subm itted. In the unlikely event that the author did not sen d a com plete m anuscript and th ere are m issing pag es, th e se will be noted. Also, if m aterial had to be rem oved, a note will indicate the deletion. UMI DP22850 Published by P roQ uest LLC (2014). Copyright in the Dissertation held by the Author. Microform Edition © P roQ uest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United S ta tes C ode P roQ uest LLC. 789 E ast Eisenhow er Parkway P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 -1 3 4 6 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, written by under the direction of h.i.S. D issertation Committee, and approved by all its members, has been presented to and accepted b y The Graduate School, in partial fulfillm ent of re quirem ents for the degree of Ph.D. CpS “92 Yingsha Liao D O C TO R OF PHILOSOPH Y Dean o f G raduate Studies Date 9 9 2 DISSERTATION COMMITTEE To m y parents, Shengyu Liao and Songbi W an and my brother and sisters, Yiping Liao, Yihong Liao, Xiangyan Liao ii Acknowledgments I am very grateful to my advisor Bob Balzer for his encouragem ent and support throughout my years at ISI. His guidance was essential in teaching me how to con duct research. Bob helped improving the presentation of th e m aterial greatly by his careful reading and enlightening comments and taught me a great deal about good scholarship. I would also like to thank my com m ittee members Richard Hull and Alvin De- spain for their valuable comments on the dissertation proposal, drafts, and final defense. Special thanks to Donald Cohen, Neil Goldman, K rishnam urthy Narayanaswamy for spending a great deal of tim e with me and with this docum ent. They were always there as patient listeners whenever I had questions or needed feedback. Many thanks to Dean Jacobs and Paul Rosenbloom for serving in my guidance com m ittee and providing suggestions and comments. I wish to thank Dennis Allard, Kevin Benner, M artin Feather, Edward Ipser Jr., Lewis Johnson, Surjatini W idjojo, Dave W ile, and Lorna Zorm an and th e research and support staff at the software division who made my tim e at ISI enjoyable and productive. 111 CONTENTS A ck n o w led g m en ts iii A b stra ct v iii 1 In tro d u ctio n 1 1.1 Program M onitoring and M easuring a t High L e v e l................................... 2 1.2 Thesis and Problem Statem ent ..................................................................... 3 1.3 Overview of S m a rtM o n ito r............................................................................... 4 1.3.1 A High Level PM M S y s t e m ................................................................ 5 1.3.2 A High Level PM M S cenario................................................................ 8 1.4 C o n trib u tio n s ........................................................................................................... 10 1.5 Scope of the R esearch................ 11 1.6 O rganization of the D isse rta tio n .........................................................................12 2 B ack grou n d and R ela ted W ork 13 2.1 Specifying PMM Requirem ents .........................................................................14 2.1.1 D ata-Based Program M onitoring and M e a su rin g ..............................15 2.1.2 M odel-Based Program M onitoring and M e a su rin g ...................... 16 2.1.3 Very High Level Languages and PM M ............................................... 19 2.2 D ata C o lle c tio n ........................................................................................................19 2.3 D ata P ro cessin g ........................................................................................................22 2.4 Related R e s e a r c h .................................................................................................... 23 2.4.1 Program T ran sfo rm atio n ...........................................................................23 2.4.2 Static Analysis of P r o g r a m s ....................................................................24 2.4.3 Increm ental Generation of Derived D a t a ............................................25 2.5 A P5 C om putation Model - A PM M View .....................................................26 3 A P ro g ra m M o n ito rin g and M easu rin g S p ecifica tio n L anguage 27 3.1 PM M Specification Language O b je c tiv e s .......................................................28 3.2 A PM M Specification L a n g u a g e ........................................................................ 30 3.2.1 Program M onitoring and M easuring Event M o d e l.......................... 30 3.2.2 Predefined Event Types and Control R e la tio n s .................................33 3.2.3 PM M Specifications ................................................................................. 38 iv 3.3 Syntax and Semantics of the L a n g u a g e ............................................................. 40 3.3.1 Syntax of the PMM Specification Language ..................................... 40 3.3.2 Semantics of the PMM Specification L a n g u ag e ..................................40 3.3.3 Properties of Valid PMM Specifications............................................... 42 3.4 S u m m a r y .................................................................................................................... 43 4 A n A u to m a tic P rogram m in g S y stem for P M M S p ecifica tio n 44 4.1 Aspects of Instrum entation G e n e ra tio n .............................................................45 4.1.1 Event Schema Generation ............................................... 46 4.1.2 Instrum entation Site S electio n ............................................................ 48 4.1.3 Instrum entation Code G e n e ra tio n ..........................................................50 4.1.4 Approach S u m m ary ..................................................................................... 51 4.2 Static A pproxim ation . .................. 51 4.2.1 Representing Sites and Their R elatio n sh ip s........................................ 53 4.2.2 Static Approximation M a p p in g ............................................................. 54 4.2.3 Static Approximation Transformation A lg o rith m s .......................... 55 4.2.4 A Static Approximation E x a m p le ..........................................................58 4.3 Instrum entation for Prim itive E v e n ts ................................................................ 62 4.3.1 An A bstract Instrum entation I n te rf a c e ............................................... 62 4.3.2 Instrum entation Generation for P r im itiv e s ........................................ 66 4.4 Instrum entation Generation A lgorithm s............................................................ 69 4.5 S u m m a r y ............................................................................ 72 5 O n In crem en ta l C o m p u ta tio n o f M on itorin g R e su lts 74 5.1 Issues of Increm ental C o m p u ta tio n .....................' ............................................75 5.1.1 D eterm ining Temporal D ep en d en cy ...................................................... 76 5.1.2 D eterm ining the D ata to R eco rd ........................ ' ................................. 77 5.1.3 G enerating Instrum entation . ............................................................. 78 5.2 Representing Temporal Relationship ................................... 79 5.2.1 Event Dependent G r a p h ...........................................................................79 5.2.2 Building Event Dependent Graphs ......................................................80 5.3 Run Tim e F ilte rin g ..................................................................................................82 5.3.1 Intra-Event C o n d itio n s..............................................................................82 5.3.2 Inter-Event C o n d itio n s..................................................... 82 5.3.3 Scope A n a ly s is ............................................................................................83 5.4 Increm ental Instrum entation G e n e r a tio n .........................................................84 5.4.1 Dealing W ith Derived R e la tio n s .............................................................86 5.4.2 Tim e and Space Compromises in P M M ...............................................86 5.4.3 O ther O p tim iz a tio n s ................................................................................. 88 5.5 Sum m ary ................................................................................................................... 88 v ~6 S m a rtM o n ito r E valu ation 90 6.1 A pplication Program and Q u e s tio n .................................................................. 90 6.2 The E x p erim en t........................................................................................................ 91 6.3 Experim ent A n a ly s is ..............................................................................................92 6.4 S u m m a r y ...................................................................................................................93 7 C o n clu sio n s and F uture W ork 94 7.1 S u m m a r y ...................................................................................................................95 7.2 M ain c o n trib u tio n s ........................................ 95 7.3 Lim itations of the R e s e a r c h ................................................................................97 7.4 Future W o rk ............................................................................................................... 97 7.5 Concluding R e m a r k s ..............................................................................................98 A p p e n d ix A The PM M Specification L a n g u ag e.............................................................................100 A .l Prim itives and A ttributes ................................................................................. 100 A.2 Control R e la tio n s ...................................................................................................103 A.3 Value Types and O ther R e la tio n s ................................................................... 104 A.4 PM M Language S y n ta x ........................................................................................ 105 A p p e n d ix B The Prototype Im p le m e n ta tio n ................................................................................107 B .l Sm artM onitor Im plem entation O v e rv ie w ......................................................107 B.2 Sm artM onitor Com ponents .................................................. 109 B.2.1 PM M Specification L a n g u a g e ..............................................................109 B.2.2 A utom atic P r o g r a m m e r ........................................................................I l l B.2.3 Static A pproxim ation ...............................................................................112 B.2.4 Instrum entation Interface .................................................................... 112 B.2.5 G enerating In s tru m e n ta tio n ............................................. 113 B.3 Im plem entation S u m m a r y .................................................................................113 R eferen ce L ist 114 vi LIST OF FIGURES 1.1 A High Level Program M onitoring and M easuring System ................... 5 4.1 A Piece of Source P ro g ra m ..............................................................................45 4.2 A PM M S pecification................................................................................... 46 4.3 A Source Program R ep resen tatio n ..................................................... 53 5.1 An Event Dependent G r a p h .......................... 81 Abstract Program m onitoring and measuring (PM M ) is the activity of collecting empirical d ata of a program ’s execution to answer questions about the program ’s performance. PM M is usually done by altering the program to collect interesting d ata as it runs. U nfortunately, this is itself an arduous task involving all the difficulties of program construction and m aintenance. Generally there are three tasks involved in altering th e program to answer performance questions: determ ining w hat d ata has to be collected, determ ining where in the program th a t d ata can be collected, and adding code to the program to collect th at data and to process it to produce th e desired results. As program s become larger and as the com putational structures employed by languages become more complicated, performing those tasks becomes a challenge for hum an program m ers. This dissertation presents a system th at autom ates each of these tasks. Its input is a high level specification of PM M questions and the source program . Its output is an augm ented version of the program whose execution produces both the results of the original program and the answers to the PMM questions. This system has proved to be very effective. It exploits several techniques not previously used. First, PM M questions are specified in a specification language th at facilitates both question specification and autom atic program augm entation. The language is based on an Entity-R elationship model and a set of programming language dependent prim itives th a t enables the relevant d ata to be collected to be determ ined from the prim itives used in the questions. Determ ining where to insert instrum entation is done by re lating run tim e program behaviors with the source program constructs th at produce th a t behavior and a static analysis which locates those program constructs in the source program . Adding code to collect th at data can be done efficiently via instan tiating generic instrum entation tem plates associated w ith each of the program m ing language prim itives. Second, the instrum entation for com puting the answers to the viii questions is constructed autom atically by filtering, combining and merging the data collected by these prim itive tem plates using th e formalism and power of a relational query processor. Finally, to minimize th e collection of extraneous data, static anal ysis is used to filter out irrelevant sites at compile tim e and tem poral analysis is used to filter out d ata at run time. Chapter 1 Introduction Program m onitoring and measuring (PMM) is the activity of collecting inform ation about the perform ance characteristics of a program. Such d ata is useful for finding perform ance bottlenecks and understanding perform ance tradeoffs, which in turn can be used to improve the performance of the program. In th e absence of special purpose hardw are, m onitoring and measuring is done by altering the program to collect interesting d ata as it runs. Unfortunately, this is itself an arduous task involving all th e difficulties of program construction and m aintenance. Generally ' the program m er starts with some performance questions. In order to answer these questions he has to do the following tasks: • determ ine w hat d ata has to be collected • determ ine where in the program th at d ata can be collected • add code to the program to collect th a t data and to process it to produce the desired results The program m er then runs the instrum ented program on some d ata of interest to obtain th e answer to his question. This generally leads either to changes in the program or further questions, and so the process iterates. As program s become larger and as th e com putational structures employed by languages become more com plicated, correctly and efficiently fulfilling each of the tasks becomes a challenge jfor hum an program m ers. This dissertation presents an autom atic program m ing ap proach to m onitoring and measuring program execution th a t autom ates each of the three tasks m entioned above. This approach exhibits how program m ing language 1 features, source program s, and program monitoring and m easuring needs can bej incorporated into an autom atic programming system to m onitor and m easure the; perform ance of program execution. The system ’s input is the source program and! a high level specification of w hat the program m er wants to know. The output is anj instrum ented version of the program whose execution produces both th e results of the original program and th e answers to the specified questions. 1.1 Program Monitoring and Measuring at High Level High level program monitoring and measuring means th a t th e specification is w ritten in term s of w hat the program m er wants to know, rather th an in term s of how a m achine can com pute w hat he wants to know -that is it is expressed in term s of high level abstractions. Notice th a t he may be interested in “low level” activities, such as how much tim e the program spends paging, but he can ask such questions w ithout having to describe how to figure out the answers. Program m onitoring and measuring requires knowledge of the m onitored pro gram, the sem antics of the program m ing language, and the sem antics of m onitoring and m easuring questions. Because the specifications are oriented towards hum an un derstanding rather than m achine execution, those specifications are declarative and m ust be translated by the program monitoring and m easuring system into procedu ral code. Usually perform ance d ata of a program ’s execution can only be collected at a low level and its volume is huge. Simply collecting all low level d ata is not only inefficient but also im practical. Some performance d ata is not directly accessible, for instance, the history of program execution. In order to get th e required results, di rectly accessible perform ance d ata needs to be recorded so th a t the required results can be derived from it. However, collecting low level d ata needs to deal with the im plem entation of some high level constructs of the program m ing language. Because the collected low level d ata is not oriented towards hum an understanding it needs to be translated and aggregated into forms th at are easy for hum an consumption. The translation is even more im portant when very high level languages are used because the gap between the level where the data is collected and th e level where 2 the required resuits are specified is very large. W ithout com plete understanding of the sem antics of th e program m ing language and of the organization of the source program , correct and efficient translation would not be possible. Requiring the pro gram m er to m anually do the m onitoring and measuring of program execution not only creates a burden for him but also has high risk of introducing additional m ain tenance problems[BP88]. Hence, the process involved in program m onitoring and m easuring should be autom ated. In addition, applying high level program m onitor ing and m easuring by using our autom atic program m ing approach makes program m onitoring and m easuring easier, fast, and less error prone. D espite m ajor difficulties m et by general purpose autom atic program m ing sys tem s, progress has been m ade in domain specific autom atic programming[KB81, Kan83, Bar85, RW88, Qia90]. There are several reasons why autom atic program ming for program m onitoring and m easuring is feasible. F irst, program m onitoring and m easuring is an inform ation processing process[Sno88]. Developm ent in data retrieval techniques makes it possible to autom atically transform low level d ata into higher level forms. Second, collecting performance data of program execution can be decomposed into collecting perform ance data of the execution of th e particular syn tactic constructs th a t composed the program. In a program m ing language, there are a few constructs th at are the building blocks of programs. Based on th e semantics of the program m ing language, instrum entation plans can be constructed beforehand for m onitoring and m easuring the execution of those constructs[Bal69, Knu71]. Third, both the PM M specification and the source program are available before program execution. They can be used to determ ine where in the program s the relevant data can be collected and how to com pute the answers to the questions defined in the specification from the collected data. 1.2 Thesis and Problem Statement The thesis of this dissertation is th a t the task o f installing software performance instrum entation can and should be automated. T he problem we address is as follows: Given: 3 - A source program - A set of perform ance questions about the run-tim e behaviors of the pro gram Domain: Program s w ritten in the APS language[Coh88] G enerate: An augm ented program th a t satisfies the following constraints: - The augm ented source program preserves the functionality of the original source program - The execution of the augm ented program produces answers to the ques tions. There are two reasons for choosing AP5 programs as this research domain: First, AP5 is a very high level program m ing language; second, it is in daily use in my research environment[Bal85]. In other words, the ideas of this research can be practically im plem ented and dem onstrated w ith the available facilities. 1.3 Overview of SmartMonitor The following issues are addressed in this dissertation: • T he developm ent of a specification language for stating PM M questions. • T he identification of the necessary reasoning processes for generating the aug m ented program from a set of PMM questions in this language and a source A P5 program. • T he architecture of an autom atic program m ing system th a t perform s th at reasoning and generates th e required instrum entation to answer the PMM questions. They are addressed through building an autom atic program m ing system — called Sm artM onitor, for high level PM M of AP5 programs. Sm artM onitor is a high level program m onitoring and measuring system. As illustrated in figure 1.1, Sm artM oni to r takes both a source program and a set of PM M questions about the perform ance 4 Source Program PMM compiler PMM Spec Augmented Source Program compiling & execution Answers & Results Figure 1.1: A High Level Program M onitoring and M easuring System of the program as inputs and produces an augm ented source program . T he aug m ented program is then compiled and executed to produce both the results of the original program execution and the answers to the questions. The set of ques tions is w ritten in a specification language, called the PM M specification language. Sm artM onitor is an autom atic program m ing system for program m onitoring and m easuring of A P5 program execution. By autom atic program m ing, we m ean th a t th e answers to perform ance questions are com puted w ithout any hum an assistance after the set of questions is given. 1.3.1 A H igh L evel P M M S y stem Sm artM onitor requires the program m er to provide a declarative description of what he wants to know about the execution of his program. This is done via a specification language which embodies a model of program execution. At the m ost prim itive level, the activity of m easuring and m onitoring involves interrupting the program at relevant points during its execution and gathering data. The d ata model supported 5 by Sm artM onitor is sim ilar to an Entity-Relationship model[Che76] in which entities and relationships are categorized as follows: • Point events: These are events which occur at a single point of execution. Examples are th e entry of a function, or the arrival of an interrupt. • Interval events: These are events associated w ith a pair of execution points, called the starting and ending points. An example is the execution of a func tion. • A ttributes of events: These associate events with d ata th a t can be observed when a point event occurs or at the starting or ending points of interval events. Examples include the tim e at which the event occurs, the nam e of the function of which this event is an execution, the value of program variables at th a t tim e, etc. • Control relations: These are relations among events. An example is the calls relation which indicates th a t one function execution calls another. • other relations among non-events: These include any other relations among d a ta objects, e.g., the values of the param eters passed to different executions. An exam ple is the fact th at one value is greater than another. From these entities and relationships one can define further relationships of in terest by composing expressions in the relational calculus. These expressions define abstractions and aggregations. C hapter 3 discusses the specification language and issues th a t arise in this design. The PM M questions about th e execution of a program are asked before its exe cution. Sm artM onitor modifies the source program to collect only the relevant data, i.e., th e d ata th a t are needed to com pute the answers to these questions. This is im portant for two reasons. F irst, directly collectible perform ance d ata is at very low level w ith very large volume. Collecting irrelevant d ata takes not only extra space jbut also extra tim e. Hence, there is a twofold reason to reduce recording irrelevant jdata. Second, the cost of the instrum entation code is part of w hat is ultim ately m easured. Because the program m er is interested in the perform ance of the original code, not the instrum entation code, the cost of the instrum entation code is thus a 6 perturbation of the d ata of interest, and should therefore be m inim ized. Of course, this cost cannot be totally elim inated as long as we restrict ourselves to software m onitoring. R ather our goal is to provide substantial autom ation of an extrem ely useful, albeit im perfect, activity th a t programmers now do manually. However, selectively collecting the required data needs to deal w ith data collec tion freedom, i.e., there are m any ways to collect the same data. Furtherm ore, per form ance d ata of program execution are com putation state dependent, i.e., whether or not the d ata collected at one place is relevant depends on the d ata collected at some other places. These difficulties raise challenges as to how to generate the aug m ented program taking advantages of the source program structures, the semantics of the program m ing language, and the semantics of the questions. In particular, given a set of questions about a program ’s execution, the following questions m ust be answered: • W hat d ata m ust be collected? • W here in the source program can/should th at d ata be collected? • How can/should code be added to the program to collect th a t d ata and to process it to produce the desired answers? • How can the dynam ic aspects of testing conditions and collecting data be m anaged? In im plem entation, Sm artM onitor uses AP5 relations to m odel a program ex ecution activity. In particular, it uses a relational schema to m odel the execution activity described in the PM M specification. This schema is expressed in term s of stored relations (ones whose tuples are explicitly asserted) and derived relations. Sm artM onitor only needs to generate instrum entation to collect d ata for the stored relations because the derived relations including the answers to the PM M questions can be autom atically com puted from th e stored relations using database query eval uation techniques[Coh89b, U1189]. T he process of recording d a ta for the stored relations is called data collection. T he process of com puting d ata for the questions is called data processing. Because d ata collection progresses gradually as the pro gram execution proceeds, the Sm artM onitor needs to determ ine w hether or not it 7 should do d a ta processing increm entally, i.e., processing the d ata as soon as they are available as discussed in C hapter 5. T he Sm artM onitor works as follows. First, it checks th e PM M specification to ensure it is well defined. Second, it checks the specification against the source program to find out where instrum entation for data collection should go and to check w hether th e instrum ented source program could provide answers to the questions. The places in the source program where instrum entation for d ata collection will be placed are called instrum entation sites. Next, the PM M specification is translated into instrum entation code. Finally, th at instrum entation code is merged into the source program at the selected sites. How can the system generate instrum entation for various PM M questions? Based on the techniques to be discussed in C hapters 3, C hapter 4, and C hapter 5, a set of instrum entation tem plates for prim itives and control relations is presented together w ith ways of filling them in and combining them . These building blocks characterize th e prim itives, control relations, and relationships between source programs and perform ance questions so th a t a high level PMM question can be decomposed into a set of prim itives, control relations, and the abstractions built on them . Using a small set of simple prim itives and control relations, the system can combine the instrum entation collected for these prim itives to answer any PM M question. 1.3.2 A H igh L evel P M M S cen ario Suppose th e program m er is considering the installation of a local software cache to record th e result of function G when it is directly called by function F. In order to estim ate th e value of the change he would like to know how often such a cache would hit, and how much tim e would be saved. In the absence of a tool like Sm artM onitor, a program m er would install instrum entation manually. F irst he m ust determ ine w hat d ata to collect. Then he m ust determ ine where in his program the data can be collected. N ext, he m ust insert th e appropriate pieces of code into th e places identified, ensuring th a t in each case th e correct d ata is collected. Finally he needs to insert code to report results. All of this is what Sm artM onitor does autom atically from a specification of w hat the program m er wants to know. 8 In tKis exam ple, the question is how much tim e is spent in how m any calls tcj G from F th a t have the same param eters as previous calls to G from the same invocation of F. T he system first analyzes the question to figure out what d ata m ust be collected. It determ ines th a t d ata of the executions of G and F are needed. It th en uses static analysis to find out where in the source program this d ata can be recorded. Here there are some choices. D ata of the executions of G can be collected by inserting instrum entation code either into the definition of function G or around the function! calls of G inside the source program. Similarly there are choices in collecting d a tJ of the executions of F. Because th e answers to the question depend only on those G executions th a t are directly called by F executions, only those G executions th at are directly called by F are relevant. Since there is a very close relationship between the PM M specification language and the semantics of the program m ing language in which the source program is w ritten as is discussed in C hapter 3, the system can figure out th a t only those function calls of G th at appear inside th e definition of function F are relevant and they are chosen as the instrum entation sites of G executions. Because all F executions are needed to com pute the answers, all function calls of F are relevant. Instead of inserting instrum entation code for all function calls of F, th e system instead inserts instrum entation code into the definition of function F. T he system then creates a relation for each kind of d ata so th at d ata of the G and F executions can be separately recorded and referenced. Let us assum e th a t relation G-execution and relation F-execution are used to record the d ata of the execution of G and F respectively. Two attributes, Parameters and Caller, are used for recording attrib u tes of function executions. Relation G-execution-counter is used to record the num ber of tim es G is executed. Finally, the system generates instrum entation code for the question using instrum entation generation plans to combine these function executions and the control relation calls th a t groups together all G executions th at occur w ithin a single F execution. The instrum entation code is m erged into the source program to generate an augm ented source program. Because the question only asks for th e num ber of the required G execution, the collected d ata of G and F executions are only used internally to com pute th at answer. As soon as th e answer is com puted, they are no longer needed. The system generates instrum entation code 9 th a t only records th e num ber of the required G executions based on the conditions m entioned above. Because of these conditions, the system also generates some filters to test those conditions at run tim e to ensure th a t the d ata collected is relevant. For the exam ple question, the filter is th at there is a previous G execution having the same actual param eter values as the current one. In order to check the condition, actual param eter values of G executions m ust be collected and recorded so th a t they can be referenced later. T he instrum entation code generated for function calls of function G inside the definition of F to answer to the posed question uses a relation called G-execution to keep track of the actual param eters and the caller for each invocation of function G, and a relation G-execution-counter to keep track of the num ber of execution of G th a t had the same actual argum ents and caller as some previous invocation. This instrum entation, paraphrased in English to make it more readable, is: IF there is a tuple in the G-execution relation with the same actual parameters and the same caller THEN increase the G-execution-counter by one ELSE insert a tuple in the G-execution relation with the current actual parameters and caller T he condition of the IF statem ent is used as a run-tim e filter to filter out those G executions th a t do not satisfy the condition required in th e question. T he instrum entation code generated for the definition of function F is: Insert a tuple into the F-execution relation T he augm ented source program will be compiled and executed. A fter the execution of the augm ented program , the num ber recorded in the relation G-execution-counter is th e answer to the posed question. 1.4 Contributions The prim ary contributions of the research are the design and im plem entation of the autom atic program m ing system th a t supports high level program m onitoring and m easuring and the dem onstration of the viability of this approach. They include: 10 • a model of program execution and a language in which high level questions about executions can be expressed • a dem onstration th a t these questions can be autom atically transform ed into instrum entation th a t computes the required answers • an understanding of the techniques for perform ing th a t transform ation and the tradeoffs inherent in their use m ore specifically, they include • using a specificational approach to high level program m onitoring and m ea suring • designing a specification language for specifying PM M questions of AP5 pro gram execution • applying autom atic program m ing for high level program m onitoring and m ea suring — the principles of choosing monitoring and m easuring prim itives for a very high level language — the methodology of com puting answers to PM M questions by merging PMM specifications into the m onitored source program so th at only rel evant d ata are collected — the design and im plem entation of the instrum entation for prim itives of the AP5 language — the m ethodology of applying static analysis to specification merging — the m ethodology of increm ental com puting tem poral dependent derived perform ance data. 1.5 Scope of the Research The m ain focus of the research is to provide a framework and the required underlying support for a high level PMM . The framework is based on program executions 11 on single instruction flow and single dataflow machines w ith a central clock. The source programs are w ritten in a very high level im perative program m ing language, in particular, the AP5 language. The framework is not intended for dealing w ith m onitoring real tim e program m ing system[Pla84]. M any issues related to PM M are not addressed here. Such issues include: - PM M com pensation: th e current framework does not take the overhead in troduced by software m onitoring and m easuring into consideration when it does program m onitoring and measuring. In order to accurately m onitor and m easure program execution com pensation should be provided for this overhead - user interface: the current system does not address user interface issues - choosing representation for interm ediate data: the current framework does not try to choose representations for the interm ediate d ata to enhance the perform ance of th e augm ented (i.e., instrum ented) program - sharing: the approach described in this dissertation perform s optim ization only on th e basis of local analysis of d ata dependency. T hus while determ ining which tuples m ight be needed to answer the question, the m ethod examines d ata flow through each of the questions separately. 1.6 Organization of the Dissertation The rest of the dissertation is organized as follows: C hapter 2 gives a brief review of related work and the AP5 com putation model; C hapter 3 describes in detail our PM M language and the ideas behind it; C hapter 4 describes our autom atic pro gram m ing system, Sm artM onitor, and the transform ation techniques for the PMM language; C hapter 5 describes our techniques for increm entally com puting moni toring results; C hapter 6 reports some experim ental results and experiences of the Sm artM onitor system; and C hapter 7 summarizes our accom plishm ents and de scribes some future work. 12 Chapter 2 Background and Related Work ( Program M onitoring and M easuring (PM M ) has been an essential activity since pro gram m ers came to deal w ith the difficulties of programming[PN81]. Conceptually, PM M involves determ ining w hat to m onitor and m easure, collecting necessary data, and processing the collected data. There are m any aspects of a program ’s execution th a t PM M can focus on. First, there is a control aspect which consists of points of control in program execution. Second, there is a d ata aspect which consists of all input d ata and internal d ata of the program ’s execution. Model[Mod79] pointed out th a t program m ers want to m onitor and m easure their program s’ execution which reflects the nature of the high-level control regime. Fur therm ore, they want to the m onitoring and measuring results be expressed in terms of which they have structured their programs. However, trad itio n al PM M facilities are oriented towards the control structure of the com putation and the state of data elem ents at too low a level. This results in an overwhelming flood of details th at ob scure rath er th an illum inate the activities of program execution th a t are relevant to the program m ers. Moreover, the programmers may not understand the im plem enta tion level details th a t would be obtained with traditional system s, as they m ay know little about the im plem entation or about the underlying program m ing language de tails. This is especially true for very high level program m ing languages, such as AP5[Coh88], REFINE[Sys86] and SETL[SSS81]. Furtherm ore, d ata structures and control structures in program s w ritten in high level languages m ay have so much inform ation associated w ith them th a t the programmers would not want to see it all, even if it were completely comprehensible. It is im portant th a t high-level PMM facilities be able to answer questions about data structure and operations instead 13 of simply showing them in full detail. It is also helpful for the system to compile low-level bits of inform ation into higher-level structures th a t provide answers to the program m ers’ question at th e level on which it was conceived. In this chapter, we discuss the various areas th a t PM M touches and provide an overview of related research in the areas. Specifically, we discuss th e issues o: specifying PM M requirem ents, collecting d ata during program executions, and pro cessing th e collected data. For each of them , we discuss various aspects of th e issue, related research in the area, and the im pact of using very high level program m ing languages. 2.1 Specifying PMM Requirements Specifying PM M requirem ents is to tell a PM M system w hat to m onitor and m ea sure. There is a spectrum of ways for program m ers to com m unicate w ith a moni toring system about w hat they want to know. At one extrem e, a system provides a fixed set of options w ith predefined meanings about w hat d ata can be collected and lets program m ers choose from them . Usually, the d ata includes the num ber of tim es a statem ent or function is executed or the am ount of tim e spent in executing a statem ent or function. At the other extrem e, a system provides a language to letj program m ers state w hat they want to m onitor and m easure. W ithin the spectrum ,1 various forms of positions are possible. For instance, some system is capable of collecting a set of d ata of program execution and it provides a very simple language th a t enables program m ers to tell the system under w hat conditions th a t d ata should be collected [Sym86, Ben88]. Like any other specifications, program m onitoring and m easuring specifications include three aspects[Gol83]. They are coverage th at concerns w hat activities of: program execution to m onitor and measure, extent th at concerns the level of detaill at which PM M should occur, and tim e th at concerns how frequently th e monitoring* and m easuring of a program execution should be done. T he existing approaches are described from those aspects. T here are two basic schemes for PMM[CC76, PN81]. 14 1. A set of d ata is collected wEile a program is being executed w ithout knowing any specific question. T he d ata is used to answer general questions about th e program ’s execution. 2. Given a program and a set of questions, d ata is collected to answers the set of specific questions. We call th e first scheme data-based PM M and the second scheme model-based PMM. 2.1.1 D a ta -B a se d P rogram M o n ito rin g an d M easu rin g D ata-based PM M systems routinely collect d ata about the perform ance of program execution to satisfy PM M requirem ents. In data-based PM M system s, there is only a trivial language if any in which program m ers state their PM M requirem ents. Usually, program m ers can only choose from a few fixed PM M categories of differ ent perform ance aspects of a program ’s execution. For exam ple, those categories cover tim e or space usages of program execution. Profile based system s are typical data-based PM M systems. An execution profile apportions the execution cost of a profiled program to its com ponent parts[Knu71]. The cost is usually the tim e spent in executing the component parts or th e num ber of tim es the com ponents are executed. T he level of program decomposition (the extent of PM M ) for profiling depends on the language in which the program is w ritten. For languages w ith ex plicit control-flow, statem ent level profiling is appropriate[Knu71]. If th e language encourages small routines, then routine level profiling may provide as much infor m ation as statem ent oriented profiling[G+83]. Higher-level languages (those with control-flow im plicit in operators or generators) m ay require profiling on individual operators[C + 83]. In those system s, very lim ited inform ation, e.g. the execution tim e of program com ponents, on all program com ponents is collected. Hence, not all inform ation of program execution, e.g., the values passed to th e param eters of functions, is available for program m ers to process w hat they needed. Little or no support is provided for program m ers to selectively tell the systems which compo nents to m onitor. Therefore, it is hard or impossible to m onitor the executions of the selected com ponents th a t satisfy some higher level conditions because there is no proper way to tell PM M systems. 15 There are several shortcomings in profile based system s. First, some desired behaviors which program m ers seek and high level concepts used in their programsj can not be explicitly expressed, they are im plicitly represented in the processing! phrase and are compiled in their code (sometimes in their m ind). In order to get w hat they w ant, program m ers are forced to tran slate their PM M requirem ent th at could be naturally expressed in high level concepts into th e im plem entation of these concepts so as to use th e systems and later translate it back. Second, it is hard or impossible for program m ers to specify requirem ents th a t focus on only a particular part of th eir program s’ executions. Finally, since no d ata about the control aspect of program s’ execution and no d ata about some of the d ata aspects of program execution (e.g., the actual param eters to some functions) are recorded, they are not available to the program m ers. 2.1.2 M o d el-B a sed P rogram M o n ito rin g an d M easu rin g M any researchers [Mod79, Sno84, 0CH91] realized the problems of the data-based PM M system s m entioned in the previous section. They figured out th at if a system lets program m ers state w hat they want to know then the system can do much bet ter in avoiding collecting irrelevant lower level data. In m odel-based PMM systems, there is a PM M language, which is different from the program m ing language in which the m onitored program is w ritten, to represent what program m ers want to m onitor and m easure. Explicitly representing program m ers’ PM M requirem ents enables the system to focus on relevant d ata and to filter out irrelevant data. Unlike data-based program m onitoring and m easuring tools, model-based tools only collect d ata th at are used in the program m ers’ PM M specification, i.e., their coverage is selective. T he extent of model-based tools is tow ard program m er defined program activities. Those activities are usually at a higher level th an th a t of data-based PM M tools. Conditions are defined on when m onitoring and m easuring should happen, thus m onitoring and m easuring is m ore selective in the sense th a t not all collected data is recorded. Relational database based PMM [PL83, Sno84] is a model-based PMM. Powell and Linton[PL83] realized th a t the relational representation[Cod70, U1188] is a very general representation. Given some high level PM M questions, the answers to 16 the questions can be com puted from the collected d a ta via relational query evalua tion techniques once the collected d ata is represented in relations. Snodgrass[Sno84] applied th a t idea to com puter system monitoring. In his system , program m ers pro vide prim itive d a ta descriptions of their program execution (e.g., the schema of the prim itive data) and insert sensors m anually to collect th a t d ata in com puter sys tem s. Prim itive d ata collected by th e sensors is recorded in relations. Program m ers then state w hat they w ant to know in term s of the prim itive d ata before program execution via a query language called TQuel[Sno87], which is an extension to re lational query language Quel[HSW85]. Queries can only be m ade on the prim itive d ata which has been installed. The high level d ata is com puted from the low level d ata as the execution of the program progresses. M odel-based approaches have been used in program debugging as well. In pro gram debugging, m any researchers recognized the need to let program m ers state w hat they want to know about their program ’s behaviors at a higher level of ab straction. T he event based view of program behavior[W il86, Pnu86], originally developed for concurrent or distributed systems, has been used in m any debugging systems to describe program behaviors [BH83, BW83, Bru85, Bat88, HK88]. By this view, an event models som ething occurring and any behavior of a system is considered to be an event. An interesting activity consists of a sequence of events, where the particular events in the sequence may be at any appropriate level of abstraction. Since any interesting program m ing system is capable of producing a vast num ber of different behaviors, depending upon different inputs it receives, the event-based view can be very expressiveness in describing possible behaviors of a system w ith a set of sequences of event[BW83, Bat88]. This approach provides a set of predefined prim itive events and a language based on EBB A upon which high level events can be defined [BW83, Bat88]. Typical prim itive events are function execu tions, file operations, and d ata references. Relationships betw een different events are captured by the order in which they occur. P ath expressions [Bru85] are used to represent th a t order. Inform ation available to define an event is the d ata defined in users’ program s, calling sequence of functions, and prim itive operations on data. Program m ers specify w hat they want to monitor by defining events in term s of the predefined prim itive events and other defined events. A m onitoring system only 17 collect¥^ataTfoF1Ehe predefined eventsTDefined events are recognized by using finite state autom ata techniques w ith syntactic p attern m atch [BW83, Bru85, Bat88]. Up to now, we have discussed those PM M languages th a t are used as a sepa rated language. There have been some efforts to enhance program m ing languages to include some m onitoring and m easuring facilities. These enhancem ents explic itly represent the inform ation which was traditionally left im plicit in program ex ecution and makes the inform ation accessible w ithin the program m ing languages. M any languages include some m onitoring statem ents inside program m ing languages [Han87, LW69, Smi84, Dav80, Bat83]. For instance, SNOBOL[Han87] has facilities th a t support three kinds of m onitoring. First, it supports accessing program in ternal param eters through keywords. Second, it supports using p attern m atch to invoke m onitoring and m easuring actions. Third, it supports some condition test ing operators so th at program m ers can do m onitoring and m easuring conditionally. Accessing program internal param eters is done by tracing. Tracing can be applied to different param eters. First, value tracing accesses value of variables whenever a value is assigned to a traced variable by an assignm ent statem ent or as a result of value assignm ent in p attern m atching. Second, Function tracing traces entry point and retu rn point of a function. Third, Label tracing causes trace print out whenever transfer is m ade to a label and only under some condition. Fourth, Keyword Tracing causes trace print out when the value of a nam ed keyword is changed. Program m ers can define trace functions to be invoked when any of the events m entioned above occurs. Special variables and functions are introduced in the lan guage to facilitate program m onitoring and measuring. For exam ple, a special vari able $stno is used to record the num ber of the statem ent currently being executed and a special function arg(fn,n) is used to access th e n th param eter value of the function fn. T he facilities described above make PMM more convenient. However, in order to m onitor program execution programmers m ust explicitly w rite m onitoring code using these facilities and insert the code into their program s. Many found th at jexplicitly enhance a program m ing language to do program m onitoring and m easur ing has a lot of restrictions[PN 81, BH83]. Moreover, directly changing a program for m onitoring purposes could introduce some additional m aintenance problems. 18 Hence, a separate language is often used to describe PM M requirements[Bru85, Sno82, Sno84, Sno88, SPSB91]. 2 .1 .3 V ery H igh L evel L angu ages an d P M M T he gap between very high level program m ing languages and the underlying compu tatio n environm ent upon which the programs w ritten in th e languages are executing is larger than th a t of m ost conventional high-level program m ing languages and en vironm ents. Using very high level program m ing languages has m any im pacts on specifying PM M requirem ents. F irst, the costs of program execution are not appar ent from the source code. Hence, PM M systems m ust provide lower level details to describe some execution activities th at are not p art of the high level languages’ exe cution model. Moreover, those systems m ust provide a language for specifying those details m entioned above. Second, because those details useful for describing th e per form ance of program execution are not part of the languages’ execution model it is m ore im portant to let program m ers to state them declaratively and let systems to figure out how to get them . It is m ore efficient and less error prone for a m onitoring system to do th at. Third, the d ata structures and control structures in programs w ritten in very high level languages may have so m uch inform ation associated with them th a t it is m ore im portant for program m ers to ask high level questions which uses inform ation of both d ata state and control d ata of program s. Doing so enables the program m ers to focus on w hat they want to know w ithout being overwhelmed by details. Fourth, the large gap also makes it more im portant for the PM M lan guage to support abstractions so th a t program m ers can introduce their own term s and some idiom atic usages in specifying w hat they want. Supporting abstractions, e.g., aggregations and specializations, also makes it easy for program m ers to tell m onitoring system s w hat they want. T hat is especially tru e if w hat they want to know includes those execution details introduced for the PM M purposes. 2.2 Data Collection In program m onitoring and m easuring, the d ata collected is usually th e tim e and space usages of program execution. D ata collection is very im portant in PMM 19 because d ata collection collects d ata upon which inform ation needed by program mers is based and because the range of data th at can be collected determ ines what questions can be answered. D ata collection can be done by either hardw are or software. Systems using special hardw are to collect d ata can be found in [Ben88, HW90, CLW90]. Using special hardw are to collect d a ta of program execution has the advantage th a t d ata collection does not affect th e perform ance of program execution. The disadvantages relative to software d ata collection are th a t it requires special hardw are, the d ata collected is very massive at a very low level, it is very inflexible, and the collected d ata is very expensive to analyze. There are m any systems th a t collect d ata of program execution via software[Knu71, CC76, Sno82, G+83, C+83, Pla84, Sno88]. Some systems do d ata collection by altering the compiler of the program m ing language in which the m onitored program is written[CC76, G+83, C+83]. O ther systems directly alter the m onitored source program s to collect d ata about them[Knu71, Sno82, Pla84, Sno88]. W ith either hardw are d ata collecting or software d ata collecting there is an issue as to how m uch d ata to collect. There are various ways of determ ining how much data to collect. M ethods range from collecting all d ata th a t can be collected[CC76, G+83] to collecting only the specified data[Sno82, Pla84, Sno88, LL89]. Systems also differ in what d ata to collect. Some systems[Bal69, Sam89, HW90] record execution traces. O ther systems[Knu71, G+83, C+83] collect less data. Pro file based systems collect only the tim e spent on program com ponents an d /o r the num ber of tim es these com ponents were executed during program execution. Most of those systems do not have th e flexibility of controlling w hat to m onitor. Usually only a few predefined forms of m easurem ent are supported (e.g., how m any times each procedure is called and the tim e it consumes). In data-based system s, because it is hard or impossible for program m ers to com m unicate w ith m onitoring and mea suring system s to tell w hat they want, the systems usually collect too much data and m ost of th a t d ata is not relevant to w hat the program m ers w ant. Being told w hat to collect, m odel-based systems collect all prim itive d a ta upon which other d a ta are based. W hen processing PMM requirem ents of a program ’s execution, these system s depend on processing techniques to filter out irrelevant d ata after it is collected. 20 How to tell th e system w hat to m onitor at the prim itive level has a significantj effect on how d ata are collected[PN81]. Various m ethods are used. • Labeling: Program m ers are required to label the statem ents th a t they want to m onitor. The system will collect execution d ata of those labeled statem ents [LL89]. • Position: Program m ers specify th e statem ents th at they want to m onitor by their positions in the source program[BH83]. For instance, m ain> Q > P is used to tell the system w hat they want to know is the execution of statem ent P which is inside the Q statem ent of the function main. • Index: Some systems give each line of the source program a line num ber and use it to locate statem ents of interest[Sno82, GYK90]. • Category: Some systems categorize the syntactic constructs into some cate gories and use the names of the constructs to nam e th e statem ents of interest. For exam ple, if P is an assignm ent statem ent inside function main then it can be referred as the assignm ent statem ent inside function main. If there are several assignm ent statem ents inside main then the tex tu al order of the state m ents is used to specify required statem ent. Some m ethods m entioned above could be used together w ith this method[PN81]. Requiring program m ers to figure out w hat d ata to collect and where th a t d ata can be collected is a burden for them . T h at is especially bad when very high level languages are used because determ ining what to collect and where to collect m ight require knowledge th a t is either not available to ordinary program m ers or very difficult to m aster. W hen very high level languages are used some execution activities may not correlate to statem ents. Hence, only using source level statem ents to describe m onitoring and m easuring requirem ents is not good enough. New term s m ust be introduced to describe the activities of program execution th at are specific to the com putation model used by th e program m ing languages b u t not correlated to the executions of statem ents. 21 2.3 Data Processing D ata processing transform s the low level d ata collected into derived forms to satisfy program m ers’ PM M requirem ents. In data-based system s, because the systems do not let you describe higher level questions, there is little or no d ata processing explicitly supported. For those th a t collect all prim itive d ata available, program m ers usually do the d ata processing either by reading th e d ata or by w riting program s to do it. Cohen[CC76] provided a procedure based language for program m ers to ask questions to a database which contains d ata about program execution. Samadi[Sam89] described a system th a t uses an expert system to interpret the collected data. In m odel-based system s, relational query evaluation and optim ization m ethods are used to derive high level d ata from prim itives. Snodgrass[Sno82] uses relational algebra to com pute the required high level d ata from the collected data. His system first represents the collected prim itive d ata in relations. It then com putes answers to program m er questions, which are represented as TQ uel queries, using a database query processor th a t is based on relational algebra. Using relational algebra to derive required high level d ata from the collected d ata enables the PM M system to use the optim ization techniques used in database query evaluation. Because low level d ata is collected increm entally, Snodgrass realized th a t it is not very efficient to collect all low level d ata first and then process th a t data. He introduced an increm ental derived d ata com putation algorithm th at com putes derived d a ta increm entally. In his algorithm , increm ental com putation is achieved by generating low level d ata and sending them to an event processor. The event processor first builds a m atch network th a t defines how derived forms can be com puted from the low level d ata and then dynam ically interprets the m atch network on the low level data. In the EBBA based framework of Bates[Bat88] and in the path expression based frameworks of Bruegge[Bru85] and Hseush[HK88], the com putation of high level d ata from low level d a ta is done using finite state autom ata m ethods plus p attern m atch. In those frameworks, low level d ata are first collected and then sent to an event recognizer to be processed. The event recognizer first uses the p attern of the low level d a ta to filter out those d a ta th a t did not m atch the required p atterns. It then uses finite state au to m ata m ethods to com pute the derived data. 22 Using very high level program m ing language makes PM M d ata processing m ore difficult because of the large gap between very high level program m ing languages and the underlying com puting environment upon which program s w ritten in th e languages are executing. Because th e PM M d ata collected directly is usually at low levels th a t reflect the im plem entation of the languages, th e large gap requires m uch more com plicated com putation to com pute the required high level forms from the low level data. 2.4 Related Research O ur PM M system is a m odel-based software m onitoring system . It lets program mers specify w hat they want to m onitor and m easure before program execution in a specification language. It then merges the PM M specification w ith the source pro gram m onitored using program transform ation techniques. In doing so it uses static analysis to figure out where in the m onitored source program d ata needed to sat isfy PM M requirem ents can be collected. It then transform s the requirem ents into instrum entation code so th at d ata can be collected and processed into the derived d ata of interest to the program m ers. In the following subsections, we briefly discuss program transform ation, static analysis, and increm ental derived d ata com putation. 2.4.1 P ro g ra m T ran sform ation Program transform ation[BD77] is a means to formally develop efficient program s from specifications. Program transform ations are widely used in autom atic program m ing [Bal85, Bar85, Bal86, Ric86, RW88, Smi90] and program synthesis [MW80, Qia90]. A transform ation system accepts a source program and advice on how to do th e transform ation (e.g., choices of data representations), th e transform ation system generates an efficient program based on a set of correctness-preserving transform a tion rules under the guidance of the advice. Surveys and studies com paring different existing transform ation system s can be found in [PS83, Fea86]. Evolution Transform ations[JF90] are transform ations whose purpose is to elab orate and change specifications in specific ways. They are used to support program evolution. Unlike traditional program transform ations, evolution transform ations 23 are not correctness-preserving. Instead, they add new sem antics into the m odi fied specifications on purpose. An example is adding a new param eter to a function which changes th e semantics of the program. In program developm ent such a change m ight be exactly w hat one wants. O ur system supports a set of dom ain specific transform ations th a t preserve the functionality of the source program , while like evolution transform ations it generates instrum entation code th a t collects and com putes d ata for the PM M specification. It uses knowledge of th e source program , knowledge of the PM M specification, and knowledge of the relationship between the PM M specification and the source pro gram to optim ize instrum entation code needed for com puting answers to the PMM specification. Unlike other transform ation system s, it generates different instrum en tatio n for the same PM M specification for different source program s. 2 .4 .2 S ta tic A n a ly sis o f P rogram s Static analysis is the analysis of source programs to find syntactic an d /o r semantic relationships of program com ponents, for instance, w hat functions call other func tions or where variables are bound, set, or referenced. S tatic analysis has been used in m any applications [PC90, 0 0 9 0 , Xer83, RD90, Nar89], for instance, global op tim ization by compilers [AS U86, FOW 87, RP88, RD90, Wol90], program m ing error checking [0 0 9 0 , CMB91], in program m aintenance [PC90], and in program evolu tion support [Nar89]. In our PM M system , static analysis is used to select relevant instrum entation sites of a PM M specification. S tatic analysis is used differently from how it is used in other applications in th at th e relationship between program execution activities and th e relationship betw een program execution activities and syntactic constructs in a source program are used to find instrum entation sites of the specification. Relationships betw een program execution activities, like one function execution calls another function, are used as the filtering conditions for selecting instrum entation sites so as to elim inate those sites th at can be proven irrelevant at compile time. Both th e types of program execution activities and the instances of program execution activity could be used to describe th e sequence of instances so th a t the performance of a program ’s execution can be specified more accurately th an using the type of 24 program execution activities alone as in other work, e.g., Cecil[OO90]. This is because we focus on dynam ic properties of com puter program while others like Cecil focus on static properties. 2 .4 .3 In c r e m e n ta l G en era tio n o f D er iv ed D a ta Increm ental generation of derived d ata is very like database query evaluation by which derived d ata is generated from stored d ata [Kin81a, Kin81b, CGM90, U1189, JK84, GMN84, Coh86]. However, unlike database query evaluation, the d ata upon! . ' which the derived d a ta is based is collected increm entally and is not available all at once in program m onitoring and measuring. Cohen[Coh89a] studied increm ental generation of derived d ata in compiling database transition triggers. In his sys-J tern, triggers are com piled into m atch network so th a t efficiency can be achieved by sharing partial results. In studying integrity constraints of database systems] increm ental generation is used to com pute derived d a ta th a t violate database con-J straints [Nic82, QW 86, SJGP90]. Increm ental generation of derived data is alsoj studied by Forgy[For79] in efficiently im plem enting production system s. In Cohen’s work[Coh89a] a language is used to specify conditions becom ing tru e in databases. T he language is a tem poral extension to the language of first order logic which enables references to both th e state before and after a database transition. The required derived d a ta is specified as triggers using th e language. Triggers are com piled into a network of m atchnodes, each of which has an associated description and a program and is connected to other nodes, its predecessors and successors. D ata satisfying a description is com puted at the node w ith which the description is associated and is used as the inputs to the successors of th e node. T he output of th e node w ithout successors is the derived d ata needed. This system assumes th a t there is no relationship between the d ata upon which th e derived d ata is based other th an the relationships specified in defining the derived data. In our PMM system , derived d a ta is usually based on m any prim itive d ata and there are some relationships among the prim itive d ata th a t could be used to m ake the com putation m ore efficient (e.g., th e tem poral relationships). 25 2.5 AP5 Computation Model — A PMM View An A P5 program uses relations to represent the state of the d ata m anipulated by a program . Program execution changes these relations and thus moves from one dat J state to another. Furtherm ore, these transitions are atom ic. W ithin a state, dat J is accessed via associative retrieval. The consistency of the states is autom atically m aintained relative to a set of user defined consistency rules. W henever a transition is proposed, these rules’ consistency conditions are checked. If there is no violation of th e conditions, the transition is made to the proposed state. If there are vio lations, th e rules m ay propose additional updates to restore their conditions. The augm ented transition is then attem pted just like the original. This in tu rn may lead to m ore violations and repairs. Only if a consistent transition can be found, is the d atabase updated. O therw ise the transition is aborted (i.e., no change is m ade to the database) and an exception is raised. The success of an atom ic transition may also trigger user defined autom ation rules which fire on every successful transition th at satisfies th eir predicates. These predicates are expressed in a tw o-state relational logic enabling reference to the before and after states of a transition. The body of autom ation rules is AP5 program whose execution may cause further state changes to occur. M ore inform ation about A P5 can be found in [Coh88]. O ur program m onitoring and measuring system uses A P5 both as the source program m ing language and as its im plem entation language (to simplify the recording and access of instrum entation d ata and the com putation of answers to the PMM questions from the prim itive d ata collected). 26 Chapter 3 A Program Monitoring and Measuring Specification Language High level program m onitoring and m easuring requires an explicit specification of th e program execution activities to m onitor and a set of questions about those ac tivities. T he purpose of the explicit specification (including th e PM M questions) is to describe the activities of interest, i.e., to tell th e PM M system w hat to mon itor and m easure. By doing so the PMM system can focus only on the execution activities specified and avoid collecting d ata th at is irrelevant to th e questions. Ex plicitly specifying program execution activities to m onitor and the PM M questions about them requires a language. Although a num ber of fundam ental principles con cerning good principles of specification languages[BG79] are proposed and a few specification languages have been used successfully[SSS81, Sys86], there is currently no specification language th a t is designed specifically for program m onitoring and m easuring. This chapter presents such a language designed for specifying program execution activities to m onitor and questions about those activities. The rest of the chapter is about a PMM specification language for the program m onitoring and m easuring of AP5 programs. First, we discuss the objectives of th e PM M specification language. Second, we present a m odel of program execution activities, called the PM M event model. Third, we introduce a vocabulary for specifying execution activities th at are specific to the A P5 com putation model. Finally, we define the syntax and semantics of this specification language and the properties of valid PM M specifications. 27 3.1 PMM Specification Language Objectives The goals of th e PM M specification language are analogous to those of very high level language’s [BCG83] - to allow a program m er to describe in the m ost n atu ra term s w hat he wants. The specification language is an interface betw een a pro gram m er and an autom atic program m ing system. It plays both an external anc internal role. T he external role is as a specification language to specify PM M re quirem ents. It enables a program m er to state w hat he w ants to know instead of how it is to be determ ined. It also allows him to state his PM M requirem ents in term s of his problem domain. It thus makes specifying PM M requirem ents easier. ToJ satisfy these objectives requires the PMM language be high level, declarative, and expressive. Being expressive requires the language be able to accurately describe the perform ance of program execution. More specifically, the language m ust be able to, identify the events of interest in the program execution and th e relevant portions of th e control and d ata states needed to define those events including access to dynam ic program execution inform ation, like the contents of the runtim e stack or the historyj of d a ta state changes, which is not directly accessible in conventional program m ing languages. These abilities enable the program m er to focus on th e portions of the program execution which he is interested in. Finally, the language should support! abstraction so th a t a program m er can build higher level term s from lower ones and avoid irrelevant details. T he internal role for the PM M specification language is to tell the system what to m onitor and m easure. In particular, besides the specification requirem ents m en tioned above, the PM M specification language needs to satisfy internal practicality and efficiency requirem ents. First, the transform ation from a PM M specification to th e instrum entation used to collect d ata and process the d ata during th e program execution m ust be doable autom atically. We call this the operational requirem ent. Second, the instrum entation generated m ust be efficient in both execution tim e and space. We call this the efficiency requirem ent. There is a conflict between these two roles played by th e specification language. On one hand, program m onitoring and m easuring needs to transform the PM M specifications into a set of d ata to collect and m ethod of com puting the answers from this data. Hence, the closer the level of the specification of execution activities is to 28 th e com putation environm ent the easier the m apping will be. On the other hand,! specification languages attem p t to move the specifications closer to the problem! dom ain and farther from the details of th e com puting environm ent. Thus, the higher th e level of the specification language, the m ore com plex th e m apping th a t th e autom atic program m ing system m ust perform. T he PM M specification language is designed to incorporate th e two roles m en tioned above w ithin a coherent framework. The specification language provides a vocabulary of prim itive program execution activities and their relationships to m odel th e features of the AP5 com putation model. They enable a program m er to specify program execution activities such as atomic execution and rule triggering as well as relationships such as the atom ic execution th a t triggered some rule. High level activities are defined in term s of prim itive activities and th eir relationships. T he principles for composing these prim itives into high level abstraction in ways th a t facilitate m onitoring and m easuring are: • Prim itives in the activity model correspond to the execution of some specific syntactic structures. • Behavior com position allows simplification and abstraction. T he expressivity and efficiency requirem ents indicate several desirable properties of th e specification language: activities should be represented as a hierarchy where activities are represented as nodes and the leaves correspond to th e observable prim i tive activities so th a t m apping between the inform ation collected during the program execution and these prim itive activities is simple. O ther nodes in th e hierarchy are defined in term s of the leaves and nodes th a t have already defined. Not only does this simplify com puting high level m appings from lower level ones, represented by th e leaves, but it also greatly simplifies observing the behavior of a complex program from the details of its statem ents by specifying the intended activities at m ultiple levels of abstraction. The nodes in the hierarchy are activities which can be com puted from th e nodes below them . This also facilitates behavior abstraction and program m onitoring and m easuring since the system can use inform ation about the hierarchy to filter out some irrelevant activities.1 Hence, a program m er can build 1 How that will facilitate program monitoring and measuring will be discussed in chapter 4. 29 up high level term s and specify PM M requirem ents w ith them . Simultaneously, the autom atic program m ing system can use these term s to build up an efficient im plem entation of m apping low level data of program execution into a level th a t is com prehensible to th e program m er. 3.2 A PMM Specification Language Intuitively in the specification language, the execution of a program is m odeled as a sequence of states. T he transitions from one state to th e next are called events. Each event is of some type and occurs at a different tim e. For m onitoring purposes we in stan tiate this model w ith a very fine grain size of states. M ost of these execution events are related to syntactic constructs of the program m ing language th a t appear in the program being m onitored, i.e., they’re p art of some statem ent. However they generally correspond much m ore strongly to an activity model th at is not p art of the program m ing language semantics, but is nevertheless specific to th e program m ing language. 3.2.1 P ro g ra m M o n ito rin g and M ea su rin g E v en t M o d el O ur specification language is based on an extended E ntity-R elationship d ata model [Che76], in which program execution activities (called events) are modeled by en tities, and relationships among them are m odeled as relations (called control rela tions). T he prim ary com ponents of our PMM language are th e explicit definition of events, attrib u tes of and relationships among events, constructors for building derived events, and perform ance questions about those events. To describe the m odel in more detail we will proceed in four steps. We begin by describing event types th a t model the classes of the program execution activities. Second, we describe control relations th at model th e relationships am ong events. T hird, we describe derived relations and event types. These are built from event types and control relations. Finally, we describe PM M specifications. 30 E v en ts and E v en t T y p es Events are used to m odel activities in AP5 program execution. Each event has a type. Inform ation useful to describe the perform ance aspects of th e activities is abstracted into a set of attrib u tes defined for each event type. In this m odel, we distinguish event types and value types. Value types are types whose m eaning can be universally understood while the meaning of event types can only be understood by their relationships to other event types or value types. Event types are organized as a hierarchy by using specialization. For exam ple, Integer, List are of value type and function-execution is an event type. There are two kinds of prim itive event types, point event type and interval event type. A point event type models a set of points on th e execution trace of a program . For exam ple, during program execution entering or leaving a function is a point event. Each m em ber of the point event type is called a point event. An interval event type models a set of pairs of point events on th e execution p ath of a program . Each m em ber of the interval event type is called an interval event. An interval event takes place over a consecutive segment of tim e. For example, a function execution is an interval event. R e la tio n sh ip A m o n g E v en ts Events are introduced to model program execution activities. Control relations are introduced to m odel the relationships among events. In th e PM M specification language we distinguish three kinds of relationships: first, temporal relations model the tem poral relationships between two events, for instance, th e execution order of two atom ic execution in AP5 programs; second, attributes m odel events and their execution environm ent, for exam ple, the duration of an atom ic execution; and third, Static Relations are used to define conditions on values, for instance, the value passed to the first param eter of a function execution is not equal to zero. A relation can be stored or derived. A stored relation consists of relationships th a t are explicitly asserted. A derived relation consists of relationships th a t are derived from stored relations or other derived relations. In the following sections we present a general tem poral relation among events and define some com putation m odel specific relationships am ong events for the AP5 com putation model. We assum e th a t the 31 AP5 program s to be m onitored are running on a SISD m achine w ith a centralizec clock. T he execution order of events is a tem poral relationship in program execution. In th e PM M language, there is a tim e attrib u te defined on each of the point event types. T he value of the tim e attrib u te is the value of the clock when th e event occurs. Tem poral relationships of events are defined on the values of their tim e attrib u tes. Because there is a natural order among the values of th e tim e attribute] the value of the tim e a ttrib u te is used to describe th e execution order of point! events. Hence, for two different points P i and P 2 th e following relation is true: (ORj (Before Pi P 2) (Before P2 P i)), where (Before Pi P 2) m eans th a t Pi was executed earlier th an P 2 was. T he execution order among interval events is defined in term s of the point events associated w ith them . For exam ple, for interval events we call the starting point event of it begin-point and the finishing point event end-pointJ Correspondingly, we call the tim e attributes of the two point events begin-time andl end-time. Suppose X and Y are interval events then (Before X Y) is defined to be true if the finishing point of X is earlier than the starting point of Y. More precisely] (Before x y) = ((x y) (3 (tx ty) (AND (end-time x tx) (begin-time y ty) (< tx ty)))) D eriv ed E v en t T y p e s and D eriv ed R ela tio n s The control relations and event types in the specification language allow a complex program activity to be described, but it would be m ore convenient to m onitor andl m easure a large program if program m ers can define their own term s. It would also be m ore convenient if they can define high level concepts in term s of low level ones. Derived event types enable them to do that. Intuitively, higher level activities are modeled by clustering and filtering. Clus tering aggregates low level events and previously defined high-level events into higherj level events. Such higher level events specify a collection of events and describe howj they relate to each other. Filtering serves to elim inate from consideration events th a t are not relevant to an activity model being investigated. Filtering is effected by specifying required relationships among cluster m em bers. T he relationships of 32 events form ing an activity m odel are expressed in term s of tem poral constraints des ignating acceptable orderings, and relational constraints am ong attrib u tes defined by different events. A derived relation is defined on event types and control relations. It enables a program m er to build up application specific term s (derived events). A derived relation has two parts: - Nam e, specifying the nam e of the defined relation. - A specification of the events or relations the defined relation is based on and th e relationships they m ust satisfy. It is of the form { (® 1 » j • ••> ® n ) | *^2; • • • j *®n)} where F ( x \ ,x 2, ..., x n) is a first order predicate form ula w ith event types and control relations used as predicates. A derived event type is a unary derived relation. 3 .2 .2 P red efin ed E v en t T y p e s and C o n tro l R ela tio n s Prim itive events (prim itives) are the basic building blocks for describing the execu tion activities of a program . It is therefore crucial in designing a PM M specification language to choose an appropriate level of detail for these prim itive building blocks. This choice makes three fundam ental com m itm ents. F irst, it m akes some activities unobservable, e.g., in the A P5 language, if we choose atom ic execution as a prim itive activity then the interm ediate state of the database is unobservable. Second, it makes some activities describable only as part of com posite activities, e.g., again in th e AP5 language, if we choose rule triggering as a prim itive activity then it can only be p art of an atom ic execution. Third, the lower th e level of prim itive activities the m ore work is needed in defining execution activities in term s of these prim itives. Thus there is a tradeoff to be m ade between the level of details the prim itives de scribe and the efficiency of reasoning w ith it: more detail m akes observing more accurate b u t results in m ore work. We use the following principles to choose the prim itive event types for the PMM specification language. T he m ain purpose of m onitoring and m easuring is to un derstand the perform ance tradeoffs of program execution. Therefore, the execution 33 d a ta collected should help a program m er focus on specific portions of his program. T he guideline for choosing a set of prim itives for a com putation m odel is th a t there is a very close correspondence between those prim itives and source program m ing language constructs so th a t d a ta about those prim itives can be easily associated w ith th e p arts of the source program w ritten in the language.2 Because a program m ing language has only a small set of prim itive constructs and a source program is constructed from them , choosing those constructs as prim itives m akes it easy to associate instrum entation d ata w ith those building blocks. This guideline can be realized by the following steps: 1. Identify a set of im portant language constructs in the source language. For the A P5 language, this set includes function, relation, rule, and atom ic statem ent. 2. If a construct identified is a d ata structure used in program execution, all of th e operations on it are also selected as prim itives. For exam ple, opera tions on relations in AP5, such as, relation insertion and relation deletion are prim itives. 3. If a construct identified is a piece of executable program , th e execution of the construct is selected as a prim itive. For exam ple, the execution of a function is a prim itive. 4. M ake tem poral relationships among the selected prim itives and the idiom atic usages of them predefined control relations. For exam ple, th e relation before is used to specify one event occurs before another. T he input and output param eters of prim itives are chosen as attrib u tes of th at event type. In addition, we also define the tim e a ttrib u te for each point event type which is the value of the clock when the event occurs. Finally, th e d a ta state of the m onitored program is defined as an attrib u te for prim itive events. For example, in A P5 program s, the tru th or falsity of predicates on their d a ta state can be used as 2This will also help the autom atic programmer for the PMM language to select the places in the source program to insert instrumentation in Chapter 4. 34 constraints on an event. This enables program m ers to use the d ata state of their program to describe m onitoring activities.3 As exam ples of applying these principles, we describe how some of th e predefined prim itives are selected next. See A ppendix A for a com plete list of AP5 prim itives and th eir attrib u tes. R e la tio n In th e A P5 language, Relations4 are used to represent d ata and the relationships am ong them . There are two kinds of Relations: stored relation and defined rela tion. Stored Relations are prim itive and Defined Relations are defined in term s of Stored Relations or other Defined Relations. T he operations defined on Relations are testing, inserting, deleting, generating, and triggering. There is a set of pa ram eters associated w ith each of these operations. For exam ple, the inserting and deleting operations require a Relation nam e as one of their param eters. Based on th e guidelines stated above we have the following prim itives and attrib u tes. The prim itive event types defined on relations are: relation-insertion, relation-deletion, relation-test, relation-generation, and relation-triggering. The attrib u tes defined for relation-insertion and relation-deletion are the nam e of a relation and a tuple of val ues to be inserted or deleted. In addition to th e attrib u tes m entioned above, there is an a ttrib u te Test-result for relation-test. Because there are m ore param eters used in relation generation, there are a few additional attrib u tes for relation-generation. They are generation-pattern whose value is the generation p attern used in generating th e relation and generated-tuples whose value is th e set of tuples generated for the relation. More detailed description of the attrib u tes and prim itive types defined on Relations can be found in the appendixes. Given the term s defined above we can describe program execution activities about Relations. For exam ple, suppose we have a ternary relation work-on-% w ith th e first param eter of type person, the second param eter of type project, and the 3It also enables them to access application specific terms defined in their program. For example, if both p e r s o n and m a n a g e r are types in a program then it is possible to check if a p e r s o n is a m a n a g e r . 4The relation used here is a data representation o f the APS language. It is different from the relations used in the PMM specification language. 35 third one of type integer, (work-on-% ’John 'FSD 50) means John works on the FSD project 50% of the tim e. T he prim itives defined on Relation can be used to describe activities like testing the relation Work-on-% w ith the last param eter equal 50, or generating relation Work-on-% w ith generation-pattern (input input output) -as follows: {X | (relation-test X ’Work-on-'/,) A (parameter X 3 50)} {Y j (relation-generation Y ’Work-on-*/,) A (Generation-pattern Y ’(input input output))} jwhere parameter and generation-pattern are attrib u tes of event type relation-test and relation-generation respectively. A to m ic S ta te m e n t In the AP5 language, an atom ic execution moves program execution from one data sta te to another. A lthough in the AP5 language sem antics an atom ic execution appears to be atom ic, it is really m ade up of m any steps and perform s m any internal tasks. An atom ic execution makes changes to th e AP5 database, checks consistency of th e database, and triggers demonic actions. T he outcom e of an atom ic depends on the set of updates proposed, consistency conditions specified |by th e consistency rules defined in the program and th e demonic conditions specified jby the autom ation rules. Syntactically there is an atom ic statem ent whose execution is an atom ic execution. Because atom ic execution corresponds to th e execution of atom ic statem ent we choose it as a prim itive, called atomic-execution and its ^attributes are: (1). Data-gathering-time, whose value is the tim e spent in com puting proposed updates of the atom ic execution; (2). Proposed-updates, whose value is the proposed updates for relations; (3). Updates-done, whose value is the updates done by th e atom ic execution; (4). C-rules-triggered, whose value is the list of consistency rules triggered; (5). A-rules-triggered, whose value is the list of autom ation rules triggered; (6), Rules-triggered, whose value is th e list of rules triggered. These prim itives can be combined to describe high level activities such as an atom ic execution th a t triggers the rule one-person-with-two-offices and updates the relation office-of-person as follows: condition execution 36 {X j (atomic-execution X) A (Rules-triggered X ’(ONE-PERSON-WITH-TWO-OFFICES)) A (Updated-done X ’(OFFICE-OF-PERSON))} where both Rules-triggered and Updated-done are attrib u tes defined for event type atomic-execution. R u le In the A P5 language, rules are used to m aintain the consistency of the AP5 database and to support dem onic actions. Rules enable an AP5 program to react to state s of th e A P5 database. Rule execution occurs inside an atom ic execution. It i has two steps, first th e AP5 system checks rules against th e changes in the database to see if there is any rule triggered (violations). If there are some rules triggered jthe action p art of the rules is executed using th e values th a t triggered the rules as param eters. T he execution of the action p art of the rules m ay suggest some repairs I for the violations of th e database or introduce further changes to the database. Using th e principles m entioned above, the prim itives chosen are: (1). Rule-execution, jwhich represents th e rule triggering and rule action executing of a rule. (2). Rule- triggering, which represents the rule triggering part of th e execution, (3). Rule- body-execution, which m eans the execution of the action p art of a rule. Some of ;heir defined a ttrib u tes are: Values-triggered, whose value is th e list of values th a t triggered the rule, and Proposed-updates, whose value is the list of updates proposed 3y the rule. For exam ple, activities like triggering a rule one-person-with-two-offices th a t pro posed some updates to relation office-of-person can be expressed using the prim itives defined on rule as follows: {X | (Rule-triggering X ’ONE-PERSON-WITH-TWO-OFFICES) A (Proposed-updates X ’(OFFICE-OF-PERSON))} where Proposed-updates is a defined attrib u te for event type Rule-triggering. P r e d e fin e d C o n tro l R e la tio n s o f A P 5 P r im itiv e s T here are some common control relationships in the AP5 com putation model, such as one function execution calling another. It is both convenient and efficient to make change 37 th em p a rt of the specification language. Suppose X and Y are function-execution events then (Calls X Y) means th a t Y is an execution of some function which is (directly called during th e execution of X. Triggers is another exam ple of a commonly used control relation in th e APS com putation model. Assum ing th a t X is an atomic- execution and Y is a rule-triggering then (triggers X Y) m eans the execution of X ^triggered Y. Introducing those idiom atic description of the relationship among events .not only m akes it easy to describe program activities to m onitor but also makes the im plem entation m ore efficient.5 A g g reg a te O p era to rs In specifying program m onitoring and m easuring requirem ents, sometim es a pro gram m er needs operators th a t are defined on a set of events. Aggregate operators such as count, average, sum, mix, and max are introduced to make specifying the aggregate inform ation of program execution activities easy. Count is defined on any event type. It counts the num ber of tim es events of the type occurred. Average, 'sum, mix, and max are defined on num erical attrib u tes of event types. They com pute th e average value, the sum, the m inim um value, and th e m axim um value of a ttrib u te values respectively. 3 .2 .3 P M M S p ecifica tio n s A PM M specification specifies a program m er’s m onitoring and m easuring require m ents about th e execution of his program . It has two parts. T he first p art defines high level events. The second p art specifies w hat d ata to record for those events. For exam ple, if th e activity of interest is the execution of a function then the data to record m ight be the values passed to it or the tim e spent in executing it. We call jthe second p art of a PM M specification the PM M questions. They can be defined on the prim itive events and control relations or on th e high level events defined in jthe first part. PM M questions are m odeled as queries over these prim itive events, control relations, and higher level events. 5Making the idiom atic usages part of the PMM specification language enables the autom atic programming system to build up special implementation for them , and thus, makes the implemen tation more efficient. T his is discussed more in Chapter 4. 38 o Thus, a program m er can not only w rite a PM M specification in term s of the predefined vocabulary but can also define his own term s using abstractions anc aggregations. B eh a v io r A b str a c tio n Derived event types can support behavior abstraction. There are two dimensions^ of abstraction supported. F irst, the level of detail at which portions of an event can be viewed, e.g., sometimes we only pay attention to w hether an event occurredj while other tim es we also pay attention to its attributes. Second, th e derived event m echanism enables a program m er to define derived inform ation in term s of more prim itive events. A bstraction perm its details about th e a ttrib u tes of derived events to be ignored. For exam ple, if we are only interested in w hether or not relation F 0 0 is deleted and we are not interested its attrib u tes we can define a new event type, say delete-FOO, as follows, {X I (relation-deletion X 'FQO)} Once the new term is defined, it can be used in the specification th e same way as th e predefined prim itives. B e h a v io r A g g reg a tio n A ggregation clusters low level activities together to constitute high level activities. Aggregation is realized by treating several different events as com ponents of a derived event. For exam ple, suppose we are only interested in w hether or not relation work- on-% is updated and do not care w hether th a t update is an insertion or deletion, we can define a new event type, say update-work-on-%, as follows, {X | (relation-insertion X ’work-on-*/,) V (relation-deletion X ’work-on-X)} Again, this new term can be used to define other term s th e sam e way as those predefined prim itives. 39 3.3 Syntax and Semantics of the Language T he PM M language extends the AP5 relation definition and query language with predefined prim itive event types, predefined control relations, and derived event type and m onitoring question definitions. 3.3.1 S y n ta x o f th e P M M S p e cifica tio n L an gu age As described in previous sections, a PM M specification has two parts: a PMM event m odel and a question specification. A PM M event m odel consists of a set of derived relation or derived event type definitions. A derived relation definition has the following syntax: (defevent Name :definition (V s.t. WFF)) where Visa list of variables and W FF is a relational calculus like form ula in which variables in V are free. A question definition has the following form: (defquestion Question-Name :definition (U s.t. WFF)) where (J is th e desired set of d ata and W FF is a relational calculus form ula th at relate these d ata to the defined events. See A ppendix A for th e com plete syntax of th e PMM specification language. 3 .3 .2 S em a n tics o f th e P M M S p ecifica tio n L an gu age This section has two parts: first, an operational semantics of the PM M specification language; second, some constraints on th e specifications w ritten in th e language to ensure th e specified activities are finite. T h e R e la tio n D efin ed by an E v en t T y p e D efin itio n An event ty p e definition defines a set of values denoted by {(x 1, x 2,...,x n) | F ( x u x 2, ...,£„)} 40 i.e. it defines th e set of values th a t satisfy the form ula F. More specifically, supposej th a t p i,...,p n is the list of all predicates6 used in the event definition E D , and suppose P i,...,P n are relations, where P,- consists of all those tuples (g i,. ..,a,k) such th a t p { a \,...,ak) is known to be true. An event E of event definition E D is made tru e by this substitution if the following hold: 1. If E is an ordinary event type or control relation, th e E becomes p(&i, under this substitution, and (61 ;..., bk) is a tuple in the relation P correspond ing to p. 2. If E is a static relation, then under this substitution E becomes (0 b c)7 anc th e relation (9 b c) is true. S y n ta c tic co n stra in ts for th e P M M sp ecifica tio n la n g u a g e In the PM M specification language, if we consider all prim itive event types and control relations as stored relations and derived event types as derived relations we need some constraints to put on the way a derived relation or event type is defined so th a t th e derived events of the type can be obtained by applying some operations on th e stored relations. One property we would like to have is finiteness, i.e., given a set of finite stored relations th e derived relations defined on them should be finite as well. One sim ple approach[U1188] to avoiding event definitions th a t create infinite re lations from finite ones is to insist th a t each definition used in a PM M specification satisfy a set of rules. Satisfying the set of rules ensures th a t the events defined by the definition are finite. The set of rules is as follows: 1. T here are no uses of the universal quantifier V. 2. W henever an OR operator is used, the two formulas connected, say P i V F 2 . have th e same set of free variables; i.e., they are of th e form F1{X 1,...i X n) V F2( X u ...,X n) 6Both event types and control relations can be considered as predicates for the specification1 language in the sense that they can be used to tell if an instance is of the specified type or a part| of the Specified relation. 7 6 is used to represent a s t a t i c r e la tio n like EQ. 41 3. Consider any m axim al subform ula consisting of the conjunction of one or m ore form ulas Fi A ... A Fm- Then all variables appearing free in any of these F fs m ust be lim ited in the following sense. (a) A variable is lim ited if it is free in some Fi, where Ft is not an arithm etic com parison and is not negated. (b) Any variable X th a t appears in a form ula (9 X a) or (9 a X ) , where a is a constant, is lim ited. (c) Variable X is lim ited if it appears in an event (0 X Y ) or (9 Y X ) , where Y is a variable already known to be lim ited. 4. A negation operator m ay only apply to a term in a conjunction of th e type discussed in (3). We require every definition used in th e PM M specification to be safe in this way. 3 .3 .3 P r o p e r tie s o f V alid P M M S p ecifica tio n s There are several restrictions on constructing a derived event type from some other event types: IC1 T here is no recursive definition. IC2 There is no tem poral dependency cycles. Suppose th a t P i, P 2 , ..., Pn are point events if Px -< P2 -< ... ■ < Pn, where Pi -< P2 m eans Pi occurred earlier th an P2 did, th en i f 1 j = > Pi f Pj- Because of th e linear tim e m odel we used it does not m ake sense for an event to occur both earlier and later th an another event. IC3 Every definition in a PM M specification is safe. IC4 A ttrib u tes and events which the attrib u tes based upon are com patible, i.e., only those attrib u tes defined by an event type can be accessed. T he first and th ird restrictions are adopted because of the operational require m ent and th e efficiency requirement[HK87]. T he second and fourth restrictions are adopted because they prevent PM M specifications from denoting em pty behaviors, i.e., no event will ever satisfy them . 42. 3.4 Summary T he PM M specification language introduced for high level program m onitoring and m easuring is a specification language tailored for program m onitoring and m easur ing. T he fram ework of events, control relations, and abstractions is designed to keep the syntax and sem antics of this specification language very simple. T here are three key reasons th a t it can be so simple and still allow the specification of m onitoring and m easuring complex program behaviors. F irst, there is a rich vocabulary of prim itive program behaviors to describe AP5 program behaviors and their relationships. Second, there are principles by which complex behavior specification can be built from th e specifications of simple behaviors and their relationships. T hird, th ere is ja very im portant relationship between the vocabulary used to describe program be haviors and th e syntactic structures used to w rite source program s which facilitate urogram m onitoring and m easuring. 43 Chapter 4 An Automatic Programming System for PMM Specification This chapter presents an autom atic program m ing system (called Sm artM onitor) th a t accepts a source program and a valid specification of program m onitoring and m easuring requirem ents, w ritten in the PM M specification language defined in C hapter 3, and autom atically generates the required instrum entation to satisfy the specification and merges this instrum entation into th e source program . This generated instrum entation collects d ata during program execution and com putes the specified m onitoring results w ithout changing the functionality of the original source program . T he goal of instrum entation generation is not m erely generating instrum enta tion to record d ata so th a t th e specified results can be com puted, but to do so efficiently by recording only relevant data. Because th e instrum entation code is generated at compile tim e, while th e determ ination of w hether a particular piece of d ata is relevant and should be recorded needs in general run tim e inform ation, th a t determ ination m ust act as a run tim e filter to restrict the recording to d ata th a t is tru ly relevant. Recording irrelevant d ata takes not only space b u t also time. M oreover, because subsequent analysis m ust process all the recorded data, th a t is an additional com putation cost as well. Hence, there is a twofold reason to record as little irrelevant d ata as possible. 44 (defun BOTH (C) (when (> C 0) (F00 C)) (BAR C)) (defun IR (D) (BAR D)) Figure 4.1: A Piece of Source Program 4.1 Aspects of Instrumentation Generation Because Sm artM onitor needs to generate instrum entation code and m erge it w ith the source program before program execution, it has to answer th e following questions. F irst, w hat d a ta is relevant to the events specified in the specification? Second, how to m ap th a t collected d ata into answers for the queries? Third, w here in the source program should instrum entation be inserted? Finally, how can th e dynam ic aspects of testing conditions and collecting d ata be managed? Sm artM onitor has three components: • Event Schema Generator, which determ ines w hat instrum entation d ata to col lect and how th a t instrum entation d ata will be stored and later retrieved. Its function is to answer the first two questions. • Instrumentation Site Generator, which determ ines w here in th e source program the generated instrum entation should go. Its function is to answer the third question. • Instrumentation Code Generator, which generates instrum entation code for each place identified by the Instrumentation Site Generator and merges it into the source program . Its function is to answer the fourth question. T he following subsections explain how each of the three parts works and how they work together. We have selected a single example which we will use throughout this chapter. Suppose the source program to m onitor is as in Figure 4.1 (only relevant parts are shown). Suppose we have a PM M specification as in Figure 4.2. T he paraphrase 45 (defevent fns-call-FOO-and-BAR :definition ((x) | (Exist (f F00 BAR p) (AND (Function-execution F00 ’F00) (Function-execution BAR ’BAR) (Function-execution x f) (Parameter x 1 p) (Not (= p 0)) (Calls x F00) (Calls x BAR))))) (defquestion Monitor-fns-dura-params :definition ((x y) I (Exist (z) (and (fns-call-FOO-and-BAR z) (duration z x) (parameters z y)))) Figure 4.2: A PM M Specification of the specification is as follows. The event type definition defines a set of events to be m onitored. T he events, in this case, are function executions th a t directly call both function FOO and function BAR, and whose first actual param eter is non zero. T he question definitions specify which d ata about these events to record, in this case, th e duration of the defined events and the param eters of th e function calls. 4 .1.1 E v en t S ch em a G en era tio n T he event schem a generator determ ines what d ata to collect. It creates a relational schem a for storing instrum entation d ata collected during program execution so th at answers to the questions defined in the specification can be com puted at th e end of th e program ’s execution using query evaluation techniques [Coh89b, U1189]. Since PM M specifications are w ritten in the PM M specification language and th e specification language is a relational calculus language th a t is based on a set of prim itive event types, attrib u tes, and control relations, by representing those event types, attrib u tes, and control relations as relations the task of th e autom atic program m ing system is reduced in two ways. First, it only needs to generate the 46 instrum entation to collect and store the required data. Once this d ata is stored as relations, th e questions can be answered by th e relational calculus’ query process. Furtherm ore, because this query processor autom atically com putes derived relations, th e autom atic program m ing system only needs to collect and store the prim itive events, attrib u tes, and control relations upon which th e defined derived relations are based. T he d a ta to be collected include prim itive events, their attrib u tes, and prim itive control relations used in the specification. The PM M specification specifies not only th e a ttrib u tes of prim itive events to collect b u t also the conditions which those prim itive events m ust satisfy to be relevant. Because these conditions are defined in term s of th e attrib u te values of events, the generator needs to determ ine which a ttrib u tes of the events are needed to test the conditions specified. Being able to collect instrum entation d ata is not enough, however. Instrum en tatio n d a ta m ust be stored so as to be retrievable when needed. T he event schema generator creates a relational schem a th at provides such an accessible storage struc tu re for th e instrum entation d ata collected during program execution. One of the advantages of representing instrum entation d ata via a relational schem a is th at com puting the answers to the questions can be done by a sophisticated and already available query processor as discussed above. In particular, prim itive event types are represented as stored relations, derived event types and control relations are represented as derived relations, and events or events th a t satisfy some control rela tions are tuples in a relation used to store data for the event type or control relation respectively. A ttributes of an event type are represented as binary relations between those event types and attrib u te values. For th e exam ple we used, th e system creates a relation for each of the prim itives, attrib u tes, and control relations used in the sample specification, such as function- execution, calls, parameters, duration. It defines some derived relations for the de rived event fns-call-FO O -and-BAR and the question monitor-fns-dura-params. For sim plicity, it gives the created relations the same nam e as th e prim itives or control relations whose d ata they are used to store. After the execution of the augm ented source program the d ata specified by th e PMM specification will be available in the relation monitor-fns-dura-params. For convenience, the process of recording prim itive events and storing them into stored relations is called data collection. 47 4 .1 .2 In str u m e n ta tio n S ite S e le c tio n As described in the previous section, the system needs to collect d a ta of prim itive events and control relations. The instrum entation site generator determ ines at com pile tim e w here in th e source program the specified d ata can be collected. It locates those program constructs in the source program whose execution m ay contribute d a ta to the events used in the PM M specification. Given a prim itive event type, there are m any constructs whose execution is asso ciated w ith d ata relevant to the events used in the PM M specification. For exam ple, for prim itive event type function-execution of F 0 0 there are two kinds of constructs whose execution is associated w ith these type of events. F irst, the definition of func tion F 0 0 . Second, th e function calls to function F 0 0 . By inserting instrum entation code in either kind, the system can record d ata about function execution events. We call th e places (definitions, function calls) where event instances of prim itive event types can be recorded instrum entation sites of the prim itive event types. Thus, each of the function calls of function FOO is an instrum entation site of the prim itive event type function-execution of FOO. A PM M specification m ay use m any prim itive events. For each of th e prim itive events there m ay be m any instrum entation sites. To com pute answers to the ques tions, the system uses events recorded in all or some of the sites. For the exam ple PM M specification and the source program , only the definition of function BO TH is a relevant site of event f (some function th a t calls both FO O and BA R) in the event definition of fns-call-FO O -and-BAR in th e sense th a t instances of th a t event m ight satisfy th e control conditions used in the event definition. T he reason th at this is the only such site is th a t BO TH is the only function in th e exam ple program th a t directly calls both FOO and BAR, and hence only executions of it can possibly satisfy th e stated conditions. We call the events th a t satisfy th e conditions specified in specifications relevant events of the specification. We call th e sites where the relevant events can be recorded relevant sites of the specification. We should note th a t not all of the events occurring at relevant sites necessarily satisfy th e specified control conditions. W hether they are really relevant m ay depend on th e state of the com putation of th e source program . For example, for function BO TH only when (> C 0) is tru e are the events relevant. Run tim e filters are included in th e generated 48 instrum entation to dynam ically check the com putation sta te as needed to determ ine which of the occurring events are relevant and to only record them . Selecting relevant instrum entation sites of a PM M specification needs knowl edge of the source program . As discussed in C hapter 3, th e prim itives of th e PMMi specification language are selected in such a way th a t there is a very strong corre-j spondence betw een them and th e execution of particular syntactic constructs in the program m ing language. Furtherm ore, runtim e relationships am ong prim itive events are strongly affected by the syntactic relationships am ong their sites. For exam ple] in order to satisfy an event specification of a function-execution of BO TH whichj directly calls a function-execution of FOO, there m ust be at least one function callj statem ent of function FO O lexically inside th e definition of function BO TH . The syntactic relationships among instrum entation sites of some events are necessary conditions of the control conditions defined on the events. Using th e correspon dence between prim itive events and their sites the system can find all potential sites! for the prim itive events used in the specification; Using th e relationships between! control relations on prim itive events and the syntactic relationships among theirj sites, th e system can rule out some irrelevant sites. For instance, in the example PM M specification in Figure 4.2, the desired derived events are function-execution events th a t directly call both FOO and BAR. Since function IR in the exam ple source program in Figure 4.1 directly calls only BAR, th e instrum entation site at function definition of IR is irrelevant. Hence, th e knowledge has two parts: 1. M apping betw een prim itive event types of the PM M specification language and th e syntactic constructs of the program m ing language which are their instrum entation sites. 2. M apping betw een control relations among prim itive events and th e syntactic relationships betw een their instrum entation sites. T he instances of these syntactic constructs and their relationships are obtained by statically analyzing th e source program using a static analysis program[Pro88]. This static analysis inform ation is stored in a program representation to be discussed in the next section. 49 This compile tim e elim ination of irrelevant instrum entation sites is achieved b} retaining only those th a t are potentially relevant - th a t is, those forms which some run tim e state exists in which they would be part of an event (prim itive or com pound) used to answer a PM M question. The selected sites are a first cut approxi m ation of th e relevant instrum entation sites of the PM M specification which is latei refined by dynam ic run tim e filters. T he essential idea of elim inating irrelevant sites is to prove at compile tim e th a t some potential sites cannot satisfy the specification using static analysis of thej source program . Because th a t step is done at compile tim e, it cannot elim inate all irrelevant events or their instrum entation sites. The reason is th a t w hether an event is relevant m ight depend on the execution state of the source program . For instance] i in the exam ple source program in Figure 4.1 w hether or not th e execution of BO TH is relevant depends on th e run tim e value of one of its param eters, C, which cannot in general be determ ined a t compile tim e. Therefore, in order to ensure the answers com puted are correct, th e system m ust make a very conservative approxim ation - If there is a possibility a site is relevant it will be retained. 4 .1 .3 In str u m e n ta tio n C od e G en era tio n T he instrum entation code generator generates the instrum entation code for the in strum entation sites th a t have been identified and merges th a t code w ith th e m on itored program . T he instrum entation code will collect th e m onitoring d ata needec and store them into the relations created by the event schem a generator. Having determ ined the sites for instrum entation and th e relations to store the instrum entation data, th e source program m ust be modified to collect th e data.J M oreover, even if instrum entation sites are determ ined, w hether or not the required d a ta should be collected depends on run tim e conditions. Therefore, the instrum enj tatio n code generator also needs to generate filters th a t test these conditions at ru n tim e. Som etim es the d ata required to answer particular questions, such as th e paging done by th e operating system or the garbage collection done by the runtim e system! in Lisp depends upon the internal structure of the execution environm ent and is beyond the knowledge or control of ordinary program m ers. Even if th e runtim e 50 system does provide some sort of hook for collecting perform ance data, the ability to tran slate this d ata into som ething th a t is meaningful at the higher level in which the program is w ritten will require specialized knowledge th a t is not available to m ost program m ers. Ideally, one program m er w ith such specialized knowledge of such facilities should provide it to a tool which will then m ake it accessible to everyone by doing th e translation autom atically. Sm artM onitor provides such an interface for accessing arbitrary run tim e perform ance d ata available w ithin an execution environm ent. As a result, it rem ains independent of these perform ance collection m echanism s, b u t can generate instrum entation code th a t interfaces to and uses them . 4 .1 .4 A p p ro a ch S u m m ary O ur goal of instrum entation generation is not merely generating instrum entation to record d a ta so th a t the specified answers can be com puted but to minim ize the d ata recorded. T h at goal is realized in two steps. F irst, the system selects only relevant sites for the PM M specification. Second, the system provides a filter for each potential instrum entation site to elim inate irrelevant events at run tim e. We first discuss w hat knowledge about source program s can help th e system to find th e relevant instrum entation sites. Then we discuss w hat instrum entation is needed to record events of a prim itive event type given a specific instrum entation site. Finally, we discuss transform ing a specification into instrum entation to record relevant d a ta using the knowledge and the instrum entation m entioned above. 4.2 Static Approximation As described above, static analysis can be used to find all the sites in th e source program w here a particular event could possibly occur. However it is often possible w ith additional analysis to prove at compile tim e th at some of these sites cannot satisfy th e specification. T he focus of this section is to provide a fram ework for using static analysis of th e source program to select relevant instrum entation sites of a PM M specification. T he essential idea is to m ap control relations of th e PM M specification language th a t are defined on the prim itive events used in a specification 51 ~into~the necessary syntactic conditions on the instrum entation sites corresponding to them . Those sites th a t cannot satisfy the necessary syntactic conditions are elim inated from this static approxim ation. Hence, the effectiveness of the static analysis in elim inating irrelevant sites depends on its ability to analyze the source program . In general, the m ore sophisticated the static analysis program is, the m ore effectively it can check the necessary syntactic conditions, and the b etter the approxim ation will be. Sm artM onitor uses a static analyzer to identify potential sites in the source program and th e syntactic relationships among those sites th a t are im plied by the control relations used in the specification language. This analysis is recorded in a database (called static analysis database). Like the PM M specification language, th e static analysis program representation uses an E ntity-R elationship model to represent the m onitored source program. Entities are used to model those syntactic constructs of th e program m ing language (such as function calls, atom ic transitions, etc.) whose execution corresponds to prim itive events of the PM M specification language. These syntactic constructs are the instrum entation sites in the source program . Relations are used to model the syntactic relationships between these constructs (such as directly calls, updates, etc.). These relationships im ply the PM M control relations am ong the prim itive events corresponding to the sites of these events. For the AP5 program m ing language, relation, function, rule, and atomic have been chosen as prim itive syntactic constructs. The execution of these constructs and th e operations defined on them have also been chosen as prim itive constructs (e.g., function-execution, atomic-execution, and relation-insertion). For each prim itive control relation used in the PM M specification language there is a corresponding relation in th e representation language. For convenience, we use the same nam e as th e one used for the control relation in the specification language w ith a suffix static” for the relation nam e. For example, Calls is a prim itive control relation. Its corresponding relation in the representation language is Calls-static, which is defined betw een two sites in which a call to the latter appears lexically inside the former. Sim ilarly th e suffix is used for th e entity types of the representation language. Hence, function-static represents sites corresponding to function definitions and function- execution-static represents sites corresponding to function call sites. 52 (function-static fl 'BOTH) (function-static f2 'IR) (function-execution-static fel 'FOO) (function-execution-static fe2 'BAR) (function-execution-static fe3 'BAR) (calls-static fl fel) (calls-static fl fe2) (calls-static f2 fe3) where fl, f2, fel, fe2, fe3 are identifiers assigned for instrumentation sites of the source program. fl > (defun BOTH (C) (when (> C 0) f e l > (FOO C)) f e 2--------> (BAR C)) f2 — — — > (defun IR (D) f e 3--------> (BAR D)) Figure 4.3: A Source Program Representation 4.2.1 R e p r e se n tin g S ite s and T h eir R ela tio n sh ip s Given a source program th e system uses a static analyzer to assign a unique identifier for each occurrence of the syntactic constructs (sites) defined in th e representation language. Hence, the system can distinguish among these occurrences (the potential instrum entation sites). For instance, the identifiers for two function calls of BA R in th e exam ple source program are different. The static analyzer does static analysis for th e source program to record relationships among different constructs in the rela tions of the representation language. Because there is a unique identifier for each of th e constructs, given an identifier the system can identify th e particular instance of the syntactic construct th e identifier corresponds to. Figure 4.3 is th e representation of our sam ple source program , where function-static, function-execution-static, and calls-static are relations in the representation language representing function defini tions, function calls, and th e direct call of one function by another respectively. For 53 instance, ( f u n c t i o n - s t a t i c f l 'BOTH) represents th a t f l is th e site of function B O TH , ( f u n c t i o n - e x e c u t i o n - s t a t i c f e l 'FOO) represents th a t f e l is the site of a function call to FOO , and ( c a l l s - s t a t i c f 1 f e l ) represents th a t th e site of th e latter, the function call f e l is lexically inside th e the site of the former, the definition of BO TH (f l). T he system uses this representation of the source program to find th e instru m entation sites of a PM M specification. T he system can do this because the instru m entation sites of all prim itive events are explicit in this program representation, it knows how to select among these (described in section 4.2.3), and it knows how to combine these prim itive events to detect derived ones. 4 .2 .2 S ta tic A p p ro x im a tio n M ap p in g Selecting relevant instrum entation sites for th e events used in a specification is done by transform ing the specification into a set of queries over the static analysis database. The results of th e queries are the relevant instrum entation sites of the specification. The system uses a transform ation language, called Syntactic M AP- ping (SM A P), to represent a set of source-to-target correspondences from the PMM specification language to th e static analysis result representation language. A cor respondence has the following form: Source-wff =>■ Target-wff; where S o u rc e -w ff is a well formed form ula in the PM M specification language and T a rg e t-w ff is a form ula of the representation language. T he S o u rc e-w ff is defined on the events of the PM M specification language while th e T a rg e t-w ff is defined on the instrum entation sites of th e events. By making the correspondence, th e system can select instrum entation sites for prim itive events and transform the relationships am ong events into necessary syntactic relationships among the instrum entation sites of the events. For exam ple, the following is one of the correspondences for a prim itive event: (fu n c tio n -e x e c u tio n *) = > • (fu n c tio n -e x e c u tio n -s ta tic *) 54 "where represents the t'ail"oflTiist. Given a W P P uOEeTPM M specification ian-| guage, for instance, ( f u n c tio n - e x e c u tio n x ’ FOO) th e system m akes th e following transform ation using th e correspondence above. (function-execution x ’FOO) => (function-execution-static x ’FOO) T he result is ( f u n c t i o n - e x e c u t i o n - s t a t i c x 'FOO), w here function-execution- static is a static analysis result binary relation. T he first argum ent is an instru m entation site of type function-execution, the second argum ent is a function name. T he correspondence represents transform ing th e event instance of type function- execution in th e PM M specification language into form ula th a t represents all func tion execution sites of function FOO in the source program . Similarly, there is a m apping betw een control relations and th e necessary condi tions on th e sites corresponding to the events involved in th e control relations. For exam ple, th e following is one of the correspondences for a control relation: (contains *) =>• (OR (calls-static *)(updates-static *)(triggers-static *)) where represents th e tail of a list and calls-static, updates-static, triggers-static are static analysis binary relations. T he first argum ent of those relations is an instru m entation site of type function-static, th e second argum ent is an instrum entation site of type function-execution-static, relation-update-static, rule-triggering-static re spectively. T he correspondence represents transform ing control relation contains on event instances of type function-execution in the PM M specification language into a form ula th a t represents all function execution sites where th e second param eter is a side inside the site represented by the first param eter. All of th e transform ations are driven by the syntactic knowledge of th e correspondence of constructs in the PM M language and the syntactic constructs of in the program m ing language. 4 .2 .3 S ta tic A p p ro x im a tio n T ran sform ation A lg o rith m s S tatic A pproxim ation Transform ation (SAT) algorithm s accept a PM M specifica tion and generate a set of queries to the static analysis result database of th e source program . T he results of those queries are the instrum entation sites of the prim i tive events used in the PM M specification. Because PM M specifications are w ritten 55 in_tHe~PMM specification language which is a relational calculus language and the queries to th e static analysis database are also expressed via a relational calcu lus language (query language), a question in the PM M specification language can be transform ed into a set of queries to th e static analysis database by recursively transform ing each of the logic constructs into the corresponding constructs in the query language. Hence, the SAT algorithm s consist of one algorithm for each logical construct. Each of the following subsections describes how to transform a PM M specification into queries over the static analysis database so th a t relevant instru m entation sites can be selected. P r im itiv e W F F s The SM AP provides a m apping between prim itive W FFs in th e PM M specifica tion language and form ulae in the representation language. The SM AP m apping is used to transform the prim itive W FFs of the PM M specification language into the form ulae of the representation language. T here is a m apping in SMAP for each prim itive event type of the specification language. However, for efficiency reasons, th e SM AP does not provide a m apping for every form ula used in the specification language. For instance, there is no m apping for parameters because it requires some very com plicated flow analysis to utilize the d ata inform ation of the param eters of functions. So for transform ing prim itive form ulae of th e PM M specification language into form ulae of the representation lan guage, there are two cases to consider. First, th a t a m apping betw een th e prim itive W F F and a form ula in the representation language exists. In this case, the m apping is used to transform the prim itive W F F into th e corresponding form ula in th e rep resentation language. Second, th a t no such m apping exists. This case only arises for form ulas th a t are either an attrib u te or a control relation. If the form ula is an a ttrib u te then the system simply ignores the formula. This is safe because m apping betw een prim itive events and instrum entation sites does not depend on attrib u tes of th e events. If th e form ula is a control condition and there is no m apping for it then th e system ignore it as well. T he resulting approxim ation m ight not elim inate as m any irrelevant instrum entation sites as the approxim ation obtained by taking the control condition into consideration could. However, since th e control conditions 56 are used as run tim e filtering conditions to elim inate irrelevant events, they can be safely ignored during static approxim ation. C o n ju n ctio n s { V \ W i A W 2A ... A W n} is transform ed into { V \ W statici A W static2 A ... A W staticn} where W;, 1 < i < n, is transform ed into W statici- For those where there is no static approxim ation, W{ will be tested at runtim e and the system will drop it from th e conjunctive together w ith th e variables solely used in it. D isju n c tio n s { V \ W 1V W 2V ... V W n} is transform ed into { V \ W siaticx V W static 2 V ... V W statiC n} where W i, 1 < i < n, is transform ed into W statici- E x iste n tia l Q u an tification As above, {V | 3 U W } is transform ed into { V | 3 U W static} in which W is trans form ed into W static• However, the system needs to determ ine instrum entation sites for all prim itive events, including those prim itive events th a t are existentially quan tified. Therefore, the system needs to determ ine instrum entation sites for each of th e prim itive events th a t could in stantiate the variables in U. For instance, for the exam ple question used in Figure 4.2, the system needs to determ ine instrum enta tion sites not only for x b u t also FOO and BAR. T he system first transform s the form { V | 3 U W } into th e form { V | 3 U W static} and uses th a t query to find instrum en tatio n sites for V . T he system then uses these instrum entation sites to select instru m entation sites for each of the variables in U w ith some prim itive event instantiation as follows: Suppose th a t U is {«x, u 2 , u n}, th e system generates the following queries for each 1 < i < n. {u,- | 3 ( « i,..., iq_i, iq+i, •••, wn, V )W static}. T he reason th a t this transform ation is correct is th a t {tq | 3 ( iq ,..., «i_i, tq + i,..., u n, V )W static\ is the set of sites th a t m ake th e form ula W static true, while in order to determ ine the instrum entation sites of V only a subset is needed. Hence, this transform ation is very conservative. It selects a super set of the possible instrum entation sites for each of the existentially quantified variables th a t could be instantiated by a prim itive event. 57 N e g a tio n s T he PM M specification language is designed in such a way th a t negation of a prim itive event cannot violate th e safety constraints of th e language. If a negation isj followed by a W F F then it m ust be a p art of a larger form ula. T he system m ust be able to test th e W F F using th e bindings of th e other p arts of the formula. The safety conditions (discussed in C hapter 3) ensure this. In particular, if th e negated W F F is existentially quantified then the system needs to select instrum entation sites for those existentially quantified events so th a t they can be recorded and used in testing th e W F F in com puting the answers to th e questions. Selecting instrum en tatio n sites for them uses the algorithm of generating existentially quantified W FF described above. U n iv ersa l Q u a n tifica tio n T he safety conditions of the PM M specification language rule out the possibility th a t an event will defined in term s of universal quantifications. Hence, the system does not need to deal w ith this case. 4 .2 .4 A S ta tic A p p ro x im a tio n E x a m p le T he exam ple PM M specification in Figure 4.2 shows how transform ation from PMM questions to queries access the static analysis database is done. is used to show this process. The form ula on the left hand side is th e form ula to be transform ed and the form ula on the right hand side is the resulting formula. E x iste n tia l Q u an tifier T ran sform ation T he system first transform s the definition into a set of queries so th a t instrum en tatio n sites for all prim itive events (including those existentially quantified) will be selected. ((x) I (Exist (f FOO BAR p) (AND (Function-execution FOO ’FOO) (Function-execution BAR ’BAR) (Function-execution x f) (Parameter x 1 p) 58 (Not (= p 0)) (Calls x FOO) (Calls x BAR)))) —y ((x) I (Exist (f FOO BAR p) (AND (Function-execution FOO ’F00) (Function-execution BAR ’BAR) (Function-execution x f) (Parameter x 1 p) (Not (= p 0)) (Calls x FOO) (Calls x BAR)))) ((FOO) | (Exist (f x BAR p) (AND (Function-execution FOO ’FOO) (Funct ion-execut ion BAR ’BAR) (Function-execution x f) (Parameter x 1 p) (Not (= p 0)) (Calls x FOO) (Calls x BAR)))) ((BAR) | (Exist (f x FOO p) (AND (Function-execution FOO ’FOO) (Funct ion-execut ion BAR ’BAR) (Function-execution x f) (Parameter x 1 p) (Not (= p 0)) (Calls x FOO) (Calls x BAR)))) T e stin g C o n d itio n E lim in a tio n As explained earlier, some attributes and control relations are not explicitly rep resented in the static analysis database. A query on that database cannot include any of these attributes and control relations. Such attributes and control relations, together with the variables solely used by them are eliminated from the query. In the example, (Parameter x 1 p) and (Not (= p 0)) are dropped from the con junction together with the variable p. ((x) 1 (Exist (f FOO BAR p) 59 (AND (Function-execution FOO ’FOO) (Function-execution BAR ’BAR) (Function-execution x f) (Parameter x i p) (Not (= p 0)) (Calls x FOO) (Calls x BAR)))) => ((x) | (Exist (f FOO BAR) (AND (Function-execution FOO ’FOO) (Function-execution BAR ’BAR) (Function-execution x f) (Calls x FOO) (Calls x BAR)))) Similarly, the same transform ation applies to the other two formulae. T ra n sfo rm a tio n for P r im itiv e s Event transform ation uses the SMAP to m ap prim itive events and control relations into queries access th e static analysis database as follows: (function-execution x ’FOO) => (function-execution-static x ’FOO) (function-execution z ’BAR) => (function-execution-static z ’BAR) (function-execution z f) => (function-execution-static z f) (Calls x FOO) => (Calls-static x FOO) Using th e generated queries for each of the prim itives, th e derived form ula is trans form ed into a query over static analysis database as follows (here only one of the final queries is listed): 60 ((x) I (Exist (f FOO BAR) (AND (Function-execution FOO ’FOO) (Function-execution BAR ’BAR) (Function-execution x f) (Calls x FOO) (Calls x BAR)))) => ((x) | (Exist (f FOO BAR) (AND (Function-execution-static FOO ’FOO) (Function-execution-static BAR ’BAR) (Function-execution-static x f) (Calls-static x FOO) (Calls-static x BAR)))) T he result of th e static approxim ation for our exam ple is the following three queries: ((x) | (Exist (f FOO BAR) (AND (Function-execution-static FOO ’FOO) (Function-execution-static BAR ’BAR) (Function-execution-static x f) (Calls-static x FOO) (Calls-static x BAR)))) ((FOO) | (Exist (f x BAR) (AND (Function-execution-static FOO ’FOO) (Function-execution-static BAR ’BAR) (Function-execution-static x f) (Calls-static x FOO) (Calls-static x BAR)))) ((BAR) | (Exist (f FOO x) (AND (Function-execution-static FOO ’FOO) (Function-execution-static BAR ’BAR) (Function-execution-static x f) (Calls-static x FOO) (Calls-static x BAR)))) They are used to select instrum entation sites for x, FO O , and BA R respectively. Evaluating these generated queries the static analysis database yields {fl}, {fel}, and {fe2} respectively where fl, fel, and fe2 are defined as in Figure 4.3. Because there are several ways of recording d ata for a prim itive event, a prim itive event m ight have several potential instrum entation sites each of which can be used as an instrum entation site of the prim itive in the PM M specification. For instance, 61 'for tE eJunciw n-execution prim itive event, the instrum entation sites could be either th e definition of the function or its function calls. Given a source program , there are m any function calls. The system uses a very sim ple heuristic to choose among equivalent instrum entation sites w ith regards to functionality. If all of the function calls of a function are selected as the instrum entation sites of a prim itive then the function definition of the function is chosen as the instrum entation site, otherwise, th e selected function calls of the function are used as the instrum entation sites. 4.3 Instrumentation for Primitive Events Having identified an instrum entation site of a prim itive event used in the PM M spec ification, the system needs to generate instrum entation code for it and to merge the generated instrum entation code into the site. The generated instrum entation code records events and their a ttrib u te values and tests any specified filtering conditions as soon as possible to record only relevant events. In th e following subsections, we first present an abstract interface for the instrum entation th a t characterizes it w ith six param eters. Properly setting these param eters enables the system to record the appropriate d ata and test conditions. Next, we present an abstract interface gener ator th a t transform s requirem ents on w hat attributes to record for a prim itive event into param eters of this abstract interface so th at th e proper d a ta can be recorded and the proper tests perform ed. 4 .3 .1 A n A b str a c t In stru m en ta tio n In terfa ce T he Instrum entation code for an instrum entation site of a prim itive event tests conditions defined on the event’s d ata and records d ata about the event. In general, it m ust record d ata and test conditions both before and after the event happens. For instance, in order to record the duration of a function execution, the system needs to access the tim e before and after the function execution. It also needs to test conditions in d a ta recorded by other sites an d /o r before and after the event’s execution. Because instrum entation d ata m ust be recorded to be accessible, space m ust be allocated to hold it. In order to avoid conflict w ith th e execution of the source program , local variables w ith names th a t are disjoint from those used in 62 The source program m ust bcTintroduced so th at instrum entation will not affect the execution of th e source program . This ensures th at th e instrum entation code can only access b u t not change d ata in the original program , and operates on a d ata space which is disjoint from th at of the original program. Finally, actions m ust be taken to transform th e recorded d ata into the proper form so th a t th ey are accessible by other instrum entation code or by PM M queries. We provide an interface w ith six param eters, which we call interface param eters, to fulfill th e roles discussed above: • Pre-condition: This is a relevance test based on d ata th a t is available before th e event. For instance, the precondition of a function call could depend on th e values of its param eters. • Local-variables: These are used to store local d ata as described above. For instance, if th e duration of an event is needed, a local variable is needed to record th e starting tim e. • Before-form: This is code to collect d ata available before th e event. For in stance, this could read the clock and store the result into a variable allocated for sta rt tim e. • After-form: This is code to collect d ata after th e event. For instance, this could store the retu rn values of a function invocation. • Post-condition: This is a relevance test based on d a ta th a t is available after the event. For instance, the postcondition of a function call could compare the retu rn value to a param eter. • Action: This is code th a t stores data in the instrum entation database. It is also used in some cases to remove data th a t is no longer needed. T he reason for having both a Before-form and a After-form is th a t the system may need to record d a ta and test conditions both before and after a function-execution event has happened. Pre-condition and Post-condition are used for testing conditions based on the d ata recorded at other sites an d /o r based on th e d a ta recorded by the Before-form and After-form. Local-variables is introduced so th a t the instrum entation will not have any d ata conflicts w ith the execution of th e original source program. 63 Finally, Action is used to record instrum entation d ata when both th e Pre-condition and th e Post-condition are true. Having abstracted the instrum entation for a site into th e above six categories, generating th e instrum entation of a function execution event for an instrum entation! site am ounts to generating th e code for each of these categories. To understand thej interface, let us consider a function definition as our instrum entation site. The sixj param eters for a function definition can be characterized as follows: 1. Pre-condition, which is used to determ ine w hether th e event recorded at th e site is relevant based on the com putation state of th e program before th e event occurs. This code can access th e actuals param eters of th e function call. 2. Local-variables, which is a list of variables to be declared locally for instrum en tation. Each can be initialized by a form w ith access to the function call’s actual param eters. 3. Before-form, which is a form to be executed before th e execution of the func tio n ’s body. Its purpose is to collect d ata about th e com putation environm ent before the function execution. It can access both the actual input param eters to th e function call and any local variable introduced. 4. After-form, which is a form to be executed after th e execution of th e function’s body. Its purpose is to record d ata about the com putational environm ent after th e execution of th e function. It can access th e actual input param eters to the function call, any local variables introduced, and the results of the function’s body. 5. Post-condition, which is used to determ ine w hether th e event recorded at the site is relevant based on th e com putation state of the program after the event has happened. It can access th e actual param eters of the function call and the evaluation results of Pre-condition, Before-form, and After-form 6. Action, which is a form to be executed after After-form. T his form moves the collected d a ta to a location so th a t they can be used either for determ ining other conditions or for com puting answers to th e specified questions. The Action form can access th e actual input param eters to the function call, any 64 local variables introduced, the retu rn values of th e function execution, and the] evaluation results of both Before-form and After-form. M erging instrum entation into a program is done by w rapping th e site identified w ithin a tem plate containing the code for each of th e param eter categories. Suppose th a t the system has already generated instrum entation for the definition of function FO O , th e instrum entation could be merged into th e definition of function F 0 0 as follows: (defun FOO <lambda-list> (let (<local-variables> return-values) (if <Pre-condition> (unwind-protect (progn <Before-form> (values-list (setq return-values (multiple-value-list (funcall FOO-original <lambda-list>)))) (progn <After-form> (when <Post-condition> <Action>))) (funcall FOO-original <lambda-list>)))) where Foo-original is th e original definition of function FOO. If th e instrum entation site is a function call of FO O the following slightly different tem plate is used to merge the instrum entation into the site. (let (<local-variables> return-values (pi al)(p2 a2) ... (pk ak)) (if <pre-condition> (unwind-protect (progn <before-form> (values-list (setq return-values (multiple-value-list (FOO pi p2 ... pk))))) (progn <after-form> (when <post-condition> <Action>))) (FOO pi p2 ... pk))) where (FOO a l a2 . . . ak ) is the original function call to function FOO . T he unwind-protect statem ent used in both of the tem plates is to ensure th a t both After form and post-condition are always executed; p x, p2, ..., Pk are introduced to enable 65 th e instrum entation to reference actual param eter values w ithout com puting the p aram eter expressions m ultiple tim es. T he essence of the two tem plates is to insert filters and forms at the beginning of a statem ent and at th e end of a statem ent so th a t the com putation state of the com putation environm ent before and after the execution of the statem ent can be recorded. W hether or not th e recorded d ata will be relevant depends on th e filter test outcom es. B ut no m atte r w hat th e outcom e of th e filter condition tested and d a ta recorded, th e statem ent of the original program is executed in th e augm ented program if it is executed in the original program. Hence, although th e transform a tions which introduce these tem plates change both th e d a ta flow and control flow of th e source program , they do not alter the functionality of the source program. 4 .3 .2 In str u m e n ta tio n G en era tio n for P r im itiv e s W ith these tem plates, instrum entation generation for prim itives is carried out by generating values for th e categories th a t fill the tem plate and thus act as param eters to it. T he system has knowledge of how to transform requirem ents on attrib u te values of a prim itive event into proper values of the tem plate param eters for the event’s instrum entation site. T he system also has knowledge of how to generate a n d /o r test a prim itive control relation by instrum enting the instrum entation sites of the events involved so th a t proper d ata can be recorded and th e proper test can be perform ed. This knowledge is represented by a set of correspondences called th e Instrum entation G eneration M apping (IGM ) between requirem ents on prim itive events and control relations of th e PM M specification language and th e appropriate values of the six tem plate param eters for the instrum entation sites of the events. Each prim itive event used in a derived definition is an event of some particular type. Each of these event types in the PM M specification language has a set of attrib u tes defined on it. Its IGM specifies the correspondence betw een requirem ents on these a ttrib u te values and th e appropriate values for th e six instrum entation tem plate param eters for events of th a t type. G enerating proper values for these six param eters depends on the types of the prim itive events, th e sites the instrum entation is going to be inserted, the attrib u te 66 nam es whose values are needed, and the relations used to store th e collected data. T he IGM for prim itive events has th e following form at: T y p e Site Attribute Relation where Type is the type of the event, Site is an instrum entation site of th e event, Attribute is an attrib u te defined on the event’s type, and Relation is the relation used to hold th e recorded a ttrib u te value. For instance, in the exam ple PM M specification in Figure 4.2, the parameters of event x of type function-execution are needed. The IGM for getting the param eters of type function-execution is: function-execution h parameters param eters where f i is th e instrum entation site of event x, id and param s are local variables used to store collected data, get-id and get-parameters are two functions supplied by th e PM M system to respectively create an event of th e type and collect the actual values of the form al param eters. Paraphrased in English, this transform ation records th e param eter values of events of type function-execution in th e relation p a ra m e te rs by creating an identifier for the instance of th e event type and captures its param eter values at the before-form of the site, and then inserting this collected d a ta into the relation p a ra m e te rs . In this sim ple exam ple no condition had to be tested in th e pre or post conditions and no action had to be taken in the after-form. T he IGM s for prim itive control relations are different from those of prim itive events. Because control relations specify relationships am ong events th a t occur at different tim es, the IGM needs to specify w hat d ata to record in earlier events, how they should be stored, and w hat conditions to check to ensure th a t only relevant ______________________ h ______________ (id params) T (progn (setq id (get-id function-execution)) (setq params (get-parameters))) nil T ( + + param eters id params) Site Local-variables Pre-condition Before-form After-form Post-condition Action 67 clata are recorded, for later use. T he IGM also specifies how to reference the data! from these earlier events when the later events occur so th a t th e conditions specified by th e control relations can be checked at th a t point. M ore specifically, the IGM for prim itive control relations specifies w hat attrib u tes of th e events involved are needed, w here they should be stored, and w hat condition checking on the stored d a ta should be perform ed. They have the following form at: Type (S itel Typei) (S ite2 Type2) (S iten Typen) Relation Site i (Local-Variablesi ... Action\) * S iten (Local-Variables,,, ... Actionn) where Type is th e control relation used in th e specification, and Relation is the relation defined to record d ata of th a t control relation. S ite i, S ite 2, ..., S ite n are instrum entation sites of th e events used in the control relation. T yp ei, T yp e2, ..., T yp en are types of the events used in the control relation. T he right hand side of th e transform ation defines the instrum entation tem plate param eter values for each of the sites. For exam ple, the IGM for the prim itive control relation Calls (which records when one event instance invokes another) is as follows: Calls (/i function-execution) (fe i function-execution) Calls ________________h__________________ (id Callee) T (setq id (get-id function-execution)) nil Callee nil _____________ f e i_____________ nil T (setq Callee (get-id function-execution)) nil T (+ + Calls id Callee) 68 where / i and f e i are the instrum entation sites of events x and FOO respectively] id and C a lle e are local variables used to store collected d ata, get-id is a function! supported by th e PM M system to create an event of the specified type, and C a lls is a relation to record the two events which call one another. Paraphrasing it in English: F irst, the caller event is captured in the Before-form of the caller event site. Second, the callee event is captured in the Before-form of th e callee event site. T hird, th e relationship between this pair of events is recorded in th e binary relation Calls. W ith th e IGMs, one can sim ply state w hat conditions to test and w hat attrib u tes are needed when generating instrum entation for prim itive events. T he system will autom atically transform each of them into proper instrum entation code, abstracted into th e six tem plate param eters, and merge them into the source program . This also ensures th a t the generated instrum entation code does not introduce any m ain tenance problem s. For the exam ple specification in Figure 4.2, one sim ply needs to specify th a t th e attrib u tes needed for prim itive event x at instrum entation site B O TH are duration and parameters. The system transform s these requirem ents into proper settings of the six tem plate param eters at th e instrum entation site. 4.4 Instrumentation Generation Algorithms Once th e instrum entation sites for each of the events are determ ined, the system then determ ines w hat attrib u tes of those events are used in th e specification, and generates instrum entation for recording those a ttrib u te values using the IGM. F i nally th e system merges th e instrum entation generated into th e source program using th e transform ations described section 4.3.1. T he instrum entation of a PM M specification is generated by generating instru m entation for each of the questions one by one. T he Instrum entation G eneration A lgorithm (IGA) can be understood in three levels as described below. P r im itiv e s and P r im itiv e s w ith L ocal F ilters G enerating instrum entation for a prim itive is done by directly using the IGMs of th e prim itive type. This level also handles prim itives w ith a local filter — th at 69 Is a condition th a t can be tested using only th e attrib u te values of the prim itive event being tested. For exam ple, testing w hether or not the first param eter value of a function-execution event is equal to zero can be handled w ith a local filter. Local filters defined on an event are transform ed into filtering conditions at the instrum entation sites of th e event. P r im itiv e s w ith C on trol R ela tio n s G enerating instrum entation for a conjunct of prim itive events with some control relation defined on them is done by first applying the IGM s of th e control relation to determ ine the d ata needed for the prim itive events involved and then applying th e IGMs of th e prim itives w ith the attributes used in th e conjunct to determ ine th e setting of the six param eters of the event sites. G en era l D e riv e d E v en ts G enerating instrum entation for a question whose definition is not in th e form at for either of th e first two levels is done by transform ing th e definition into Disjunctive N orm al Form (D N F) and applying the above algorithm s on each of th e conjuncts. In general, th e instrum entation generation algorithm for a PM M question is as follows: A lg o r ith m 1 (In stru m en ta tio n G en era tio n ) generating instrum entation fo r a P M M question. Input: a P M M question and the static analysis database o f the source program. O utput: instrum entation fo r computing the answers to the question M ethod: 1. Create a schema to hold the instrum entation data required by creating a stored relation fo r each o f the prim itive events, attributes and control relations used in the P M M specification. Create derived relations: fo r derived e v e n tsd e riv e d control relations, and questions using the stored relations. 70 FOO Locai-variables nil Pre-condition T Entry-form (setq FOO-id (get-id f u n c t i o n - e x e c u t i o n )) Exit-form nil P ost-condition T A ction ( + + Calls f-id FOO-id) BAR Local-variables nil Pre-condition T Entry-form (setq BAR-id (get-id f u n c t i o n - e x e c u t i o n )) Exit-form nil P ost-condition T Action (+ + Calls f-id BAR-id) BOTH Local-variables (f-id FOO-id BAR-id params t l t2) Pre-condition T Entry-form (setq f-id (get-id f u n c t i o n - e x e c u t i o n ) t l (time) params (get-parameters)) Exit-form (setq t2 (tim e)) P ost-condition (AND FOO-id BAR-id) Action (progn ( + + Parameters f-id params) (-F+ Duration f-id (- t2 t l) ) ) Table 4.1: Generated Instrum entation 2. Apply the static approximation algorithm to select instrum entation sites fo r each o f the prim itive events used in the question. 3. Generate instrum entation code fo r each instrum entation site o f a prim itive event using the IG A to collect only data o f the event used in the■ specification or test filtering conditions defined on them. T he o u tp u t of the algorithm is th e instrum entation code in th e form of the six tem plate param eters to be merged into each of the instrum entation sites. For th e sam ple source program in Figure 4.1 and the exam ple PM M specification in Figure 4.2, using the algorithm m entioned above leads to th e following: 71 E v en t N a m e S ites A ttr ib u te s X ( f l ) ( i d p a r a m e t e r s d u r a t i o n ( p a r a m e t e r 1 ) ) FOO ( f e l ) ( i d ) BAR ( f e 2 ) ( i d ) where id is an attrib u tes of function-execution whose value is th e function-execution event. Applying th e Instrum entation G eneration algorithm on th e exam ple PM M spec ification in Figure 4.2 th e six forms of each of th e instrum entation sites is generated as in Table 4.1. 4.5 Summary We have presented an autom atic program m ing system th a t transform s a source pro gram and a PM M specification into an augm ented source program th a t incorporates the needed instrum entation. T he augm entation is introduced only for m onitoring purposes and does not interfere w ith or change th e com putation of th e source pro gram w ith regard to its functionality. By executing the augm ented program , all and only th e relevant d ata for the PM M specification is recorded. Since the PM M specification is w ritten in the PM M specification language which is a relation calculus language based on a set of prim itive building blocks, the re quired results can be obtained by collecting the d ata of th e prim itives and com puting the results from th e collected data. The system first determ ines w hat d a ta to record by analyzing w hat the prim itives are upon which the PM M specification is defined. It then defines a relational schem a to store the collected d a ta of the prim itives used in the specification. N ext, th e system determ ines where in the source program instrum entation should go. T he static analysis database is used to select relevant instrum entation sites for th e PM M specification. By using knowledge of the m apping betw een the building blocks of th e specification language and the syntactic constructs and relationships among the constructs, th e system elim inates m any irrelevant instrum entation sites a t compile tim e, thus reducing recording irrelevant d a ta at ru n tim e. Finally, th e system generates instrum entation and merges it into the source program . T he system contains knowledge of how to instrum ent th e source program 72 so th a t th e needed d a ta of the building blocks can be recorded. T he knowledge is represented via a set of transform ations. Having determ ined w hat prim itive d ata to record the system applies the transform ations to generate instrum entation at the instrum entation sites identified so as to record the necessary data. T he answers to the PM M questions are com puted from this recorded d a ta via database query evaluation algorithm s after the program ’s execution. 73 Chapter 5 On Incremental Computation of Monitoring Results In this chapter, we exam ine the problem of generating instrum entation th a t incre m entally com putes the required m onitoring results as soon as th e d a ta is available so th a t irrelevant d ata is not collected and d ata th at is no longer needed is removed asj quickly as possible. T he instrum entation computes these results at run tim e as the d a ta is collected. It tests conditions defined on the collected d a ta and elim inates the, d a ta th a t does not satisfy the stated conditions, thus, avoiding recording irrelevant data. We exam ine this increm ental com putation problem in term s of generating, instrum entation for com puting th e answers to m onitoring questions. O ur solution is based on the use of tem poral conditions defined on th e events in the questions to determ ine which d a ta to record and as relevancy run tim e filters on th e events pro ducing th a t data. C entral to this solution is a representation th a t we have defined th a t makes the tem poral dependency among events explicit and an algorithm th at uses this representation to generate instrum entation th a t increm entally computes th e answers to the posed PM M questions. In order to simplify our description of the increm ental com putation problem and our solution of it, we assum e th a t m onitoring questions in our relational calculus PM M specification language are defined as a conjunction of events and control re lations. This assum ption does not affect the generality of the approach since more general questions can always be transform ed into D isjunctive N orm al Form (DNF) and then each of those disjuncts (which is a conjunction) can be dealt w ith individ ually. Hence, this chapter addresses the problem of generating instrum entation th a t 74 increm entally com putes x 2 ... ^ | 3 (Xk+i ■ ■ ■ x n) (A N D E \ , E 2, E n, C\J C2, Cm)} where x i , x 2, ..., arn are events of the types E i, E 2, E n, Ci, C2 , are conditions defined over those events. We first discuss th e case w here E i, E 2, ..., E n are all prim itives and then discuss the case where some of E i, E 2, ..., E n are derived event types. Furtherm ore, because m ultiple questions can be defined on the prim itive event types, we assum e th a t there is a separate relational schem a for each of the questions used in a PM M specification. As a result of this assum ption, we need only to deal w ith one question at a tim e w ithout worrying about interference am ong questions. 5.1 Issues of Incremental Computation T he d a ta upon which the answers to a m onitoring question are com puted are col lected at the sites of the prim itive events used in the question. In order for the d ata collected at a prim itive event’s site to be relevant, it m ust satisfy th e conditions (relationships) defined on the prim itive event. There are two kinds of relationships (conditions) th a t can be defined in a PM M question: com parisons am ong event attrib u te values and tem poral relationships among events. Because all attributes referenced in a com parison have to be accessible to m ake the com parison, referenced a ttrib u te values of earlier events m ust be recorded so th a t th e com parisons can be m ade when the later events occur. Hence, a precondition to generating instrum en tatio n to increm entally com pute answers to the question is to first determ ine the tem poral order am ong the involved events. Then the system can determ ine which event attrib u tes need to be recorded and create a relational schem a to record th a t data. Finally, it can generate the instrum entation to test conditions th a t only de pend on attrib u tes of th e event in which the test is perform ed a n d /o r th e events th at occurred before this event (and whose needed a ttrib u te values have been recorded in the relational schem a). Only when this condition is satisfied are the required attrib u tes of the current event recorded for use by later condition testers or as part of th e answer to th e posed PM M questions. 75 Relation Name Format Temporal Relation Before (Before f g) f e 9 b ) Calls (Calls f g) H f b 9 b ) A (-< g e f e ) Calls* (Calls* f g) H f b 9 b ) A (-< g e f e ) Contains (Contains f g) (-< f b 9 b ) A (-< g e f e ) Triggers (Triggers f g) (■< f b 9 b ) A (-< g e f e ) Updates (U pdates f g) (-< f b 9 b ) A (-< g e f e ) is a temporal control relation defined on two point events. Given two point events A and B (-< A B) is true iff A occurred before B. bA point event with a subscript is used to represent the beginning and ending point events associated with an interval event. For instance, f t and f e represent the begin point event and end point event o f interval event f respectively. Table 5.1: Tem poral Relationships Specified by Control Relations 5 .1 .1 D e te r m in in g T em p oral D e p e n d e n c y Relationships among events are specified by control relations. Control relations define or im ply tem poral relationships among the events. For instance, if F callJ G, then the begin event of F m ust precede the begin event of G. In other words] th e begin event of F m ust occur before th a t of event G to satisfy F calls G. If no F has begun (and not yet ended) when a G occurs then this G is not being called by F. Hence, knowing these tim ing relationship between events is very im portant. It helps the system determ ine which d ata to store about earlier events and how to use this stored d a ta to filter later events. Since the conditions are usually defined on! th e a ttrib u te values of events and those attrib u te values are associated w ith poinJ events (i.e., attrib u te values are available at particular points of program execution)] th e system needs to generate instrum entation for each of those point events to store| d a ta and test conditions. Therefore, tem poral relationships need to be represented in term s of point events. Because the com putation m odel on which th e system is based is a sequential m odel th e tem poral relationships among point events can be represented by a single tem poral relation precede which we represent as T he tem poral relationships among interval events are represented in term s of the tem poral relationships of th e point events associated w ith them . 76 For each of th e prim itive control relations defined for a com putation model, the system m ust be given th e tem poral relationships among the events im plied by those control relations. Table 5.1 lists the tem poral relationship knowledge for th e control relations of the AP5 com putation m odel represented by For instance, given two interval events, th e function executions of F and G, F calls G implies th a t the begin point of F occurred earlier than th a t of G and th e end point of F occurred later th a n th a t of G. 5 .1 .2 D e te r m in in g th e D a ta to R ecord A ttrib u te values of the prim itive events used in the PM M question are recorded either for com puting final answers or for testing conditions. A relational schema is created to store this collected d ata so th a t it can be retrieved and com bined w ith other d a ta to com pute the answers. O ur goal is to generate instrum entation th a t com putes the answer to the PMM question while m inim izing the instrum entation space and tim e overhead. This is achieved by recording only relevant data. There are two dimensions in recording only relevant data: • only record d ata of prim itive events th a t satisfy the conditions defined on them in th e question • increm entally com pute partial results of the answers and elim inate the col lected d ata when they are no longer needed. F irst, since th e question is defined as a conjunction of conditions over a set of prim i tive events, each condition in th e conjunct m ust be satisfied for the prim itive events to be relevant. Hence, for each prim itive event used in the question, instrum entation m ust be generated to check the conditions defined on it. If there is a single violation, th e prim itive event cannot be relevant, thus the d ata for this event should not be collected or stored. Because the comparisons in conditions can only be tested when th e d h ta they depend on are available, the conditions which can be checked are those th a t only depend on th e d a ta of th e current event or events th at occurred ear lier! Conditions th a t depend on d ata of later events cannot yet be used as filtering conjditions m ust wait until those later events occur. I 77 Second, since th e d ata is collected increm entally the answers can also be com p u ted and stored increm entally. As discussed in C hapter 3 th e specification language is designed in such a way th a t there is a very strong relationship between the prim itive event types used in th e PM M specification language and th e d a ta and control constructs of th e program m ing language in which the source program is w ritten. P rim itive events usually correspond to the executions of syntactic constructs of th e source program . Control relations correspond to the run tim e relationships of events. In program execution, both d ata and control constructs of th e m onitored program have th eir scope[Ste90j. Scope refers to the spatial or tex tu al region of the program w ithin which references m ay occur. Temporal conditions can be used to determ ine the scope of the events used in PM M questions to reduce the recording of the irrel evant data. Instrum entation d ata can be elim inated whenever their scope is over. For instance, if there is an interval event in a question such th a t every point event in the definition is contained in th e interval event (we call such an interval event th e outm ost interval) then the answers to the question can be com puted at the end of the interval. Furtherm ore, the d ata collected during the interval is no longer useful for com puting the answers th a t are outside the interval. T h at is because once a new outm ost interval starts the d ata collected during th e previous interval would not satisfy the condition th a t every point event is contained in th e new interval.1 5 .1 .3 G en era tin g In stru m en ta tio n G enerating instrum entation for increm ental processing is achieved by sorting the prim itive events according to tem poral order. Based on th e conditions to be tested in the events and based on the d ata needed for com puting the answers, instrum entation is generated to record d a ta of the earlier events and to test conditions as soon as th e d a ta on which the conditions depend is available. G enerating this instrum entation includes th e following steps: 1. explicitly represent the tem poral relationships am ong the prim itive events ref erenced in a PM M question, 1If there is a possibility of the two outm ost intervals overlapping then som e additional condition is needed to ensure this is true. 78 2. for each point event in this set determ ine w hat other events in this set wil] occur before it, 3. determ ine w hat d ata should be collected and stored at each point event, 4. for each point event in this set determ ine w hat conditions can be tested based on th e d a ta collected at th e earlier events, 5. for each point event install as its filtering conditions those conditions testable on the d ata of earlier events. These steps ensure th a t th e d ata collected and stored at each point of program execution are relevant based on th e d ata available at th a t point. There are some cases where irrelevant d ata cannot be elim inated until analysis tim e as is discussed later. 5.2 Representing Temporal Relationship T he tem poral relationships among the events referenced in a PM M question are explicitly represented as a graph called the Event D ependent G raph (EDG), which explicitly represents the tem poral order among the point events used in th e ques tion. By representing the tem poral relationships as a graph the system can use graph algorithm s to ensure th a t no tem poral dependency cycle exists in a question definition. T he graph representation also enables the system to use a graph traversal algorithm when generating instrum entation for the question. 5 .2 .1 E v en t D e p e n d e n t G raph Given a PM M question definition, its EDG is defined as follows: D e fin itio n 1 (E D G ) A n ED G is a directed graph whose nodes are point events used in the definition and whose edges indicate the temporal order among the point events. Given two nodes N \ and N 2 , there is an edge e fro m N \ to iV2 iff N \ -< N 2 and there is no node N m such that N i -< N m -< N 2. 79 An ED G of a valid PM M question is acyclic,2 although it is not necessarily connected. Since an EDG is a acyclic graph, for convenience, we call those nodes w ithout an incom ing edge roots and those nodes w ithout an outgoing edge leaves. All other nodes are called inner nodes. Given a node in th e graph, those nodes w ith an edge leading to th e node are called th e predecessors of th e node; those nodes with an edge leading from the node are called the successors of th e node. An edge in the graph represents a tem poral constraint on th e nodes involved. If there is no edge from N{ to N j or from Nj. to N i then any tem poral order could exist am ong A ^ . and N j. Because -< is a transitive relation, if Ni ~ < N j and N j -< AT*,, then Ni -< Nk- Hence, for any point event, all of th e point events on the paths from any root to it should occur before it; all of the point events on the paths from it to any leave should occur after it. For those points th at cannot reach or be reached from the point, they could occur either before or after the event. 5 .2 .2 B u ild in g E v en t D e p e n d e n t G raphs T he goal of building an EDG for a question definition is to m ake the tem poral de pendency am ong point events explicit. The graph is built by exam ining each of the control relations to check w hat tem poral constraints are im plied am ong the events upon which the control relation is defined. T he knowledge of w hat tem poral con straints are im plied by prim itive control relations on events is explicitly represented in th e system and is used to build the graph. In addition, the system also knows th a t th e end event of an interval event always occurs after the begin event of the interval event. T he following algorithm builds an EDG graph for a PM M question definition. A lg o r ith m 2 (B u ild an E D G ) Suppose question Q — (A N D E i, E 2, ..., E n C\, C 2 , ■ ■ ■ , Cm) ■ Let M be a set o f conditions to be tested, initially C 1 , C 2 , Cm. Let C L U S T E R S be a set o f connected graphic components, initially all point events in Q ’ s definition. Repeat the following steps until M = < j > . 2The validation constraints o f the PMM specification language requires that no cyclic temporal dependency exists. 80 fb JT a Ri AR, Figure 5.1: An Event D ependent G raph 1. Select a C fro m M, suppose C has two parameters Ei, Ej. 2. I f the point events associated with Ei and Ej are not in CLUSTERS, report an error, otherwise connect the point events with directed edges if their temporal order can be deduced from the sem antics o f C 3. Set M = M - {C } Finally, fo r all Ei and Ej, which are the begin and end point events o f som e interval event, if Ei is a leaf then connect an edge from Ei to Ej. T he algorithm first makes each point event a node. It then exam ines one control relation at a tim e and connects edges among events involved in th e control relation using th e explicitly represented tem poral knowledge of th e control relation. For instance, th e EDG for the following event definition (which is th e exam ple PM M specification defined in C hapter 4) is shown in Figure 5.1 (defevent fns-call-FOO-and-BAR :definition ((x) | (Exist (f FOO BAR p) (AND (Function-execution FOO 'FOO) (Function-execution BAR ’BAR) (Function-execution x f) (Parameter x 1 p) (Not (= p 0)) (Calls x FOO) (Calls x BAR))))) 81 where the node set of the EDG is V = { /t fe FOOb FOOe BARb BARe}, the edge set is E ={(fb FOOb) (fb BARb) (FOOb FOOe) (BARb BAR,) (FOO, f e) (BAR, f e)}- There is one leaf f e and one root fb- 5.3 Run Time Filtering M inimizing instrum entation overhead is achieved by inserting instrum entation code th a t tests conditions at run tim e to elim inate irrelevant d ata and th a t com putes and stores p artial results and elim inates the no longer useful d ata upon which those partial results were based. Based on w hat d ata is needed to test a condition there are two cases to consider: First, conditions th a t depend only on the d ata associated w ith the event; Second, conditions th a t depend on the d ata associated w ith one or m ore other events. We call the first case intra-event (or local) conditions and th e second case inter-event conditions. In the following subsections, we discuss the two kinds of conditions and exam ine conditions under which partial results can be com puted. 5.3.1 In tra -E v en t C o n d itio n s Intra-Event Conditions are conditions th at can be tested by using the local data available at certain execution points. The essential idea is to test a condition before d a ta is stored. For exam ple, if the program m er is interested in a function-execution event whose first param eter value not equal to zero, then the system could generate instrum entation th a t first tests if the first param eter value of a function invocation equals zero. Only if the first param eter value is not zero will th e d ata representing the occurrence of the event be recorded. 5 .3 .2 In ter-E v en t C o n d itio n s Inter-eve nt conditions are conditions defined on the d ata associated w ith one or m ore events other th an the one in which the condition is defined. In this case, the required d a ta from the earlier events needs to be recorded when those events occur so th a t the conditions can be tested later. R ather than testing these Inter-eve nt 82 conditions as a unit after all the referenced events have occurred, they can be used! as filtering conditions for each of those events by decom posing them into conditions testable at each of the instrum entation sites of the referenced events. W henever one of those events occurs th e portion of th e inter-eve nt condition apportioned to it will ensure th a t it is a relevant event. D epending on w hether or not there is a tem poral relationship specified among the events there are two cases to consider: First, there is some tem poral order among the involved events; second, there is no tem poral order am ong th e involved events. If there is some tem poral order among th e involved events then the events can be sorted according to the tem poral order. Based on th e tem poral order, the system can generate instrum entation to store d ata needed from th e earlier events and test conditions on th a t stored d ata in the later events. By th e tim e the later events occur at least some earlier events should have occurred, otherw ise those events are not relevant ones which could satisfy the condition. If there is no tem poral order among the involved events then there are no con straints on th e sequence in which the involved events can occur. Hence, when such an event occurs and no other relevant events have yet occurred, it is still possible th a t they could occur later. Hence, the system m ust generate instrum entation to record the needed d ata of the involved events and do the condition checking either when all the involved events have occurred or at analysis time. % 5 .3 .3 S co p e A n a ly sis As discussed previously, the tem poral constraints in a question definition can help the system determ ine the scope of the d ata collected during program execution. The scope is used to determ ine when to com pute partial results and when to elim inate d ata th a t are no longer useful after com puting the partial results. D e fin itio n 2 ( O u tm o s t I n te rv a l E v e n t) A question definition is said to have an outm ost interval iff there is an interval event eoutmost in the definition that satisfies the follow condition: fo r any event e used in the definition other than eoutmost itself (contains c) is true. 83 Given the conditions about the execution order of th e point events used in the| question definitions, the outm ost interval for each of th e question definition can be com puted using the following algorithm : A lg o r ith m 3 ( O u tm o s t I n te rv a l) This algorithm computes the outm ost interval event o f a question definition if there is one. • Input: point events p \, p2, ..., pn used in a question definition and the condi tions C \, C2, C m on them. Suppose Cl is o f the fo rm a t (-< p!2) • Output: the outm ost interval if there is one. • Method: Apply topological sort on pX ; p2, pn using Ci fo r 1 < i < m . Let: S be the earliest point events and S be the latest point events. I f both S and S contains one point event and there is an interval event O whose begin event and end event in S and £ respectively then output O otherwise there is no outm ost interval event. if there is an outm ost interval event and there is no recursion then d a ta collected for the events of the type can be released at th e end of th e interval. If recursion is possible then we m ust ensure th a t d ata collected be released only a t the end of the last interval of the type. 5.4 Incremental Instrumentation Generation T here are three steps in the increm ental com putation 1. Tem poral Analysis, which determ ines the tem poral dependency among the prim itive data. It is based on the Event D ependent G raph. 2. Schem a G eneration, which creates a relational schem a to record the instru m entation d ata collected during the program ’s execution. It creates a relation for each of the prim itive events used in a question definition to avoid d ata conflict among questions. 3. Instrum entation code generation, which generates instrum entation code for each instrum entation site using the schema and tem poral order from the two previous steps. 84 Because th e EDG of a question is a directed acyclic graph, th e instrum entation! generation algorithm is described in term s of graph traverse of th e EDG. In the following, we first assume th a t the EDG is a connected graph. We then discuss how to deal w ith the case th a t the EDG is not a connected graph. A lg o r ith m 4 (D y n a m ic F ilte rin g ) The input o f the algorithm is the ED G o f a question and the relational schema generated fo r the question. The output o f the algorithm is the instrum entation generated fo r each o f the prim itive events used in the definition. Traverse the graph and do the following starting at root nodes: 1. Root nodes: generate instrum entation fo r recording the data needed by its suc cessors and store it in the relations fo r each o f the root nodes. 2. Inner nodes: if all o f its predecessors have been visited then generate instru m entation to collect data needed by its successors and to test the conditionsj that depend only on the data collected at its predecessors and the data collected by itself. Store the collected data only when the conditions are satisfied. 3. L eaf nodes: if all o f its predecessors have been visited then generate instrum en tation to collect data needed fo r computing final answers and to test conditions that depend on data collected at its predecessors and the data collected by it self. Compute partial results based on the data o f its predecessors and the data collected by itself and store them in the relations defined fo r it. I f this is an end point event o f an outm ost interval event, eliminate the data stored at its predecessors. This algorithm starts from root nodes generating instrum entation for recording the a ttrib u te values used for either condition testing or final results com puting. It then visits the im m ediate successors of th e nodes th a t have been visited and generates instrum entation code for those nodes whose im m ediate predecessors have all been visited. T he instrum entation code generated tests the conditions th a t depend only on th e d a ta collected at its predecessors and the d ata collected at this site, and collects the required d ata for this site. The collected d ata will not be stored unless th e conditions tested are true. If th e node is a leaf node, partial results are com puted 85 if it is an end point event of an outm ost interval and th e stored tem poral d ata is elim inated. If an ED G is not a connected graph then the above algorithm is applied to each of its connected com ponents and the required m onitoring results are com puted at analysis time. 5 .4 .1 D e a lin g W ith D eriv ed R e la tio n s Derived events or derived relations are defined on either prim itive events and con trol relations or other derived events and relations. Hence, they cannot be collected directly during the program ’s execution. Instead, they are com puted from the prim itive events and control relations upon which they are defined ju st like th e PM M questions described above. If derived events or relations are used in a PM M ques tion there are two alternatives. In the first alternative, th e definitions of the derived events or relations could be expanded in the question definition. In th at case, the above algorithm works fine. In the second alternative, th e derived events and re lations are treated as if they were ju st prim itive events and control relations by generating instrum entation for com puting them and by only letting other events ac cess th e derived d ata not the prim itives and control relations they depended upon. T he first approach which expands the derived events and control relations needs to only deal w ith th e prim itive events and control relations, hence, it is very simple. However, by expanding the definitions it may lose some opportunity for sharing com m on derived inform ation. On the other hand, th e second approach makes sharing easy. T he PM M system takes the first approach because th e focus is on increm ental com putation of th e answers to PM M questions. 5 .4 .2 T im e an d S p a ce C om p ro m ises in P M M As described above increm ental com putation depends upon ex tra instrum entation to do run-tim e filtering and partial result com putation. Because of this ex tra in strum entation, th e augm ented program ’s execution has some additional overhead w ith regards to both tim e and space. One n atu ral question to ask is when is the increm ental algorithm effective? T h at is, when does this ex tra effort result in an 86 overall reduction in the cost of collecting and analyzing th e instrum entation. We first exam ine several special cases before considering the general case. D epending on the tem poral relationships existing am ong th e events used in a question definition there are three cases of interest. First, there is no order among th e events in the definition. Second, there is a total order am ong the events of the definition. Third, there is an outm ost interval event in th e event definition. If there is no tem poral order among the events used in a question definition, when an event occurs the instrum entation generated will sim ply record the needed d a ta of th e event, because there are no other events to check against and the event m ight be needed later. In other words, even if the event is not relevant now it still m ight be relevant later. Therefore, only at analysis tim e can the system determ ine if it is relevant. So, this is a case where run time testing is not very effective. If there is a to tal order am ong th e events used in a question definition, when an event occurs, only the im m ediate predecessor needs to be checked. If the event and its im m ediate predecessor satisfy the conditions defined on them , the event and its d ata are recorded. T he recorded data represents th e p artial result of the event and all of its predecessors. Hence, each of the events represents th e partial results of th e event and the events before it. W hen the last event occurs th e final result is com puted. Because th e conditions defined in the question are used to filter out irrelevant events at each of the point events and no event is recorded unless it passes all of the conditions defined on itself and all of its predecessors, this is a case where run-tim e testing is m ost effective. In general, determ ining th e effectiveness of run-tim e testing requires the ability to estim ate the costs of alternative choices. This in tu rn relies on estim ates (or m easurem ents) of such things as d ata volume and the likelihood of branch conditions, which are not readily deduced from the program itself, or readily available from th e program m er. The system uses simple heuristics to m ake such choices. As an exam ple, th e specification m ay talk about invocations of function F w ith various properties, such as an argum ent being less than zero, th a t can easily be checked when F is called. In this case the argum ent will be com pared w ith zero at run tim e. If th e condition fails then no further m onitoring activity is perform ed. On the other hand, if th e specification talks about pairs of invocations of th e functions F and G 87 w ith the sam e argum ents, it simply records all the argum ents of all invocations and delays the com parisons for analysis time. All other cases fall between the two cases. In those cases, only some of the point event’s filtering conditions can be tested. Among those cases a question with an outm ost interval event is of interest. Because there is an outm ost event, by definition every event in the question is contained in the event. Hence when the end event of the interval event occurs, all of the instrum entation d a ta recorded is released once th e answers are com puted. 5 .4 .3 O th er O p tim iza tio n s In our approach, questions about a program ’s execution have been transform ed into queries to a database whose schem a is dynam ically generated based on the PM M questions th a t have been asked. Once d ata is collected, answers to the questions are com puted from the stored data. Because query evaluation is used both in com puting partial results and final results, there are issues of how to com pute the derived relations fast, and various query optim ization techniques[U1189, Coh86, Coh89a] could be used. 5.5 Summary Increm ental com putation improves the efficiency of m onitoring instrum entation us ing the following techniques: (1) Filter prom otion which tests conditions th a t an event needs to satisfy before the d ata is recorded; (2) Scope analysis which uses the scope of the instrum entation sites to elim inate some irrelevant interm ediate data; and (3) Increm ental partial result com putation th a t increm entally com putes the answers for the questions. D ynam ic filtering applies filter prom otion techniques to the dom ain of program m onitoring and m easuring. Dynam ic filtering does the following things: 1. Testing m fra-event filters, e.g., compare the values of th e attrib u tes of some event against a set of constants. 88 2. Collecting d a ta and testing inter-eve nt filters, e.g., com pare the attrib u tes values of two different events. 3. C om puting answers to th e PM M questions and elim inating d ata th a t is not useful after certain points, e.g., com pute and store a partial result so th a t less d a ta needs to be stored. where the first step integrates intra-eve nt filters w ith th e prim itive sources to test if an event is of interest before it is recorded, the second step integrates filter testing against recorded partial results, the third step integrates techniques th a t enable thej answers to th e questions to be com puted as soon as the th e d ata those answers depend on is available, hence d ata th a t is no longer useful can be discarded as soon as possible. 89 Chapter 6 SmartMonitor Evaluation This chapter focus on the evaluation of th e Sm artM onitor system . It tries to answer th e following question: How well does the system perform ? T he perform ance of the system can be m easured by determ ining how effectively it can filter out irrelevant data. To show the effectiveness of the m ethods introduced in the previous sections we apply them to a real program used in th e CLF environm ent. We com pare th e am ount of d a ta collected using the following m ethods: 1. Profile based tools th at record all d ata of interest. 2. O ur m ethod using only static filtering, which employs compile tim e optim iza tion to elim inate irrelevant sites. 3. O ur m ethod using only dynam ic filtering, which uses ru n tim e filtering to elim inate irrelevant d a ta during program execution. 4. O ur m ethod using both static and dynam ic filtering, which uses both the second and the third m ethod. 6.1 Application Program and Question We present an A P5 exam ple to illustrate how Sm artM onitor supports very high level languages. T he exam ple is draw n from a calendar application th a t keeps track jof people’s schedules. This application defines two application specific types, Ap- jpointm ent and Trip. Each such object is related to a start tim e, a duration, and a set of people. In addition to these types and relations, AP5 supports constraints. In 90 particular, the calendar application contains a constraint th a t prohibits scheduling a person for two different appointm ents w ith overlapping tim es (The constraint is w ritten here in a pseudo notation to simplify understanding). PROHIBIT Conflicting-appointments 3(person, appointm enti, appointment2) (appointmenti ^ appointment^) A appointment-participant(person, appointmenti') A appointment-participant(person, a p p o i n t m e n t i ) A OVERLAP {appointm enti, appointm enti) C onstraints such as the one above provide an exam ple of high level constructs w ith perform ance consequences th a t are not readily apparent. In particular, if we try to schedule an appointm ent, how long does it take for the AP5 runtim e system to determ ine w hether there is a conflict? W hat choices in th e source code affect this tim e and how? As described in C hapter 3, the AP5 version of Sm artM onitor provides event types th a t correspond to such high-level concepts as atom ic executions, and rule executions (including constraints). Sim ilarly it also provides a relation between atom ic executions and the rule firings th a t result from it (establishing this relation required small change to the AP5 im plem entation). As an exam ple, the following specification asks for th e atom ic executions (and their durations) th a t cause the above rule (whose nam e is “conflicting-appointm ents”) to run: appointment-update-duration(x, t) is defined as 3z atomic-execution(x) A duration(x, t) A triggers(x, z) A rule-execution(z “CONFLICTING-APPOINTMENTS”) 6.2 The Experiment T he experim ent consisted of two hundred random atom ic updates to either an ap pointm ent or a trip. T he execution results are sum m arized in th e following table. Ite m N am e n u m b e r e n trie s Total updates 200 Updates to trip 97 Updates to appointment 103 Relevant updates 24 91 where th e total updates are th e num ber of atom ic executions th a t update either A ppointm ent or Trip, the Updates to trip are the num ber of atom ic executions th a t u p d ate Trip, the Updates to appointment are the num ber of atom ic executions th a t u pdate Appointm ent, and the relevant updates are the num ber of updates th a t actu ally triggered th e rule conflicting-appointments, which is th e d a ta we were looking for (actually th e duration of these atom ic executions). T he effectiveness of the four different optim ization m ethods in term s of the num ber of entries checked and recorded on the m onitoring exam ple is sum m arized in th e following table: Ite m N am e n u m b e r en trie s checked n u m b e r e n trie s reco rd ed S t r aight for war d“ 200 200 Static Approximation6 103 103 Run-time Filteringc 200 24 Combined Methods'6 103 24 a Profile-like tools which record all events of a specified type. 4 Use only compile tim e analysis alone. cUse run time test alone. dUse both com pile-tim e and run-time analysis where number of entries checked represents the num ber of tim es th a t the inserted instrum entation code is executed to see if the event m eets the specified criteria; number of entries recorded represents the num ber of events th a t satisfied this criteria. 6.3 Experiment Analysis T he results of th e experim ent confirmed th at most of the d ata collected w ith the traditional m ethod are irrelevant and th a t our high level focused m onitoring and m easuring m ethods are useful in reducing this irrelevant data. In particular, it showed how large reductions could be m ade separately in th e num ber of events checked and recorded by static approxim ation and run-tim e filtering respectively. Moreover, it showed th a t th e com bination of the two was com plem entary and very effective. 92 C urrent trends in program m ing languages are toward the use of strong typing [Mil84, CW 85, Hud89, ACPP91], m odular program developm ent, and object ori ented program m ing practice. Those trends encourage partitioning applications into loosely connected com ponents. This makes compile tim e analysis of elim inating ir relevant d a ta sites very effective. The experim ent served as a very good exam ple in which the m odules dealing w ith appointment and th e modules dealing w ith trip could be identified at compile tim e, and thus the compile tim e optim ization was effective as illustrated by the result. T he fact th a t the application program was w ritten in AP5 shows th a t this approach works well for languages w ith very high level constructs, e.g., atom ic executions and rule executions. 6.4 Summary T he experim ent m ade three points: • the system we built can work on real program s • th e results show th a t it is effective in reducing the collection of the irrelevant d ata • th e system works for relational paradigm s (i.e., it can be applied to th e high level relational concepts introduced in AP5). 93 Chapter 7 Conclusions and Future Work C urrent trends tow ard m ore complex programs and use of higher level languages m ake program m onitoring and m easuring more im portant, since both trends tend to obscure the sources of execution costs. These trends also raise a new require m ent to program m onitoring and measuring for supporting focused m onitoring and m easuring of program execution in term s of the abstractions they employ. Existing m onitoring and m easuring tools do not support this new requirem ent and the prac tice of letting program m ers insert instrum entation by them selves for m onitoring and m easuring is difficult, tedious, and error prone. This research has shown, by building Sm artM onitor, th a t autom atic program m ing for program m onitoring and measuring is feasible. In particular, this fram e work provides a system atic approach to program m onitoring and m easuring • Program m onitoring and m easuring requirem ents can be expressed in a spec ification language • T he process of inserting instrum entation and analyzing the collected d ata can be autom ated Moreover, this research has also shown th at the generated instrum entation code can be efficient in the am ount of d ata th a t it collects and retains. In particular, m ethods have been defined to • select prim itives for the specification language th a t facilitate both specifying PM M questions and generating effective instrum entation, 94 • use th e PM M specification and static analysis to help th e system generate instrum entation th a t only collects relevant data, • use tem poral dependency among events to filter out irrelevant d ata at run tim e, • and use a database, w ith its relational schema, abstraction and inferencing capability, and query processor to facilitate the collection, com putation, and access to the required m onitoring results. T he integration and incorporation of knowledge about the PM M specification, the source program language sem antics, and the source program structures can thus form the basis of autom atically generating instrum entation th a t m inim izes the in strum entation overhead. 7.1 Summary T his dissertation describes a new approach to the problem of answering questions about th e perform ance of program s in which an autom atic tool accepts the original program and a set of questions posed in a formal specification language and installs instrum entation code in the program much as a hum an program m er would. It determ ines w hat d ata m ust be collected, finds the places in the source program where th a t d ata should be collected, and inserts code to collect and process th at data. A utom ation makes th e process faster and more reliable. T he specifications are generally quite concise. It is usually much easier to w rite th e specification than to m anually instrum ent th e program . In addition to the specification language, this dissertation describes an im ple m entation of such a tool, along w ith m any of the m ethods used to produce effective instrum entation. 7.2 Main contributions T he prim ary contribution of this dissertation is to provide a fram ework for high level program m onitoring and m easuring. The key elem ents of this work are: 95 • a m odel of program execution and a language in which high level questions about executions of th a t m odel can be expressed. PM M questions can be asked in term s of a very high level specification lan guage th a t is based on th e Entity-R elationship m odel w ith a rich set of pro gram m ing language specific prim itives. T he declarative nature of the specifi cation language makes is simple as we observed in C hapter 4 to reason about w hat d ata should be collected and where th a t d a ta is available for collection. • a dem onstration th a t these questions can be autom atically transform ed into instrum entation th at com putes the required answers. By selecting prim itives th a t characterize the prim itive perform ance activities of program execution and using the Entity-R elationship model to model high level questions in term s of these prim itives, the fram ework facilitates the tight connection between th e source program ’s execution m odel and the PM M spec ification. This connection enables the use of static analysis to locate places in th e source program where the relevant d ata can be collected. T he use of tem poral analysis enables run tim e filtering to be used to further reduce the collection of irrelevant data. As a result, the generated instrum entation code to collect perform ance d ata can be reasonably efficient. Answers to the PM M questions are produced by using a relational representation for th e recorded d ata and a query processor to derive the results. • an understanding of m any techniques for perform ing th a t transform ation and th e tradeoffs inherent in their use T he im plem entation m ade clear the costs associated w ith compile tim e, run tim e, and analysis tim e filtering. It also illustrated various ways to enforce the filtering conditions at different stages. In sum m ary, th e research described in this dissertation represents a big step to wards autom atic program m ing of program m onitoring and m easuring capabilities. It provides a convincing confirm ation to our claim th a t the process of inserting in strum entation for program m onitoring and m easuring can and should be autom ated. T he connection of this work to research in database query optim ization, compiler 96 optim ization, static analysis of com puter program s, and tem poral reasoning demon strates th e wide range of capabilities required in such system s. 7.3 Limitations of the Research T he research results reported in th e dissertation have im plem entation, testing, and theoretic lim itations: • There is no com pensation m odel for the instrum entation code. Because in strum entation code is inserted into the source program , th e execution of the instrum entation code com petes w ith the source program for system resources, such as tim e and space. Currently, the fram ework does not take th a t into consideration. Instead, it tries to reduce the instrum entation overhead w ith optim izations of the am ount of d ata checked and recorded. • There is no declarative way of extending the set of prim itives. Because the instrum entation interface for the prim itives used in the specification language is hand crafted and im plem entation dependent, extending the prim itive set is difficult. • T he static analysis perform ed is very prim itive, inform ation like d ata flow in th e source program is not utilized in the work. • T he set of heuristics for selecting instrum entation algorithm s is very prim itive. There is a lack of a cost m odel for different instrum entation algorithm s. • Only questions expressed in conjunctive norm al form are supported. • There is no full theoretical investigation on the com pleteness a n d /o r the ex pressiveness of the specification language. 7.4 Future Work Extensions of this research should be concerned w ith rem oving m any of the above lim itations. More specifically, future work should include: extensions to perm it 97 m ore general models of program execution and refinem ents to adm it m ore efficient instrum entation. T here are several directions for th e extensions to perm it m ore general m od els of program execution. First, supporting a m echanism for extending the set of prim itives supported by th e PM M specification language. Second, applying the fram ework on different com putation models and languages, such as parallel program executions and functional program m ing. T hird, applying the fram ework developed here in different contexts. For instance, in program anim ation, it is often the case th at program m ers need to identify th e parts of the source program where the d ata used to drive some graphic anim ations can be collected. It should be possible w ith some suitable specification language to describe the kinds of d ata needed, and use the fram ework developed here to locate only those relevant places in the source pro gram and to insert “instrum entation” code th a t autom atically collects the needed data. One refinem ent to adm it more efficient instrum entation could include providing an evaluation m ethod to determ ine the cost of various algorithm s for instrum en tation. A second could investigate how to integrate m onitoring requirem ents into compiler construction tools so th at th e prim itive support for the PM M paradigm is available directly from the language compiler. For instance, providing annotations for a program m ing language compiler-compiler so th a t the prim itives of th e PMM specification language are supported by the language compiler. Hence, modifica tions to these language compilers would not be necessary. A th ird refinem ent could also include investigation for instrum entation com pensations so th a t th e m onitoring results could m ore accurately reflect the execution cost of the source program . 7.5 Concluding Remarks This research has shown th a t program m onitoring and m easuring questions can be described in a high level declarative language and the process of generating and inserting instrum entation into the source program can be autom ated. A utom atic program m ing in the dom ain of program m onitoring and m easuring is not only de sirable b u t also feasible. This dissertation stands as a solid proof of our claim 98 and represents significant progress towards the goal of autom atic program m ing of program m onitoring and m easuring. 99 Appendix A The PMM Specification Language A .l Primitives and Attributes jThe following is th e set of predefined prim itives for th e PM M specification language followed by their attributes. • Interval is a new type of entity. It is associated w ith a pair of points of execution, called the startin g and ending points. - Param eters(Interval values) relates an interval event to its param eter (a list of values). - Param eter(Interval natural-iiumber values) relates an interval event to its nth param eter (values). - Return-Values(Interval values) relates an interval event to its return values (a list of values). - Return-Value(Interval natural-number values) relates an interval event to its nth param eter (values). - Duration(Interval Integer) relates an interval event to its duration, i.e., tim e spent in on the event. • Function-lnvocation, is a subtype of the interval event type. - Function-Name(Function-lnvocation symbol) relates a Function-lnvocation event to the function’s nam e. 100 - Function-Execution(function-invocation function-name) is a derived relation defined on Function-lnvocation and Function-Name. • Relation-operation, is a subtype of th e interval event type. - Relation-Name(Relation-operation symbol) relates a relation operation to the relation’s name. • Relation-Test is a subtype of the Relation-operation type. • Relation-updates is a subtype of the Relation-operation type. • Relation-lnsert is a subtype of the Relation-updates type. • Relation-Delete, is a subtype of th e Relation-updates type. • Relation-Generating is a subtype of th e Relation-operation type. - lnput-pattern(Relation-Generating list) relates a Relation-Generating event to its generation p attern . - Generators(Relation-Generating list) relates a Relation-Generating event to those relations th a t are used as genera tors. - Filters (Relation-Generating list) relates a Relation-Generating event to those relations th a t are used as filters. - Relation-Size(Relation-Generating number) relates a Relation-Generating event to th e size of the generated relation. • Rule-Operation is a subtype of the interval event type. - Rule-Name(Rule-Operation symbol) relates a Rule-Operation event to the nam e of the rule. 101 • Rule-Triggering is a subtype of the Rule-Operation type. • Rule-Body-Execution is a subtype of th e Rule-Operation type. • Rule-Execution(Rule-Operation symbol) is a derived relation defined on Rule-Operation and Rule-Name th a t relates a Rule-Operation event to the nam e of the rule executed. • Atomic-Execution is a subtype of the interval event. - Data-gathering-time(Atomic-Execution integer) relates an Atomic-Execution event to the d ata gathering tim e of the execution. - A-Rule-l\lames(Atomic-Execution list) relates an Atomic-Execution event to the names of th e autom ation rules trig gered. - C-Rule-I\lames(Atomic-Execution list) relates an Atomic-Execution event to the names of th e consistency rules trig gered. - Proposed-Update(Atomic-Execution list) relates an Atomic-Execution event to the tuples of th e proposed updates. - Update-Done (Atomic-Execution list) relates an Atomic-Execution event to the tuples of th e updates done. • Point is a new type of entity. They occur at a single point of execution. - Time(Point value) relates th e Point event to the tim e when it occurs. • Begin(Point Interval) relates the Point event to the Interval event. It represents th a t the point event is the begin point event of the interval event. • End(Point Interval) relates the Point event to the Interval event. It represents th a t th e point event is th e end point event of the interval event. 102 - Data-condition(Point Expression value) relates th e Point event to an source program expression and its evaluation value. It represents th a t the evaluation value of the expression is the value when th e point event occurs. This attrib u te is defined for interval events as well. W hen it is applied on an interval event it is assumed th a t it is applied to the begin point event of th e interval event. The Expression is an expression defined in the source program m ing language whose evaluation does not have any side effect. A.2 Control Relations Control relations are tem poral relationships betw een different events during program execution. For point events E,-, (Tim e E,) represents th e tim e E, occurred. • (Before E i E 2 ) defines a relationship between two events E a and E 2. If both E x and E 2 are point events then th e relation is tru e iff (Tim e E \) < (T im e E 2). For interval events E i, i = 1,2, Tst- and Te; are the starting tim e and ending tim e of Ei and Tsi < Tei. (Before E j E 2) is true iff Te 1 < Ts2. • (Contains E* E 2) is tru e iff T5l < Ts2 < Te2 < Tel. E i m ust be an interval event. • (Triggers Atomic-Execution Rule-Execution) is tru e if the Atomic-Execution event triggers the Rule-Execution event. • (U pdates Atomic-Execution Relation-updates) is tru e if th e Atomic-Execution event updates th e Rule-updates event. • (G enerates Interval Relation-generating) is tru e if th e Interval event contains the Rule-generating event. • (Tests Interval Relation-test) is tru e if th e Interval event contains th e Rule-test event. 103 • (References Interval Relation-operation) is tru e if th e Interval event contains the Rule-operation event. • (Calls Function-Executioni Function-Execution2) is tru e Function-Executioni directly calls Function-Execution2. • (Calls* Function-Executioni Function-Execution2) is tru e Function-Executioni directly or indirectly calls Function-Execution2. A.3 Value Types and Other Relations Besides th e prim itives defined above, the PM M specification language also includes some value types and relations defined in the value types. Values types include program data, tim e, etc. Relations among data objects of th e value types, e.g., the values of th e param eters passed to different executions include com parison operators and those predicates th a t are defined in the source program m ing language. Aggregation operators could be used to define some derived relations. Suppose th a t R is a relation w ith arity n. T he aggregation operators could be used as follows to define a new derived relation (defevent name rdefinition (aggregation-operator R i I O \ J02 ... /On))) where 70,- is either IN PU T or O U TPU T. In general, every tuple of th e original relation corresponds to exactly one tuple of the derived relation. T h a t tuple is obtained by keeping only the elem ents of the original tuple at th e “in p u t” positions and adding an integer at the end. Depending on the aggregation operator used, the num ber added has different meaning. Suppose the operator is a count operator, The integer at the end of a derived tuple is the num ber of tuples of th e original relation th a t correspond to th a t derived tuple. Thus, if th e p attern consists entirely of o u tp u t’s th e derived relation will contain at m ost a single one tuple: th e num ber of tuples in the entire relation. At th e other extrem e, if the p attern consists entirely of in p u t’s then the derived relation will contain exactly as m any tuples as th e original | relation b u t each tuple will have a one at the end. For instance, suppose th a t the arity of relation R is three 104 (dsfevent cardinality :definition (count R (input output output))) defines a derived relation c a r d i n a l i t y (typei number) where typ ei is the type of jthe first slot in the original relation and num ber is the num ber of those tuples in the original relation w ith the first slot value explicitly fixed. Similarly, other aggregation operators are used to define derived event attributes. A.4 PMM Language Syntax T h e following is th e BN F definition of th e PM M language. We follow the BNF conventions to use and as m eta symbols. Note th a t “(” is not used as ja m eta symbol. Upcased words are key words. PMM-Specification:: {Relation-def inition}* {Question-definition}* Relation-definition:: (DEFEVENT Event-Name :DEFINITION Event-specification) Question-definition:: (DEFQUESTION Question-name :DEFINITION Event-specification) Activity-name:: 'Event-Name I Question-name; Event-specification:: (E-vars Suchthat Event-Expression) I (Aggregate-Operator Activity-name input-output-pattern); Suchthat:: SUCHTHAT I S.T. I ‘I’; Event-Expression:: Monitoring-Term I (EXISTS E-Vars Event-Expression) | (FORALL E-Vars Event-Expression) I (Logic-Operator Event-Expression Event-Expression+) ; Monitoring-Term: : (Event-Expression) I (Activity-name E-var E-var* value-type*) 'Logic-Operator:: OR | AND I NOT 105 Aggregate-Operator: : COUNT I AVERAGE I SUM I MIN I MAX •input-output-pattern:: (input-output*); ■input-output:: INPUT I OUTPUT; E-Vars:: (E-Var*) ; E-Var:: Symbol ; Event-Name:: Symbol; Question-name:: Symbol; value-type:: Integer | String I List I Symbol; E vent nam es a n d /o r question names m ust be defined before they are used. A valid I p MM specification should also satisfy the constraints defined in C hapter 3. 106 Appendix B The Prototype Implementation This chapter describes th e prototype im plem entation of the Sm artM onitor. We first introduce th e basic functionalities supported by th e im plem entation. We then discuss th e assum ptions and im plem entation decisions m ade for it. B .l SmartMonitor Implementation Overview Sm artM onitor was developed at the Inform ation Science In stitu te of University of Southern California. It is w ritten in Lucid Common Lisp using A P5 on an HP9000 series 300 w orkstation. It has been used to answer perform ance questions about non trivial software in real use. T he focus of th e developm ent is on functionality. M any features are not addressed, e.g., user interface. In order to m ake a working system, a lot of compromises are m ade. W henever there is a tradeoff between sim plicity and expressiveness, priority is given to simplicity. T he system provides: 1. A specification language for m onitoring and m easuring the execution of AP5 program s. T he language is based on a set of prim itive events and control relations listed at A ppendix A. The language also lets one to define derived events, derived control relations, and questions. 2. An autom atic program m er for the specification language which includes (a) instrum entation interfaces for prim itive event types (b) instrum entation generation tem plates for prim itive control relations 107 (c) m appings between prim itives and their instrum entation sites in the source program (d) transform ations from PM M specifications to queries on th e static analysis result database of the source program (e) tem poral analysis mechanisms for PM M specifications 3. T he m onitoring results are represented in A P5 relational form at and they could be accessed directly via A P5’s query language. Com pared w ith th e ideal im plem entation, th e system has th e following lim itations: 1. Only a subset of prim itives identified for the PM M specification language of AP5 is im plem ented. 2. Space usage attrib u tes of events is not supported by th e im plem entation. 3. T he tem poral analysis mechanisms for PM M specifications discussed in Chap ter 5 is not com pletely im plem ented, specifically, th e increm ental com putation of the m onitoring results is im plem ented only when there is an outm ost inter val. 4. Straightforw ard representation for the instrum entation d a ta is used although selecting d ata representation depending on how they are used can result in b etter perform ance. 5. Selection among m ultiple instrum entation algorithm s is done based on very simple heuristics. For instance, th e current instrum entation always test local filters at run tim e w ithout knowledge of how effective th e filters are. T h at is because m aking right decision needs such inform ation as w hat percent of data would pass th e filters. T h at kind of inform ation is usually not available. 6. Knowledge of the m appings is coded into the system so th a t reasonable ef ficiency is achieved. However, it is hard to extend the prim itive event types of th e PM M specification language because of th at. Sim ilarly knowledge of dealing w ith prim itive control relations is coded directly into the system and adding prim itive control relations could not be done declaratively at the cur rent im plem entation. 108 T he current im plem entation does m inim al error checking and leaves m ost errors to be trap p ed by A P5 at runtim e. Sm artM onitor assumes: 1. T he m onitored source program is loaded because some static analysis uses inform ation th a t is only available when th e program is loaded. 2. There is a nature initialization and term ination of th e m onitoring and m easur ing execution. It requires the program m ers to explicitly tell th e system when th e program m onitoring and m easuring should be begun and ended. B.2 SmartMonitor Components This section describes the m ajor com ponents of the autom atic program m ing system , m ajor assum ptions and design decisions used. B .2 .1 P M M S p ecifica tio n L anguage A PM M specification has two parts: 1. An activity model which is represented by a relational schem a 2. A set of questions which are represented by a set of queries defined on the above schema T hey are defined by the following constructs: event definitions and question defini tions. Defining a PM M specification has the following functions: 1. introducing the terminology 2. specifying the interested execution model 3. specifying th e questions. It enables program m ers com m unicate w ith the autom atic program m ing system to tell w hat are of interest. It can also be used by the system to focus on only the relevant d ata specified in th e specification. 109 E v e n t D e fin itio n s D e fe v e n t is used to define either an event type or a relationship among events. It supports abstractions. It is mainly used in com puting answers to the questions. A fter the execution of th e program , the tuples of the relations m ay or m ay not be kept. T he system also supports specialization of event types. An event type is defined by an unary definition. If event type E sub is a subtype of E sup then all attributes defined on E sup are defined on E suf, Q u e s tio n D e fin itio n s D e fq u e s tio n defines a query to the relational schem a defined to characterize the activities to m onitor. Unlike event definitions, th e d ata com puted by th e question definitions are th e answers to the questions defined in th e specification. After the execution of th e program , th e com puted d ata are kept in th e AP5 relation form. T he specification language is an extension to A P5 schem a definition and query language. M aking the PMM specification language based on A P5 language enables Sm artM onitor use the tools built for the AP5 language to process th e PM M speci fications. In particular, existing tools of AP5 are used to • check if a PM M specification is well formed • define th e event schema used for storing the collected d ata • support relations other than events and control relations, like + , etc. Sm artM onitor uses AP5 relations to represent th e relational schem a of the in strum entation data. Using AP5 relations enables Sm artM onitor to simply use AP5 query generator to gener-ate algorithm s for the queries generated by Sm artM onitor. It also enables Sm artM onitor to utilize the query optim ization techniques of AP5 query generator to generate b etter algorithm s. A nother advantage of using AP5 is the generality of AP5 representation. In AP5, a relation is an abstraction of data. A relation is characterized by the operations defined upon it. For exam ple, it sep arates representations from the functionalities. Because d a ta recording in program m onitoring and m easuring happens during the execution of th e m onitored program , 110 jthe overhead of it should be minimized. W ith the separation of representation and functionalities, SmartMonitor records instrumentation data by treating tuples of Lome relations as LISP data. By using simple LISP operations to store data rea- jsonable efficiency is achieved. By providing relational interface required by A P5’s ■relational abstraction, the stored data could be accessed via the query language of AP5. Hence, A P5’s semantics enables SmartMonitor to achieve its functionality without sacrificing too much efficiency. B .2 .2 A u to m a tic P rogram m er |To a program m er using the system, program m onitoring and m easuring has the following steps: 1. preparing a specification 2. submitting both the specification and the source program to the system 3. running the generated augmented program 4. reading the answers generated by the execution T he following commands are used between th e first step and the fourth step: • T he augm ented program is generated by invoking the following com m and (Monitor-On PMM-specification Source-Pro gram) • T he instrum entation of th e augm ented program is elim inated by invoking the following com m and (Monitor-OfF Source-Program) • M onitoring starts on an augm ented program via the following com m and (Monitor-Begin Source-Program) • M onitoring ends on an augm ented program via th e following com m and (Monitor-End Source-Program) M onitoring and m easuring a source program works in a batch mode. If a program m er changes some p art of the PM M specification, the augm ented program will be regenerated from scratch. No increm ental generation is supported. I ll B .2 .3 S ta tic A p p ro x im a tio n S tatic analysis is used to store all potential sites and their relationships into a static analysis database. T he relevant sites of a PM M specification are determ ined by transform ing the specification into a set of queries to the static analysis database. T he evaluation results of the queries are th e sites of the specification. C urrently, there is no declarative way for a program m er to com m unicate w ith the static analysis tool to let it record w hat is of interest. S tatic analysis is done for the source program w ithout taking the PM M specification into consideration. It assumes th a t static analysis results are stable. It tries to fully utilize the static analysis tools provided in th e environm ent. T hat is because the environm ent already represents analysis results in AP5 database and enables the use of AP5 query language to query the results. T he static analysis could be im proved by only recording the relevant stuff. B .2 .4 In str u m e n ta tio n In terface Instrum entation interface supports th e access of th e run tim e perform ance data of the prim itive events and control relations used in the specification language. Currently, th e instrum entation interface for accessing run tim e perform ance data is hand coded for several reasons. First, executing instrum entation code consumes ] system resource, hence, it should be im plem ented as efficient as possible. Second, it needs to deal w ith th e im plem entation detail of th e program m ing language. Prim itives are used to set up the correspondence between the specification lan- i guage and th e execution of the program m ing language constructs. There are several reasons why it is difficult to extend the prim itives and control relations used in the specification language declaratively: • There is no declarative way to inform static analysis tool to record all sites for a new prim itive although it would not be very difficult to extend th e static analysis tool to do so • There is no sim ple way to declaratively define the code th a t supports the interface of the new prim itive 112 lit is our belief th a t th e execution of a program w ritten in a program m ing language jean be described by a set of prim itives and control relations. Based on th e principles stated in C hapter 3, m ost of the perform ance questions of a program ’s execution jean be answers. Hence, changes in th e set of prim itives events and control relations do not happen very often. B .2 .5 G en era tin g In stru m en ta tio n Instrum entation generation consists of th e following com ponents: 1. instrum entation tem plates for predefined prim itives and control relations axe w ritten in CLOS[Ste90] th at supports object-oriented program m ing. In par ticular, th e tem plates are organized following th e type hierarchy in th e PM M specification language. 2. instrum entation plans for logic operators th e separation m akes it possible to separate autom atic program m er into language dependent p art th a t is the first p art and language independent p art th at is the second part. B.3 Implementation Summary T he im plem entation of Sm artM onitor benefits greatly from th e use of th e A P5 lan guage and th e CLF[Pro88] environm ent. T he schema definition language and the query optim izer of AP5 m akes th e representation of the m onitoring d ata and the com putation of th e m onitoring d ata easier. Because AP5 is m ain m em ory virtue database th e im plem entation of Sm artM onitor can take advantage of th e conve nience associated w ith relational representation w ithout paying th e overhead asso ciated w ith traditional database systems. The availability of a very sophisticated static analyzer in CLF makes the im plem entation simpler. Furtherm ore, the in tegration of A P5 and the CLF environm ent makes Sm artM onitor be p art of the environm ent so th a t th e interm ediate d ata and m onitoring results can be m anip ulated using th e tools in the environm ent. All of these m ade the im plem entation possible and m uch easier. 113 Reference List [ACPP91] [ASU86] [Bal69] [Bal85] [Bal86] [Bar85] [Bat83] [Bat88] [BCG83] [BD77] [Ben88] M. A badi, L. Cardelli, B. Pierce, and G. Plotkin. Dynam ic typing in sta t ically typed language. A C M Transactions on Programming Languages and System s, 13(2):237-268, April 1991. A. V. Abo, R. Sethi, and J. D. Ullman. Com pilers-Principles, Tech niques, and Tools. Addison-W esley Publishing Company, 1986. R. Balzer. E xdam s-extendable debugging and m onitoring system . In Proceedings o f the A F IP S Conference, pages 567-586, 1969. R. Balzer. A 15 year perspective on autom atic program m ing. IE E E Transactions on Software Engineering, Vol. S E -ll(N o. 11):1257-1268, Novem ber 1985. R. Balzer. Living in the next generation operating system . In Proceed ings o f the IF IP 10th World Com puter Congress, pages 283-291, Dublin, Ireland, 1986. D. R. Barstow. Domain-specific autom atic program m ing. IE E E Trans actions on Software Engineering, SE-11(11):1321-1336, November 1985. J. B atali. C om putational introspection. A. I. Memo 701, M assachusetts In stitu te of Technology, Artificial Intelligence Laboratory, February 1983. P. C. Bates. Debugging heterogeneous distributed systems using event- based models of behavior. In Proceedings o f A C M S IG P L A N and S IG O P S Workshop on Parallel and Distributed Debugging, pages 11 - 22, M adison, W isconsin, May 1988. R. Balzer, T. E. C heatham , and C. Green. Software technology in the 1990’s: Using a new paradigm . IE E E Computer, 16(11), November 1983. R. M. B urstall and J. Dalington. A transform ation system for developing recursive program s. Journal o f the ACM , 24(l):44-67, January 1977. J. Benjam in. PILO T: A prescription for program perform ance m easure m ent. In Proceedings o f the 10th International Conference on Software Engineering, pages 388-395, 1988. 114 [BG79] [BH83] [BP88] [Bru85] [BW83] [C+83] [CC76] [CGM90] [Che76] [CLW90] [CMB91] [Cod70] R. Balzer and N. Goldman. Principles of good software specification. Proceedings on IE E E Conference on Specification o f Reliable Software, pages 58-67, 1979. B. Bruegge and P. Hibbard. Generalized p ath expressions: A high level debugging mechanism. In Proceedings o f the A C M S IG S O F T Software Engineering Sym posium on High Level Debugging, pages 34-44, Pacific Grove California, M arch 1983. B. Boehm and P. N. Papaccio. U nderstanding and controlling software costs. IE E E Transactions on Software Engineering, 14(10):1462-1477, O ctober 1988. B. Bruegge. Adaptability and portability o f symbolic debuggers. PhD thesis, D epartm ent of Com puter Science, CMU, P ittsburgh, PA 15213, Septem ber 1985. P. C. B ates and J. C. W ileden. High-level debugging of distributed sys tem s: T he behavioral abstraction approach. The Journal o f System s and Software, 3(4):255-264, 1983. C. C outant et al. M easuring the perform ance and behavior of icon pro gram s. IE E E Transactions on Software Engineering, SE-9(1), January 1983. J. Cohen and N. Carpenter. A language for inquiring about th e run-tim e behavior of program s. Software-Practice and Experience, pages 445-460, February 1976. U. S. Chakravarthy, J. G rant, and J. M inker. Logic-based approach to sem antic query optim ization. A C M Transactions on Database Systems, 15(2):162-207, June 1990. P. P. Chen. T he Entity-Relationship Model - Toward a Unified View of D ata. A C M Transactions on Database System s, l(l):9 -3 6 , M arch 1976. C. C. C harlton, P. H. Leng, and D. M. W ilkinson. Program m onitoring and analysis: Software structures and architectural support. Software- Practice and Experience, 20(9):859-867, Septem ber 1990. J. Choi, B. P. Miller, and R. H. B. Betzer. Techniques for debugging parallel program s with flowback analysis. A C M Transactions on Pro gramming Languages and System s, 13(4):491-530, O ctober 1991. E. F. Codd. A relational model of d ata for large shared d ata banks. Com m unications o f the ACM , 13(6):377-387, 1970. 115 Coh86] Coh88] [Coh89a] Coh89b] CW85] Dav80] Fea86] For79] FOW87] [G+83] GMN84] ;Gol83] [GYK90] D. Cohen. A utom atic com pilation of logical specifications into efficient program s. In Proceedings o f the 5th National Conference on Artificial Intelligence, pages 20-25, Philadelphia, PA, A ugust 1986. AAAI. D. Cohen. A P S u ser’ s manual. ISI/U SC , 1988. D. Cohen. Compiling complex database transition triggers. In Proceed ings o f the A C M SIG M O D Conference on M anagement o f Data, pages 225-234, 1989. D. Cohen. A first order logic database suitable for real program m ing. Technical report, U SC/ISI, 1989. L. Cardelli and P. Wegner. On understanding types, d a ta abstraction, and polym orphism . A C M Computing Surveys, 17(4):471-522, December 1985. R. Davis. M eta-rules: Reasoning about control. Artificial Intelligence, 15:179-222, 1980. M. Feather. A survey and classification of some program transform ation approaches and techniques. In F.R .G , editor, Proceedings o f IF IP T C 2 Working Conference on Program Specification and Transform ation, April 1986. C. L. Forgy. On the Efficient Im plem entation o f Production Systems. PhD thesis, D epartm ent of C om puter Science, Carnegie-M ellon Univer sity, P ittsburgh, PA 15213, February 1979. t J. Ferrante, K. J. O ttenstein, and J. D. W arren. T he program depen dence graph and its use in optim ization. A C M Transactions on Program ming Languages and System s, 9(3):319-349, July 1987. S. G raham et al. An execution profiler for m odular program s. Software Practice and Experience, 13:671-683, 1983. H. G allaire, J. M inker, and J. Nicolas. Logic and databases: A deductive approach. A C M Computing Surveys, 16(2):153-185, 1984. N.M. Goldman. Three dimensions of design developm ent. In Proceedings o f A A A I Conference, W ashington D. C., 1983. G. S. Goldszm idt, S. Yemini, and S. K atz. High-level language debug ging for concurrent programs. A C M Transactions on Com puter System s, 8(4):311-336, November 1990. 116 [Han87] [HK87] [HK88] [HSW85] [Hud89] [HW90] [JF90] [JK84] [Kan83] [KB81] [Kin81a] [Kin81b] [Knu71] D. Hanson. Event associations in S N 0 B 0 L 4 for program debugging. Software-Practice and Experience, pages 115-129, A ugust 1987. R. Hull and R. King. Sem antic database modeling: Survey, applications, and research issues. A C M Computing Surveys, 19(3):201-260, Septem ber 1987. W . Hseush and G. Kaiser. D ata path debugging: D ata-oriented debug ging for a concurrent program m ing language. In Proceedings o f AC M S IG P L A N and SIG O P S Workshop on Parallel and Distributed Debug ging, pages 236-247, M adison, W isconsin, May 1988. G. C. Held, M. R. Stonebraker, and E. Wong. IN G R ES-A Relational D ata Base System. In Proceedings o f the 1975 N ational Com puter Con ference, pages 409-416, Anaheim , California, 1985. P. H udak. Conception, evolution, and application of functional program m ing languages. A C M Computing Surveys, 21(3):359-411, Septem ber 1989. D. H aban and D. W ybranietz. A hybird m onitor for behavior and perfor m ance analysis of distributed systems. IE E E Transactions on Software Engineering, 16(2): 197— 211, Febuary 1990. W . L. Johnson and M. Feather. Building an evolution transform ation library. In Proceedings o f the 12th International Conference on Software Engineering, pages 238-248, Nice, France, 1990. M. Jarke and J. Koch. Query optim ization in database systems. A C M Com puting Surveys, 16(2): 111— 152, 1984. E. K ant. On the efficient synthesis of efficient program s. Artificial Intel ligence, 20:253-305, 1983. E. K ant and D. R. Barstow. The refinement paradigm : T he interaction of coding and efficiency knowledge in program synthesis. IE E E Trans actions on Software Engineering, SE-7:458-471, 1981. J. King. Query Optimization by Sem antic Reasoning. PhD thesis, Stan ford University, 1981. J. King. QUIST: A system for semantic query optim ization in relational databases. In Proceedings o f the 7th International Conference on Very Large Data Bases, pages 510-517, Cannes, France, 1981. D. K nuth. An em pirical study of Fortran program s. Software-Practice and Experience, 1:105-133, April 1971. 117 [LL89] [LW69] [Mil84] [Mod79] [MW80] [Nar89] [Nic82] [0CH91] [0090] [PC90] [PL83] [Pla84] B. Lazzerini and L. Lopriore. A bstraction m echanism s for event control in program debugging. IE E E Transactions on Software Engineering, 15(7) :890— 901, July 1989. P. Lucas and K. Walk. On the Formal D escription o f P L /I , volume 6. A nnual Reviews of A utom atic Program m ing, 1969. R. M ilner. A proposal for standard ML. In Proceedings 1984 A CM Conference on LISP and Functional Programming, pages 184-197. ACM, 1984. M. L. Model. Monitoring System Behavior In a Complex Computational Environment. PhD thesis, D epartm ent of C om puter Science, Stanford University, Stanford, California, 1979. Z. M anna and R. W aldingger. A deductive approach to program syn thesis. A C M Transactions on Programming Languages and System s, 2(1 ):90— 121, January 1980. K. Narayanaswamy. Static Analysis-Based Program Evolution Support in the Common Lisp Framework. In Proceedings o f the 11th International Conference on Software Engineering, Singapore, Singapore, 1989. J. M. Nicolas. Logic for im proving integrity checking in relational d ata bases. Acta Informatica, 18(3):227-253, 1982. R. A. Olsson, R. H. Crawford, and W . W . Ho. A dataflow approach to event-based debugging. Software-Practice and Experience, 21(2) :209— 229, February 1991. K. M. Olender and L. J. Osterweil. Cecil: A sequencing constraint lan guage for autom atic static analysis generation. IE E E Transactions on Software Engineering, 16(3):268-280, M arch 1990. A. Podgurski and L. A. Clarke. A formal model of program dependences and its im plications for software testing, debugging, and m aintenance. IE E E Transactions on Software Engineering, 16(9):965-979, Septem ber 1990. M. L. Powell and M. A. Linton. A database model of debugging. In Proceedings o f the A C M S IG SO F T Software Engineering Sym posium on High Level Debugging, pages 67-70, Pacific Grove California, M arch 1983. B. P lattn er. Real-tim e execution m onitoring. IE E E Transactions on Software Engineering, SE-10(6):756-764, November 1984. 118J [PN81] [Pnu86] [Pro88] [PS83] [Qia90] [QW86] [RD90] [Ric86] [RP88] [RW88] [Sam89] [SJGP90] [Smi84] B. P lattn er and J. Nievergelt. M onitoring program execution: A survey. IE E E Computer, pages 76-93, November 1981. A. Pnueli. Specification and developm ent of reactive systems. Inform a tion Processing, pages 845-858, 1986. CLF Project. C LF manual. U SC /Inform ation Science Institute, 4676 A dm iralty Way, M arina del Rey, CA 90292, August 1988. H. P artsh and R. Steinbruggen. Program transform ation system s. A C M Computing Surveys, 15(3): 199-236, 1983. X. Qian. Synthesizing database transactions. In Proceedings o f the 16th International Conference on Very Large Data Bases, pages 552-565, Bris bane, A ustralia, 1990. X. Qian and G. Weiderhold. Knowledge-based integrity constraint val idation. In Proceedings o f the 12th International Conference on Very Large Data Bases, pages 3-12, 1986. P. Van Roy and A. M. Despain. The Benefits of Global Dataflow Analysis for an Optim izing Prolog Compiler. In Proceedings o f the 1990 North Am erican Conference on Logic Programming, 1990. C. Rich. A formal representation for plans in the program m er’s appren tice. In Proceedings of AAAI, pages 1044-1052, 1986. B. G. R yder and M. C. Pauli. Increm ental data-flow analysis algorithm s. A C M Transactions on Programming Languages and System s, 10(1): 1— 50, January 1988. C. Rich and R. C. W aters. A utom atic program m ing: M yths and prospects. IE E E Computer, pages 40-51, A ugust 1988. B. Sam adi. TUNEX: A knowledge-based system for perform ance tuning of the UNIX operating system. IEEE Transactions on Software Engi neering, 15(7):861-874, July 1989. M. Stonebraker, A. Jhingran, J. Goh, and S. Potam ianos. On rules, procedures, caching and views in database systems. In Proceedings o f the A C M SIG M O D Conference on M anagement o f Data, pages 281-290, A tlantic City, New Jersey, 1990. B. C. Sm ith. Reflection and semantics in LISP. In Proceedings of 1984 A C M Principles of Programming Language Conference, pages 23-35, Salt Lake City, U lta, 1984. ____________________________________________________________________________________ 119_ [Smi90] [Sno82] [Sno84] [Sno87] [Sno88] [SPSB91] [SSS81] [Ste90] [Sym86] [Sys86] [U1188J [U1189] [Wil86] D. R. Sm ith. KIDS: A sem iautom atic program developm ent system. IE E E Transactions on Software Engineering, 16(9):1024-1043, Septem ber 1990. R. Snodgrass. M onitoring Distributed System s: A Relational Approach. PhD thesis, D epartm ent of Com puter Science, Carnegie-M ellon Univer sity, P ittsburgh, PA 15213, December 1982. R. Snodgrass. M onitoring in a software developm ent environm ent: A relational approach. In Proceedings o f the A C M SIG S O F T Software En gineering Sym posium on Practical Software Development Environm ents, pages 124 -131, P ittsburgh, Pennsylvania, April 1984. R. Snodgrass. T he Temporal Q uery Language TQ uel. A C M Transac tions on Database System s, 12(2):247-298, June 1987. R. Snodgrass. A relational approach to m onitoring complex systems. A C M Transactions on Com puter System s, 6(2):157-196, May 1988. R. W . Selby, A. A. Porter, D. C. Schm idt, and J. Berney. M etric-driven analysis and feedback systems for anabling em pirically guided software developm ent. In Proceedings o f the 13th International Conference on Software Engineering, pages 288-298, A ustin, Texas, May 1991. E. Schonberg, J. Schwartz, and M. Sharir. An autom atic technique for selection of d ata representations in SETL program s. A C M Transactions on Programming Languages and System s, 3(2):126-143, 1981. G. L. Steele Jr. CO M M ON LISP: The Language. Digital Press, second edition, 1990. Symbolics. The Symbolics G EN ERA Programming Environm ent M an ual. Symbolics, Inc, 4 New England Tech Center, 555 V irginia Road, Concord, MA 01742, July 1986. t Reasoning Systems. Refine User’ s Guide. Palo Alto, CA, 1986. J. D. Ullman. Principles of Database and Knowledge-Base System s, vol um e 1. Com puter Science Press, 1988. J. D. Ullman. Principles o f Database and Knowledge-Base System s, vol um e 2. Com puter Science Press, 1989. J. C. W ileden. Applying event based analysis to specifications and de signs. Inform ation Processing, pages 577-581, 1986. 120 [Wol90] [Xer83] M. Wolfe. D ata dependence and program restructuring. The Journal o f Supercomputing, 4:321-344, 1990. Xerox Palo Alto Research Center. Interlisp Reference M anual, October 1983. 121
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11255792
Unique identifier
UC11255792
Legacy Identifier
DP22850