Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Alias analysis for Java with reference -set representation in high -performance computing
(USC Thesis Other)
Alias analysis for Java with reference -set representation in high -performance computing
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UM I a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. ProQuest Information and Learning 300 North Zeeb Road, Ann Arbor, M l 48106-1346 USA 800-521-0600 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ALIAS ANALYSIS FOR JAVA WITH REFERENCE-SET REPRESENTATION IN HIGH-PERFORMANCE COMPUTING by Jongwook Woo A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA in Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER ENGINEERING) August 2001 Copyright 2001 Jongwook Woo Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3054828 ___ ® UMI UMI Microform 3054828 Copyright 2002 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES. CALIFORNIA 90007 This dissertation, written by - 5 o /jG C U ) tP £ \ a ) o d ......................... under the direction of h. Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of DOCTOR OF PHILOSOPHY Dean of Graduate Studies Date August 7, 2001. DISSERTATION COMMITTEE / ^ Chairperson / Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Dedication To my father Joonki Woo and mother Sookja Yoon, without whom I would not suc ceeded in this endeavor. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Acknowledgments I would like to thank my advisor, Professor Jean-Luc Guadiot, who provided his guidance, encouragement, and knowledge throughout the Ph.D. journey. His belief in my ability launched me on this research and discovery. I thank my committee members, Professor Sandeep Gupta, for asking the questions that helped organizing my work and Professor Douglas lerardi, for advice and feedback that significantly improved this dissertation. My sincere thanks and deep gratitude to Professor Jehak Woo, Andrew L Wendelbom, Denis Caromel, and Isabelle Attali. Without their continuous supports, this work would have never been completed. Special thanks to Yonsei Alumni at USC and 85’ class of Electronic Engineering, Chulho Shin and Duckdong Hwang for providing me with the warmest friendship while I was in the hard time. I also like to thank for many interesting discussions and suggestions that contributed to the improvement of this work all the members of Parallel and Distributed Processing Center, Seongwon Lee, Dongsoo Kang, Jungyup Kang, Wonwoo Ro, Stephen Jenks, and Manil Makhija. On a personal note, I like to thank my family, Hye Jin Woo, Hye Seon Woo, An Jae Sung, Jae Do Kim, Eun Ji Sung, Hyung Keun Kim, and the new bom baby, Tae Won Kim. In particular, I like to thank my wife, Jihyun Kim for everything. Without Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. my family’s continuous encouragement, love, help, support, patience, and understanding, I wouldn't have been able to survive my graduate school. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table of Contents Dedication ii Acknowledgments iii List of Tables vii List of Figures viii Abstract xi 1. Introduction 1 1.1. Problem Statement 1 1.2. Motivation 4 2. Background 8 2.1. Inter-procedural Analysis 8 2.1.1. Inter-procedural Problems 8 2.1.2. Problem statements of alias analysis 10 2.2. Inter-procedural Alias Analysis 11 2.2.1. Data-Flow Analysis 1 1 2.2.2. Calling Graph and Control-Flow Graph 13 2.2.2.1. Flow-Sensitive Context-Insensitive Graph 15 2.2.2.2. Problems in Flow-Sensitive Context-Insensitive Graph 18 2.2.3. Alias-Set Computing 21 2.3. Related Work 22 2.3.1. Alias-Set Representation 23 2.3.2. Type-Inference for Objects 25 2.3.3. Other Java Issues 28 2.3.4. Other Related Work 30 3. Alias Analysis for Java with Reference-Set Representation 34 3.1. Differences between C++ and Java 34 3.2. Reference-Set Representation of Alias Set 37 3.3. Data Structure 40 3.3.1. Calling Graph and Control-Flow Graph 40 3.3.2. Type Table 41 3.3.3. Class Structure Table 45 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.4. Propagation Rules of Allas Set 50 3.4.1. Rules for Intra-procedural Analysis 53 3.4.2. Rules for Inter-procedural Analysis 56 3.5. Extended Propagation Rules for Exceptions 62 3.5.1. Rules for Intra-procedural Analysis 65 3.5.2. Rules for Inter-procedural Analysis 68 3.6. Alias Analysis Algorithm 73 3.7. Complexity of the Algorithm 78 3.8. Regarding Multithreading Issue 81 4. Methodology 84 4.1. JavaCC 84 4.2. JTB 85 4.3. Benchmarks 86 4.3.1. Dynamic CG 86 4.3.2. Binary Tree 86 4.3.3. Ray Tracer 87 4.3.4. Exception Block 87 4.3.5. Recursive Call 88 4.4. Framework of Alias Detector 88 5. Performance Evaluation 91 5.1. Simulation Environment 91 5.2. Simulation Results 91 5.2.1. Kottos 94 5.2.2. Ceng 95 5.2.3. Asadal 97 5.2.4. Comparison on Architectures 98 6. Conclusion 109 6.1. Summary 109 6.2. Future Work 112 Bibliography 113 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Tables Table 1. Characteristics of Benchmark 88 Table 2. Characteristics of hosts 92 Table 3. The Comparison of Dynamic CG in Figure 30 96 Table 4. The Comparison of Ray Tracer in Figure 31 97 Table 5. The Comparison of Exception Block in Figure 32 98 Table 6. The Comparison of Binary Tree: Depth 0 in Figure 33 99 Table 7. The Comparison of Binary Tree: Depth 1 in Figure 34 100 Table 8. The Comparison of Binary Tree: Depth 2 in Figure 35 101 Table 9. The Comparison of Binary Tree: Depth 3 in Figure 36 102 Table 10. The Comparison of Binary Tree: Depth 6 in Figure 37 103 Table 11. The Comparison of Rec Call: Depth 0 in Figure 38 104 Table 12. The Comparison of Rec Call: Depth I in Figure 39 105 Table 13. The Comparison of Rec Call: Depth 2 in Figure 40 105 Table 14. The Comparison of Rec Call: Depth 3 in Figure 41 106 Table 15. The Comparison of Rec Call: Depth 6 in Figure 42 107 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Figures Figure 1. Control-Flow Graphs and Data-Flow equations [ASU86] 12 Figure 2. Inter-procedural analysis on graphs 14 Figure 3. An example Code and its CFG Categorization 16 Figure 4. An example Code and its Calling Graph Categorization 16 Figure 5. An example path problems 18 Figure 6. Example Programs for Related Work 23 Figure 7. Multi-threading example code [DD97] 28 Figure 8. Relation between a pointer and an object in C++ 35 Figure 9. Difference between a pointer in C++ and a reference in Java 35 Figure 10. The relationships between references and objects 38 Figure 11. An example source code and its type inference 43 Figure 12. An example Java code of shadowed variables and overridden methods for single inherited classes 46 Figure 13. CSTs for an example in Figure 12 47 Figure 14. Class Structure Table 48 Figure 15. Example of an inter-procedural analysis 60 Figure 16. An exception handling example code [Flan97] 61 Figure 17. CFGs of example classes in Figure 16 63 Figure 18. CG of example classes in Figure 16 64 Figure 19. Example of an inter-procedural analysis 71 viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 20. Alias Analysis Algorithm 74 Figure 21. Example Code for The Algorithm of Figure 20 75 Figure 22. CG and CFG of Figure 7 81 Figure 23. Alias detection for Java codes 89 Figure 24. Execution Time of Benchmark (RS 6000) 92 Figure 25. Execution Time of Benchmark: Binary Tree (RS 6000) 93 Figure 26. Execution Time of benchmark (Sun4) 93 Figure 27. Execution Time of benchmark: Binary Tree (Sun4) 94 Figure 28. Execution Time of benchmark (Windows 2000) 95 Figure 29. Execution Time of benchmark: Binary Tree (Windows 2000) 95 Figure 30. Execution Time of Dynamic CG for Architectures 96 Figure 31. Execution Time of Ray Tracer for Architectures 97 Figure 32. Execution Time of Exception Block for Architectures 98 Figure 33. Execution Time of Binary Tree at Depth 0 for Architectures 99 Figure 34. Execution Time of Binary Tree at Depth 1 for Architectures 100 Figure 35. Execution Time of Binary Tree at Depth 2 for Architectures 101 Figure 36. Execution Time of Binary Tree at Depth 3 for Architectures 102 Figure 37. Execution Time of Binary Tree at Depth 6 for Architectures 103 Figure 38. Execution Time of Rec Call Depth 0 for Architectures 104 Figure 39. Execution Time of Rec Call Depth 1 for Architectures 104 Figure 40. Execution Time of Rec Call Depth 2 for Architectures 105 ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 41. Execution Time of Rec Call Depth 3 for Architectures Figure 42. Execution Time of Rec Call Depth 6 for Architectures Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract As one of optimization technique used in the static analysis of the compiler, alias analysis has been developed to detect aliased variables in a code and to be used in reordering instructions to enhance performance. Besides, in high-performance computing, alias analysis is useful for instruction-level parallelism or distributed and clustering computing because aliases have side effects such as context switch overhead, communication delay, and race conditions. There are two solutions for these issues. One is to avoid aliases in a code by using functional language such as Sisal and the other is to detect aliases in the code. In this thesis, the second solution is proposed in order to maintain Java’s syntax. People have focused on improving the efficiencies and preciseness of their analyses by maintaining safeties. In this thesis, a flow-sensitive context-insensitive alias analysis in Java is proposed, that is more efficient and precise than previous analyses in C++ without harming the safety of aliased references. For that, first, a reference-set alias representation is presented, that is more efficient for Java than existing pairs of object relations for C++ which cause complicated and inefficient analysis for Java. The reference-set alias representation binds all aliased references for one object. Second, the data-flow equations based on the propagation rules for the reference-set alias representation are introduced. The equations compute alias information more efficiently and precisely with an alias set of reference-sets and by xi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. removing redundant reference-sets at a call statement. Also, the rules are extended to analyze aliases in exception statements. Third, for the type determination of overridden methods and shadowed variables, the type table is built with reference variables and all possible types of each reference variable. The time complexity of type determination becomes constant and it is more efficient than the type searching based on objects in existing studies. Fourth, the CST is presented to build CFGs and CGs to avoid increasing spaces. Fifth, the alias analysis algorithm is proposed, which uses a popular iterative loop method for an alias analysis with a structural traverse of a CFG to improve its efficiency. Finally, execution times of benchmark codes are compared for reference-set and existing representation on three USC hosts that have different architectures. It shows that reference-set representation is more efficient in benchmark codes that generate many objects and aliases than object-pair representation. Besides, a possible multi-threading solution is proposed. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Introduction 1.1. Problem Statement An alias is defined as two or more reference (pointer) variables that may point at the same memory location. Aliases can be created by assigning statements among reference variables and by calling statements among procedures on global variables and parameters (call-by-reference) of reference variables. Aliases complicate data flow analysis that is performed during the optimization and the parallelization of a programming code. It causes side effects that make the code error prone and unsuitable for parallel execution. As one of optimization technique used in the static analysis of the compiler, alias analysis has been developed to detect aliased variables and to be used in reordering instructions to enhance performance. Aliases are detected on assigning statements through intra-procedural analysis within a procedure by using its control flow graph (CFG) and on calling statements through inter-procedural analysis among procedures by using its calling graph (CG). In object-oriented languages such as C++ and Java, the static determination of run-time object types is a critical factor in the optimization. The static determination of run-time types makes inlining/cloning optimizations possible which is used to resolve a number of indirect function calls by limiting the l Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. number of possible functions invoked and by converting the indirect function calls to direct function calls. Also, it improves the precision of inter-procedural analyses and transformations. In addition, in case of alias analysis, the static determination of run time object types are needed to build safe calling graph for function calls including virtual function calls. The integration of alias analysis with type information increases the precision of alias detections, particularly with regards to inheritances among objects. Although people [PR95, CSH95, CR97, DMM98] have proposed such integrating methods to improve the precision and efficiency of alias analysis for virtual functions in C++ (overridden methods in Java), their approaches have issues for achieving the desired precision in Java because their alias representations are for pointer and object based language. Also, for constructors and class inheritances, the type information used by their integrating methods may lose the information regarding shadowed variables in Java; shadowed variable is a variable defined in one class with the same named variable of its super class; overridden method is a method defined in one class with same named method and same argument types of its super class. Besides, their type inference methods may not be safe for dynamic type determination [PR95] and the indirect type inference with objects may negatively affect performance [PR95, CSH95, CR97], In this thesis, we present a compile-time flow-sensitive context-insensitive alias analysis algorithm with type information in Java. It is to address those issues in C++ 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. while applying it to Java. Our alias analysis algorithm adopts an existing alias analysis algorithm for C++ [CSH95] to Java by adding our type inference algorithm and data flow computing equations for proposed reference-set representation. In this scheme, the type information of references during alias analysis is inferred by using our type inference operation. The inferred type information is used to increase the precision of subsequent alias analyses not only for overridden methods but also for shadowed variables in Java. The inference is much efficient because each type information is directly accessed. Also, our algorithm proposes a data flow computing rules for possible statements by regarding constructors as functions. Compared to other algorithms [PR95, CSH95, CR97, DMM98], the precision of our analysis algorithm is improved by adding type information of shadowed variables and by regarding constructors as functions. Also, it computes safe aliases and improves its efficiency by using reference-set representation. Besides, the equation is extended to compute aliases in exception statements. And, the possible alias detection in multithreading is proposed. In this thesis, section 1 introduces our work and motivation of developing alias analysis algorithm for Java. Section 2 describes background for alias analysis: problem statements of the trade-off between precision and efficiency on control flow graph and calling graph, applicable existing data structures, data flow computing equations for aliases, and control flow/calling graphs to our alias analysis in order to solve forward and backward path problem, and type inference problems of 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. hierarchical objects and inheritances in Java. Section 3 presents our proposed type inference and data flow computing rules with reference-set representation to improve the precision and the efficiency of alias detection in Java. Also, it extends the rules for exceptions and proposes possible alias analysis solution of multithreading issue. Section 4 explains the methodology to implement our algorithm and presents benchmark codes. Section 5 explains the simulation environment and analyses the simulation results. Section 6 presents the summary of the work and future work. 1.2. Motivation There are three main motivations for developing an alias detecting algorithm in Java. First, aliases causes problems not only for optimizing a sequential compiler but also for a parallel compiler when using Java in high performance computing. Second, some mechanisms used in inter-procedural analysis are not adequate for application to object-oriented languages because object-oriented languages have objects. Third, conventional alias analysis with type information in C++ needs to be adapted for an alias representation, additional type information, and rules including exceptions when applied to Java. Those motivations are more specifically described as follows. High performance computing on distributed computing systems has been widely studied [CCINSW97, CKV98]. It has been shown that high performance computation can be achieved without any extra costs, using existing computers that are connected Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. via some networks to execute some applications concurrently as in distributed computing systems. Since Java is a platform independent language, it can be used for integrating different computational platforms into a distributed computing system at no additional cost. Java is an object-oriented language which allows for easy maintenance, update and reuse of application programs. An application program can be implemented for applets or servlets that are executed on the World-Wide-Web. Thus, we can create a distributed or clustering computing system which is more scalable and allows accesses to arbitrary hosts with the owner’s permission. These properties are driving Java to the one of forefronts among parallel languages for the distributed computing world. However, Java is an object-oriented language with reference variables which refer to objects instantiated. Reference variables becomes possible aliases if a pair of references are left-hand and right-hand side references on an assignment statement. Those aliased references may cause race conditions if each reference is assigned into different process. And, it may cause context switch and communication delay. People have studied to avoid or detect aliases among pointers in C and C++ [LR92, LRZ92, CBC93, BCC97, DGC97, PR95, CSH95, CR97, DMM98]. Since reference implies pointer itself or pointed object, we could refer to those studies for Java in order to detect aliases among reference variables. 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Inter-procedural alias analysis has been developed to detect aliases between a calling procedure and a called procedure. In C or C++, pointers cause side effects if those are aliases. People [LR92, LRZ92, CBC93, BCC97, DGC97] have proposed algorithms which detect aliases for pointers. However, it is not applicable to object oriented language since an object is not a procedure, but rather a collection of procedures and variables. An object oriented language such as Java needs additional algorithms which find a way how to refer to data (variables) and operations (procedures) of a receiver by recognizing its type. Type information is needed to construct safe calling graphs and to facilitate data or control flow analysis subsequently in an object-oriented language since an object has its own class type, since an object invokes its instances which are another objects, and since a class has inheritances (hierarchy). Regarding hierarchies, a subclass contains all instances of a superclass even though such instances are not defined within the subclass. Inheritances affect flow analysis on type casting, assigning between subclasses and superclasses, and parameters passing. Studies [PR95, CSH95, CR97, DMM98] have integrated type information and alias analysis in order to facilitate safe type inference or precise alias detections. Although their works gave us the basic direction of an alias analysis in object- oriented languages and how to handle virtual functions in C++, their object-pair alias representations are not well applicable to Java for efficiencies and safeties; subsequent analysis has been uncompleted because a constructor, one kind of a 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. method (procedure), which initializes an object has not been considered in their works and they did not consider the shadowed variables which also cause imprecise implementing of calling graphs particularly regarding to constructors; some of their type inference mechanisms are not safe for dynamic type determination and less efficient to apply to Java. Our proposed work adds additional type information and a data flow computing rules for exceptions in order to handle both of shadowed variables and overridden methods simultaneously, to maintain safety of the analysis, and to improve its efficiency by proposing reference-set representation and direct type inference of references while adopted an existing alias analysis algorithm in C++ to Java. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 Background 2.1. Inter-procedural Analysis An inter-procedural analysis gathers information among all procedures in the entire program instead of a single procedure. It determines the variables that are modified as a side effect of a procedure call by finding given pair of variables which might be aliased to one another on entry to a given procedure. 2.1.1. Inter-procedural Problems People [Bann79, ASU86, Kenn97, Much97] have defined some terminologies for inter-procedural problems as follows: Alias Set ALIAS(s, x): For a procedure p and its formal parameter x, set of all variables that may refer to the same memory location of x as the entry to the calling site s which calls the procedure p. Call Graph G = (N, E): A graph that models the calling relationships between procedures in a program. Vertices in N represent procedures in the program and edges in E represent their possible calls. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Also, inter-procedural problems have been classified into may and must problems with flow analysis problems: May Problems Compute set of variables. MOD (may be modified), REF (may be referenced), USE (may be used before being defined). At a procedure p, may refer to the same storage location, that is, in some execution instances of the p on some path through a flow-graph. Example: in a procedure p, one path of p contains x=y and another path x=z, x may refer to y or z. Must Problems Compute set of variables. KILL (must be killed). At a procedure p, must refer to the same storage location, that is, in all execution instances of the p on all paths through a flow-graph. Example: in a procedure p, every path of p contain only x=y, x must refer to y- Flow-Sensitive Problem Examines the body of the called procedure with regard to individual control- flow paths of the body by making use of the intra-procedural control flow information associated with individual procedures (control flow graph). Example: KILL, x refers to y in block B (procedure p). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Flow-Insensitive Problem Examines the body of the called procedure without regard to individual control-flow paths of the body without making use of the intra-procedural control flow information (control flow graph). Example: MOD, REF, x may refer to y since there is a path x=y in a p. Context-Sensitive Call Graph The data flow analysis is proceeded on path-sensitivity, so each procedure may be analyzed separately for different calling contexts. Context-Insensitive Call Graph Each procedure is represented by a single node in a call graph, so data information is computed efficiently in context-insensitivity graph but the computed data are approximated. 2.1.2. Problem statements of alias analysis Data-flow analysis computes data information which is used for compiler optimization, in particular with the parallelization of a code written in a sequential code. Alias analysis computes data information alias set which consists of aliased elements. There is a trade-off between precision and efficiency in data-flow analysis so that people [LR92, LRZ92, CBC93, EGH94, WL94, CSH95, PR95, BCC97, CR97] have proposed their data-flow analysis for aliases on flow-sensitivity or flow- insensitivity within a procedure by combining one of them with context-sensitivity or 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. context-insensitivity analyses among procedures. Thus, to maintain reasonable efficiency without losing the precision, safety of the aliases has been considered. Precise aliases can be defined as aliases which must be occurred at the statements along the execution path of a code. Safe aliases can be defined as aliases which are computed approximately at the statements along the execution path of a code and which are the super set of the precise aliases along the path. Myers [Myers81] shows that inter-procedural data flow problems become NP-complete in the presence of aliasing so that precise aliases will be computed at exponential space and time complexities. Therefore, people [LR92, LRZ92, CBC93, CSH95, PR95, CR97, BCC97] have studied to improve efficiency of their analyses to polynomial time complexity by computing approximated aliases, that is, safe aliases. 2.2. Inter-procedural Alias Analysis Inter-procedural alias analysis computes aliases among procedures by using Control Flow Graph and Calling Graph. Intra-procedural analysis uses Control Flow Graph to analyze the control flow of each statement within a procedure. Calling Graph is used in inter-procedural analysis to relate procedures such as a caller and its callees so that they can be interactively analyzed. 2.2.1. Data-Flow Analysis Data-Flow Analysis is defined as a process which collects information about the way variables are used in a program [ASU86]. It is much harder to perform Data- 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ? b; // sUlgmgiuT^ O n : set of all definitions of a in the program (a) Assignment Statement gen[s] = [d] kill[s] = Da - {d) ow/[s] = gen[s] u (t/i[s) - kill[s]) <n[s2] = onr [si) o (b) Concatenating Assignment Statement statement si statement s2 statement si /n[sl] = on/[s] u ou/[sl] i'n[s] = ow[sl] u o m /[s2] , x .... . . . (d) loop statement (c) conditional statement Figure 1. Control-Flow Graphs and Data-Flow equations [ASU86] Flow analysis among procedures on traditional procedure languages than on a procedure because procedures are related to each other as a caller and its callee. Therefore, in such analysis, Control Flow Graph and Calling Graph are used to collect data-flow information among procedures. Data-flow information is computed by a typical equation which has the form Equation 2.2.1 is named data-flow equation. A statement or a basic block of a program is represented as s. Gen[s] describes that some definition of a variable within s reaches the end of s. That is, it is the set of definitions generated by s. Kill[s] is intended for representing the set of definitions that never reach the end of s, that is, it ouf[s] = gen[s] U (m[s] - kill[s]) (2.2.1) 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is the set of definitions killed by s [ASU86]. In[s] is the set of definitions before reaching the beginning of s. Out[s] is the set of definitions after analyzing s. Figure 1 shows data-flow equations for possible statements of a program on control-flow graphs. These equations are largely extended for the proposed rules of our algorithm in section 3.3 and 3.4. 2.2.2. Calling Graph and Control-Flow Graph Callahan [Much98] and Choi [CBC93] compute flow-sensitive inter-procedural alias information based on Control Flow Graphs with equations of Figure I. Intra procedural analysis is performed based on the control-flow graph G of a procedure that consists of four elements: G = (N, E, entry, exit) where N is a node that stands for a statement, E is an edge between nodes, and entry and exit are basic nodes for the procedure. Inter-procedural information is collected among procedures by integrating intra-procedural analyses. In addition to those nodes, nodes m[s], out[s], and edge E should be considered for each statement as in Figure 1. /n[s] stands for alias set before the statement s and ou/[s] represents alias set after the statement s. Also, its edge E represents the transfer of the control between statements s and its previous/ next statements. Entry node collects alias set of global variables and the formal parameters of the procedure P in case of call-by-reference. Exit node collects alias set of global variables, the formal parameters, the local variables, and the return variable of the 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. P0{ Q(); // calling site 1 Q(){ R(); // calling site I R(){ I (a) Example code of procedures (ZD CD (ZD (b) Calling Graph ^ : can be linked based on calling graph Pcxit Qcniry Rcatry Pest Qest Qcxit R e n try Rcxit Pen try c a llin g Q cuttin g R (d) Inter-procedural Control-Flow Graph (c) Control-Flow Graph Figure 2. Inter-procedural analysis on graphs procedure P. /n[s] and ouf[s] nodes are related to each calling sites in a procedure P. At a calling site, an edge connects in[s] node of a calling procedure P to the entry node of a called procedure Q. The exit node of the called procedure Q is connected to 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0u/[s] node of the procedure P by the other edge. Figure 2 shows the calling graph and control flow graph for procedures P, Q, and R. While reading and parsing the example code (a) of Figure 2, control flow graph G is constructed with node N, entry, exit of the procedure and edge E. During static analysis, even though a procedure invokes other procedures which do not create their control flow graphs yet, the control flow graph of invoked procedures will be implemented with subsequent parsing or with an algorithm to build a calling graph so that control flow graphs of a caller and its callees are connected by linking edges. Calling Graph represents links among a caller and its callees so that control-flow graph can refer to calling graph to get the information of callees. Figure 2 (d) shows inter-procedural analysis which builds a CG with connected CFGs of caller and callee at a calling statement [Much98, LR92, LRZ92]. Choi [CBC93] builds a CG which presents a link between a caller and its callee by using iterative analysis for all of the procedures in the application program and apply data-flow equations on CG shown in Figure 2 (b) and CFG shown in Figure 2 (c). 2.2.2.I. Flow-Sensitive Context-Insensitive Graph An intra-procedural analysis has two approaches: flow-sensitive and flow- insensitive analysis. Figure 3 presents an example code and its CFGs. Figure 3 (b) is a flow-sensitive CFG of its example code (a). Its node and edge are path sensitive where a node is a statement and edge is its flow between nodes. A flow-sensitive 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. main() { stm tl; if(true) stmt2; else stmt3; stmt4; } (a) Examaple Code stmt4 7 stmt2 stm t4 m a,nexit ( m ainexit) (b) Flow-Sensitive (c) Flow-Isensitive Figure 3. An example Code and its CFG Categorization raam () { si:Q 10; S2:Q2(); sd: Qn(); } _______ void Ql(){ RO : } void Q20 { RO; } void QnO{ RO; } m ainO ) QIO) ( Q 2 0 ) •• (Q n O ) (An, Sn (A2.S2.Q2) mainQ) ( ~ Q 1 0 ) ( Q2Q ) - ( Q n o ) /si S2 Sn ( RO ) ( RO ) - ( RQ ) (a) Context-Insensitive (b) Context-Sensitive Figure 4. An example Code and its Calling Graph Categorization analysis computes more precise aliases but it is slower than a flow-insensitive analysis. Figure 3 (c) is its flow-insensitive CFG that is path insensitive. Although a flow-insensitive analysis computes safe aliases with high efficiency, it is too 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. imprecise to be applied to the data-flow analysis for the parallelization of a code since it ignores control flow graph of a procedure. For the inter-procedural analysis, the program call graph is used which is a directed graph that represents the calling relationships among procedures in a code. The program call graph is analyzed in a context-sensitive or context-insensitive way. In a context-sensitive call graph: Figure 4 (b), each procedure may be analyzed separately for different calling contexts. And, context-sensitive call graph can be shown as in Figure 4 (a) when the data flow analysis is proceeded on path-sensitivity. Precise aliases can be computed along all paths in a context-sensitive graph in Figure 4 (b) and in a path-sensitivity on Figure 4 (a). But it has a performance problem because its computing time is exponential with respect to the number of edges in the call graph [Myers81]. In a context-insensitive call graph: Figure 4 (a), each procedure is represented by a single node in the graph [GDDC97]. Data information is computed more efficiently in a context-insensitivity graph but the computed data are approximated. People have studied to improve the trade-off between precision and efficiency in inter-procedural data flow analysis; Context-sensitivity is used in Emami [EGH94] and Wilson [WL94] for a pointer analysis in C. They concentrate on gaining precise aliases by improving the efficiency; Context-insensitivity is used in Landi [LR92, LRZ92], Choi [CBC93], and Burke [BCC94, BCC97] because context-sensitive 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. invalid aliases combining A il Al3, Al4, A 23, A24 (a) forward path problem in CFG and CG: (b) backward path problem in CG: invalid alias combining where to return back? Figure 5. An example path problems analysis has a performance limitation. They proposed efficient algorithms that reduce the number of safe aliases to precise aliases by using context-insensitivity with tags. Our alias analysis in this thesis computes aliases on flow-sensitive context- insensitive CG as in Landi [LR92, LRZ92], Choi [CBC93], and Burke [BCC97] because we want to maintain the advantage of the efficiency in context-insensitive CG and safety of flow-sensitive CFG and to improve the precision of safe approximated aliases by applying the solution of path problems shown in Choi [CBC93] and Burke [BCC97], 2.2.2.2. Problems in Flow-Sensitive Context-Insensitive Graph With the alias computing on CFG and CG of previous section, we want to maintain the efficiency and improve the precision of safe approximated aliases. Thus, our alias analysis computes the aliases on flow-sensitive CFG context-insensitive CG as in Landi [LR92, LRZ92], Choi [CBC93], and Burke [BCC97]. There are several 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. issues to solve in order to compute precise aliases in flow-sensitive and context- insensitive CG Those issues are: forward path problem in flow-sensitive and context- insensitive CG and backward path problem in context-insensitive CG People [LR92, LRZ92, CBC93, BCC97] have proposed their own answers to overcome those issues. In our alias analysis, we follow the proposition of Choi [CBC93] and Burke [BCC97] since their data structure is easy to follow and they proposed that their solution for forward path problem might be more precise than Landi [LR92, LRZ92]’s. In CG and CFG invalid aliases are occurred by combining aliases that come from different execution paths and should not be combined. This invalid combining problem is called forward path problem. Forward path problem is shown in Figure 5 (a). Aliased elements Aij: Ai3, Ai4, A23, A24 are invalid alias combining because Ai and Aj come from different execution path; Ai and A y stands for aliases; Aij is defined as a new generated alias by combining aliases Ai and Aj. A new generated alias Aij is a valid alias at a node N only if (1) Ai and Aj hold immediately before the node N; and (2) the semantics of the node N requires the combination ® of Ai and Aj [CBC93, BCC97] as follows: Aij <= Ai ® Aj Choi [CBC93] and Burke [BCC97] proposed an alias data structure named alias instance that reduces the occurrence of the invalid combining problem as follows. AI = [AJJirthSite] 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Alias instance AI consists of aliases A and BirthSite. The BirthSite of an alias instance is the statement that generates the aliases A. A lij <= A li ® Aij Thus, alias instance Alij is valid if and only if the combining of corresponding aliases of Ali and Aij is valid. Also, in order to address the inter-procedural forward path problem, an alias instance is associated with the most recent call site of the execution path along which the alias A is propagated: AI = [AJiirthSite,C allSite] The CallSite of an alias instance is determined as follows: (1) For an alias instance propagated along call site C to the entry of the callee, C is its call site. (2) Call site becomes null when an alias instance is generated without combining any aliases. (3) An alias instance Alij is valid if and only if the call sites of Ali and Aij are identical or at least one of them is null [BCC97]. Another source of inaccuracy in alias analysis, unrealizable path problem [LR92], is occurred when invalid aliases are returned backward to the caller from callee in Figure 5 (b). Backward path problem, another name of an unrealizable path problem, is handled by this call site information on Choi [CBC97]’s following rule: (1) An alias instance with a null call site at the exit of a callee need be propagated to all the callers. (2) An alias instance with a call site C at the exit of a callee should be propagated only back to the caller C. However, their solution has limitation in case 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. there exist more subsequent called procedures. If more call site path information is added in order to avoid the limitation, space complexity will become exponential [CBC93, BCC97], If applying Choi [CBC93] and Burke [BCC97]’s solution to the rules of section 3.3 and section 3.4, alias instance AI can be computed with partially avoiding forward and backward path problems. But to avoid the complexity of the rules, these solutions are not used in our rules. 2.2.3. Alias-Set Computing Alias is defined as two or more different variables point at the same memory location. In procedural or object-oriented language such as C or C++, pointer may cause aliases. In order to compute aliases among statements, a transfer function describes the effect of aliases on data-flow framework, specially among pointers in C or C++. For a statement s, out(s) of statement s can be derived from in(s) by transfer function transs as follows [ASU86, CBC93] within a CFG; out(s) is an alias set after the statement s; in(s) is an alias set before the statement s; for out(p), p is a predecessor of the statement s. out(s) = transs(i>i(s)) (2.2.3) in(s) = U out[p] Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.3. Related Work The computation of pointer aliases has been studied by many colleagues as C and C++ have been developed. C is a procedural language and a memory is handled by using pointer variables. C++ is developed by extending C to object-oriented concept even though it is not pure object-oriented language. Since pointer implies many hidden aliases, people [LR92, CBC93, BCC94, EGH94, CSH95, PR95, BCC97, CR97, WWGOO] have studied the computing of the pointer induced aliases. They focused on how to improve efficiency, preciseness, and safety of analyses that has a trade-off each other. Thus, existing studies have their own alias-set representations and type inference methods. Java has the similar syntax with C and C++ so that an alias analysis in Java can be an extension of existing alias analyses in C/C++. Unfortunately, since Java is a pure object-oriented language with several properties: object references, invoking methods of other classes, late dynamic binding, class hierarchies, exceptions and multithreading, conventional studies [LR92, CBC93, BCC94, EGH94, CSH95, PR95, BCC97, CR97] are not sufficient solutions to compute aliases among objects in Java. Nonetheless, conventional inter-procedural analyses give many backgrounds to an alias analysis in Java. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. func() { A **x, *y, z; bar() { A x, y, z; x = &y; //statement 1 y = &z; //statement 2 z = new A(); //statement 1 y = z; //statement 2 x = y; //statements (a) A Code in C++ (b) A Code in Java Figure 6. Example Programs for Related Work 2.3.1. Alias-Set Representation Pande presented the first algorithm which solved type determinations and pointer aliases simultaneously with points-to alias set representation in C++ programs [PR95]. Points-to is the form of <loc, obj> where obj is an object and loc is a memory location of the object obj. Points-to pair is essentially points-to relation introduced by Emami [EGH94]. Emami proposed it to reduce extraneous alias pairs generated in certain cases with alias pairs of Landi [LR92]. Carini proposed a flow-sensitive alias analysis in C++ with compact representation [CSH95]. The compact representation is a alias relation that has a name object or one level of dereferencing. The compact representation of alias relation was introduced by Choi [CBC93, BCC94, BCC97] to eliminate redundant alias pairs. 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chatteijee presented a flow-sensitive alias analysis in object-oriented language with points-to alias set representation in C++ [CR97]. He improves the efficiency and safety of points-to alias set representation comparing to Pande’s. The compact and points-to alias representation are highly similar. But, the points- to alias representation contains may or must alias information [BCC94, BCC97]. Woo introduced a flow-sensitive alias analysis in Java with referred-set alias representation, which is an alternation of this thesis [WWGOO]. Referred-set is a set of objects that may be pointed by a reference variable and an alias set is a collection of the referred-set. It is to reduce extraneous alias pairs while applying compact and points-to alias representation in C/C++ to Java. For example, in C++ as (a) of Figure 6, Alias pairs can be generated as < *x, y>, <**x, *y>, <*y, z>, <**x, z>. Its compact representations are <*x, y>. <*y, z>. Points-to pairs can be represented as <x, y, D>, <y, z, D> where D represents its must alias relation. Referred-set alias relation is A(statement 2) = null because each variable does not point any object In Java as (b) of Figure 6, Alias pairs can be generated as <z. obj_A>, <y, z>, <x, y>, <y, obj_A>, <x, obj_A> where obj_A is an object generated at statement 1. Its compact representations are <z, obj_A>, <y, z>, <x, y>. Points-to pairs can be represented as <z, obj_A>, <y, z, D>, <x, y, D> where D represents its must alias Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. relation. Referred-set alias relations are Az = <obj_A>, Ay = <obj_A>, Ax = <obj_A>, A(statement 3) = {A ., Ay , A J. Existing alias relations in C++, compact and points-to representations, are highly similar and become a pair of <object name, object name> when applied to Java. Object name is an object generated or a reference variable that points at the object. In this thesis, we call compact and points-to representations as object-pair representation. The space complexity of object-pair representation is 0(Nr + N0 XnCi), where Nr: the number of reference variables, N0: the number of objects, and n: the number of reference variables that refer to an object. The space complexity of Referred-set alias relations is 0(Nr + N0 x A 0), where Nr: the number of reference variables, N0: the number of objects, and A0: the maximum number of references aliased for an object. If n becomes larger, nC: will be larger than Aa. Thus, 0(Nr + N0 XnCi) > 0(Nr + N0 X A 0). Therefore, comparing to object-pairs, the collection of objects for a reference variable saves spaces. And, it may improve performance of time complexity to compute alias set. 2.3.2. Type Inference for Objects As an object-oriented language, C++ and Java contains many inheritances which cause dynamic invocation. The dynamic invocation such as virtual functions makes it hard to build calling graph statically. 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Type inference statically computes possible run-time types for every expression in the program’s source code. The static determination of run-time type is a key issue for compile-time optimization because the dynamic calls of an object-oriented language make it hard to build its calling graph statically. Therefore, there are advantages of static type determinations. First, the inlining or the cloning optimization mechanisms of functions are applicable if the static type determination resolves a number of indirect function calls by limiting the number of possible functions invoked and by converting the indirect function calls to direct function calls. Second, the static type determination improves the precision of inter procedural analyses and transformations in object-oriented language. Finally, the static type determination has solved problems upon the occurrence of virtual function calls for compile-time optimization. For example, it presents the inlining which converts indirect calls to direct calls and the pipelining which eliminates pipeline delays by knowing the target of calls. For an alias analysis in object-oriented languages, the second and the third benefits will improve the precision or, at least, maintain the safety of the analysis. Pande integrates alias analysis and dynamic type information in C++ [PR95], Its points-to alias relations contains its type information. However, it loses the safety of alias relations because the type information is not clear how to maintain type information for conditional statements. 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Carini proposed a type table to infer possible types of an object in C++ [CSH95]. The type table is a pair of <object, type>. We can statically infer possible types of an object with its alias relations and the type table. Its space complexity is 0(Na +Nr), where N0: the number of objects and Nr: the number of references. Its time complexity with compact alias representation is 0((No + Nr) x A r), where Ar: the maximum number of references aliased for an object. Chatterjee computed a type of pointer variables with referenced and modified type relations and alias relations at each program point in C++ [CR97]. Its space complexity is same as compact representation. Its points-to alias relations contains its type information. And, it maintains the safety of the possible types for conditional statements by adding may or must flag. The time complexity is the same as compact alias representation. The time complexity of the Referred-set alias relation for type inference is 0(R X A0), where R: the maximum number of accessible references at a program point and A0: the maximum number of objects pointed by a reference. Carini’s type table is used and searched each object to collect possible types of a reference because a referred- set contains all possible objects to which the reference points. Since, normally, R is smaller than N0 +Nn referred-set alias relation will be more efficient than the others. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. public class PrintTest { public static void main( String args[]) { PrintThread thread 1, thread2; PrintThread thread3, thread4; thread 1 = new PrintThread( "I"); thread2 = new PrintThread( "2" ); thread3 = new PrintThread( ”3"); thread4 = new PrintThread( ” 4"); thread l.start() thread2.start() thread3.start() thread4.start() } class PrintThread extends Thread { int sleepTime; public PrintThread( String id ) { super( id); // sleep between 0 and S seconds sleepTime = (int) ( Math.random() * 5000); System.out.println( "Name:" + getName() + sleep: ” + sleepTime); } // execute the thread public void run() { // put thread to sleep for a random interval try { sleep( sleepTime); } catch ( InterruptedException exception ) { System.err.println( "Exception:" + exception.toStringO } // print thread name System.out.println( "Thread " + getNameO ); } Figure 7. Multi-threading example code [DD97] 2.3.3. Other Java Issues Previous section describes fundamental object-oriented language issues for an alias analysis. This section presents other issues in Java language such as multi threading and exception handling that provide more difficulties in alias analysis. Multi-threads as light weight processes have been used in C and C++ by adding light weight process libraries. Java provides built-in language support for threads and makes programming much easier for light weight processes. Its important benefit is 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that it improves not only the scientific computing but also interactive performance of graphic user interface and audio clip [Flan97]. Figure 7 shows example multi-threading code that is randomly scheduled so that the executing sequence of four threads will be randomly determined. The scheduling mechanism makes alias analysis more complicated when shared variables exist among threads and are interfered by each thread. In addition to multi-threading issue, Java provides exception handling feature. An exception is a signal that indicates that some sort of error condition has occurred. Exceptions propagate up through the block of a method caught to the method call stack. If an exception is not caught by the block of a code, it keeps propagating to the next higher enclosing block of a code to find a block of a method that catch the exception. But if it is not caught by any higher method, it will propagates up to all the way to the main() method from which the program started and the interpreter exits and prints an error message and a stack trace [Flan97]. Java provides an exception handling mechanism with try/catch/finally construct. The try block handles its exceptions and abnormal exits with zero or more catch blocks. The catch clauses catch and handle specified exceptions. But run-time exceptions such as a NullPointerException cannot be caught by using catch block but propagated to exit with stack traces and error message. The finally block should be executed even though an exception is caught or not. A programmer’s own exceptions are generated by the throw statement. 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Choi has analyzed exceptions statically in Java. He presented a model of exceptions by analyzing exceptional stack based control flows in bytecode [CGH99]. But, exceptions are another issue for a statement based alias analysis of source codes. 2.3.4. Other Related Work The following papers contribute on alias analysis, parameterized analysis, and type inference separately. Even though those papers did not integrate alias analysis and type inference, we have referred to some part of those works for implementing our alias analysis on inherited classes in Java. Landi [LR92] presented an algorithm which safely approximated inter procedural may alias in the presence of pointers as in C language. He used k-limiting system for recursive data structures to represent all possible aliases to finite numbers. He also applied inter-procedural control flow graphs which interconnect the control flow graphs of procedures. It was completely designed and implemented in Landi’s [LRZ92]. He proposed that his program-point-specific pointer aliasing information was quite precise and it could not be compared to Choi’s [CBC93]. Choi [CBC93] implemented efficient flow-sensitive alias analysis algorithm for pointers on iterative loop. He proposed that it was more precise and efficient than the best inter-procedural method known at that time. For intra-procedural phase, traditional control-flow graph was modified by sparse-evaluation graph for efficiency. Its backward bind transferred the result of aliases to the caller. For the inter- 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. procedural phase, forward bind transferred the aliases from caller to callee by unionizing the aliases at call site and the mapping of formal/actual parameters. Agesen [APS93] designed and implemented a type inference algorithm for dynamic inheritance in SELF. It was the first algorithm to handle dynamic inheritance, multiple inheritance, object based inheritance, and blocks with non-local returns. It is applicable to the inlining of virtual function calls. Plevyak [PC94, PC95] presents an incremental constraint-based type inference which produces more precise type information by typing many previously untypable object-oriented programs. Also, they present an cloning algorithm which removes dynamic dispatches from parametric polymorphism on minimizing the number of clones. Bacon [BS96] compares three static analysis algorithms which resolve virtual function calls and then improve C++ programs. Those three mechanics of the static analysis algorithm are Unique name (UN), Class Hierarchy Analysis (CHA), and Rapid Type Analysis (RTA). They have shown that RTA most highly resolves virtual function calls in their benchmarks and then that CHA and UN does. Dean [DDGLC96, CDG96] develops the Vortex compiler infrastructure, a language-independent optimizing compiler for object-oriented languages. It translates the input language into Vortex compiler’s intermediate language and produces high- quality code. Its executable code, a series of three-address statement, is able to add the type of message sends and instance variable accesses. It will be applicable to 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. decide type of shadowed variables and overridden methods in Java as well as of virtual functions in C++. Burke [BCC97] presented approximation of inter-procedural alias analysis for a language which contains pointers, reference parameters, and recursion. It improved precision of alias analysis with respect to unrealizable path and non-distributed combining problem and for compact and graph-based representations. It included the algorithm of Choi [CBC93]’s. Christiansen[CCINSW97] presented an Javelin as an infrastructure for global computing on Java-enabled web browser. It was an infrastructure for implementing coarse-grained— applets, that is, classes— parallel applications on numerous anonymous machine. Grove [GDDC97] presented a parameterized algorithm framework for call graph construction in dynamic dispatch and first class functions. It developed a common framework for describing existing call graph construction algorithms and evaluated the precision and cost of the algorithms on programs. It was more general than conventional parameterized algorithms and extended to the issue of cross-algorithm precision comparisons in an optimizing compiler. Diwan [DMM98] presented Type-Based Alias Analysis (TBAA) which inferred aliases based on types instead of detecting aliases based on alias analysis and additional type inference. Their algorithm was type-based mechanism instead of 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. conventional instruction-based alias analysis algorithm so that it was fast but imprecise. DeFouw [DGC98] implemented a general parameterized analysis algorithm for inter-procedural control flow analysis of higher-order in object-oriented languages. It produced more precise results and same complexities than previous algorithms by integrating both propagation-style analysis and unification-style analysis Caromel[CKV98] proposed Java// library which was 100% Java and metaobject protocol on which Java// was built. In order to maintain the platform independent property of Java, the library was designed on 100% pure Java and it was accessed to the compiled representation of classes, not to the sources for distributing Java objects. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 Alias Analysis for Java with Reference-Set Representation Alias analyses have been studied for C or C++. Since Java becomes one of candidate languages for high-performance computing, there has been a need to apply existing analyses to Java. In this section, object-oriented languages C++ and Java are compared and then a proper alias representation of Java is proposed. 3.1. Differences between C++ and Java Naming of an object should be considered to represent aliases in C++ and Java. In C++, static objects declare object names. Also, dynamic objects and pointer-valued objects (pointer variables) have their own names for an alias analysis. A pointer variable name is a name to point an object that contains the address of a pointed-to object. In Figure 8 (a), pointer variable name p is naming a pointer-valued object that contains its address value. Dereferenced pointer p is naming the object that is pointed to by p. A variable name p that is not a pointer is naming an object that contains the address of the variable. There exist alias relations among pointer-valued objects because of pointer-to-pointer relationships. Therefore, in the previous works [CBC93, CS95, BCC97], when pointer p points to an object of v, the alias relation is represented as <*p, v>. Figure 8 (b) shows that a pointer points to another pointer variable that complicates the alias analysis, where a box depicts a pointer-valued 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 8. Relation between a pointer and an object in C++ <*p. r> <*q, r> <*r. v> <p. r> <q. r> <r.d. v> Figure 9. Difference between a pointer in C++ and a reference in Java object and a circle is a non-pointer object. Those alias relations are represented as <*p, q> and <*q, v>. Existing alias relations in C++ are similar compact [CBC93, CS95, BCC97] and points-to [PR96, CR97] representation. Those relations save spaces by representing all alias relations without using an exhaustive set. Those relations can be used in Java. However, there are some problems to use those representations because only references are used to name objects in Java. A reference is a variable that refers to an object as a pointer in C++. There are no pointer-to- pointer concepts and no pointer operations in Java. An object in Java is created dynamically so that the object becomes an anonymous object that does not have its own name. Thus, each object needs its own naming by binding a reference name and an object name in an alias relation. However, we have another more efficient binding. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In Figure 9 (a), if there is an assignment statement *p = w, the value of the addressed valued object named by p is changed to the value of the addressed valued object bound by w. Thus, the object pointed by r is changed via <*p, r>. Therefore, <*r, v> can be killed and a new alias relation <*r, w> is generated. In Figure 9 (b), an alias relation via compact representation in Java is shown. If there is an assignment statement p.d = w, the addressed value of the reference p.d is changed so that <p.d, r.d> is inferred and p.d and r.d are considered as an alias of the same address-valued object. However, if v and r.d are recognized as an alias of the same address-valued object in <r.d, v>, an object referred to by the reference v is changed and the wrong alias relations <r.d, w> and <v, w> are generated. For the correct relation, <r.d, v> should be killed and <r.d, w> should be generated. The wrong result comes from the fact that, in Java, a reference name is used for naming an object without a dereferencing operator such as * in C++. To obtain a right result in this example, r.d should be recognized not only as a memory location that contains its addressed value in <p.d, r.d> but also as an object that is referred to by the reference r.d. To solve this problem, an alias relation for an address-valued object should be presented by extending a compact representation. Otherwise, a data flow equation for aliases should recognize the difference. Therefore, reference names for an alias relation should be meant as dereferencing and the <p.d, r.d> alias relation for the alias computation should be analyzed differently. Especially, the alias computation should consider that reference names in an 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. assignment statement l-value and r-value should be used to assign the address value of the memory location. 3.2. Reference-Set Representation of Allas Set In the previous section, it is shown that a compact representation can be used for alias relations in Java even though it causes impreciseness. For a more precise alias analysis in object-oriented languages, the type information of the objects accessed are needed and this information can be collected more safely via alias information [PR95]. It is known as a type inference. The type information can be used for the virtual functions resolution. The more precise the type information, the more precise alias analysis becomes. Since the method resolution is executed iteratively whenever an iterative loop of an algorithm meets the method, the type inference to collect all the possible type information affects the efficiency of the alias analysis algorithm. In this section, the reference-set representation is proposed to improve the efficiency of the alias computation and the type inference. The reference-set representation collects a set of all references that refer to an object. The set of all possible references is called the reference-set. Reference-set: a set of alias references that consist of more than two references which refer to an object; /?, = (rj, r2, ..., ry/: for each j, initially j > 2 and /y is a reference for an object; when /y and rk are qualified expressions with a field/, ry and rk can be represented with a /?,./ with a 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A a = new A(); //object 1 is created Ab; X x, c, d; c = a.e; d = c; Ff; Gg; f = new F(); // object 3 is created g = new G(); // object 4 is created f.h = g; // the statement s b = a; a.e = new X(); //object 2 is created c d g a f b Figure 10. The relationships between references and objects reference-set Rt for an object /; During data flow computation in an alias analysis, j > 1 is allowed when passing references forward and backward at a call site. All reference-sets that are accessible from the current statement of a program should be computed and the set of the reference-sets, is defined as an alias set. The alias set contains the entire alias information at the statement. Alias set: a set of reference-sets at a statement s; As = {Rj, R2 RJ In the statement s of a program in Figure 10, each reference-set and alias set for the alias relation are represented as follows. Initially, we only consider reference-sets that contain more than two elements since reference-set with one element is a redundant set for an alias analysis. 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R, = {a, b) R2 = IRl.e, c, d} R4 = ff.h, gj As = (R[, R2 , R4 } The space complexity of the reference-set representation is 0(Rn x Ar), where Rn is the number of objects and Ar is the maximum number of aliased references for an object. It is less than 0(Rt x Ar) of the existing compact representation [CBC93, CS95, BCC97], where R, is the maximum number of objects in the program. Practically, Rn is less than Rt because Rn is the number of objects that include more than two references. The space complexity directly affects the time complexity of the data-flow computation in an alias analysis. Therefore, the reference-set representation has a lower time complexity. Further, the time complexity of the reference-set representation for the type inference is 0(C) where C is the constant which represents the number of reference variables in a Java code. Because time complexity of the type inference is 0(Rt x Ar) for the compact representation and Rt x A r is the much larger number than C, the reference-set representation is more efficient for the type inference. An alias analysis algorithm computes the alias sets in a program. Each statement collects an alias set from its predecessor and updates it with the statement itself and passes the resulting alias set to its successor(s). Since the alias computation should be iteratively done until the alias sets and a calling graph have converged for the program, it affects the efficiency of the whole algorithms. Supposed that there is an 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. assignment statement a = g in Figure 10. It means that the reference a refers to an object that the reference g refers to. Therefore, the element a of the reference-set Rj is killed and the elements a should be copied to the reference-set R4. The time complexity of this computation depends on the space complexity of each representation. Thus, the efficiency of whole algorithms is improved via reference-set representation. 3.3. Data Structure Aliases can be computed with data-flow equations. For the computation of the aliases, we define our data-flow equation rules, calling graph (CG), and control flow graph (CFG) in this section. A type table contains all possible types of reference variables. 3.3.1. Calling Graph and Control Flow Graph A CG is needed to compute the alias set of an inter-procedural analysis between a calling and a called methods at a call statement. Our CG is a directed graph defined 35 < ^ cg Ecg n m a in * * ' where Nqq is a set of nodes and each node is a method shown one time in a CG even though it may be called many times; where Ecc is a set of directed edges connected from caller(s) to callee(s) and one edge is connected even though a caller may invoke the callee many times and ail edges are connected when many callers invoke one callee; n ^ ^ is the main method that executes initially in a Java program. During our algorithm proceeds, our CG grows as in the previous works 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [CBC93, CS95, PR96, CR97, BCC97, WWGOO, WACGWOl] by adding nodes because the precise method invocations are achieved by deciding the resolution of overridden methods. CFGs can be used to compute the alias sets of the intra-procedural propagations. Our CFG is a directed graph defined for each method as <NCF q Ec fg , nentry, nexjl>, where NCFG is a set of nodes with nentry nexit, and each statement of the method; Ecfg is the set of directed edges that represent the control and alias set information between a predecessor and a successor statements; nentry represents the entry node of the method; nexit represents the exit node of the method. In our CFG, seven node types are proposed based on their purposes: Entry that is the nentry of the CFG, Exit that is the nexit node, Assignment statement, Call statement, Return Statement, Flow construct node (if, while etc.), and Merging nodes. The flow construct node is a node which signifies the start of the if ox while clause. In an I f node, each clause is branched from the node. All the branched clauses are merged into a merging node. In a while node, a merging node is not necessary and a directed edge is connected from the last node of the while loop to the flow construct node. 3.3.2. Type Table As an object-oriented language, Java contains many inheritances which cause dynamic invocation. The dynamic invocation such as overridden methods makes it 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. hard to build calling graph statically. We can infer the types of objects in the statements which has inheritances and dynamic invocations of aliases. Type inference statically computes possible run-time types for every expression in the program’s source code. The static determination of run-time type is a key issue for compile-time optimization because the dynamic calls of an object-oriented language make it hard to build its calling graph statically. Our alias analysis needs to use a static type inference. First, the static type determination improves the precision of inter-procedural analyses and transformations in object-oriented language. Second, the static type determination has solved problems upon the occurrence of virtual function calls for compile-time optimization. In our alias analysis, a type can be defined as a set of classes. Depending upon its decision time, a type is categorized as either static or dynamic. For our alias analysis, a static type is a declared type of an object in the source code; a dynamic type is established not within the source code, but rather, it is created by assigning and casting statements from objects in the hierarchical chain in the course of the operation of the application. The analysis in constraint-based type inference safely approximates run-time types in object-oriented languages [PS91, PC94, APS95, HA96], Constraint-based type inference consists of type variables and constraints. The analysis in constraint- based type inference computes a type for every expression in the program so that it will overestimate the exact types. Its operation consists of, first, defining type variables for the run-time type information, second, constrains on these variables for 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. {... A a = new AQ; II constraint 1 and 2 0 b = new 0(); // constraint 1 and 2 where class 0 is a subclass of class A a = b; // (1) constraint 3 a = (A)b; // (2) constraint 4 a = (0) b; // (3) constraint 5 a = new 0(); // (4) constraint 2 ((A)b).func1(); // (5) constraint 6 ((4)b).varx; // (6) constraint 7 func2(a); // (7) constraint 8; a func2 is defined as ‘voidfunc2(0 p)’ } (a) example code in Java 1. Assigning Statement: case (1), (2), (3), (4) [a] 2 [b] ► [a]^{A,B) object initial type overridden method sh ad o w ed variable a A B A 2. Object casting: case (5), (6) l(A)i]a[*] ► [(/4)6]2M.0} object initial type overridden method shadow ed variable (A)b N A B A Site: case (7) r„ i r„ i [a)^{A,B} r 1 object initial type overridden method sh ad o w ed variable a A B A (b) Type Tables of variables in (a) Figure 11. An example source code and its type inference Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the run-time type information are derived, finally, the resulting constraints are solved for in order to obtain the desired information. Figure 1 1 shows possible run-time type determinations upon the occurrence of assignment statements, object casting expressions, and calling statements in a Java source code. By applying constraint-based type inference, a type information of each statement will be analyzed. Even though a shadowed variable type is determined on Java syntax at compile time, the type information of shadowed variable is lost in conventional ways [PR95, CSH95, CR97, DMM98] because the overridden method type of an object is overlapped to its declared type. Thus, it causes ambiguity to determine the type of the object while the object implies three alternative types for each usage regarding inheritance property. Therefore, in order to apply the type information efficiently and safely to our alias analysis, we propose our type table structure which consists of three elements as shown in Figure 11: initial declared class type, overridden method type, and shadowed variable type. Reference variables dynamically refer to objects in Java. Thus, the types of a reference variable are determined statically at the type declaration and dynamically during the processing of the algorithm as in Figure 20. Type inference can be processed with a type table which contains the declared and dynamic types of each reference variable. The declared type represents static type information of a reference variable and the shadowed variable type represents shadowed variable type information of a reference variable. 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The dynamic types represent possible overridden method types of the reference variable. Types of each reference variable can be computed in a constant time. If there is an assignment statement: LHS = RHS, the type of LHS is determined by the type of RHS. Our type table structure is defined as a class TypeTable in Figure 14 and its expression in the algorithm Figure 20 is: TYPES[a b [e(LHS) = TYPEStabie(RHS) where table is a type table For a call statement, RHS can be ECMC , Mc or new Mc. The type set value of ECMC is determined from the reference-set of Ec and its method Mc on its hierachical structure. For the type of Mc or new Mc, it will be simply determined by the function and type declaration. Besides, if there exists a LHS for the call statement, the type of LHS will be the type of RHS as an assignment statement: TYPEStable(LHS) = TYPEStable( RHS) Our CG is built dynamically with the type information and its structure of each node is defined as class CallingGraphNode in Figure 20. 3.3.3. Class Structure Table In order to apply our alias analysis algorithm, we need to construct control flow graphs, a calling graph, and type tables. A Control Flow Graph of each procedure is constructed during parsing. Aliases are computed on each statement that is a node of a CFG. 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. rflc rS tra p rf :lass Circle extends Shape { itmoObi r = new tmnObil 101:1 \tmpubj r = new impubjc'tesi", Z)j double areau( II # of dots per inch sq I) return (r.r * r.r); int x, y; // location double area() ( 1) return (3.14 * r.r); class trapObj { intr, class Foo { public static void main(String argsQ) ( Circle c = new Circled; Shape s; tmpObj t; 1 ) ((S/wpe)c).areaO; 2) t = {(Shape)c).r; // implies function 3 ) t-fnO ; String str; tmpUbjOnt i)l t) r = i; tmpObj(Stnng s, int i) 1 ) r = i * i; 2 ) str = s; void fhO{ } 1 Figure 12. An example Java code of shadowed variables and overridden methods for single inherited classes Based on CFGs, our alias analysis algorithm examines the body of the called procedure with regard to individual control-flow paths of the body by making use of the intra-procedural control flow information associated with individual procedures. A context-insensitive CG is used for efficiency in our algorithm. CG is constructed among methods within a class and among classes during executing our alias analysis algorithm. By referring to CG, our alias analysis algorithm can use the information of the CFG on a calling statement. For the object-oriented language different from procedural languages, a Type Table for each object is needed, which presents class types of each object. A Type 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. class structure table of class type Foo / m ^ \ CFG o f area() oitry of arc*) return (3.14 * r.r); exit ufaraO Figure 13. CSTs for an example in Figure 12 Table collects an object and its class type in order to construct complete CG so that subsequent analysis should be complete, specially for overridden methods and shadowed variables. A Type Table consists of four elements: object reference variable, its initial class type, shadowed variable type, and overridden method type. For shadowed variable type, it can be integrated with an initial class type except for cast type expression of a reference variable. In order to save the memory space for possible infinite objects at run-time and to store the information of classes: CFGs and Type Tables, a Class Structure Table (CST) is proposed for a class. We can copy instances (data and operation on it) of each class to a CST— data and operation on the data are named field and method each other in Java. We use CSTs as information of all objects to be created at run-time. 47 class structure table of iclass type Circle/Shape CFG of tmpObjf CFG of main() Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. class ClassStnicture { String className; //collection o f CFG Vector method; ClassStnicture super, sub; Vector field; class Node{ Vecotor predecessor; Vector successor; Vector aliasSet; class Alias Instance { IVpeTable nameLHS; IVpeTable name RHS; // a node of a CFG String BirthSite; String CailSite; class CallingGraphNode { Vector parent; Vector child; //information o f this node String className; String methodName; ClassStnicture classTVpe; Vector actualParameter, class ControlFlowGraph { String methodName; String returo'iype; Vector formalParameter, Vector variable; // node for entry, exit, // and each statement Node node; class IVpeTable { String className; ClassStnicture classiype; ClassStnicture fieldiype; Vector m ethodiype; // field or method invoked IVpeTable invokedName; Figure 14. Class Structure Table Figure 12 presents an example code which has constructors and a class hierarchy and which implies shadowed variables and overridden methods in Java. Figure 13 shows CSTs of an example in Figure 12. A CST includes CFGs for each method of a class during parsing and our algorithm. Type Tables are constructed in order to infer the type of each object during parsing. A CG is constructed among methods and among classes during our algorithm. We can save memory space for objects to be created infinitely at run-time because each class is represented by one CST for its data and operations. In a CFG, aliased elements are computed with the type information of each element. Figure 14 represents the data structure of a CST. Since it is implemented in Java, each data structure is a class. A class ClassStructure, which stands for a CST, mainly consists of its class name, super and sub class CST, fields, and methods. The class 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. name is the name of the class for the CST. The super and sub class CSTs are the super and sub class name of this class each other. The fields and methods are the collection of data variables and the procedures of this class each other. A class CallingGraphNode mainly consists of parents which are callers and children which are callees. Parents and children procedures are vector to analyze an alias set in context-insensitive way. The methodName is the name of this procedure. The className is the class where this procedure is defined. The classType is the reference of the class className. The actualParameter represents all of parameters for this procedure. The class ControlFlowGraph is a set of nodes that builds a CFG of a procedure. Its methodName represents the procedure name of this CFG. Its retumType is a return class type of the procedure. The formalParameter represents a collection of the formal parameters of the procedure. The variable is a collection of the references defined in the procedure. The node is a collection of statements for the procedure including entry and exit nodes. The Node class represents a node of a CFG and it has three data: predecessor, successor, and aliasSet. The predecessor and the successor are collections of predecessors and successors of the Node each other. The aliasSet stores an alias set of a statement in type Aliaslnstance for the Node. The class Aliaslnstance is to store an alias set of a node in a CFG and it consists of aliased element nameLHS and nameRHS which is stored in a Node class, BirthSite of the aliased elements, and CallSite of the aliased elements which can be used for 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. backward and forward path problems. The class TypeTable is to present all possible types of a reference to build complete CG with a className, a classType, a fieldType, a methodType and an invokedName data. The className is a name of a reference. The classType is a initial type of the reference. The fieldType is a shadowed variable type of the reference. The methodType is a collection of all possible overridden method types of the reference. The invokedName is to represent a reference. CSTs are used to proceed our alias analysis algorithm by building CFGs, a CG and a Type Table in order to compute alias sets. Our parser reads Java class files as input. During parsing, a CST of each input class is created and CFGs of procedures are built for each class. A CG is interleaved with CFGs and built during the process of our algorithm. 3.4. Propagation Rules of Alias Set In a flow-sensitive alias analysis, the alias information of each statement should include all alias relations occurred at the point. The information is propagated to the next statement and the alias results affected by that statement are subsequently computed. This propagation and computation of alias information is made through the nodes in a CFG of each method. The CFG has the structure of the class ControlFlowGraph in Figure 14. The computation of an alias set for each node can be modeled by a data-flow equation, which computes an alias set on the effect of each node. Node n has the 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. structure of class Node in Figure 14. Let in(n) be the input alias set of a node n transferred from predecessor nodes; Let out(n) be the output alias set held on exit from a node n. The alias set in(n) and out(n) consists of alised element defined as the class Aliaslnstance in Figure 14. The effect of a node n can be computed by the following equation: in(n) = u out(pred(n)) out(n) = Trans(in(n)) = Modgen [ M o d ^ in fn ))] In this equation, pred(n) represents a predecessor node of the node n. Modkill denotes the alias set modified after killing some reference-sets of in(n) and Modgen is the subsequent alias set after generating the new reference-sets on Modkm. After the alias analysis algorithm computes the alias sets for all nodes in CFG iteratively until they converge, the result alias set out(n) of each node n has a relationship with the input alias set in(n) that is described in the above data-flow equations. In our reference-set representation, an operator u works within the same reference-sets between alias sets as well as the alias sets themselves. It is defined as: Union Operator u : If A t2 = A j u A2 where A j and A2 are alias sets, A J2 is an aliase set that contains all the elements of A/ and A2 but not duplicated. For example, when an alias set Aj = {Rj, R2} where Rj = {a, b} and R2 = {b, cj and an alias set A2 = fR {, RjJ where Rj = {a, c} and R3 = {b, dj, A = A t u A2 is: A = (Rj, R2, Rj} 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where Rj = {a, b, cj, R2 = fb, c}, Rj — {b, d} because Rt = (a, bj U {a, c} The propagation of an alias set in a method is achieved by an intra-procedural analysis through the nodes in a CFG of each method. This analysis starts with the alias set holding at the entry node nentry, and traverses all nodes, and computes the out alias set of each node in a flow sensitive order. The analysis ends at the exit node nexit. When the intra-procedural analysis meets a call statement node, the alias set should be propagated into a called method because the set can affect the aliases that are computed in the called method. Similarly, the alias set should be propagated back from the called method because its result alias set can affect aliases after the current call statement. Thus, aliases should be interprocedurally propagated from a caller to its callee, and from a callee to its caller. This propagation is made through the entry and the exit node in a called method. A calling method propagates the input alias set of the entry node in a called method using its own input alias set, and then reads the output alias set of the exit node in the callee in order to compute its own output alias set at the call statement node. In this section, we describe propagation rules according to CFG node types. The rule for each node type modifies an alias set by killing and generation rule, which computes the alias set by killing and subsequently generating of reference-sets at each node. Thus, the output alias set of a node can be computed by the data-flow equation. 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.4.1 Rules for Intra-procedural Analysis The propagation rules for intra-procedural analysis are described below for every CFG node type except an entry and a call statement node type. The rule consists of premises and conclusions divided by a horizontal line. The premises are a set of equations that define an input alias set, information about a node, and intermediate sets. The premise can have the form of conditional implication that is interpreted as follows: when a given condition holds, the implied equation has a meaning and can be solved. The conclusions in our rules define equations computing the alias set out(n) for a node n. When all premises hold, the equations in the conclusions are solved for out(n). First, we define a flow construct node type rule as follows: in(n) = out(npred) npred • ' predecessor node o fn -------------------------------------------------------------------------- — [ Flow Construct Node j out(n) = in(n) npred is a predecessor set of node n. Flow construct node has several outgoing edges with the same out information. Merging node has several incoming edges. Given npred, out(n) of node n is the union of all predecessor node sets. The merging node type rule that has several incoming edges is defined as follows: in< n> = HnP Z ut(P} npred • ' predecessor node o fn ---------------------------------------------------------------------------------------- [Merging Node] out(n) = in(n) 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The next rule concerns the node type of an assignment statement. in(n) = out(npred) n pred • ’ predecessor node o fn x = LHS, y = RHS, Vi, j Rit Rj e in(n) — » [ M o d ^in fn )) = [R, \ kill x e R J U {Rt | kill R j-fe Ri when q e Rj and x = q.f}] a [in(n) = in(n) - M o d ^in fn ))] a [KILL(in(n)) = (x,Rj.f}], V k Rk 6 in(n) -» [Modgen(in(n)) = (R k \ Rk = Rk u KlUU(in(n)) w heny € R J ] a [in(n) = in(n) - Modg J in (n ))] ------------------------------------------------------------------------------------- [Assignment Node] out(n) = Modmfinfn)) u Modgdn(in(n)) u in(n) In this rule, LHS and RHS respectively stand for the left hand side and the right hand side of an assignment statement. Modb lfin(n)) is a set of reference-sets that are modified by killing the LHS element of a reference-set among in(n). KILL(in(n)) is a reference-set of references killed by Modm iin[n)). Mod,Jin(n)) is a set of reference-sets that are modified with the RHS reference-set and KILL(in(n)) among in(n). out(n) of the node n is a union of Modu lfin(n)), Modg m (in(n)), and in(n). In order to show how the above rule can be applied to alias analysis, we analyze an assignment statement a.e = fh in a statement of Figure 10. Initially, reference-set Rl, R2, R3 and alias set in(n) are expressed as follows for the statement: Rl = [a, b] R2 = (Rl.e, c, d] R3 = (fh, g] in(n) = [Rl, R2, R3J Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Because LHS is a qualified expression related to both R l and R2, Modu u (in(n)), in(n), and KILL(in(n)) are computed as follows: R l = {a, bj and R2 = (Rl.e, c, d] then R2 = (c, d} ModUin(n)) = {R2} in(n) = (Rl, R3} KILL(in(n)) = {Rl.e} Since R3 includes RHS, ModtJin(n)) and in(n) are computed as follows: ModsJin(n)) = (R3 \R3 = R 3 u {Rl.e} = {Rl.eJ.h, g}} = {R3} in(n) = {Rl} Finally, out(n) is the union set of M od^infn)), ModtJin(n)), and in(n) as follows: out(n) = Modk i U (in(n)) u Mod^Jinfn)) u in(n) = {R2, R3, Rl} when R l = {a, b}, R2 = {c, d}, R3 = {Rl.e.f.h, g} The rule for the return statement node type is presented as follows with the reference-set of a return variable r. In the rule, LOCAL stands for a local variable set defined in a method M such as local variable and formal parameter variables. in(n) = out(npred) npred ■ ' predecessor node o fn M: callee, LOCALfM ) = /V | v is a local variable o fM } V/ /?, e in (n )fo r r: return reference — > [M odu /(in(n)) = /7?, | kill x £ Rifo r x € LOCAL(M)}] a [Rr = (Rr | kill x e Rrfo r x G LOCAL(M) when r e R J ] -------------------------------------------------------------------------------------------------[Return Node] out(n) = M odk ,Jin(nj) The next is the rule for an exit node type. There might be direct edges from return statements node to an exit node so that there might exist several predecessor nodes. 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. irt(n) = u out(p) P * "pmt npred •' predecessor node ofn M: callee, LOCALfM) = fv\v is a local variable o f M f Vi Ri 6 in(n) - » [M od^infn)) = fR{ \ kill x E R jfor x e LOCALfM)}) a [in(n) = in(n) - M od^ in fn ))] -------------------------------------------------------------------------------------------- [Exit Node] out(n) = M od^infn)) u in(n) 3.4.2. Rules for Inter-procedural Analysis Inter-procedural propagation rules should be considered for a call statement node and an entry node. The data flow of an alias set in a call statement denotes that an alias information of the statement is propagated to a called method and it affects an alias information of the called method. The affected information are passed back to the call statement of the calling method after computing the alias set of the called method. The alias set from the called method modifies the alias set of the call statement when the return alias set includes non-local variables and actual parameters. We virtually divide a call node into a precall node and a postcall node to simplify the computation of a call statement. A precall node collects an alias set from a predecessor node of a current call node and computes its own alias set out(n) with the collected set. This alias set is propagated to the entry node of the called method. During the propagation, the reference-sets for references which are inaccessible from the called method are killed. Since this set is an input of the postcall node and is not modified, it does not need to propagate to the called method. The out(n) of the precall node is not propagated to the postcall node because the called method might modify 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the set. In previous approaches [PR95, CSH95, CR97], the set affected by the called method is used when computing an alias set of the call node without considering the modification by the called method and the alias set computed is propagated to the subsequent node. However, if we do not kill the alias relations affected by the called method for the subsequent analysis, it might build nonexistent call relations. Thus, previous approaches compute redundant alias relations by overestimating the alias relations and cause the subsequent analysis to become inefficient. A postcall node collects the modified kill set of the precall node and exit nodes alias set of all possible called methods. By selecting subsets among the collected alias sets of references accessible from the calling method, we can compute the result alias set. The result set of the postcall node becomes the out alias set of the call node. The following rule computes an out set of a precall node. in(n) = out(npred), npred • ' predecessor node ofn RHS = ECMC , RHS = Mc Vi, aj = the ith actual parameter of the callee Mc, y i.f i = the ith formal parameter of the callee Mc, Vi, R(a{ ) e in(n) -> [Rpass(ai) = faitfi}] a [R(a{) = R(ai) - Rpass(ai)], RHS = Mc Vi, R(aj) e in(n), v is a non local variable in the callee Mc , VfVv R(v.f) e in(n) - » [R(v) = R(v) - {v}] a [R(v.f) = R(v.f) — (v.fj] A f Rpass(v) = M i A i R pass(v-f) = M i l a[PASS(Mc) = U {Rposfai), Rpasfv), Rpass(v.f)}], RHS = ECMC Vi, R(aj) 6 in(n), VfVat , Rfa^f) e in(n), V f R(Ecf) ein(n) — > [R(Ecf) = R(Ecf) - {Ecf } ] a [Rpass(Ecf) = fEcf } ] a [R(atf) = Rfa^f) - { a j } ] A [Rpass(a rf> = ( a i f } ] A [ P A S S (E C M C ) = U {R p o sfa j), R p a sfa i-f), R pass(E c f ) f ] , n: the first node o f an exception block B — > in(B) = in(n) 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ---------------------------------------------------------------------------------------- [Precall Node] out(n) = in(n) PASS(MC ) and PASS(ECMC ) represents the set of actual, formal parameters and non-local variables in a called method Mc and EC .MC respectively where Ec is a reference variable. Rpasfai) is a set of reference variables accessible by a called method when passing from a caller to the called method Mc. Rpass(v) is a set of non local variables accessible by a called method in the called method Mc. In our rule, a qualified expression of an actual parameter of a caller is not considered for simplifying the rule. PRECALL(Mc) is a precall node of call statement nodes that invoke this called method node. This set can be computed by considering ingoing edges of the called method Mc in a CG. An entry node merges alias sets from the precall nodes and then propagates the merged set to its subsequent node. PRECALL(Mc) : a precall node o f the callee Mc in(n) = u PASS(p) p e PRECALUM ) [ T . j / Entry Node] out(n) = in(n) The rule of the postcall node is defined as follows. in(n)= u ou t(n „call) P ^ nprrcatl fiprecall • ' a ptecall node ofn RHS = ECMC — > FIELD(EC ) - ( f \ f is a field name in an object referred by Ecf, RHS = Mc -» FIELD(EC ) = 0 , RHS = new Mc -> F1ELD(EC ) = 0 a A(r), EXIT(MC ) - [e \ e is an exit alias set from a possible callee method M J, LHS = <2>yRpassb 6 EXIT(MC ) -+ 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [Rpassb = Rpassb ~ ( v \ visa local variable in the callee M CJ] a [EXIT(MC ) = u RpassbI. fo r ail Me LHS * 0 , VRpassb Sout(return node) — » [ Rn fL „h = Roastb ~ {v \ v is a local variable in the callee M J ] A [EXIT(MC ) = y Rpassb]< fo r all Mc r Vi Rj e EXIT(MC ), Vj Rj e in(n) — > [Rj \ Rj = Rj u wften /=y7 a [EXIT(MC ) = EXIT(MC ) - Rj] a /m(7i) = in(n) - /? y 7 , exit(RHS) = u out(e) u u out(p) u u /?, , e £ EX m M c) P^Xpncall fo r all i LHS = 0 — » our = exit(RHS), LHS = .r, Vi, j Rj, Rj € exit(RHS), R(RHS) <=exit(RHS) -» [ModU n( exit( RHS)) = (Rj | M / x 6 ^ u //?,• | A :/// /?, w/zen q e and x = q.f]] a [exit(RHS) = exit(RHS) - Modm (exit(RHS))] a [KlLL(exit(RHS)) = fa, /?,./;/, e exit(RHS) [ Modgen(exit(RHS)) = (R(RHS) \ R(RHS) = R(RHS) u KILL(exit(RHS))] a [exit(RHS) = exit(RHS) - Modg m (exit(RHS))] a [out = ModH ilexit(RHS)) u Modg r Jexit(RHS)) u exit(RHS)] ------------------------------------------------------------------------------ [Postcall Node] out(n) = out exit(RHS) is a set of exit nodes of all possible called methods as explained before. We can compute exit(RHS) in a CG by integrating all out(precall node) and outgoing edges from callers and their exit nodes. Figure 15 is our example to compute an alias set on the inter-procedural analysis rule. If Figure 15 (b) as a status after executing a statement s in Figure 15 (a), the alias set A s of the statement s is: As = [R2, Rj] where R2 = [a.f b, c, R3.f) and R3 = [R2.f c] Even the example implies mutual recursion for reference c in I f statement, but 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Be; A a = new A(); // object 1 is created B b = new B(); // object 2 is created a.f = b; c = b; if(true) { c= new B(); // object 3 is created b.f = c; c.f = b; } // statement s a.update(c); // statement t (a) A main example code a (b) Object relations after the statements void update(Obj i){ b = i.f; // statement u } (c) A function example code Figure 15. Example of an inter-procedural analysis our reference-set representation can present its status with R2 and R3. After executing the call statement t in Figure 15 (a), the result alias set of its precall node can be computed in the following sequence of rule applications: in(t) = {R2, RjJ, RHS = a.update(c), a i = c ,f i = i, Rpass(a i) = {°> 0. R (“i) = R 2 = R 2 ' {°} = {a - f b ’ R 3-f) orR(ai) = R3 =R3 -{c/ = {R2.f}, R(a.f) = R2 =R2 - {a.f} = (b, R3.f} and Rpass(a.f) = {a.f}, PASS(a.update) = {RpasfaJ, Rpass(a.f)J, out(t) = {R2, R3} 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. public class ThrowTest { public static void main(String argsQ){ int i: try I i = lntegcr.parselntfargs[0]);) catch f Array IndexOutOfBoundsExcepdon e){ System.out.prinUnf"needs an argument!''); return; ) catch (NumberFormatException e){ System.out-println("necds an integer argument!"); return; I aft); I public static void a(int i){ try { b(i); ) class MyException extends Exception { public MyExceptionf) { supetf); 1 public MyException(String s) ( superfs): I I class MyOtherExcepdon extends Exception { public MyOtherExceptionO { superf); I public MyOtherException(String s) { superfs); class MySubException extends MyException { public MySubException() { superf); 1 public MySubExcepdonfString s) { superfs); I ) Figure 16. An exception handling example code [Flan97] catch (MyExcepdon e) ( if fe instanceof MySubExcepdon)System.ouLprintln("MySubExcepuon!"); ^ else System.ouLprintlnf"MyExcepdon!”); Sysiem.ouLprindnfe.getMessage()); 1 1 public static void bfint i) throws MyExcepdon ( int result; try { System.ouLprindnf"i=" + i); result = cfi); System.out.println("c(i)=" + result); } catch fMyOtherExcepdon e) ( System.ouLprintlnf"MyOtherExcepdonr” + e.getMcssagef)); \ finally { System.ouLprintlnf"\n" + "fin" + “ \n"); 1 I public static int cfint i) throws MyExcepdon. MyOtherExcepdon { nvirc/ifi){ case 0: throw new MyExcepuonf"too low input"); case 1: throw new MySubExcepdonf"sull too low input"); case 99: throw new MyOtherExcepdon(”too high input"); default: return i*i; I ) ) The PASS(a.update) of the precall node propagates to the entry node of the callee update(). The result alias set of the exit node can be computed as follows: R p a s ^ i) = M * 7 . R p a s M = ( a fl> Rpassfa i) ~ Rpass(a -f) ~ ( c < l" > a-fi = R pasJR l) f or R 2 > Rpass(a i) (C. 1} RpassfRj) f o f R 3 > 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in(u) = (RpasJRn), Rpass(R3)f- R(b) = fb, if, c.f}, out(u) = updateexit = (R(b), Rpass(R2)’ RpasJR3)} The result set of the postcall node at the statement t is computed with the exit alias set of the update()and the propagation rule of the postcall node as follows: in(t) = out(tprecall) = {R2, R3} where R2 = (b, c.f}, R3= {cf}. FEELD(a) = {a.f}, EXIT(update) = updateexit = fR(b), Rpas/ R 2), RpaSs(R3)f’ where Rpasf R2) = (c, i, of} and Rpass(R3) = {c, i} , R(b) = Rpassb = fb, C .f}, Rpassb(R 2) = (c> a-fl Rpassb(R 3) = (C I fo r th e calle5 EX1T(update) = fR(b), Rpassb(R2). RPassb(R 3)/- R2 = R2 U Rpassb U Rpassb(R2) = {b. R3 f c.f, c, a.f}, R3 = R3 K J Rpassb U Rpassb(R3) = fR2'f b > C.f, c}, Thus, out(t) = (R2 , R3I 3.5. Extended Propagation Rules for Exceptions We can maintain the safety of an alias analysis in Java with the structures such as CFG CG, and Type Table. Java provides an exception handling mechanism with try/ catch/finally construct. The try block handles its exceptions and abnormal exits with zero or more catch blocks. The catch clauses catch and handle specified exceptions. The finally block should be executed even though an exception is caught or not. A 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CTflow construct^ ^ 7 case 0 block default block case I block case99 block try block I try block !T~ <2[ow co n stru cQ try block Q T ow constru cO flow co n stru cO catch block catch block catch btockl catch bloc merging finally block Figure 17. CFGs of example classes in Figure 16 programmer’s own exceptions are generated by the throw statement. Figure 16 shows example exception classes that were written by Flanagan [Flan97]. For the computation of the aliases, CFGs can be used for intra-procedural alias analysis. Our CFG is a directed graph defined for each method as <NC FG , Ecfg ne n try > ne x ir > & cfg> :> ^CFG * s a set °f nodes with nentry, nexit, and each statement of the method; ECFG is the set of directed edges that represent the alias set information between a predecessor and a successor statements; nen[ry and neX it represent the entry and the exit node of the method; BC FG is a set of blocks that consist of the set of nodes for try, catch, and finally blocks. We assume that each potential exception statement C PES) in a try block has its corresponding catch block for an exception construct. An 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. mainO: Throw'ftst at): ThrowTest, b(): Thnw lkst c(): ThrowTesr M;Eutpfoi*)) (^newMySabExaplJoi^ (n r* MyOthtrEjcrplfcwJ) Figure 18. CG of example classes in Figure 16 exception edge is connected to a catch block from the block that contains the exception statement. And, the catch block is connected to the exit node or a finally block. Also, we need to consider PESs of a CFG. Runtime Exceptions are caused by wrong array indexes, string indexes, and class casts, by qualified expressions of a null pointer, and by dividing-by-zero. nexit of a CFG includes an out alias set of the last statement and out alias sets of PESs. Figure 17 shows CFGs of example classes in Figure 16. A CG and a type table are built same as in the Section 3.3.1. In a flow-sensitive alias analysis, the information is propagated to the next statement and subsequently computed in the CFG of a method as in the Section 3.4. For an exception block B that consists of nodes: n j, n2 nm, we can relate the block and its nodes by the following equation: 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in(B) = in(nj) out(B) = oui(nm) Also, if a node n has a array, a qualified expression, a divide operation, and a class cast expressions, we can consider it as a PES node. We describe propagation rules in the following sections according to CFG node types. 3.5.1. Rules for Intra-procedural Analysis The intra-procedural analysis rule consists of premises and conclusions divided by a horizontal line. The premises are a set of equations that define an input alias set, information about a node, and intermediate sets. When all premises hold, the equations in the conclusions are solved for out(n). First, we define a flow construct node type rule that has several out going edges with the same o u t information: in(n) = out(npred) n pred • ' predecessor node o f n or predecessor block ofn -------------------------- [ Flow Construct Node] out(n) = in(n) The merging node type rule is as follows: in(n) = u out(p) P e npird ripred • predecessor node o f n or predecessor block o fn ---------------------------------------------------------------------------------- [Merging Node] out(n) = in(n) In the rule, npred is a predecessor set of node n. Given npred, out(n) of node n is the union of all predecessor node sets. 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The next rule concerns the node type of an assignment statement. in(n) = out(npred) ripred ■ ' predecessor node o fn x = LHS, y = RHS, Vi, j /?,, Rj G in(n) — > [Modk Jin (n )) = [Rt \ kill x e R J u{R-t \ kill R j f e Rt when q e Rj and x = q.f}] a [in(n) = in(n) - Modu,fin(n))]A [KILL(in(n)) = fx, R jff], V k Rk 6 in(n) — > [Modgen(in(n)) = {Rk | Rk = Rk u KILL(in(n)) when y e R J ] a [in(n) = in(n) - ModfJin (n ))], n: the fir s t n o d e o f an exception b lo ck B - » in(B) = in(n), n is a P otential E xcep tio n S ta tem en t — » PES = PES u (n] ----------------------------------------------------------------------------------- [Assignm ent Node] out(n) = M odufinfn)) u Modge„(in(n)) u in(n) In this rule, LHS and RHS respectively stand for the left and the right hand side of an assignment statement. KILL(in(n)) is a reference-set of references killed by M od^infn)). Also, out(B) = out(n) if n is the last node of the block B. This out equality between a block and a node can be applicable for all of the rules in this paper. If the node n is one of the PESs: an array, a qualified expression, a divide operation, and class cast expressions, the node n becomes an element of a set PES for all of the rules. For example, when there is an assignment statement a.e = f.h in the catch block B2 of Figure 19 (b), alias set out(Bj) and in(B2) are expressed with must alias reference-set Rt, R2, R4 in the try block Bj of Figure 19 (a), (b) as follows: Rj = [a, b}, R2 = {R,.e. c, d], R4 = (f.h, g] in(B2) — out(Bj) = [Rj, R2, R4f 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Because L H S is a qualified expression related to both R j and R 2 , M o d ^ fin fB 2)), in(B2), and K ILLfin(B 2)) are computed as follows: R j = fa , b} and R 2 = fR j.e , c, d } then R 2 = fc, d} M o d kilf in ( B 2)) — (R 2} in (B 2) = fR j, R4} K IL L (in (B 2)) = fR j.e } Since R 4 includes R H S, M o d tJ i n ( B 2)) and in(B2) are computed as follows: M odgen(in(B 2)) = (R 4 \ R 4 = R4 (J fR j.e } = fR j.e , f.h , g }} = fR 4} in(B 2) = fR jj Thus, out(B2) is the union set of M odilfin (B 2)), M odttn(in(B2)), and in(B2) as follows: o u t(B 2) = M odkilfin ( B 2)) u M o d gen(in (B 2)) u in(B 2) = (Rj, R 2, R4} w hen R j = fa , b}, R 2 = fc, d f, R 4 = fR j.e , f.h , g j Finally, in(n) for the merging node n is the union set of o u t(B j) and o u t(B 2) as follows: in(n) = o u t(B j) u out(B 2) = ( R j R 2, R4J w here R j = fa, b}, R 2 = fc , d, R j.e } , R4 = (f.h, g, Rj.e} Each reference-set of in(n) consists of m ay alias elements; R j consists of m u st alias element; R 2 may contain an aliased element R j.e from the block B j\ R 4 may contain an aliased element R j.e from the block B 2 The rule for the return statement node type is presented with the reference-set of a return variable r. L O C A L stands for 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a local variable set defined in a method M such as local and formal parameter variables. in(n) = out(npred) np n d • ' predecessor node ofn M: callee, LO C A lfM ) = /v | v is a local variable o f M] Vi /?,-e in(n)for r: return reference — » [Modurfinfn)) = (R{ \ kill x e Rjforxe LOCAL(M )f] a [Rr = (Rr | kill x G Rrfo r x £ LOCAL(M) when r e RJ], n is a Potential Exception Statement — > PES = PES u {nj ------------------------------------------------------------------------------------------ [Return Node] out(n) = Modtlll(in(n)) The next is the rule for an exit node type. in(n) = u out(p) P 6 npnd npred ■ ' predecessor node ofn PES = u out(p) p s P E S r PES: Potential Exception Statement M: callee, LOCALfM) = /v | v is a local variable ofM] Vi Rj G in(n) — > [Modult(in(n)) = [Rj \ kill x e Rjforxe LOCALfM)}] a [in(n) = in(n) - Modu,fin(n))] ------------------------------------------------------------------------------------ [Exit Node ] out(n) = ModtJin (n )) u in(n) u PES 3.5.2 Rules for Inter-procedural Analysis We virtually divide a call node into a precall and a postcall node to simplify the computation of a call statement. A precall node collects an alias set from a predecessor node of a current call node and computes its own alias set out(n) with the collected set. This alias set is propagated to the entry node of the called method and killed in the calling method. It reduces the inefficiency of the previous approaches 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [PR94, PR95, CR97] which compute redundant alias relations to be modified in a called method. A p o stca ll node collects the modified kill set of the p rec a ll node and exit nodes alias sets of all possible called methods. By selecting references accessible from the calling method, we can compute the result alias set of the p o stca ll node which is the out alias set of the call node as follows: in(n) = out(npred), npred • ' predecessor node o fn RHS = Ec Ma RHS = Mc Vi, at = the ith actual parameter of the callee Mc, Vi, f = the ith formal parameter of the callee Mc, Vi, RfaJ 6 in(n) -> [RpaSs(ai) = A W = Kfa/J - Rpass(ai)l> RHS = Mc Vi, Rfa,) € in(n), v is a non local variable in the callee Mc , V /V v R(v.f) e in(n) [R(v) = R(v) - fv}] a [R(v.f) = Rfv.f) — fv.f}] A f Rpass(v) = M l * fRpass(v-f) = M U a fPASS(Mc) = U {Rpasfa,), Rpass(v), Rpass(v.f)}], RHS = ECMC Vi, R(a;) e in(n), VfVai, R (aif) € in(n), V / R(Ecf ) sin (n ) -> = R(Ecf ) - { E J } ] a [Rpass(Ecf ) = {E J }] a [R(aj.f) = flfa,-./) - {Oj.fl] A (RP ass(acf) = (ai-f}]a [PASS(MC ) = u (RpaJa,-), Rpass(arf), Rpass(Ecf)}], n: the fir s t n o d e o f an exception b lo ck B — » in(B) = in(n) ------------------------------------------------------------------------------------------ [Precall Node] out(n) = in(n) PASS(MC ) represents the set of actual, formal parameters and non-local variables in a called method Mc. Rpass(ai) > s a set of actual parameters accessible by a called method when passing from a caller to the called method Mc. Rpas/v ) is a set of non local variables accessible by a called method Mc. 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. PRECALL(Mc) is a precall node of call statement nodes that invoke this called method. An entry node merges alias sets from the precall nodes and then propagates the merged set to its subsequent node. PRECALL(Mc): a precall node o f the callee Mc in(n) = u PASS(p), p 6 PRECALUM c) n: the first node of an exception block B - » in(B) = in(n) [Entry Node] out(n) = in(n) The rule of the postcall node is defined as follows. in(n) = u out(n u) P e nprmUl nprecall: a precall node o fn RH S = ECMC — > FIELD(EC ) = f f \ f is a field name in an object referred by Ec], RHS = MC^> FIELD(EC ) = 0 . RHS = new Mc -> FIELD(EC ) = 0 a A(r), EXIT(MC ) - (e \e is an exit alias set from a possible callee m ethod Mcj, LH S = 0 y R passb e EXIT(M C ) -> [Rpassb ~ Rpassb ~ { v \ v i s a local variable in the callee Mc}] a [EXlT(Mr) = u Rpassbl• Jar all Mc H LHS * 0 . VRpassb 6 out(return node) — » [Rpassb = Rpassb ~{v\ vis a local variable in the callee Mcf] a [EXIT(M C ) = u RpassbI fo r all Mc H Vi Ri G EX1T(MC ), V; Rj G in(n) [R-t | /?,• = Rt U Rj when i=j] a [EX1T(MC ) = EXIT(M C ) - R f a fin(n) — in(n) - Rj], exit(RHS) = u out(e) u u outfprecall node) u u /?,. c 6 EXIT(Me) p e npmaU fa r all i LHS = 0 - * out = exit(RHS), LHS = .r, Vi, j /?,, /?y € exit(RHS), R(RHS) G exit(RHS) -> [M odnif exit( RHS)) = [Ri \ k i l l x e R j kj [Rj\ kill R j.fe R{ when q S Rj and x = q.f] ] a [exit(RHS) = exit(RHS) - M odm (exit(RHS))] a [K lL lf exit(RHS)) = /*. f y ///. R(RHS) e exit(RHS) [Modgen(exit(RHS)) = fR(R H S) \ R(RHS) = u KlLL(exit(RHS))] a [exit(RHS) = exit(RHS) - M od,.Jexit(RH S))] 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) Object relations at Block B{ void update(O bj i){ b = i.f; II statement u (b) A function called at Block B2 try Block Bt catch Block B- II statement t a.update(c); (c) CFG of an example Figure 19. Example of an inter-procedural analysis a [out = Modk l U (exit(RHS)) u Modffn (exit(RHS)) u exit(RHS)], n is a Potential Exception Statement — » PES = PES u [nf ------------------------------------------------------------------- [Postcall Node] out(n) = out exit(RHS) is a set of exit nodes of all possible called methods. We can compute exit(RHS) in a CG by integrating all out(precall node) and outgoing edges from callers and their exit nodes. Figure 19 is the example where to compute an alias set on the inter-procedural analysis rule. If we assume that Figure 19 (a) represents the state at try block Bj, the out alias set of the block Bj is: out(B1 ) = (R2, Rj} where R2 = {a.f b, c, Rj.f} and Rj = {R2 f c} After executing the call statement t at the block B2 in Figure 19 (c), the alias set 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of its precall node becomes as follows: in(t) = in(B2) = (R2, R3 }. RHS = a.update(c), U [ = c,fi = i, Rpass(a i) = ( c’ '7. R (a i) - R 2 = R 2 - (CI = ( a - f b > R3-fI 0 rR(ai) = R3 = R3-{c} = {R2.ff, R(a.f) — R2 — R2 - {a.f} = (b, R3 .f} and Rpass(a.f) = fa.f}, PASS(a. update) = (RpaJaJ, RpaJ a .f)}, OUt(tprecau) (R2, R3 } The PASS(a.update) of the precall node propagates to the entry node of the callee updatef). The result alias set of the exit node can be computed as follows: Rpass(^i) ( C ’ 0 1 Rpasda 'f) {& ••/}> Rpass(a i) Rpass(a -f) (c> &•/} Rpass(R 2^ f o r R2 > Rpass(a i) = f o V = Rpass(R3 ) f ° r R 3 • in(u) = fRpass(R2), RpaJ R 3)l. R(b) = (b, if, c.f}, out(u) = update exit = (R(b), Rp a s /R 2)’ Rpass(R3)I The result set of the postcall node at the statement t is computed with the exit alias set of the update() and the propagation rule of the postcall node as follows: in(‘postcall) = ou‘(‘precall) = (R % R 3) w here R2 = f b> c /7 - R 3= / c /7- 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. FIELD(a) = { a .f} , EXTT(update) = update£r/, = fR(b), Rpasf R2h R pass(R 3)I> where Rpass(R2) = fc, i, a.f} and Rpass(R3 ) = fc, i} , m R passb = fb, C .f}, RpaSsb(R 2) = (c’ a f}< Rpassb(R 3) = ( c l for the calle^ EX1T(update) = (R(b), Rpassb(R2). R passb(R 3^h R 2 R 2 ^ R passb ^ R passb(R 2) (b, R3'f > R 3 = R 3 u ^pajsfe u R passb(R 3) = (R2-f b, C.f, c}. Thus, = on/fr; = out(tposteall) = {R2, R3} Finally, mfnj = out(Bj) u out(B2) = //?2- where R2 = faf, b, c, R3.f} and R3 = {R2f b, c} 3.6. Alias Analysis Algorithm Our alias analysis algorithm in Figure 20 builds an initial CG in the line (2) with main method. The algorithm visits all nodes of a CG until fixed data status and nodes are achieved in the loop (3) and (36). The algorithm traverses each node of a CG in a topological and a reverse topological order in (4,5) in order to possibly shorten the execution time for the fixed point [CBC93, CS95, BCC97]. The set TYPES represents the possible class types for a callee to build a safe CG. TYPESta b le (r) is a set of dynamic types of a reference variable r in a type table. While processing the algorithm, resolved methods with the possible class types of each reference make the CG grow as shown between the lines (19) and (24). 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (1) A lgorithm Allas Analysis (2) construct an initial CG with main method; (3) repeat { (4) for each method T.M e Ncg, (5) alternating between topological and reverse topological order { (6) for each node n e Nc fg (TM ) in structural order { (7) if n is a call statement node { (8) if (RHS = ECMC ) { (9) compute the set o f inferred types from the reference-set for E^ (10) compute the set TYPES resolved (11) from the inferred types and class hierarchy; (12) } else if (RHS = Mc) { (13) TYPES.- (TJ; (14) } else if (RHS = new Mc) { (15) T Y P E S-(M J; (16) ) (17) if LHS exists (18) TYPEStabd LHS) = TYPES,ablf RHS); (19) for each type t e TYPES { (20) if t.Mc is not in CG (21) create a CG node for Mc; (22) if no edge from T.M to t.Mc with a label n (23) connect an edge from T.M to t.Mr with a label n; (24) } (25) compute out(npncall) for a precall node npn.,.M \ (26) compute oui(npm,call) for a postcall node npuslcall\ (27) | else { (28) if n is an assignment statement node (29) TYPES,a b lJ LHS) = TYPES,able(RHS); (30) else (31) TYPES,m JLHS) = TYPES,able(LHS) + TYPES,able( RHS); (32) compute out(n) using data-flow equation and propagation rule; (33) } (34) } (35) ) (36) 1 until CG and alias set for every CFG node converge Figure 20. Alias Analysis Algorithm Each node in our algorithm is visited in structural order at the line (6); while visiting nodes from an entry node to an exit node, for the i/flow construct node, each branch is traversed then finally its merging node is visited; for the exception blocks, each block is traversed then finally its merging node is visited as shown in the lines 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Class Foo { static A x, y, z; public static void main(Siring argsQ) { bar(); //statement 1 static void bar() { z = new A(); //statement 1 y = z; //statement 2 (b) CFG (C) C G (a) The Example Code Figure 21. Example Code for The Algorithm of Figure 20 (30, 31). With the structural order, we do not only maintain the safety of the alias computation, including exception constructs, but also we improve the efficiency in Java than the previous work [CSH95] without losing the accuracy of a resulting set. Within the lines (25, 26, 32) of the algorithm, out(n) is computed where n is precall, postcall, assignment, or merging statement node, etc. based on each propagation rule as shown in the previous sections. For example, we can apply to the algorithm the example code of the Figure 21 as follows: Initially, a parser build a CFG for each method and the class A only has a default constructor. Line (2): the method main becomes an initial node of a CG in the Figure 21 (c). Ncg = (Foo.mainf 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Line (3): the loop repeat starts. Line (4, 5): the method main in the CG is visited as a topological order, Line (6): the node statementl e Nc fg(Foo.main) is visited on structural order. Line (7): the node statementl is a call statement. Line (12): RHS is a method bar. Line (13): TYPES = {Foo} Line (17): LHS of the call statement does not exist Line (19, 20): for the type Foo, Foo.bar is not in the CG. Line (21): bar:Foo node is created in the CG Line (22, 23): an edge from a node Foo.main to a node Foo.bar is connected in the CG because it does not exist. Line (25): out(nprecall) = {} Line (26): out(npostcall) = {} Line (36): the CG is changed with a node bar:Foo; Alias sets are: mainexit = {} 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Line (4, 5): the method Foo.bar in the CG is visited as a reverse topological order Line (6): the node statementl e Nc jg(Foo.bar) is visited on the structural order. Line (7): the node statementl is a call statement. Line (12): RHS is a constructor A. Line (13): TYPES = (A} Line (17, 18): TYPESmble(z) = TYPEStable(RHS) = {A} Line (19, 20): for the type A, “ new A ” is not in the CG. Line (21): “ new A ” node is created in the CG Line (22, 23): an edge from a node Foo.bar to a node “new A ” is connected in the CG. Line (25, 26): out(statementlprecaU ) - { } , out(statementlpostcall) = {} Line (6): the node statementl e Nc fg(Foo.bar) is visited on the structural order. Line (27, 28): the node statementl is an assignment statement. Line (29): TYPEStable(y) = TYPEStable(z) = {A} 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Line (32): barexit = out(statement!) = fR l} where R1 = fy, z} Line (4): The method Foo.main is visited but its process is the same as its previous visit except out(statement 1postcau) = barexil = fRl} where R1 = (y. z} Line (36): the CG is changed with a node “new A”; Alias sets are: mainexit = barexi[ = (Rl} ; barexit = {Rif Line (4): The method Foo.main is visited but its process is the same as its previous visit Line (4): the method Foo.bar is visited but its process is the same as its previous visit Line (4): the constructor A is visited as a default constructor. Line (36): the CG and the alias set: mainexit = {Rl}; barexit = {Rl} is not changed so that the repeat loop stops 3.7. Complexity of the Algorithm For the most outer loop (3), Rn and Ar are the number of reference-sets and the maximum number of aliased reference variables for each reference-set. Rn x Af means the maximum number of refer-to relations between references and objects existing in each node of a CFG except its exit node. We can estimate the worst time complexity 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of the loop as 0(Rn x Ar x Npes x Ecg)\ Npes is the number of potential exception statements in a CFG; Rnx A rx Npes denotes the maximum number of refer-to relations between references and objects existing in exit node of a CFG; Ecg is the number of edges in a CG since each relation from an entry node to an exit node is traversed once per each iteration. For the second out loop (4), the time complexity becomes 0(Ncg) if Ncg is the final number of nodes in a CG. For the inner most loop (6), the time complexity is 0(Nc j- g) if Nc fg is the maximum number of nodes in a CFG that consists of the maximum number of nodes. The most dominant parts of the execution in an inner loop are the call statement nodes (7) so that the worst time complexity depends on the number of call statements. The time complexity of a set of inferred types is 0(Rm)', Rm is the number of reference variables in a program code and a type table contains all possible types of an reference variable. The time complexity for the possible method resolution is 0(Ti x H); 7} is the maximum number of subclasses for a superclass and H is the maximum number of the levels in its hierarchy. The time complexity for the resolution of overridden methods and the updating of a CG is 0(Ti x (H + Ncg + Cc)) when Cc is the maximum number of call statements to invoke same called methods in a calling method. The worst time complexity of a precall and a postcall nodes is 0(Rp x R); Rp is the maximum number Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of reference-sets propagated; R is the maximum number of reference variables in Rp on a call statement. Therefore, the worst time complexity of the main algorithm is 0(Rn x Ar x Npes x Ecg x Ncg x Nc jg x (Rp x R x Rm + 7 } x (H + Ncg + Cc))). The worst space complexity becomes 0(Rn x A rx Npes x Ncg x Nc jg) to include an alias set and 0(Rm) for a type table. The worst space complexity of the outgoing edges for a call statement is 0(7} x H) and then the worst space complexity of a CG is 0(Ncg x C5 x 7 } x H); Cs is the maximum number of call statements in a method. Existing alias relations for C++ generates the number of aliased elements in an exit node as((0 x A o + 0 )x Npes) for Java; 0 is the number of objects in a program; A0 is the maximum number of aliased element for an object. For the most outer loop with the same iterative algorithm as in Figure 20, we can estimate the worst time complexity asO((OxA0 + 0 )x Npes x Ecg). The number of columns of a type table are the number of object names 0. The time complexity of a set of inferred types is 0((OxA0 + O) + 0)\ 0((0xAa + O)) is the time to search the aliased elements; 0(O) is the time to search the type table for all possible types of an object name. The worst time complexity of a call statement node is 0 ((0 x A0 + O) x Npes) when a caller propagates an alias set to both the callee and next node. Therefore, the worst time complexity of the existing works [PR95, CSH95, CR97] is 0 ((0 x A 0 + 0 ) x Npes x Ecg x Ncg x Ncfg X (O x Aa + O) x (O x Aa + O) + O) + 7} x (H + Ncg + Cc))). 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. thread I = new PrintThreadC-1"): thread2 = new PrintThrcad("2”) thread3 = new PrinlThread(“3"); thread4 = new PrintThread("4”) ✓ * \ (^K^MntThrtado) FrintThrtad~^) hreadl.start(): |hrcad2.slartQ; thread3.startQ; |thread4.stan(); [^end of code (a) Control Flow Graph Figure 22. CG and CFG of Figure 7 P rtm T h lT w P ) (b) Calling Graph Practically, Rn is much less than O even though Ar equals to Aa so that our 0(Rn x Ar) is less than 0 (0 x A0 + O). For the type inference, our constant time complexity 0(Rm) is less than the time complexity of 0 ((0 x Aa + O) + O). For the call statement, our 0(Rp x R) is bigger than the 0 ( 0 x A0 + O) but it reduces the redundant aliased elements of the caller. As a result, our worst time complexity of the main algorithm is less than the existing works [PR95, CSH95, CR97]. 3.8. Regarding Multithreading Issue In this section, it is described if our algorithm can be applied to other Java issues such as multi-threading while computing safe alias set but not precise alias set. Radu [RR99] proposed flow-sensitive and context-sensitive pointer analysis algorithm for the Cilk multi-threaded programming language. It focuses on key difficulty to compute aliases of interfered variables among threads. Their algorithm is based on flow-sensitive and context-sensitive pointer analysis. However, our 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. algorithm is based on flow-sensitive and context-insensitivity so that it will be sufficient to collect safe alias set without considering the interfered variables for the executing order of threads. Radu’s approach is based on context-sensitive CG in order to collect alias set in Cilk. Multi-threading in Cilk, with particularly respect to ‘Par’ syntax, is invoked on fork and join operations. All of threads are produced and joined at the same statement. Radu computes safe alias set by considering all possible interference to the shared variables and collects more precise alias set at join point. However, in Java, threads are not produced and joined simultaneously without adding extra codes. Suppose that thread 1 has an assignment statement ‘a=b;' for shared variable a and thread 2 has an assignment statement ‘a = c ;\ If thread 1 executes later, we have an aliased element (a, b). If thread 2 executes later, we have an aliased element (a, c). Unfortunately, we cannot compute precise alias set among these two threads statically because we cannot predict which thread execute first and because those two threads are not joined at the same statement as in Cilk. Even though alias analysis for multi-threading should be studied further, we can issue problems as follows. Since our analysis is based on flow-sensitive and context- insensitivity that computes safe alias set, the analyzing of reference variable interferences in order to compute precise alias set in a multi-threaded program might not be necessary as in flow-sensitive and context-sensitive algorithm [RR99]. Also, even though people needs to compute precise alias set on flow-sensitive and context- 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sensitive analysis, it is not easy to predict the sequence of threads in Java because multi-threading is platform dependent in Java and because a programmer can decide a priority implicitly so that interfered variables among threads cannot be computed precisely. Even though a further study is neccessary, we can apply our idea in this thesis to multi-threading applications by extending CFG and CG with interferences between references. First, we can compute safe alias set {(a, b), (a, c)} for the above two threads by considering all possible interferences. Its time complexity is Ofs(v)!], where s(v) is the number of statements s for a shared variable v among threads. Radu’s approach has the same time complexity while computing all possible interference to the shared variables because the executing order of threads are not predictable and additionally, he computes precise alias set at the join point that cannot be applicable to Java. Second, since the executing order of threads are not predictable, we can compute safe alias set for all possible interfered variables in Java with context-insensitivity CG even though it is not efficient. Figure 22 shows that our algorithm may be extended to build CG and CFG of multi-threading example code by considering threads as functions. For a CFG of threads, each start call statement of a threads become a node of the CFG and a node end o f code is added to consider unpredictable end of threads. For the CG, each duplicated function start is a node to compute inter-procedural alias set. Thus, by using those graphs, we can compute safe alias set. 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 Methodology This section presents the methodology to build our alias analysis system. The system mainly consists of a parser for Java syntax, its syntax tree builder, and the alias analysis algorithm. Also, benchmark codes are explained to compare existing alias representation with our reference-set representation. 4.1. JavaCC Java Compiler Compiler (JavaCC) is the parser generator which is developed by Sun Micro Systems. It reads grammar specification which is written in JavaCC and converts it to a Java program. JavaCC has been run on several Java platforms if it is downloaded to the platforms. By default, JavaCC generates LL(1) parser which has better performance. But for the solving of shift-shift ambiguity at some point, it uses LL(k) parser for that point without harming LL(1) property for all of the grammar. It uses extended BNF such as (A)* and (A)+ which is easy to read and left-recursion is not needed. Since it also points at the error location, programmer can see the clear error reporting [JavaCC98]. JavaCC is used for implementing a parser of our system; (1) Top-Down parsers are much easier to debug than bottom-up parser, (2) Since its lexical and grammar specification is in one file, it is easy to read and maintain grammar; (3) JavaCC is platform independent and runs on any Java platform JDK- 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.0.2 or later; (4) Sun Micro Systems gives an example JavaCC code for Java lexical and grammar for JDK-1.0.2 so that we do not need to implement parser for Java by ourselves; Among these features, we can use the example parser code for JDK-1.0.2 without spending extra cost for developing Java grammar. 4.2. JTB JTB is built by at Purdue University, West Lafayette, Indiana, USA. JTB is a syntax tree builder to be used with the JavaCC parser generator introduced in previous section 4.1 [JTB00]. It takes a plain JavaCC grammar file as input. It automatically generates a JavaCC grammar with the proper annotations to build the syntax tree during parsing. The JavaCC grammar builds a set of syntax tree classes based on the productions in the grammar. It is based on the Visitor design pattern whose default methods simply visit the children of the current node. JTB requires a Java 1.1 or higher virtual machine. In addition, it is compatible with JavaCC 0.6.x, or higher. Our alias analysis system is integrated with pretty printer example of JTB for the Java 1.1 grammar. The pretty printer example uses the Java 1. l.jj grammar fiIe that can be used with JavaCC 0.7.1. The generated classes from the example by JavaCC takes as input a Java file and outputs a nicely indented version. Our system modifies the syntax tree package of the pretty printer example by adding CST building function. Thus, during parsing, the pretty printer builds syntax tree and CST for each methods of a class. 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3. Benchmarks In this section, benchmarks are presented. Alias analysis algorithms of existing object-pair [CS95, PR96, CR97] and our reference-set representations are compared for the efficiency, the safety of alias set and type inference, and the preciseness. Following benchmarks are selected to meet those criteria. 4.3.1. Dynamic CG This benchmark is written in C++ initially by Carini [BCC94] and adapted in Java by us. It has conditional statements and overridden methods to allocate different objects for the same reference variable on each condition. Thus, it can be used to measure the efficiency and safety of type inference for the reference variable which has several dynamic types and its calling graph in alias analysis algorithms. 4.3.2. Binary Tree This benchmark is provided by Proactive group, Inria, France. The binary tree is a recursive data structure. A tree is composed of a root node, and each node has two potential child nodes. Here, it contains many conditional statements and recursive calls that generate potential aliases dynamically. Conditional statements can be used to measure the preciseness and safety of alias analysis algorithms. Recursive calls can be a measurement of efficiency of the algorithms. ProActive group developed ProActive PDC that is a Java library for Parallel, Distributed, and Concurrent computing and metacomputing lead by Denis Caromel [ProActive]. 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3.3. Ray Tracer This benchmark is one of Java Grande’s benchmarks to measure the performance of a 3D raytracer. The resulting scene contains 64 spheres rendered at a resolution of NxN pixels. The authors are Florian Doyon and Wilffied Klauser, INRIA and it is adapted by Mark Bull, EPCC. This benchmark can be used to measure the efficiency of alias analysis algorithms as a traditional code. EPCC leads the benchmarking initiative to develop a suite of benchmarks to measure different execution environments of Java against each other and native code implementations. For that, EPCC works with Sun Micro Systems collaborately to adapt codes of Java Grande. NPAC and Sun lead Java Grande community to promote the use of Java for so-called "Grande" applications. Grande applications cover not only traditional computational science and engineering codes but also large scale database applications and business and financial models. The community aims to modify Java language if they find more suitable way to execute the Grande codes [JavaGrande]. 4.3.4. Exception Block This benchmark is built by Flan [Flan97] and adapted by us. It contains try/catch and try/catch/finally constructs for Java exceptions. It is useful to check if our extended propagation rule of exceptions works fine and to measure the safety of the exception blocks in type inference and alias set. 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 1: Characteristics of Benchmark Number of classes Number of lines Number of methods/ constructors Number of overridden methods Dynamic CG 5 54 7 4 Binary Tree 5 154 8/2 1 Ray Tracer 12 1,213 53/13 4 Exception Block 4 67 4/6 0 Recursive Call 5 160 10/2 1 4.3.5. Recursive Call This benchmark is built by modifying the binary tree benchmark of Section 4.3.2. It includes more calling statements to extend a tree so that it will show if our approach can gain a performance improvement in the dynamic calls. 4.4. Framework of Alias Detector Figure 23 shows a framework of detecting aliases while examining application codes written in Java as input. It mainly consists of three parts: parser, sysntax tree builder, and alias analysis. Sun Micro System gives a basic parser for a JDK-1.0.2 grammar in JavaCC which can be converted to Java programs. Purdue university built JTB, Java syntax 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Alias Detector B enchm ark codes in JD K -1.0.2 Ja v a C C JT B libraries libraries Parser H i Parsing TypeTable Builder I Syntax Tree Builder T Alias Analysis Algorithm C FG Builder C G Builder Type Inference A lias D etector A lias Set Figure 23. Alias detection for Java codes tree builder. Our system adds to these parser and sytax tree builder semantic actions such as type and scope checking and CST. Based on this, during executing alias analysis algorithm, CFG and CG are built; dynamic type information is stored in the type table; alias set is detected on alias computing rules. As shown in Figure 23, the parser reads the example input classes and stores attribute information of the classes by creating each CST. Also, type and scope checkings operations are done during parsing. The syntax tree builder builds a syntax tree of each input and modifies its CST. Our alias analysis algorithm shown in Figure 20 uses the information of CSTs constructed on parsing so that it builds CFG and CG 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and that it converges all hidden alias sets of each class and then collects final alias set. As a result, our alias detector detects alias set of given Java codes. 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5 Performance Evaluation 5.1. Simulation Environment To execute existing object-pair [CS95, PR96, CR97] and our Reference-Set representation for benchmark in section 4.2, three hosts at USC are selected: Ceng, Asadal, and Kottos. Table 2 presents properties of the hosts in host type, architecture, operating system, and Java virtual machine. The application codes are compiled by IBM JDK-1.1.7A Just-in-Time compiler. All benchmark codes are executed 10 times on those hosts and the average data of results are collected. Kottos is a SP2 distributed memory system with IBM Power chip. It has an IBM AIX version 4.2 as an operating system. AIX 4.2 installs IBM Just-In-Time (JIT) Compiler and its Java virtual machine is an IBM JDK-1.1.1 version. Ceng is a Sun Sparc system with Sun operating system 5.6. It has Java runtime environment JDK-1.2. l_02 and Sun Just-In-Time compiler. Asadal is a Windows NT system with a single processor. It consists of x86 processor, Celeron 500, and Windows NT operating system. Symantec Just in Time (JIT) Compiler and its Java virtual machine, JDK 1.2.2 version, works on Asadal. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 2: Characteristics of hosts Kottos Ceng Asadal Host Type RS6000 Sun4 Windows 2000 Architecture Power Sparc x86 OS AIX4 SunOS5.6 Windows NT Java VM IBM VM JDK-1.1.1 Solaris VM JDK- 1.2. 1_02 Windows VM JDK-1.2.2 a OP (DCG) |R S (DCG) n O P (RT) n R S (RT) a OP (EB) q RS (EB) Figure 24. Execution Time of Benchmark (RS 6000) 5.2. Simulation Results This section presents simulation results of benchmark codes for existing object- pair (OP) and our Reference-Set (RS) representations executed on three USC hosts: Kottos, Ceng, Asadal. Also, the execution time of each code is compared for the hosts. Benchmark codes are represented as DCG for Dynamic CG, RT for Ray Tracer, EB for Exception Block, BT for Binary Tree, and RT for Recursive Call. 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. B Object Pair B Reference Set Figure 25. Execution Time of Benchmark: Binary Tree (RS 6000) Figure 26. Execution Time of benchmark (Sun4) To measure execution time of our Reference-Set representation, we have used alias detection system in Figure 23. For object-pair representation, we have used the basic frame of alias detection system in Figure 23. But, its type table and call statement rule are built on the structure of Carini’s [CSH95]. Except that structure, the 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 164600.4 B Object Pair b Reference Set Figure 27. Execution Time of benchmark: Binary Tree (Sun4) efficiencies of object-pair and our Reference-Set representations are measured on the same environment to maintain the preciseness and safety of alias detections fair. 5.2.1. Kottos Figure 24 shows execution time of three benchmark codes: DCG, RT, and EB. It shows that the execution time of Reference-Set representation is slightly less than object-pair for each code. However, in case of Ray Tracer code, the execution time of object-pair representation is slightly less than Reference-Set. Figure 25 presents execution time of each representation at depth 0 ,1 ,2 ,3 , and 6 of BT that contains recursive calls. Whenever the depth becomes larger, the execution time of object-pair representation increases exponentially and at the depth 6, it is not measurable because USC computing service does not allow it to run more than 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 500 450 400 350 300 250 200 150 100 50 0 OOP (DC Figure 28. Execution Time of benchmark (Windows 2000) 31237.5 700 600 500 8 400 E 300 200 100 0 Depth 0 1 2 3 6 □ Object Pair ■ Reference Set Figure 29. Execution Time of benchmark: Binary Tree (Windows 2000) minutes. The execution time of Reference-Set is slightly increased and it is much less than the time of object-pair representation. 5.2.2. Ceng Figure 26 demonstrates execution time of three benchmark codes: DCG, RT, and EB on Ceng hosts. As similar to Kottos, the execution time of Reference-Set 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Dynamic CG B OP (Kottos) o R S (Kottos) d OP (Ceng) OOP (Ceng) BOP (Asadal) oR S (Asadal) Figure 30. Execution Time of Dynamic CG for Architectures Table 3: The Comparison of Dynamic CG in Figure 30 Kottos Ceng Asadal Object-Pair 1.02 2.23 1.42 Reference-Set 1 2.19 1.38 representation is slightly less than object-pair for each code. However, in case of Exception Block code, the execution time of object-pair representation is slightly less than Reference-Set. But, in Ray Tracer, Reference-Set representation has less execution time. In Figure 27 as in Kottos, whenever the depth becomes larger, the execution time of object-pair representation increases exponentially. However, until depth 2, the execution time of object-pair representation increases slightly. The execution time of Reference-Set is slightly increased and it is much less than the time of object-pair representation. 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Ray Tracer OOP (Kottos) s R S (Kottos) q OP (Ceng) □ RS (Ceng) o OP (Asadal) Q RS (Asadal) Figure 31. Execution Time of Ray Tracer for Architectures Table 4: The Comparison of Ray Tracer in Figure 31 Kottos Ceng Asadal Object-Pair 0.95 1.65 1.45 Reference-Set 1 1.64 1.45 5.2.3. Asadal In Figure 28, the execution time on host Asadal of three benchmark codes: DCG, RT, and EB are presented. As similar to Kottos and Ceng, the execution time of Reference-Set representation is slightly less than object-pair for each code. However, in case of Exception Block code, the execution time of object-pair representation is slightly less than Reference-Set as in Ceng. In Figure 29 as in Kottos and Ceng, whenever the depth becomes larger, the execution time of object-pair representation increases exponentially. However, until depth 3, the execution time of object-pair 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Exception Block H OP (Kottos) 0 R S (Kottos) a O P (Ceng) □ RS (Ceng) a OP (Asadal) g RS (Asadal) Figure 32. Execution Time of Exception Block for Architectures Table 5: The Comparison of Exception Block in Figure 32 Kottos Ceng Asadal Object-Pair 1 1.94 1.45 Reference-Set 1 2.02 1.48 representation increases slightly. The execution time of Reference-Set is slightly increased and it is much less than the time of object-pair representation. 5.2.4. Comparison on Architectures This section compares the execution time of each benchmark among the USC hosts: Kottos, Ceng, and Asadal. Table 3 presents the relative execution times of Dynamic CG. All data is normalized on Reference-Set representation of Kottos. It shows that Kottos JVM is fastest and Ceng is slowest for Dynamic CG Also, for all hosts, the execution time of Reference-Set is 1.8 ~ 2.8% faster than object-pair 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 600 500 400 £ 300 E 200 100 0 0 OP (Kottos) &RS (Kottos) dO P (Ceng) □ RS (Ceng) a OP (Asadal) QRS (Asadal) Figure 33. Execution Time of Binary Tree at Depth 0 for Architectures Table 6: The Comparison of Binary Tree: Depth 0 in Figure 33 Kottos Ceng Asadal Object-Pair 1.14 1.68 1.78 Reference-Set I 1.72 1.59 because our Type Table is more efficient to search possible types of methods than existing Carini’s [CSH95], Table 4 presents the execution times of Ray Tracer. It also shows that Kottos JVM is fastest and Ceng is slowest for Ray Tracer. Also, for all hosts, the execution time of Reference-Set is slightly faster than but almost same as object-pair. It implies that the benchmark codes such as Ray Tracer, which do not contain many aliased references among objects and which is for JVM performance measurement, do not 99 IsWAMtj Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 600 500 400 H 300 E 200 100 0 B O P (Kottos) s R S (kottos) d O P (Ceng) □ RS (Ceng) B OP (Asadal) □ RS (Asadal) Figure 34. Execution Time of Binary Tree at Depth 1 for Architectures Table 7: The Comparison of Binary Tree: Depth 1 in Figure 34 Kottos Ceng Asadal Object-Pair 1.44 1.57 1.61 Reference-Set 1 1.57 1.53 have any big difference in the execution time of alias analysis for any alias representation. Table 5 presents the execution times of Exception Block. For all hosts, the execution time of object-pair is almost same as or 4% faster than Reference-Set. It means that our algorithm will take longer to compute aliases if a benchmark code does not contain possible dynamic types for references. Table 6 presents the execution times of Binary Tree at depth 0. The execution time of Reference-Set is 10 and 12% faster than object-pair on Asadal and Kottos too Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1200 1000 800 | 600 400 200 0 H OP (Kottos) s R S (Kottos) QOP (Ceng) □ RS (Ceng) a OP (Asadal) q RS (Asadal) Figure 35. Execution Time of Binary Tree at Depth 2 for Architectures Table 8: The Comparison of Binary Tree: Depth 2 in Figure 35 Kottos Ceng Asadal Object-Pair 3.13 1.51 1.51 Reference-Set 1 1.43 1.29 each other But the execution time of object-pair is 3% faster than Reference-Set on Ceng. Table 7 presents the execution times of Binary Tree at depth 1. The execution time of Reference-Set is same as or 5-31% faster than object-pair on Ceng, Asadal and Kottos. Table 8 presents the execution times of Binary Tree at depth 2. The execution time of Reference-Set is 60%, 5.3%, and 14.6% faster than object-pair on Kottos, Ceng, and Asadal respectively. 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6201.125 OOP (Kottos) ORS (Kottos) OOP (Ceng) □ RS (Ceng) o OP (Asadal) QRS (Asadal) Figure 36. Execution Time of Binary Tree at Depth 3 for Architectures Table 9: The Comparison of Binary Tree: Depth 3 in Figure 36 Kottos Ceng Asadal Object-Pair 17.9 2.10 1.62 Reference-Set 1 1.41 1.47 Table 9 presents the execution times of Binary Tree at depth 3. The execution time of Reference-Set is 17.9, 1.5, and 1.1 times faster than object-pair on Kottos, Ceng, and Asadal respectively. Table 10 presents the execution times of Binary Tree at depth 6. The execution time of Reference-Set is 328 and 55 times faster than object-pair on Ceng and Asadal respectively. For Kottos, object-pair is not measurable because the execution time is more than 15 minutes. From Table 6 to Table 10, it shows that our alias analysis algorithm with Reference-Set representation has better performance comparing to existing object-pair representation for benchmark codes such as Binary Tree, which 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 164600.4 31237.5 600 § 400 E 200 0 HOP (Kottos) 0 R S (Kottos) OOP (Ceng) □ RS (Ceng) h OP (Asadal) □ RS (Asadal) Figure 37. Execution Time of Binary Tree at Depth 6 for Architectures Table 10: The Comparison of Binary Tree: Depth 6 in Figure 37 Kottos Ceng Asadal Object-Pair NA 357.63 67.87 Reference-Set 1 1.09 1.23 contains accumulated aliased objects inside many conditional and recursive call statements. From Figure 33 to Figure 37, we have shown that reference-set representation has remarkable response times particularly in dynamic codes such as Binary Tree and in their depths of recursive calls. We have presented an additional result of the second dynamic code, Recursive Call, from Figure 38 to Figure 42 on its depth 0, 1,2,3, and Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. HOP (Kottos) HRS (Kottos) □ OP (Ceng) □ RS (Ceng) h OP (Asadal) H RS (Asadal) Figure 38. Execution Time of Rec Call Depth 0 for Architectures Table 11: The Comparison of Rec Call: Depth 0 in Figure 38 Kottos Ceng Asadal Object-Pair 1.21 1.84 1.79 Reference-Set I 1.8 1.28 HOP (Kottos) s R S (Kottos) □ OP (Ceng) EjRS (Ceng) q OP (Asadal) a RS (Asadal) Figure 39. Execution Time of Rec Call Depth 1 for Architectures Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 12: The Comparison of Rec Call: Depth 1 in Figure 39 Kottos Ceng Asadal Object-Pair 2.03 1.69 1.56 Reference-Set 1 1.68 1.16 • an a OP (Kottos) S R S (Kottos) DOP (Ceng) a RS (Ceng) a OP (Asadal) a RS (Asadal) Figure 40. Execution Time of Rec Call Depth 2 for Architectures Table 13: The Comparison of Rec Call: Depth 2 in Figure 40 Kottos Ceng Asadal Object-Pair 6.29 1.98 1.56 Reference-Set 1 1.55 1.14 6 in order to make sure if the reference-set representation has better efficiency than the object-pair representation for dynamic statements such as recursive call and conditional statements. 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 13002.25 700 600 500 g 400 « 0 E 300 200 100 0 S O P (Kottos) SRS (Kottos) QOP (Ceng) □ RS(Ceng) sO P (Asadal) □ RS (Asadal) Figure 41. Execution Time of Rec Call Depth 3 for Architectures Table 14: The Comparison of Rec CaU: Depth 3 in Figure 41 Kottos Ceng Asadal Object-Pair 37.48 4.92 2.7 Reference-Set 1 1.93 1.06 Table 11 presents the execution times of Recursive Call at depth 0. The execution time of Reference-Set is 2 - 40% faster than object-pair depending on the hosts. Table 12 presents the execution times of Recursive Call at depth I. The execution time of Reference-Set is 2.03, 1, and 1.34 times faster than object-pair on Kottos, Ceng, and Asadal respectively. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 568723.8 61817.5 B OP (Kottos) b RS (Kottos) □ OP (Ceng) □ RS(Ceng) □ OP (Asadal) Q RS (Asadal) Figure 42. Execution Time of Rec Call Depth 6 for Architectures Table 15: The Comparison of Rec Call: Depth 6 in Figure 42 Kottos Ceng Asadal Object-Pair NA 1292.92 140.53 Reference-Set 1 1.87 0.94 Table 13 presents the execution times of Recursive Call at depth 2. The execution time of Reference-Set is 6.29, 1.28, and 1.37 times faster than object-pair on Kottos, Ceng, and Asadal respectively. Table 14 presents the execution times of Recursive Call at depth 3. The execution time of Reference-Set is 37.48, 2.55, and 2.55 times faster than object-pair on Kottos, Ceng, and Asadal respectively. Table 15 presents the execution times of Recursive Call at depth 6. The execution time of Reference-Set is 691 and 150 times faster than object-pair on Ceng and 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Asadal respectively. For Kottos, object-pair is not measurable as in Table 10. From Table 11 to Table 15, it presents the similar result to in Binary Tree. It shows that our alias analysis algorithm with Reference-Set representation has better performance comparing to existing object-pair representation in particular for benchmark codes such as Binary Tree and Recursive Call, which contains accumulated aliased objects inside many conditional and recursive call statements. Also, benchmark results [CaffeineMark, JavaGrande, SciMark] show that JVM of Windows NT has the best score and JVM of AIX has the worst score among AIX, Sparc, and NT. Our results meet with theirs particularly in object-pair representation for Binary Tree and Recursive Call when those have the larger depths such as depth 2, 3, and 6 that make those codes iterate much longer. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 Conclusion 6.1. Summary Peoples are interested in Java and its paralleiization because it is platform independent and it is well applicable to high performance computing on the Internet. Object parallelism in Java will give better solution rather than fine-grained parallelism for high performance computing of Java since communication latency is a critical factor on the Internet. By detecting aliased references statically in Java environment, it is possible to exploit instruction level parallelism and to avoid side effects, a context switch overhead, and a communication overhead for distributed and cluster computing. Our alias analysis among objects in Java can be applicable to the parallelizing of java by detecting possible side effects. Studies have shown that the integration of alias analysis and type inference makes alias analysis more precise, specially for virtual functions in C++ because C++ is an object-oriented language and it includes a large number of inheritances. Unfortunately, conventional type information of C++ is not sufficient for more precise detections of aliases among object-oriented languages because existing object-pair alias relations are not well applicable to Java, because Java and C++ are Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. different in dereferencing, and because there is the property of inheritance such as shadowed variables as well as overridden methods in Java. We have presented reference-set alias representation for Java since existing alias representations have been built for pointer usage language such as C/C++ and then are not efficient for Java that consists of references only for objects. We also have presented CFGs of exception constructs with PESs. Then, we have proposed our flow sensitive alias analysis algorithm by adapting existing alias analyses [CS95, BCC97] for C/C++ to for Java. The algorithm is more precise and efficient than previous works [BCC97, CBC93, CR97, CS95, EGH94, PR94, PR95, WWGOO] based on the reference-set alias representation and its associated type table and data propagation rules. Besides, the algorithm is the safe alias analysis including exception statements. By proposing our additional type information and by combining our adopted alias analysis algorithm with type information, we can detect shadowed variables which cannot be detected through conventional means as well as overridden methods. This algorithm detects more precise alias sets for both shadowed variables and overridden methods. Our algorithm also regards constructor as procedure in order to analyze the shadowed variables so that calling graph contains constructors to compute the alias set of each constructor by using our proposed equation. Its efficiency is not negatively affected even though the precision is improved by adding extra type information. H O Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Our work is the first implementation of alias analysis with type inference for Java as best as can be determined. The type information and rules for reference-set representation presented in this thesis are applicable to C++. In the complexity analysis, we have shown that our reference-set alias representation is more precise and efficient for a type inference and data propagation rules that are main parts of the alias analysis algorithm. By using structural traverse of a CFG■ the algorithm achieves additional efficiency, surpassing previous work. Besides, possible multithreading solution is proposed. Finally, we have built our alias algorithm in Java with JavaCC parser and JTB syntax tree builder and executed benchmark codes. The first experimental result on Dynamic CG shows that our dynamic type determination is safe as object-pair representation. The second result on Ray Tracer presents that our alias analysis does not show any improvement of efficiency in regular application codes which do not contain many aliases. The third result on Exception Block shows that our analysis succeeds in analyzing exceptions in Java. But, it also shows that if a code does not have references with many dynamic types and aliases, our analysis might be less efficient than object-pair representation. The final result on Binary Tree and Recursive Call shows that if a code does have references with many aliases and objects generation, our analysis should be much more efficient than object-pair representation. ill Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.2. Future Work We have implemented a parser for JDK-1.0.2, its syntax tree, and semantic analysis of an alias analysis algorithm for Java codes. Several Java codes are used as benchmarks in order to prove that the precision and efficiency of our algorithm is improved comparing to existing algorithm and its alias representation. Also, exception construct codes in Java are applied to detect aliases. However, there are some issues that people are interested in and can apply our work to next phase as follows. For the future work, first, people, including us, can extend the JDK-1.0.2 in this thesis to JDK-1.1.x, JDK-1.2.X, and JDK-1.3. For it, they need to build or update a parser, syntax tree, and alias analysis algorithm based on this thesis. Then, they can execute benchmark codes to collect alias data. Second, the solution of multithreading issues that is presented in this paper needs to be enhanced. Our solution might have possible performance issue. Even though multithreading is too dynamic to analyze, there might have some ways to overcome it. Third, we have tested the benchmark codes on architectures of IBM Power, Sun Sparc, and Intel Celeron. Those can be tested on the other architectures. Finally, our reference-set representation and type table are not only for Java. People can apply them to other object-oriented languages and compare the results with existing works. 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bibliography [ADM98] O. Agesen, D. Detletfs, J. E. B. Moss, Garbage Collection and Local Variable Type-Precision and Liveness in Java Virtual Machines, ACM on Programming Language Design and Implementation, 1998. [AISS97] A. D. Alexandrov, M. Ibel, K. E. Schauser, and C. J. Scheiman, SuperWeb: Research Issues in Java-Based Global Computing, Concurrency: Practice and Experience, June 1997. [APS95] O. Agesen, J. Palsberg, and M. I. Schwartzback, Type Inference of Parametric Polymorphism, In proceedings ECOOP’95, Aarhus, Aug. 1995. [ASU86] A. V . Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley Publishing Company, 1986. [Bann79] J.P. Banning, An efficient way to find the side effects of procedure calls and the aliases of variables, Proc. Sixth POPL, Jan., 1979. [BCC94] M. Burke, P. R. Carini, and J. Choi, Efficient Flow-Insensitive Alias Analysis in the Presence of Pointers, IBM Research Report, RC 19546, 1994. [BCC97] M. Burke, P. R. Carini, and J. Choi, Interprocedural Pointer Alias Analysis, IBM Research Report, RC 21055, 1997. [BS96] D. F. Bacon and P. F. Sweeney, Fast Static Analysis of C++ Virtual Function Calls, OOPSLA’96 Conference Proceedings: Object-Oriented Programming Systems, Languages, and Applications, ACM SIGPLAN Notices volume 31 number 10, San Jose, California, October 1996. [CaffeineMark] CaffeineMark 3.0, Pendragon Software, http://www.pendragon- software.com/pendragon/cm3/results.html [Cann89] D. C. Cann. Compilation Techniques for High Performance Applicative Computation, Technical Report, CS-89-108, Colorado State University, 1989. [CBC93] J. Choi, M. Burke, and P. Carini, Efficient Flow-Sensitive Interprocedural Computation of Pointer-Induced Aliases and Side Effects, The Twenties Annual ACM SIGACT SIGPLAN Symposium on Principles of Programming Languages, Jan. 1993. 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [CBR96] D. Caromel, F. Belloncle, and Y. Roudier. The C++// language, chapter 7, Parallel Programming using C++, Massachusetts Institute of Technology, 1996. [CCINSW97] B. Christiansen, P. Cappello, M. F. Ionescu, M. O. Neary, K. E. Schauser, and D. Wu. Javelin: Internet-Based Parallel Computing Using Java, 1997 ACM Workshop on Java for Science and Engineering Computation, June 1997. [CCINSW97] B. O. Christiansen, P. Cappello, M. F. Ionescu, M. O. Neary, K. E. Schauser, and D. Wu, Javaelin: Internet-Based Parallel Computing Using Java, ACM 1997 Workshop on Java for Science and Engineering Computation June 21-97 PPoPP Las Vegas. [CDG96] C. Chambers, J. Dean, and D. Grove, Whole-Program Optimization of Object-Oriented Languages, Dept, of Computer Science, Univ. of Washington, Technical Report, UW-CSE-96-06-02. [CGHS99] J. Choi, D. Grove, M. Hind, and V . Sarkar. Efficient and Precise Modeling of Exceptions for the Analysis of Java Programs. Proceedings of the ACM SIGPLAN SIGSOFT workshop on Program analysis for software tools and engineering September 6,1999. [CSH95] Paul R. Carini, Michael Hind, and Harini Srinivasan, Flow-Sensitive Type Analysis for C++, IBM Research Report, RC20267, Nov. 1995. [CK88] K. D. Cooper and K. Kennedy, Interprocedural side-effect analysis in linear time, Proc. SIGPLAN 88 conference on Programming Language Design and Implementation, SIGPLAN Notices 23(7), July 1988. [CK89] K. D. Cooper and K. Kennnedy, Fast Interprocedural Alias Analysis, In Conference Record of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, p49-59, Jan. 1989. [CKV98] D. Caromel, W. Klauser, and J Vayssiere. Towards seamless computing and metacomputing in Java, pp. 141-150, in ACM Workshop “Java for High- Performance Network Computing” , February 28— March 1 1998, Stanford University, Palo Alto, California. [CR97] R. Chatteijee and B. Ryder, Scalable, flow-sensitive type for statically typed object-oriented languages, Technical Report, DCR-TR-326, Rutgers University, Aug. 1997. 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [CV98] D. Caromel, J. Vayssiere, A Java Framework for Seamless Sequential, Multi threaded, and Distributed Programming pp. 141-150, in ACM Workshop “Java for High-Performance Network Computing", 1998, February 28— March 1. [DD97] Java how to program. Deitel & Deitel, Prentice-Hall International, Inc, 1997. [DDGLC96] J. Dean, G DeFouw, D. Grove, V . Litvinov, and C. Chambers, Vortex: An Optimizing Compiler for Object-Oriented Languages, OOPSLA’96 Conference Proceedings: Object-Oriented Programming Systems, Languages, and Applications 1996. [DGC97] G DeFouw, D. Grove, and C. Chambers, Fast Interprocedural Analysis, Dept, of Computer Science and Engineering, Univ. of Washington, Technical Report 97-07-02, July 1997. [DMM98] A. Diwan, K. S. McKinley, and J. E. B. Moss, Type-Based Alias Analysis, Proc. SIGPLAN 98 conference on Programming Language Design and Implementation, 1998. [EGH94] M. Emami, R. Ghiya, and L. J. Hendren, Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers, In the Proceedings of the ACM SIGPLA’94 Conference on Programming Language Design and Implementation, June 20-24, 1994. [Flan97] David Flanagan, Java in a nutshell, 2nd Edition, O’REILLY, May 1997. [GDDC97] D. Grove, G DeFouw, J. Dean, and C. Chambers, Call Graph Construction in Object-Oriented Languages, OOPSLA ‘97 Conference Proceedings: Object-Oriented Programming Systems, Languages, and Applications 1997. [HA96] U. Holzle and O. Agesen, Dyniamic vs. Static Optimization Techniques for Object-Oriented Languages, Theory and Practice of Object Systes, 1(3), 1996. [JavaCC98] Sun Micro Systems, Java Compiler Compiler, The parser Generator, http://www.suntest.com/JavaCC/, Version 0.8pre2. April 21, 1998. [JTB00] Purdue University, West Lafayette, Indiana, USA, Java Tree Builder, http:// www.cs.purdue.edu/jtb/index.html, May 15,2000. [JavaGrande] Java Grande Forum, http://www.javagrande.org/ 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [Kenn97] Ken Kennedy, Advanced Compilation for Vector and Parallel Processors, Lecture note for Comp515, Computer Science, Rice University. [LR92] W. Landi and B. G Ryder, and S. Zhang, A Safe Approximating Algorithm for interprocedural Pointer Aliasing. Proceedings of the ACM SIGPLAN ‘92 Conference on Programming Language Design and Implementation, p235-248, June 1992. [LRZ92] W. Landi, B. G Ryder, and S. Zhang, Interprocedural Modification Side Effect Analysis With Pointer Aliasing, Technical Report, LCSR-TR-195, Rutgers University, Nov. 1992. [Myers81] E. W . Myers, A Precise Inter-procedural Data Flow Algorithm, In 8th Annual ACM Symposium on the Principles of Programming Languages, 1981. [Much97] Steven S. Muchnick, Advanced Compiler Design Implementation, Morgan Kaufmann Publishers, June 1997. [PC94] J. Plevyak and A. A. Chien, Precise Concrete Type Inference for Object- Oriented Languages, OOPSLA’94 Conference Proceedings: Object-Oriented Programming Systems, Languages, and Applications 1994. [PC95] J. Plevyak and A. A. Chien, Type Directed Cloning for Object-Oriented Programs, workshop for Languages and Compilers for Parallel Computers, Columbus, Ohio, August 1995. [PR95] Hemnat D. Pande and Barbara G Ryder, Static Type Determination and Aliasing for C++, Technical Report, LCSR-TR-250-A, Rutgers University, Oct. 1995. [PS91] Jens Palsberg and Michael I. Schwartzbach, Object-Oriented Type inference. In OOSLA’91, Object-Oriented Programming Systems, Languages, and Applications, p 146-161, Phoenix, AZ, Oct. 1991. [ProActive] Inria, Sophia, France, http://www.inria.fr/oasis/ProActive [Rosen79] Barry K. Rosen. Data flow analysis for procedural languages. JACM. 26(2):322-344, April 1979. [RR99] R. Rugina and M. Rinard, Pointer Analysis for Multithreaded Programs, In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999. 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [SciMark] SciMark 2.0, National Institute Standards and Technology, http:// math.nist.gov/cgi-bin/ScimarkSummary [WACGW01] Jongwook Woo, Isabelle Attali, Denis Caromel, Jean-Luc Gaudiot, and Andrew L Wendelbom, Alias Analysis On Type Inference For Class Hierarchy In Java, The 24th Australian Computer Science Conference, ACSC 2001, Jan 29-Feb 2, 2001. [WL95] R. P. Wilson and M. S. Lam, Efficient Context-SEnsitive Pointer Analysis for C Programs, In Proceedings of the ACM SIGPLAN’95 Conference on Programming Language Design and Implementation, June 18-21, 1995. [WWACGW01] Jongwook Woo, Jehak Woo, Isabelle Attali, Denis Caromel, Jean- Luc Gaudiot, and Andrew L Wendelbom, Alias Analysis for Java with Reference-Set Representation, accepted on International Conference on Parallel and Distributed Systems, June 26-29 2001. [WWGOO] Jehak Woo, Jongwook Woo, and Jean-Luc Gaudiot, Flow-Sensitive Alias Anlysis with Referred-set Representation for Java, The Fourth International Conference/Exhibition on High Performance Computing in Asia-Pacific Region, HPC-ASIA 2000, May 14-17, 2000 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Architectural support for network -based computing
PDF
Adaptive dynamic thread scheduling for simultaneous multithreaded architectures with a detector thread
PDF
Efficient PIM (Processor-In-Memory) architectures for data -intensive applications
PDF
Deadlock recovery-based router architectures for high performance networks
PDF
Architectural support for efficient utilization of interconnection network resources
PDF
Automatic code partitioning for distributed-memory multiprocessors (DMMs)
PDF
Consolidated logic and layout synthesis for interconnect -centric VLSI design
PDF
I -structure software caches: Exploiting global data locality in non-blocking multithreaded architectures
PDF
A framework for coarse grain parallel execution of functional programs
PDF
Induced hierarchical verification of asynchronous circuits using a partial order technique
PDF
Automatic array partitioning and distributed-array compilation for efficient communication
PDF
Functional testing of constrained and unconstrained memory using march tests
PDF
Decoupled memory access architectures with speculative pre -execution
PDF
Clustering techniques for coarse -grained, antifuse-based FPGAs
PDF
Encoding techniques for energy -efficient and reliable communication in VLSI circuits
PDF
A unified mapping framework for heterogeneous computing systems and computational grids
PDF
Full vectorial finite element analysis of photonic crystal devices: Application to low -loss modulator
PDF
Fault simulation and multiple scan chain design methodology for systems -on -chips (SOC)
PDF
Content -based video analysis, indexing and representation using multimodal information
PDF
Improving memory hierarchy performance using data reorganization
Asset Metadata
Creator
Woo, Jongwook
(author)
Core Title
Alias analysis for Java with reference -set representation in high -performance computing
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Computer Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Computer Science,engineering, electronics and electrical,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Gaudiot, Jean-Luc (
committee chair
), Gupta, Sandeep (
committee member
), Ierardi, Douglas (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-167972
Unique identifier
UC11330364
Identifier
3054828.pdf (filename),usctheses-c16-167972 (legacy record id)
Legacy Identifier
3054828.pdf
Dmrecord
167972
Document Type
Dissertation
Rights
Woo, Jongwook
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical