Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Energy optimization of mobile applications
(USC Thesis Other)
Energy optimization of mobile applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ENERGY OPTIMIZATION OF MOBILE APPLICATIONS by Ding Li A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2016 Copyright 2016 Ding Li Acknowledgements Pursuing the Ph.D. is a hard journey. I would not be able to finish it without the support of many people. Here, I would like to thank all of them. Among all those people who have helped and supported me, I would like to thank a few of them specially. First and foremost, I would like to thank my advisor, Prof. William G. J. Halfond, for his excellent guidance. I appreciate the opportunity of being his student and feel extremely lucky to have his advisory. I have learned countless things from him, which include but are not limited to making presentations, paper writing, idea organizing, computer science techniques, and English. His kindness and patient have accompanied me for five-years. My memories of my Ph.D. life will always include the appearance of Los Angeles at four o’clock and Prof. Halfond’s midnight guidance to improve my papers before the deadlines. I will also remember Prof. Halfond’s outstanding personality, his support and encouragement during my Ph.D., and his help in my job hunting. I would like to thank my committee members, Prof. Nenad Medvidovic, Prof. Sandeep Gupta, and Prof. Murali Annavaram for all their support and help. Their feedback and suggestions helped me to finish my dissertation. I would like to thank Prof. Ramesh Govindan for all his help in my Ph.D. life. I want to thank Prof. Yao Guo, who advised my during my undergraduate years in Peking University and supported me to enter the Ph.D. program. I also want to thank my labmates, Sonal Mahajan, Jiaping Gui, Mian Wan, Abdulmajeed Alameer, and Yingjun Lyu. They gave me a lot of happy memories in the lab. I want to thank all of the co-authors of my papers who are not mentioned before. They include Shuai Hao, Yuchen Jin, Cagri Sahin, and Angelica Huyen Tran. I also want to thank Yurong Jiang, who helped me by proofreading this dissertation. I would like to thank my wife, Xiao Li, who is always my strongest supporter. I want to thank my parents, Xiao Cao and Weimin Li, for their continued love and support. Lastly, I would like to thank my roommate Shaodi Wang, and other my best friends in Peking University for their support. These people are Bowen Li, Jingchao Chen, Hanwen Yang, Mo Yu, Geng Li, Xiao Ma, Zechong Xiong, Mo Li, Yizhi Li, Xinhui Tian, Guolong Qu, Danfeng Yang, Han Zuo, Zhao Gao, and Chao Chen. ii Table of Contents Acknowledgements ii List Of Tables vii List Of Figures viii Abstract x Chapter 1: Introduction 1 1.1 The Dissertation Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Major Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3.1 Where is Energy Consumed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.2 What to Optimize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.3 How to Optimize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.3.1 Display Energy Optimization . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.3.2 HTTP Energy Optimization . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.4 String Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2: Energy Measurement 7 2.1 Challenges of Energy Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 Hardware Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Runtime System Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.3 Software-level Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Approach of Energy Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Runtime Measurement Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Offline Analysis Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.2.1 Adjustment to Path Energy Samples . . . . . . . . . . . . . . . . . . . 14 2.2.2.2 Calculating Source Line Energy Values . . . . . . . . . . . . . . . . . . 18 2.2.2.3 Visualizing the Energy Consumption . . . . . . . . . . . . . . . . . . . 20 2.3 Evaluation of Energy Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.1 Subject Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.3 RQ1: Analysis Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.4 RQ2: Measurement Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.4.1 Accuracy of the API Energy Measurements . . . . . . . . . . . . . . . 25 2.3.4.2 Accuracy of Bytecode Energy Distribution . . . . . . . . . . . . . . . . 25 2.3.4.3 Accuracy of Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 iii Chapter 3: Empirical Study 28 3.1 Research Questions of the Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 General Experiment Protocol of the Empirical Study . . . . . . . . . . . . . . . . . . . . 31 3.3 Evaluation Result of the Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.1 RQ 1: How much energy is consumed by individual applications? . . . . . . . . . 33 3.3.2 RQ 2: How much energy is consumed by the idle state of an application? . . . . . 35 3.3.3 RQ 3: Which code consumes more energy: system APIs or developer-written code? 36 3.3.4 RQ 4: How much energy is consumed by the different components of a smartphone? 37 3.3.5 RQ 5: Which APIs are significant in terms of energy consumption? . . . . . . . . 39 3.3.5.1 How many APIs are significant in energy consumption? . . . . . . . . . 39 3.3.5.2 How similar are the top ten most energy consuming APIs across differ- ent applications? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.5.3 Which APIs are the most likely to be in the top ten most energy con- suming APIs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.6 RQ 6: How much energy is consumed by code in loops? . . . . . . . . . . . . . . 41 3.3.7 RQ 7: How much energy is consumed by the different types of bytecodes? . . . . 42 3.3.8 RQ 8: Is time equal to energy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.9 RQ 9: What granularity of measurement is sufficient? . . . . . . . . . . . . . . . 43 3.3.10 RQ 10: Is it necessary to account for idle state energy? . . . . . . . . . . . . . . . 44 3.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Chapter 4: Display Energy Otimization 47 4.1 Motivating Example for Display Energy Optimization . . . . . . . . . . . . . . . . . . . . 50 4.2 Overview of the Approach for Display Energy Optimization . . . . . . . . . . . . . . . . 51 4.3 HTML Output Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.1 The HTML Output Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3.2 String Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.3 The HTML Adjacency Relationship Graph . . . . . . . . . . . . . . . . . . . . . 53 4.3.3.1 Getting the HTML Tag Graph . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.3.2 Getting the HTML Adjacency Relationship Graph . . . . . . . . . . . . 55 4.4 Color Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.1 The Definition of Color Conflict Graph . . . . . . . . . . . . . . . . . . . . . . . 57 4.4.2 Building the CCG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.4.3 Generating the CTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.5 Output Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.6 Evaluation for Display Energy Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.6.1 Subject Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.6.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.6.3 RQ1: Time Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.6.4 RQ2: Energy Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6.5 RQ3: Runtime Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.6.6 RQ4: User Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.7 Threat to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Chapter 5: HTTP Energy Optimization 71 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2 Overview of the Approach for HTTP Energy Optimization . . . . . . . . . . . . . . . . . 73 5.3 SHRS Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3.1 Definition of an Sequential HTTP Requests Session (SHRS) . . . . . . . . . . . . 73 5.3.2 Intra-procedural Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.3.3 Inter-Procedural Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 iv 5.4 Bundling Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4.1 String Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4.2 Rewriting HTTP API Invocations . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.4.3 Generating the Tester . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.4.4 Generating the Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.5 Runtime Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.5.1 The The Agent HTTP APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.5.2 The Proxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.5.3 Maintaining Server Side States . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.5.4 Handling Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.6 Evaluation for HTTP Energy Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.6.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.6.2 Subject Apps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.6.3 RQ 1: Energy Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.6.3.1 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.6.3.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.6.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.6.4 RQ 2: Manual Effort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.6.5 RQ 3: Analysis Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.6.6 RQ 4: Runtime Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.6.7 RQ 5: Pervasiveness of SHRSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.7 Threats To Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Chapter 6: String Analysis 94 6.1 Motivation for String Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.1.1 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.1.2 Context-Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.1.3 Flexible Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.1.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.2 Approach of String Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.2.1 The Intermediate Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.2.2 Interpretation of the Intermediate Representation . . . . . . . . . . . . . . . . . . 100 6.2.3 Getting the Intermediate Representation . . . . . . . . . . . . . . . . . . . . . . . 101 6.2.3.1 Intra-procedural Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.2.3.2 General Region Processing . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2.3.3 Processing Loop Regions . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2.3.4 Inter-procedural Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.2.4 Illustrative Example of the Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.3 Evaluation for String Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.3.2 Experiment Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.3.3 RQ 1: Accuracy on Various Types of Data Flow . . . . . . . . . . . . . . . . . . . 110 6.3.4 RQ 2: Accuracy for Basic String Operations . . . . . . . . . . . . . . . . . . . . 112 6.3.5 RQ 3: Accuracy on Realistic Apps . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.3.6 RQ 4: Analysis Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 v Chapter 7: Related Work 116 7.1 Energy Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.2 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 7.3 Display Energy Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 7.4 HTTP Energy Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.5 Other Energy Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.6 Other Web App Transformation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.7 String Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Chapter 8: Conclusion 123 Chapter 9: Future Work 125 References 126 vi List Of Tables 2.1 Subject apps of energy measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 Time and accuracy of vLens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1 Subject apps for display energy optimization . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2 Result for user study for display energy optimization . . . . . . . . . . . . . . . . . . . . 68 5.1 Subject apps for HTTP energy optimization . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.2 Analysis time of HTTP energy optimization . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.1 Subject apps for string analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.2 Time cost of Violist vs. JSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 vii List Of Figures 2.1 Overview of the vLens approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Using longer execution windows for calculating the energy of invocations with a short execution time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 API invocations with tail energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Concurrent threads during an API invocation. . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 Visualization of the energy measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6 Comparison of API energy cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1 The structure of research questions of the first aspect . . . . . . . . . . . . . . . . . . . . 29 3.2 Categorization of the subject applications in the empirical study . . . . . . . . . . . . . . 33 3.3 Average application energy consumption by category. . . . . . . . . . . . . . . . . . . . . 34 3.4 Breakdown of non-idle energy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.5 Component level energy usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.6 Frequency distribution of the top ten most energy consuming APIs. . . . . . . . . . . . . . 40 3.7 Distribution of loop energy consumption. . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.8 Distribution of bytecode energy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.1 An example web app. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2 The generated code of Figure 4.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Architecture of Nyx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4 Example HTML Output Graph for Figure 4.1 . . . . . . . . . . . . . . . . . . . . . . . . 52 4.5 Example HARG for Figure 4.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.6 Example HTG for Figure 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.7 Example CCG for Figure 4.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.8 Overhead of Nyx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.9 Before/after screenshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.10 Acceptance rate of the transformed web application . . . . . . . . . . . . . . . . . . . . . 68 5.1 Post dominator tree for Program 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2 Summary of main in Program 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 viii 5.3 Runtime workflow of Bouquet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.4 Energy savings of Bouquet at the whole-app level. . . . . . . . . . . . . . . . . . . . . . . 87 5.5 Energy savings of Bouquet at the SHRS level. . . . . . . . . . . . . . . . . . . . . . . . . 87 5.6 The runtime overhead introduced by Bouquet. . . . . . . . . . . . . . . . . . . . . . . . . 90 6.1 Region tree of string analysis for Program 4 . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.2 Precision of the three string analyses on varied data flows. . . . . . . . . . . . . . . . . . 111 6.3 Precision of the three string analyses on market apps . . . . . . . . . . . . . . . . . . . . 113 ix Abstract Energy is a critical resource for mobile devices. Many techniques have been proposed to optimize the energy consumption of mobile devices at the hardware and system levels. However, only optimizations at the hardware and system level are insufficient. Poorly designed applications can still waste the energy of mobile devices even with fully optimized hardware and system support. In my dissertation work, I proposed multiple techniques to help developers to create energy efficient apps. Particularly, my dissertation work addresses three problems in creating energy efficient apps. The first problem in my dissertation is “where is energy consumed.” Modern mobile apps are very complex. They may contain more than 500,000 lines of code [90]. Thus, it is important to know which parts of the code consume more energy. To address this problem, I developed a source line level energy measurement technique that can report the energy consumption of mobile apps with a very fine granularity. My tech- nique achieved 91% accuracy during the measurement. The second problem in my dissertation is “what to optimize.” Modern mobile apps may use different libraries and invoke thousands of APIs [1]. It is also important to know what kind of libraries and APIs can consume more energy. To address this problem, I conducted an empirical study with 405 Android market apps about how these Android apps consume energy. In this study, I evaluated ten research questions that have motivated my following energy opti- mization techniques. The third problem is “how to optimize.” After knowing where energy is consumed and what to optimize, it is also important to design effective techniques to optimize the energy consump- tion of mobile apps. To address this problem, I developed two automated techniques. The first one can automatically optimize the display energy for mobile web apps and the second one can optimize HTTP energy for Android apps. My display energy optimization technique reduced the energy by 25% and my HTTP energy optimization technique achieved 15% energy savings. In summary, my techniques and the empirical evaluation show that program analysis techniques can help developers to understand how energy is consumed in mobile apps and can also help to optimize the energy consumption of mobile apps. x Chapter 1 Introduction The popularity of mobile apps continues to increase. This popularity is driven, in part, by the innovative ways in which app developers combine sensors and data to provide users with useful and novel functional- ity. However, a problem for developers and users alike is that these apps require a large amount of energy and the mobile devices on which the apps run are constrained by limited battery power. This has led to a tension between adding new features that will attract users, but consume more energy, and minimizing energy costs, at the risk of reducing functionality. The balance has proven hard to attain and it is quite common to see many complaints related to energy consumption in app marketplace reviews [13]. Research advances in battery, hardware, and operating system design have, to some extent, improved the battery life of mobile devices [90]. However, in spite of these optimizations, a poorly coded app can be inefficient and perform numerous unnecessary and costly operations. Unfortunately, many apps are not energy efficient [13]. In online technical reviews and app market comments, we can often see users complain about some apps draining their batteries quickly [102]. These include popular apps, such as Facebook and LinkedIn [12]. According to prior research, at least thousands of apps in the Google Play Market have energy inefficiencies [120]. Higher energy consumption in mobile apps may also be correlated with lower ratings [63]. Thus, improving the energy efficiency of current mobile apps is an important and urgent issue for mobile app developers. The goal of my dissertation is to develop techniques to improve the energy efficiency of mobile ap- plications. Currently, there are several existing techniques to achieve this goal [101, 121, 20]. However, they have limitations due to two reasons. First, developers do not have systematic knowledge about how energy is consumed in mobile apps [99]. Developers often have to use their own knowledge and expe- rience to decide which parts of the apps should be optimized [20]. However, their experiences are not complete and may have errors [88]. Second, many existing techniques require developers to manually follow certain energy efficient programming practices or use certain libraries. For example, remember to 1 release sensors [101]. However, developers may make mistakes or overlook the energy efficient program- ming practices in these manual processes [121]. In my dissertation, I used program analysis techniques to address the limitations of existing techniques. I first developed a technique to measure the energy con- sumption of mobile apps and conducted studies into how energy is consumed in market apps. In my empirical study, I identified several facts and insights that pointed out the major energy consuming parts in mobile apps. Then, based on these discoveries, I used static program analysis techniques to develop two automated energy optimization techniques. My techniques can substantially reduce the amount of manual efforts in energy optimization. 1.1 The Dissertation Statement The thesis statement of my dissertation is: Program analysis techniques can help developers to understand how energy is consumed in mobile apps and can also be used to automatically optimize the energy consumption of mobile apps. My research statement contains two parts. The first part is that program analysis techniques can help mobile app developers to understand the energy consumption of mobile apps. To evaluate this part, I show that program analysis techniques can be used to identify the energy consumption of each software library and detect the potential misuses of them that can cause energy inefficiencies. The second part of my disser- tation statement is that program analysis techniques can automatically optimize the energy consumption of mobile apps. To evaluate this part, I developed two different automated energy optimization techniques for mobile apps. 1.2 Major Problems The first major problem addressed in my dissertation is “where is energy consumed.” For developers who want to optimize the energy consumption of their applications, it is fundamental to know which part of their code contains energy inefficiencies. Existing energy measurement techniques cannot fully address this problem. These techniques either lack fine granularity [121, 158] or are not fast enough to run market apps [114]. None of these techniques can provide source line level energy measurement for mobile apps with a reasonable runtime overhead. The second major problem addressed in my dissertation is “what to optimize.” Modern mobile ap- plications may contain many different software components, such as libraries and API calls. Different developers may also have different programming practices to build mobile apps. Thus, it is critical to understand what software components or programming practices may represent energy issues. To address 2 this problem, researchers have conducted several studies about the energy consumption of system APIs and programming practices [36, 24, 57]. These studies are useful and effective. However, they do not have a large enough set of subject applications and their measurement methods are not fine-grained. In this case, these studies can only cover subsets of software components and programming practices. The third major problem addressed by my dissertation is “how to optimize.” Currently, there are several energy efficient tips for developers to manually optimize the mobile app energy [10]. However, relying on developers to use the energy efficient programming tips has limitations for several reasons. First, it is error-prone. Developers can always make mistakes. These mistakes will nullify the efforts of energy optimization or, even worse, cause new energy problems [101]. Second, the manual process requires devel- opers to have knowledge of energy efficient programming practices and software components. However, in reality, many developers do not have the knowledge to create energy efficient apps [99]. Third, identifying the conditions for optimization and designing effective optimization mechanisms requires detailed analy- sis of the program. Such a detailed analysis may substantially increase the cost of making apps. These reasons motivated the need for automated techniques to optimize the energy consumption of mobile apps. 1.3 Contributions The major contributions of my dissertation are as follows: 1. Where is energy consumed: A measurement technique that can provide source line level energy information for mobile apps. This will be described in Chapter 2. 2. What to optimize: An empirical study that reports the energy consumption for hundreds of Android market apps at the source line level. This will be described in Chapter 3. 3. How to optimize: (a) Display energy optimization: A technique, Nyx, that automatically transforms colors for mo- bile web apps to energy efficient colors while maintaining the usability and attractiveness of the original web apps. This will be described in Chapter 4. (b) HTTP energy optimization: A technique, Bouquet, that automatically bundles small HTTP requests to reduce the energy consumption of making HTTP requests. This will be described in Chapter 5. 4. String analysis: A string analysis framework, Violist, which statically models the string values in Java and Android applications that underpins my display and HTTP energy optimization techniques. This will be described in Chapter 6. 3 1.3.1 Where is Energy Consumed The first contribution of my dissertation is an energy measurement technique, vLens [91], that helps to answer the question of “where is energy consumed.” This technique employs a combination of hardware- based energy measurements and efficient path profiling to correlate energy measurements with the appli- cation’s execution. Then the approach analyzes each of the executed paths to handle high energy events, such as garbage collection and thread switching. The adjusted measurements are then used to perform a regression analysis that maps energy to the individual source lines I also conducted an empirical evaluation on vLens and showed that vLens can measure the energy of mobile apps with 90% accuracy. vLens can also avoid heavy runtime overhead during the execution. In the evaluation, the runtime overhead of vLens was below 4%. Compared with existing measurement techniques [48, 119, 163], vLens improves the measurement granularity from the process level to the source line level. By using vLens, it is possible to measure the energy consumption of a specific part of mobile apps and point out potential energy hotspots at the code level. In fact, vLens was the first technique that could provide source-line level measurements for Android applications. 1.3.2 What to Optimize The second contribution of my dissertation is an empirical study about how market mobile apps consume energy [90]. This study tries to answer the question of “what to optimize.” In this study, I measured the energy consumption of 405 Android market apps at the source line level. Compared with other similar em- pirical studies [36, 24, 57], my study had a much finer measurement granularity and an order of magnitude more subject applications. In my empirical study, I discovered two key insights that inspired my optimization techniques. These two insights are: Insight 1: Mobile apps consume more energy while waiting for user input. Most mobile apps are heavily user-interactive. These apps may pause to wait for user input during execution. In my empirical study, mobile apps consumed on average 62% of the total energy while waiting. Thus, if people want to optimize the energy consumption of mobile apps, only optimizing the code may not be sufficient. One of the effective ways to optimize the energy consumption while waiting for user input is to optimize the display energy. This is because while the apps are waiting for user input, they still need to display the UI to end users. Reducing the overall display energy could reduce the total energy consumed while waiting for user input. 4 Insight 2: Making HTTP requests is one of the most energy consuming operations in mobile apps. HTTP requests are widely used in mobile apps to access data from the cloud. In my empirical study, mobile apps spent 37% of their non-idle state energy in making HTTP requests. 1.3.3 How to Optimize The third contribution of this dissertation includes two techniques to automatically optimize the energy consumption of mobile apps. These two techniques address two aspects of the problem of “how to op- timize.” The basic assumption for these two techniques is that developers may make mistakes or lack adequate expertise. Thus, we need to reduce the amount of manual work as much as possible. The two techniques are based on the two insights I have discovered in my empirical study. My first technique is to optimize the display energy for mobile web apps. This technique is motivated by the first insight that mobile apps consume more energy while waiting for user input and optimizing the display energy can reduce the energy consumed while waiting for user input. The second technique is to optimize the HTTP energy in mobile apps. This is motivated by the second insight that making HTTP requests can consume a lot of energy in mobile apps. 1.3.3.1 Display Energy Optimization The first energy optimization technique is Nyx [94], which automatically optimizes the display energy consumption of mobile web apps. The basic idea of Nyx is to replace the large, light colored background areas of web applications with dark colors (preferably, black) to reduce the energy consumed by OLED screens. The key challenges for this transformation are analyzing the code of mobile web apps to model their output and keeping the aesthetics and usability of mobile web apps during the color transformation. To address these challenges, I first developed a new data structure, HTML Output Graph, with a new algorithm to model all of the HTML pages that can be generated. Then, I proposed a new data structure, Color Conflict Graph, to model the relationships of colors in mobile web apps. Finally, I designed a search-based algorithm to transform colors in mobile web apps to energy efficient colors while keeping the usability and aesthetics. I also evaluated the energy savings achieved by using my technique. In my evaluation, my technique can have a 40% saving for total display power. My technique can maintain the usability and attractiveness of mobile web apps during the transformation. I conducted a user study about the user acceptance of the color transformation. In my study, 97% of users said that they were willing to use the energy efficient web apps before the battery went to critically low and 60% of them said that they preferred the transformed versions in the daily usage. 5 1.3.3.2 HTTP Energy Optimization The second energy optimization technique is Bouquet [92], which automatically optimizes the HTTP energy consumption of Android apps. This is achieved by automatically detecting and bundling multiple HTTP requests. In Bouquet, I first identified a pattern of HTTP requests, Sequential HTTP Request Session (SHRS), that can be optimized with static program analysis techniques. Then I designed a set of algorithms to automatically model the URLs and parameters of HTTP requests. Finally, I designed a proxy based technique to automatically bundle HTTP requests in SHRSs. Bouquet averaged a 38% energy reduction for the targeted requests and 15% energy reduction at the application level. Besides energy savings, I also found that Bouquet could reduce the network response time by 30%. In summary, these savings show that our approach can help developers to improve the energy efficiency and performance of their apps 1.3.4 String Analysis The fourth contribution of my dissertation is a new string analysis framework, Violist, for Java and Android applications. String analysis is a fundamental technique for both display energy optimization and HTTP energy optimization techniques. It is used to solve the HTML output for mobile web apps in the display energy optimization technique and to model the URLs in the HTTP energy optimization technique. It is also widely used in other scenarios, such as reflection analysis and malware detection, in software engineering. Compared with existing techniques [7, 110], In Violist, I proposed a novel intermediate representation of string values. The intermediate represen- tation isolates the process of program analysis and string value modeling. By doing this isolation, Violist is able to avoid early stage approximations that may introduce inaccuracy. It also allows users to interpret the intermediate representation based on different use cases. Furthermore, by using the intermediate represen- tation, Violist can reduce a lot of intermediate steps during string value modeling and improve the speed of analysis. As evaluated, Violist is faster and more accurate than widely used state-of-the-art techniques. 6 Chapter 2 Energy Measurement Energy measurement is fundamental for energy optimization. Before developers and researchers can pro- pose efficient energy optimization techniques, they have to understand which part of their code may con- sume more energy. To assist developers, researchers have developed a range of techniques to provide energy consumption information. Several techniques use runtime monitoring to track key operating sys- tem parameters and provide estimates to developers [48, 119, 163]. However, the level of granularity of these techniques is either at the component or method level, which is helpful, but does not provide in- formation at a low enough level of granularity to guide developer changes. For example, method level information cannot help developers distinguish between two paths within a method that have different energy consumption. Developers have often used CPU time as a proxy for energy. However, this is not an accurate approximation because mobile devices scale their voltage dynamically and interact with multiple hardware components that have varied energy consumption patterns, such as GPS, WiFi, and cameras [69]. Techniques for estimating energy consumption (as opposed to measuring) could also be used, but have drawbacks: Cycle-accurate simulators [114] often run several thousand times slower than actual hardware, and program analysis based estimation techniques [69, 130] require carefully fine-tuned software environment profiles. Current research has not been able to provide developers with techniques that can use energy mea- surements to provide source line level energy consumption information. As discussed in more detail in Section 2.1, there are numerous practical and conceptual challenges in providing this information. The most straightforward technique, using a power meter to take direct measurements of an app while it is running, is not practical. Current power meters are unable to measure and record fast enough to isolate individual source lines. Even if this problem could be resolved, such measurements would not be accurate unless they also dealt with issues, such as thread switching and garbage collection, that can make it difficult to determine the implementation structures that should be attributed with the measured energy cost. As I 7 discuss in Section 2.1, these types of events can consume a significant amount of energy that could distort the energy attributed to the application. My work in energy measurement is a new approach that provides developers with source line level energy information. To address the numerous inherent challenges in achieving this result, I have devised an approach that combines hardware-based energy measurements with program analysis and statistical modeling techniques. At a high-level, the basic intuition is as follows: While measuring the energy con- sumption of a smartphone, the approach uses efficient path profiling to identify which parts of the applica- tion are executing and matches these paths with the measured energy for these paths. Then, the approach statically analyzes the paths to identify and adjust for high-energy events, such as thread switching, be- fore applying robust regression analysis to calculate each source line’s energy consumption. Finally, the approach presents developers with a graphical representation of the energy consumption by overlaying the calculated energy with the source code of the application. I also performed an empirical evaluation of my approach to measure its accuracy and the time needed to perform the analysis. My approach was able to accurately calculate energy. For a set of API invoca- tions, I found that the calculated energy values were within 10% of the ground truth measurements and the statistical models matched the measured data very closely with a high R 2 average of 0.93 and a low accumulated error rate. The approach was also able to detect the influence of high-energy events, such as thread switching and garbage collection, with 100% accuracy. The approach was also fast, it could calculate source line level energy measurements for each of my subject applications in less than three min- utes. Overall, the results of the evaluation were positive and indicate that my approach is an effective and practical way to provide developers with source line level energy information. The remainder of this chapter is organized as follows: In Section 2.1 I discuss several of the significant challenges that shaped the design of my approach. I present the approach itself in Section 2.2. The evaluation of the approach is described in Section 2.3. 2.1 Challenges of Energy Measurement In this section I discuss several challenges to measuring and calculating source line energy consumption. I break the challenges into three broad categories, hardware, runtime system, and software level, and explain how they preclude a straightforward solution for calculating source level energy consumption. To illustrate these challenges, I make use of the code shown as Programs 1 and 2, which are excerpts from the open- source Google Authenticator project that handles synchronizing a system clock. In the example, method getNetworkTime (Program 1, line 1) retrieves the time by sending an HTTP request to a timeserver (line 7) and then parsing the response packet (lines 16, 20, 23, 26). Method runBackgroundSync 8 1 p u b l i c long getNetworkTime ( ) throws IOEx cepti on f 2 HttpHead r e q u e s t = new HttpHead (URL) ; 3 Log . i (LOG TAG, "Sending request to " + r e q u e s t . getURI ( ) ) ; 4 HttpResponse h t t p R e s p o n s e ; 5 t r y f 6 h t t p R e s p o n s e = m H t t p C l i e n t . e x e c u t e ( r e q u e s t ) ; 7 g catch ( C l i e n t P r o t o c o l E x c e p t i o n e ) f 8 throw new IOException ( S t r i n g . valueOf ( e ) ) ; 9 g catch ( IOException e ) f 10 throw new IOException ("Failed due " +"to connectivity issues: " + e ) ; 11 g 12 t r y f 13 Header d a t e H e a d e r = h t t p R e s p o n s e . g e t L a s t H e a d e r ("Date" ) ; 14 Log . i (LOG TAG, "Received response with "+"Date header: " + d a t e H e a d e r ) ; 15 i f ( d a t e H e a d e r == n u l l ) f 16 throw new IOException ("No Date header" ) ; 17 g 18 S t r i n g d a t e H e a d e r V a l u e = d a t e H e a d e r . g e t V a l u e ( ) ; 19 t r y f 20 Date networkDate = 21 D a t e U t i l s . p a r s e D a t e ( d a t e H e a d e r V a l u e ) ; 22 return networkDate . getTime ( ) ; 23 g catch ( D a t e P a r s e E x c e p t i o n e ) f 24 throw new IOException ("Invalid Date header format " ) ; 25 g 26 g f i n a l l y f 27 . . . . . . 28 g 29 g Program 1: Google Authenticator Method 1 (Program 2, line 2) calls the getNetworkTime (line 6) and then starts a background thread to handle errors (line 10). 2.1.1 Hardware Limitations The primary hardware obstacle to directly measuring the energy consumption of source lines is the differ- ence in the speed at which instructions execute and hardware devices can perform energy measurements. On modern processors, individual instructions will execute at a rate of several million per second. At best, power meters can sample electrical power draw at several tens of KHz [79], which means that each sample will include the power consumption of hundreds, perhaps thousands, of instructions. For example, it is possible for Program 2 to execute completely in the time that transpires between two consecutive power samples by a relatively fast power meter. There are reasons to believe that this order of magnitude dispar- ity will likely persist, since the bottleneck in high-frequency power sampling is the storage system, which cannot save power samples at the same frequency as the power meter can generate them [76, 77, 109], and there is no evidence to show that this gap will be closed in the near future. Therefore, a challenge for my approach is to reconcile power measurements samples and instruction execution, even though there is a several orders of magnitude difference in their frequency. 9 1 p r i v a t e void 2 runBackgroundSync ( E x e c u t o r c a l l b a c k E x e c u t o r )f 3 long n e t w o r k T i m e M i l l i s ; 4 t r y f 5 n e t w o r k T i m e M i l l i s = mNetworkTimeProvider . getNetworkTime ( ) ; 6 g catch ( IOException e ) f 7 Log .w(LOG TAG, "Failed to obtain network " +"time due to connectivity issues" ) ; 8 c a l l b a c k E x e c u t o r . e x e c u t e ( new Runnable ( ) f 9 @Override 10 p u b l i c void run ( ) f 11 f i n i s h ( R e s u l t . ERROR CONNECTIVITY ISSUE ) ; 12 g 13 g) ; 14 return ; 15 g Program 2: Google Authenticator Method 2 2.1.2 Runtime System Challenges The runtime system of an Android smartphone includes both the Android operating system and the Dalvik Virtual Machine, the platform on which marketplace apps are run. The runtime system implements several types of behaviors that affect energy consumption of an app, thread switching, garbage collection, and tail energy. However, the details of the duration, frequency, and timing of these events is, by design, hidden from the app. This makes it difficult to correctly attribute energy at the source level. Although it would be straightforward to modify the runtime systems to track these events, this would introduce considerable overhead and reduce the portability of the approach, as it would be necessary to provide custom runtime systems for each smartphone platform. Therefore, one challenge for my approach is to account for these events with information that is available at the app layer. Thread switching causes several problems for calculating source line level energy. The first problem is that any energy measurements for a given time period may also include energy for threads that are not related to the application (i.e.,, operating system processes). For example, at any point in the execution of the thread in runBackgroundSync or getNetworkTime, the OS could swap in a thread from another app or the OS. Any power samples during this time would likely include instructions from both the original and swapped-in thread. The second problem is that thread switching itself introduces an energy overhead. For example, Line 10 of Program 2 will incur a significant energy overhead for the context switch that occurs when a new thread is started. Periodically, the Android operating system will perform garbage collection during the execution of an application. Typically, applications do not have control over garbage collection, it is managed by the operating system and can occur anytime and within any method. Although the Dalvik virtual machine logs garbage collection, it timestamps the events via a millisecond based clock, which does not provide enough precision to determine in which path they occur (paths are tracked with nanosecond timestamps.) As noted in Section 2.2.2.2, garbage collection incurs a significant energy cost and has the potential to 10 significantly distort energy measurements for an application. For example, consider if garbage collection were to occur at line 20 of Program 1, which is compiled as a simple jump on equality (ifeq), or line 6 of Program 2, which is a relatively expensive method invocation. The extremely high cost of the garbage collection would dwarf the cost of both instructions making them appear to have a nearly identical high energy cost. Therefore, an approach must be able to identify and properly account for garbage collection while calculating the source line level energy consumption of an application. Smartphones exhibit tail energy usage, where certain hardware components are optimistically kept active by the operating system, even during idle periods, to enable subsequent invocations to amortize startup energy costs. Tail energy manifests itself in two different ways. The first way is when an invocation to an API accesses a hardware component and completes its invocation. Even when no other invocations access the component, the component will remain active for a time period T tail and consume energy E tail . An example of this kind of invocation is at line 7 of Program 1. The call to the network will cause the radio to remain on for a period of time, even after the request is finished, and consume energy during this time. The second way that tail energy manifest itself is when two invocations access the component in sequence and the time interval between them is less than T tail . Here, only a portion of E tail will be consumed. The E tail for certain devices can be quite high and a naive approach to measurement might attribute the tail energy cost to subsequently executed instructions. Referring back to the example invocation, a naive approach could mistakenly attribute part of the invocation’s tail energy to its successors, lines 16, 18, and 20, (depending on the length of its T tail ) instead of to line 7. To be accurate, an approach must recognize when tail energy occurs and attribute it to the source invocations that started the component and kept it at a non-idle state. 2.1.3 Software-level Challenges Source level calculations require precise information about an app’s execution. In particular, for each time period represented by a power sample, it is necessary to know which instructions were executed, their frequency, and their ordering — essentially, path information. The primary challenge is that obtaining this type of information at the software level is generally expensive and intrusive. For example, insert- ing instrumentation into an app to isolate and measure individual instructions can lead to high overhead. Even optimizations, such as only instrumenting each basic block, would still produce a high amount of instrumentation. Other approaches, such as instruction counting, do not allow developers to know which instances of the instructions were executed. It is also necessary to have precise information about API invocations. During execution, smartphone applications generally invoke library functions and APIs to access hardware components, such as GPS, network, and WiFi. For example, at line 7 of Program 1,getNetworkTime() sends an HTTP request 11 to Google by invokingHttpClient.execute(). This invocation consumes a relatively large amount of energy due to its use of the network. In contrast, a call to dataheader.getValue at line 24 of Program 1, incurs a relatively small constant cost, since it is simply returning the value of an HTTP header. This wide range of invocation behavior causes problems for techniques that naively apply statistical sampling or averaging-based approaches because, unlike normal instructions, an invocation’s energy cost can vary based on its target and the data provided for its arguments. Furthermore, it is not feasible to precompute the cost of API calls, since that requires sampling every point in each API calls input space to determine its energy consumption. App Instrumenter Path Adjuster Analyzer Annotator Runtime Measurement Phase Offline Analysis Phase Application (AUA) Use cases Energy Report Power Measurement Platform Visualization {<paths>} {<power>} Insufficient data? AUA´ Figure 2.1: Overview of the vLens approach 2.2 Approach of Energy Measurement The goal of my approach is to provide source line level energy information for smartphone applications. An overview of my approach is shown in Figure 2.1. From a high-level, my approach has two phases, Runtime Measurement and Offline Analysis. The inputs to the Runtime Measurement phase are the application under analysis (AUA) and a set of use cases for which the tester wants to obtain energy measurements. The App Instrumenter uses an efficient path profiling technique to guide the insertion of probes into the AUA that will capture timestamps and path traversal information. The tester executes the instrumented AUA on the Power Measurement Platform (PMP). This causes the instrumentation to record path information while the PMP records power samples. The path and power samples are the inputs for the Offline Analysis phase. The Path Adjuster performs a static analysis of the paths and makes modifications to the power samples to account for the high-energy events. Then the Analyzer performs the regression analysis in order to calculate each source line’s energy consumption. If the Analyzer finds there are not enough data points to solve the regression analysis, then the process returns to the Runtime Measurement phase so the developer can further execute the application. If this is not possible, then the Analyzer performs approximations that 12 I discuss in Section 2.2.2.2. Finally, the Annotator uses the calculations to create graphical overlays of the measurements on the source code for display in an integrated development environment. I explain the approach in more detail in the remainder of this section. 2.2.1 Runtime Measurement Phase During the Runtime Measurement phase, information about the paths executed in the AUA and power measurements are generated by the approach. To do this, the App Instrumenter inserts probes into the AUA to record the information, and then the instrumented AUA is executed by the developer while power samples are collected by the PMP. The output of the Runtime Measurement phase is set of timestamped paths executed by the AUA and power measurements. Instrumentation of the AUA: The instrumentation collects information about the paths executed by the developer. Namely, which paths are traversed, their frequency, and timestamps of the path traversals and invocation of certain APIs. To record the path information, the App Instrumenter adapts a techniques for efficient path profiling proposed by Ball and Larus [30]. The approach first builds a control-flow graph (CFG) of each method in the AUA. Then each edge in the CFG is assigned a label so that each unique path in the CFG has a unique path ID. The approach then calculates a maximal spanning tree over the CFG and uses this to guide the minimal placement of instrumentation that will increment a path ID counter. By design, the approach can use a single counter to identify the path traversed in the method and another counter on all back edges of the CFG to track loop traversals. I extended the Ball-Larus approach to handle nested method calls, concurrency, and exceptions. The App Instrumenter also inserts a probe at the method entry to initialize the method’s path counter, record a timestamp of when the path traversal began, and the current thread ID. At each exit point of the method, another probe records the value of the path ID counter, loop traversal counters, and another timestamp. After execution is finished, this information allows the approach to generate a set of path tuples of the formhthread id; path id;enter time;exit timei where thread id identifies the current thread, path id is the traversed path ID, and enter time and exit time are the time stamps that indicate when a path starts and ends. The App Instrumenter also inserts probes to obtain timestamps before and after the invocation of certain APIs. As I explain in Section 2.2.2.1, this information is used to allocate tail energy and isolate the energy cost of certain APIs for which it is not possible to model using linear regression (i.e., methods that have non constant energy consumption.) For each of the invocations, the instrumentation generates an invocation 13 tuple of the formhmethod id;enter time;exit timei, where method id identifies the invoking method, and enter time and exit time are timestamps before invoking the method and after the method has returned. Execution of the Instrumented AUA: To generate the path and invocation tuples, the instrumented AUA is executed on the PMP. The PMP is based on the LEAP node [122]. The LEAP is an x86 platform based on an ATOM N550 processor that runs Android 3.2. Each component in the LEAP (e.g., WiFi, GPS, memory, and CPU) is connected to an analog to digital converter (DAQ) that samples current draw at 10KHz. The LEAP also provides Android applications with the ability to trigger a synchronization signal. This allows the approach to synchronize the samples with the paths’ timestamps and avoid inaccuracy due to clock skew. Each of the uses cases is executed by the developer on the AUA while it is running on the LEAP. All of the measurements are recorded in hardware external to the Android smartphone components, so the measurement process does not introduce any interference or execution overhead. 2.2.2 Offline Analysis Phase In the Offline Analysis phase, my approach analyzes the tuples and power samples generated in the Run- time Measurement phase to produce a mapping of energy to source lines. There are three parts to this phase. In the first part, the Path Adjuster statically examines each traversed path in the CFG and adjusts the corresponding energy measurements to account for special API invocations, tail energy, and interleav- ing threads. The adjusted energy measurements and paths are the inputs to the second part, the Analyzer, which uses robust regression techniques to calculate the cost of each source line and identify paths along which garbage collection and thread context switches occurred. The Analyzer also determines if the test- ing process created enough data points to perform the linear regression and either reduces the grouping of variables to be solved in the regression or directs the tester to repeat the test cases to provide more data points. Finally, in the third part, the Annotator creates a graphic representation of the energy measurements and overlays this with the source code. Note that in the rest of the paper, I describe my approach in terms of bytecode instructions, but it is straightforward to convert the bytecode-level information to source-level using compiler provided debugging information. 2.2.2.1 Adjustment to Path Energy Samples Before beginning the analysis, the Path Adjuster first reconstructs the paths traversed during execution of the AUA. The instructions executed in a path can be identified using the path ID and the CFG of the method containing the path, as described by Ball and Larus [30]. Once the paths have been reconstructed, the Path Adjuster calculates the energy total for the path by summing the measurements reported during the path’s time of execution. The Path Adjuster can identify the corresponding path and power samples 14 due to the synchronized timestamps. Then, the Path Adjuster performs a static analysis of each path in order to adjust certain API invocations due to non-constant and too-short API invocations, tail energy, and thread interleaving. The Path Adjuster generates a set of paths with the adjusted invocations removed from the paths and their corresponding energy removed from the power measurements. API Invocations: Certain API invocations have a non-constant energy cost associated with their ex- ecution. Therefore, it is not possible to calculate their energy cost using the robust linear regression techniques described in Section 2.2.2.2. To address this problem, my approach uses the invocation tuples to identify the time periods when these invocations are executing and calculates the invocations’ energy cost by summing the power measurements taken during that time. Figure 2.2: Using longer execution windows for calculating the energy of invocations with a short execution time. In most cases, the execution time of an invocation is long enough that the approach is able to get accurate energy measurements (i.e., the execution time is longer than several sampling periods.) However, in some cases the execution time of the invocation is too brief (e.g., the execution time is shorter than a sampling period.) For these invocations, the approach identifies an execution time period that includes the too short execution. Then it calculates the ratio of the original execution time versus the larger execution time and multiples that against the energy total for the larger execution time. To illustrate, consider the example shown in Figure 2.2. In this figure, the horizontal bars indicate the energy sampling interval. For example, the LEAP will sample the power at t3 and then again at t4. If an invocation I executes from time t1 until time t2, then there are no power samples to be summed in order to find the energy consumed by I (E I ). When this occurs, the Path Adjuster finds the next largest execution window for which it has sufficient energy samples and uses this window to calculate E I . In the case of the example, the Path 15 Adjuster calculates E I as shown in Equation 2.1, where E a;b denotes the energy consumption measured by the PMP during interval[a;b]. t2t1 t4t3 E t3;t4 (2.1) Note that this only approximates E I . I have found that for functions whose execution time is so short, this is a reasonable approximation. These functions consume a very small portion of the overall energy expended at runtime, on average about 6% of the total API energy cost. For the general case, where the execution time is of sufficient length, my evaluation shows that I am able to accurately measure most functions to within 9% of their measured ground truth cost. Figure 2.3: API invocations with tail energy Tail Energy: As explained in Section 2.1, tail energy occurs when the operating system keeps certain hardware components active, even during idle periods, to enable subsequent invocations to amortize startup energy costs. The result of this behavior is that the energy measurements for the time period following the component access will be higher. My approach adjusts the path energy total so that the tail energy is attributed to the invocations that interact with the hardware component. I assume the availability of tail en- ergy models, which are generally provided by either component manufacturers or power researchers [121]. The model specifies the energy consumption of the component after an invocation (E tail ) and for how long the device driver maintains this state (T tail ). To calculate the adjustment, the Path Adjuster examines the reconstructed paths to identify the sequence A D of invocations to methods that cause tail energy for each device D of the smartphone. For each such invocation a i 2 A D , the adjuster compares the timestamp of a i against the timestamp of a i+1 . If the difference is greater than T tail then all of E tail is attributed to a i . If the difference is less than T tail then only a fraction of E tail is expended before a i+1 occurs and should be 16 attributed to a i . The fraction is calculated as shown in Equation 2.2, where T S returns the starting times- tamp of an invocation and T E returns the ending timestamp of an invocation. Note that the invocation timestamps are known due to the invocation tuples collected during the Runtime Measurement phase. T S (a i+1 ) T E (a i ) T tail E tail (2.2) To illustrate these two scenarios, consider the two invocations shown in Figure 2.3. Both of these, API 1 and API 2, access the same device in sequence. Their tail energy consumptions are shown as curved lines extending after the end of the invocations. To calculate the tail energy associated with API 1, note that t2, the start of the invocation to API 2, occurs before the T tail time has transpired. Therefore the tail energy assigned to API 1 is calculated as t2t1 t3t1 E tail . When the invocation to API 2 returns at t4, there is no other API accessing the same external device, so the approach assigns all of E tail , to API 2. Figure 2.4: Concurrent threads during an API invocation. Thread Context-Switching During Invocations: A potential problem that arises with the way the approach attributes energy to API invocations is that it is possible for the thread containing the invocation to be switched out for another thread. For example, this can happen because the invocation must wait for a shared resource or another thread has higher priority. The problem is that energy consumption by the other threads will then also be measured in the corresponding power samples. As I discussed earlier, this situation can be detected by modifying the operating system to accurately track thread scheduling. However, this would make the approach less portable, so I have devised a software-level technique to address this problem. The Path Adjuster determines the number of threads that were executing while the invocation was executing. This is done by examining the starting and ending timestamps of each path tuple to see which 17 ones were active between the invocation’s timestamps. Then, the Path Adjuster evenly allocates the energy among the concurrent threads, assigning each the energy of 1 N , where N is the number of concurrent threads. To illustrate, consider the example shown in Figure 2.4. T 1 is the original thread and contains an API invocation at time t1. Threads T 2 and T 3 run while T 1 is performing the invocation. T 1 and T 2 are concurrent in time interval[t2;t3] and all three are concurrent in[t3;t4]. Therefore, the energy of T 1 will be E t1;t2 + 1 2 E t2;t3 + 1 3 E t3;t4 . This is similar to the way prior approaches have handled concurrent thread energy [121]; however, I extend it as described in Section 2.2.2.2, 2.2.2.2 Calculating Source Line Energy Values The second part of the Offline Analysis phase calculates the energy consumption of each source line. The input to the Analyzer is the set of adjusted paths and energy samples produced by the the Path Adjuster. At this point, adjustments for all of the API invocations have been made, but the paths could still be influenced by the occurrence of garbage collection and thread switching during times when there is no API invocation. As I discussed in Section 2.1, the energy costs associated with events, such as garbage collection and thread switching, can skew the regression analysis, but it is difficult to identify when they occur. My insight is that characteristics of these events allow us to define them as statistical outliers and therefore the Analyzer can employ Robust Linear Regression techniques to calculate each source lines’ energy consumption and mitigate the influence of these high-energy events. In the rest of this section, I first explain the regression technique and then discuss the insight that allows us to define garbage collection and thread switching as statistical outliers. Robust Linear Regression Analysis: The Analyzer uses linear regression analysis to calculate each instruction’s energy consumption. I expect linear regression to work well in this situation because prior work has found that the cost of bytecodes will be constant given a particular hardware environment [69, 129] and the Path Adjuster has removed the non-linear cost invocations. To perform the analysis, the Analyzer sets up the equations E = mX, where E is the adjusted power measurements, X represents the path traversals with each row representing a frequency vector of bytecodes present in the measured path. (Note that the bytecodes associated with the inserted instrumentation are included in this matrix. Since the approach knows which paths were instrumented, the associated results are simply removed before visualizing or reporting the final source line level calculations.) Then the Analyzer solves for the coefficients m to determine the energy consumed by the instructions in the path segment. To solve the equations, the Analyzer employs Robust Linear Regression (RLR) analysis. In particular, the Analyzer uses RLR based on M-estimation [74], which does iterative regression analysis. It begins by solving a normal linear regression on the set of data points. Then at each iteration, it calculates 18 the residuals and gives each data point a weight based on the standard deviations of the residuals. The regression analysis is repeated on the weighted data to generate a new model and the process repeats until the standard deviation of residuals does not change between iterations. For the power samples, RLR is preferable over the well-known ordinary least squares approach because it is more robust in the presence of outliers. In this case, my data sample has outliers, which are the paths whose energy measurements are influenced by garbage collection and thread switching. More specifically, at each iteration, given the linear function~ y= X ~ q+~ u, the Analyzer solves Equa- tion 2.3 and updates the standard deviation of residuals. For the weighting function (y), I use the well- known Tukey’s Bisquare function [145], which is shown in Equation 2.4. Thes is the standard deviation of residuals in the last iteration. The value of k is constant and is set according to different use cases. In my experiments, I found that the average energy cost of garbage collection and thread switching is about 10 to 70 times the average standard deviation of the residuals in the first iteration; therefore, I select the median 40 of this range as k. å i y(y i å k x ik q k )x i j = 0 (2.3) y k (x)= 8 > < > : x(ks x 2 ) 2 ks < x< ks 0 otherwise (2.4) The solution represents the energy cost of each instruction in the path. Taken together with the mea- sured energy cost of the invocations, the Analyzer now has the energy cost for the entire path. The values for all of the paths are provided as input to the Annotator. Two special cases are discussed below. The first special case is when it is not possible to solve for a path’s instruction energy (i.e., m). This happens when the number of unique bytecodes in a path is higher than the number of independent data points. Solving linear system of equations requires that there be at least as many independent data points as unknown variables. In this situation, the Analyzer recognizes that there are not enough data points and can take two actions, which are repeated until the equation is solvable. The first is that the tester is notified that the application should be executed more to generate additional data points. The additional executions do not need to exactly reproduce the initial executions, but should represent similar use cases to ensure a significant amount of path overlap. Since this is not always possible, the second possible action for the Analyzer is to group counts of similar bytecodes. For example, all variations of theiconst instruction. This gives fewer unknown variables for the system of equations. Note that in my experience, even moderate size marketplace apps were sufficiently complex that neither of these actions were required in my evaluation. 19 The second special case are paths identified as outliers, which contain thread switching or garbage collection. The detection of these outliers is discussed more below. Currently, the Analyzer excludes these paths, which comprise about 1% of the total path count. Although it is desirable to include these paths, since excessive high-energy events could be a symptom of energy inefficient coding, there are two obstacles for which there is not a good solution. First, the path must be adjusted to remove the energy cost of the high-energy event. However, because the measured energy of these events is so high compared to the path energy, it is not clear how to accurately estimate and separate the event energy versus the path energy. Second, although the energy associated with these events is significant and of interest to developers, its not clear which instructions should be attributed with the cost. Currently, this number is tracked and reported as a separate total. High-energy Events as Statistical Outliers: The energy overhead of switching to external threads and garbage collection is quite high. My experiments show that these events range from 20,000 to 150,000 times the cost of a normal instruction. Yet they occur rarely during execution. My insight is that the high energy cost and relative infrequency of these high energy events allows them to be detected as statistical outliers. In small scale experiments, I found that the energy cost of a path ranges from 0.007mJ to 0.631mJ; garbage collection from 20mJ to 81mJ; and thread switching overhead from 6.44mJ to 11.3mJ. This means that the energy cost of paths with garbage collection and thread switching will be significantly larger than normal paths: 10 to over 10,000 times larger. This difference in the range of values allows us to identify the events by defining them as outliers based on their energy consumption (i.e., by setting an appropriate value of k in Tukey’s Bisquare function.) Note that my approach detects outliers for a specific path and does not simply cluster all of the energy measurements. A per-path outlier detection is necessary since it is possible for a path to be long enough that it consumes more energy than garbage collection or thread switching. In the evaluation, I validate this approach to detecting garbage collection and thread switching by showing that my approach is able to detect all known occurrences of these types of events. 2.2.2.3 Visualizing the Energy Consumption The Annotator presents a graphical representation of the source line level information. Developers can use this visualization to more readily understand the distribution of energy consumption across the different parts of their application. The Annotator is an Eclipse plugin that overlays power information onto an application’s source code. A screenshot is shown in Figure 2.5. The visualization uses a SeeSoft [53] like graphical representation where different colors indicate the amount of energy consumed by source lines. The color for each source code line is obtained by ranking each source line according to the sum of its associated bytecode energy 20 Figure 2.5: Visualization of the energy measurements. costs. The ranking is then mapped to a color within the spectrum. In my example, blue shows low energy consumption and red indicates a high level of energy consumption. In between values are different levels of red and blue (purple.) 2.3 Evaluation of Energy Measurement I evaluated two aspects of my approach, analysis time and accuracy. For the evaluation, I implemented my approach as a prototype tool, vLens, and designed several experiments to evaluate these two aspects. I considered two broad research questions: RQ1: What amount of analysis time is incurred by vLens? RQ2: How accurately does vLens calculate the energy consumption of the application? 2.3.1 Subject Applications In the evaluation, I used a set of five application from the Google Play Market. Table 2.1 shows the number of classes (C), methods (M), and bytecodes (BC) for each application. All of the applications are written for the Dalvik Virtual Machine, do not use any native libraries, and can be translated to and from Java Virtual Machine bytecodes by the dex2jar tool. These applications represent real-world 21 Table 2.1: Subject apps of energy measurement Application Information App C M BC Description BBC Reader 590 4,923 293,910 RSS reader for BBC news Bubble Blaster II 932 6,060 398,437 Game to blast bubbles Classic Alchemy 751 4,434 467,099 Educational game Skyfire 684 3,976 274,196 Web-browser Textgram 632 5,315 244,940 Text editor marketplace applications and implement a diverse range of functionality. For all of the experiments, I ran the applications using canonical usage scenarios for each application. For example, I played a game several times (Bubble Blaster II and Classic Alchemy), created and edited a text document (Textgram), read a news article via the reader (BBC Reader), and opened a web page (Skyfire). 2.3.2 Implementation The vLens prototype is written in Java and works for Android applications written to run on the Dalvik Virtual Machine. I chose to implement for Android because its open nature made it easier to understand the OS inner workings. However, my approach is applicable for other platforms, such as Windows Phone and iOS, since it relies on energy measurements provided by external hardware, statistical techniques, and instrumentation, which is available on many platforms. There are four main modules in the implementation: the PMP, App Instrumenter, Analyzer, and An- notator. For the PMP I utilized the LEAP power measurement device [122] described in Section 2.2.1. The App Instrumenter uses BCEL [28] to build intra-procedural control flow graphs for the efficient path profiling and insert the required instrumentation. I usedex2jar [3] to convert Dalvik bytecodes to Java bytecodes; then after instrumentation, I compile the classes back to Dalvik with thedx tool provided by the Android SDK. The Analyzer uses R [8] for the robust linear regression functions and Java code to perform the path adjustments. Finally, the Annotator is based on an Eclipse visualization plugin I built for prior work in energy estimation [69]. 2.3.3 RQ1: Analysis Overhead For the first research question, I consider three aspects of the approach’s analysis time. These are: (1) time to instrument each application (T I ), (2) time to perform the offline analysis (T A ), and (3) runtime overhead introduced by the instrumentation (T R ). Experiments to measure the first two were performed on 22 Table 2.2: Time and accuracy of vLens Timing Measurements Accuracy App T I (s) T A (s) T R (%) R 2 AEE (%) BBC Reader 353 158 0.51 0.94 6.5 Bubble Blaster II 460 145 3.24 0.90 8.6 Classic Alchemy 873 128 8.77 0.93 3.4 Skyfire 277 97 1.12 0.99 4.8 Textgram 298 63 6.33 0.92 6.3 a desktop platform containing an Intel i3@2.1Ghz with 2GB RAM and running Ubuntu 12.04. Overhead was measured on the LEAP platform. To determine T I , I measured the time to instrument each application with vLens. After running the application and collecting the path and invocation tuples, I determined T A by measuring the time to analyze the tuples, perform path adjustments, and calculate each source line’s energy using the regression analysis. The results of these measurements are shown, in seconds, in Table 2.2. The time to instrument ranged from five to twelve minutes, and the time to analyze ranged from one to just under three minutes. Most of the instrumentation time was due to the computational cost of building and analyzing the control flow graphs for each of the methods. The majority of the offline analysis cost was due to the IO overhead of reading all of the path tuples into memory so they could be converted to the path matrices. Further optimizations, such as caching path information during the offline analysis, could reduce this cost further. However, since vLens is intended to be an experimental prototype, I did not implement these improvements. To determine the runtime overhead of the instrumentation (T R ) I could not take the straightforward approach of comparing an instrumented version of each application against the uninstrumented version. The reason for this is that a significant amount of application time is actually spent idle, waiting for user input or data. The normal user variation in entering this data masks the instrumentation overhead variation. Therefore I calculated the non-idle execution time of the instrumented application and then determined the percentage of that time that was caused by the instrumentation. A key insight to measuring non-idle time is that the Android operating system is event driven, so it idles waiting for user input after a method exits, and then when it receives input, comes out of the idle state and executes the event handler method. Therefore, I could calculate the non-idle time of the application by summing up all of the time during which an application path was being traversed. For example if path p 1 executed from time 1 to 3 and path p 2 executed from time 2 to 4, then I can calculate the non-idle time as 3 units. Note that this properly counts a time unit once regardless of how many threads are executing. Next, I profiled the instrumentation code that was inserted into the application and determined its execution time T Inst . By examining the path 23 tuples, I also know the execution frequency n of the instrumentation. The resulting calculation for T R is shown in Equation 2.5. Here the denominator is the special time summation described above. T R = n T Inst [T i (2.5) The results of this calculation are shown in Table 2.2 as T R . The overhead costs range from 0.51% to 8.77% with an average just under 4%. Overall, this is a low runtime overhead, representing about 0.15 seconds. Anecdotally, this amount of overhead did not cause a noticeable delay to the testers during execution. 2.3.4 RQ2: Measurement Accuracy For the second research question, I considered the accuracy of the measurements calculated by vLens. The primary challenge in this evaluation is that there is no source line ground truth against which I can compare for accuracy. As discussed in Section 2.1, power samplers cannot measure at a frequency high enough to capture individual source lines. Therefore, I showed accuracy of the approach in several other ways. First I examined the accuracy of the energy attributed to the APIs. Second, I used statistical tests to show that the regression analysis results could describe the overall energy consumption relationship accurately and that each path also accurately accounted for its energy. Lastly, I showed that my technique for identifying high-energy events via statistical outlier analysis correctly identified paths that contain these events. 0.00 0.50 1.00 1.50 2.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Normalized Energy vLens Profiling Figure 2.6: Comparison of API energy cost 24 2.3.4.1 Accuracy of the API Energy Measurements To measure the accuracy of the API energy measurements, I compared their measured cost against a profiled cost. To obtain the measured cost, I ran the apps using vLens and extracted the cost of each invocation that was executed during the test runs. Altogether, there were invocations to over 3,722 unique APIs. From this group I focused on a group of twenty-four APIs whose invocations together comprised more than 70% of the total invocation related energy consumed by the five applications. For these twenty- four, I recorded the arguments and execution context of the invocations and then profiled their energy cost. The profiling was performed on the LEAP platform by executing each invocation 100 times and then measuring the energy consumed during the execution. I repeated this profiling five times and calculated the mean and standard deviation of the experiment. The profiled cost was then compared against the cost calculated by vLens. The result of this comparison are shown in Figure 2.6. Each of the APIs are listed along the X-axis and the two different costs are shown on the Y-axis, the values on the Y-axis are normalized to the cost of profiled cost. One standard deviation off of the profiling mean is shown with the additional horizontal lines. Figure 2.6 shows that for nineteen of the twenty-four APIs, the vLens measured cost was within an average of 9% of the profiled costs and the gap was within one standard deviation in almost all cases. For the remaining five, the measured cost was off significantly. I investigated these APIs to determine what caused this high error rate. The first four APIs are synchronized, which means that they invoke monitor enter andmonitor exit. This made the execution time and energy consumption of their invocation vary widely as the acquisition and release of the synchronization lock was non-deterministic. The fifth API accessed an HTTP response code, and I found that the code was cached after the first call to the method, which meant significantly less computations had to be performed in subsequent invocations. Because the profiling results for these five were skewed by these types of behaviors, I consider the nineteen to be a more accurate reflection of the accuracy achieved by the vLens calculations. 2.3.4.2 Accuracy of Bytecode Energy Distribution I evaluated the accuracy of the bytecode regression model in two ways. First, I determined the accuracy of the regression model at an application level by comparing the amount of application energy calculated using the regression model versus the amount of application energy actually measured during the experi- ments. Second, I looked at the accuracy of the regression model at a path level by determining the multiple correlation coefficient of the calculated paths values. The results of these experiments are reported in Ta- ble 2.2. Note that I could not perform this experiment at the bytecode level because I did not have a way to establish the bytecodes’ ground truth measurements. 25 To determine the accuracy of the regression model at the application level, I calculated the Accumu- lated Estimating Error (AEE). This value represents the normalized difference between the amount of energy that the regression model would calculate for the application versus the amount of actually mea- sured energy. Intuitively, the AEE can be thought of as the amount of total energy not accurately accounted for by the regression model and a lower ratio is a stronger result. The AEE is shown in Equation 2.6. The ˆ y i represents the energy calculated for the ith path based on the regression model and y i is the measured energy of the ith path produced by the Path Adjuster (i.e., with the invocation energy removed.) AEE = jS ˆ y i Sy i j Sy i (2.6) To determine the accuracy of the regression model at a path level, I calculated the multiple correla- tion coefficient (R 2 ). The R 2 is a well-known statistical measure that shows how well a variable can be predicted based on a linear function of multiple variables. In my case the value to be predicted is the energy of the ith path ( ˆ y i ) calculated with the regression model, and the multiple variables are the individ- ual bytecodes solved for during the regression. The value of R 2 is obtained by calculating the solution to Equation 2.7. As with the AEE in Equation 2.6, y i is the adjusted measured value of the ith path, ˆ y i is the path’s value calculated using the regression model, and ¯ y is the mean of the measured paths. The R 2 value is measured on the interval[0;1] with a value close to 1 representing a strong fit of the data to the model and 0 representing a weak fit. R 2 = 1 S(y i ˆ y i ) 2 S(y i ¯ y) 2 (2.7) For the AEE, the numbers range from 3.4% to 8.6% with an average of just under 6%. This means that, on average, 6% of the actual measured energy is not accounted for by the model. In general, since linear regression is an approximation, a certain amount of variance in the energy totals is to be expected. In this context, I believe that 6% represents a low enough number that the overall results of the analysis would still be informative and help to guide the developer in making changes to the code. The R 2 values range from 0.9 to 0.99 with an average of 0.936. The high average R 2 value shows that my model is able to fit the measured data very closely. Overall, these numbers show that my regression model exhibits high accuracy with respect to its calculations at both the application and path level. 2.3.4.3 Accuracy of Outlier Detection I evaluated the accuracy of the RLR techniques to detect outliers caused by garbage collection and thread switching events. To do this, I seeded a variable number (20, 50, 100, and 200) of artificial events along paths of the subject applications by inserting calls to garbage collection (System.gc()) and thread 26 switching instructions (Thread.start() andThread.join()). The target thread contained only a single instruction so that its actual execution time would be low and the most prominent cost would be the context-switch overhead. I then applied the RLR techniques and checked to see if the paths containing the seeded events were determined to be outliers. For each of the applications, the RLR techniques were able to detect all of the seeded high-energy events. From this result, I conclude that the use of statistical outlier detection to identify paths affected by garbage collection and thread switching overhead is both an effective and practical approach. 2.4 Threats to Validity In this section I discuss the threats to the validity of my empirical evaluation and explain how I addressed these issues. External Validity: The subjects in this study are real world marketplace apps downloaded from the Google Play Market. They represent different app domains and, in terms of size, are representative of many apps in the marketplace. Internal Validity: In general, to help ensure internal validity, the accuracy and timing experiments were repeated multiple times and averaged. To ensure that the outlier detection was a result of my tech- nique, I tracked the seeding of the garbage collection and thread switching, so I could determine easily in which paths they occurred. Furthermore, I compared actual and seeded garbage collection events, to verify that they had similar energy characteristics. For threads, I used a minimal sized thread (one instruction) to ensure that the thread’s energy was not inflated and therefore made easier to detect. Construct Validity: For accuracy, I were unable to use the most straightforward comparison, byte- code ground truth, due to the difficulty of measuring the ground truth for such short events. Instead, I used invocation ground truth and well-known statistical tests to show the degree to which my models fit the measured data. Overall, it seems likely that if the calculated values for invocation were close to ground truth and the R 2 values were consistently high across all apps, then the calculated source line level en- ergy would also be accurate. Nonetheless, this is a validity threat that my experimental design could not completely address. 27 Chapter 3 Empirical Study Understanding how energy is consumed in mobile apps is important for energy optimization. vLens en- ables me to have a very fine grained measurement study on mobile apps. However, it does not provide insights about energy consumption of market apps directly. To better understand how energy is consumed in market apps, I performed an extensive study of the energy related behavior of smartphone applications. In this study I measured and analyzed the energy consumption of over 400 real-world marketplace applica- tions downloaded from the Google Play market. In this study, I leveraged vLens to provide measurements at different levels of granularity, from the whole application level to the source line and instruction level. To the best of my knowledge, my work was the first to combine such a large sample size with detailed source line level energy information. My study revealed several interesting observations that provide actionable guidance for software engi- neers and motivate areas for future research work in the software engineering community. First, I found that most applications spend more than 60% of their energy in an idle state, which is the state where no code of the current application is running, so only optimizing the code of applications is not sufficient to reduce the overall app energy consumption. Second, I found that the network is the most energy consuming component in Android applications. In particular, making an HTTP request is the most energy consuming operation of the network. Third, I found that the energy consumption of applications is dominated by that of system APIs. Thus, developers should focus on the optimization of the use of system APIs. Fourth, I found that, despite the large number of APIs used in applications, only a few are significant in terms of energy consumption. Hence, developers could narrow the range of APIs that need to be optimized to a small set. Fifth, I found that instructions in loops typically consume over 40% of the total non-idle energy. It indicates that developers should be more careful while creating loops. Lastly, I found that data manipula- tion operations are the most energy consuming among all types of bytecodes. Taken together, these results provide useful information about the energy consumption of Android applications to developers and may help them design better schemes for energy optimization. 28 3.1 Research Questions of the Empirical Study I focus on two aspects of the energy consumption in Android applications. The first aspect is how energy is consumed. This aspect includes several research questions, such as what kind of applications are the most energy consuming, how much energy is actually consumed by the applications, which are the most energy consuming components, etc. I answer these research questions in a top-down order. I first answer the research questions on the application level, and then move to the component level, and finally move to the source line level for APIs and instructions. The structure of the research questions of the first aspect is shown in Figure 3.1 the The second aspect is how to measure the energy consumption of applications. Measurement is the basis for energy optimization. Inaccurate measurements can undermine the validity of results. Thus, I analyze the impact of three typical practices in energy studies on the accuracy of energy consumption results. Figure 3.1: The structure of research questions of the first aspect For the first aspect, the most high-level research question on the application level is RQ 1: How much energy is consumed by individual applications? This research question gives a high level picture of how energy is consumed by different applications and what kinds of applications are the most energy consuming. This information provides developers with a general overview of the energy consumption of a broad sample of applications. Next, I break down the energy consumption of each application. The second research question is RQ 2: How much energy is consumed by the idle state of an application? Android applications are event based, multithreaded, and sensor intensive. Applications may be suspended at times to wait for user input, sensor data, or thread synchronization. However, the system will still consume energy during this idle 29 time. The amount of energy consumed in an idle versus non-idle state is important as most code-oriented optimization will not affect energy consumed when an app is idle. The last research question on the application level is RQ 3: Which code consumes more energy: system APIs or developer-written code? In this research question, I break down the non-idle energy even further. Android applications are composed of system APIs and developer created code. So, the ratio of the energy spent in APIs or developer created code will influence the energy optimization strategy. If the system APIs dominate the non-idle energy, the energy optimization scheme should focus on the usage of system APIs. Otherwise, developers should focus on the optimization of their code. On the component level, my research question is RQ 4: How much energy is consumed by the different components of a smartphone? Current Android smartphones have multiple software and hard- ware components that provide different functionalities. These components include UI, network, IO, Sqlite, location, camera, media, and sensors. The applications can use a subset of these software and hardware components to accomplish their tasks. By knowing how energy is distributed in different components, developers can be better informed in designing their apps and optimizing energy usage. On the source line level, the first question I answer is RQ 5: Which APIs are significant in terms of energy consumption? Some of the APIs may consume a lot of energy and dominate the total energy consumption, while other APIs may have trivial energy consumption. To address this question, I answered three subquestions. The first one is How many APIs are significant in energy consumption?. This subques- tion provides information about the quantity of energy significant APIs. The second one is How similar are the top ten most energy consuming APIs across different applications?. This subquestion asks if there are common patterns in the energy significant APIs across different applications. If so, developers can design specific optimization schemes for these patterns. The third one is Which APIs are the most likely to be in the top ten most energy consuming APIs?. This subquestion asks which APIs are likely to be energy significant. Based on this study, developers are better informed when they consider which API usages should be optimized. Besides the energy distribution in APIs, the second research question at the source line level is RQ 6: How much energy is consumed by code in loops? Loops are common code structures to do repeating tasks. It is useful to know whether they also play an important role in energy consumption. Specifically, I want to know whether there are often energy consuming APIs in loops. If so, these loops may be good targets for optimization. The third research question at the source line level is RQ 7: How much energy is consumed by the different types of bytecodes? Besides invocations of system APIs, the developer-written code is another major part of an Android application. However, it is not clear what aspects of this code are the 30 most energy consuming. With better knowledge of the energy consumption of the different underlying instructions, developers could better plan their design and create more energy efficient applications. In the second aspect, I studied three typical practices in energy measurement studies. These three methods are commonly used but have limitations that could introduce inaccuracy into the measurement. I evaluate how much inaccuracy they can introduce and help developers decide what method of measurement they need. My first research question on energy measurement is RQ 8: Is time equal to energy? Traditionally, developers use the execution time of methods to optimize the performance of applications. However, it may not be sufficient for energy optimizations and may be misleading. Knowing how accurately time can approximate energy would allow developers to determine what kind of information they need to use to carry out the most effective optimizations. The second research question on energy measurement is RQ 9: What granularity of measurement is sufficient? Today, developers have many energy measurement tools to help diagnose energy issues. However, many of them are only on the millisecond level. The processors of modern smartphones are generally in the GHz range, which means millions of operations can be executed in one millisecond. Simply using the millisecond level measurement may miss a lot of details and introduce inaccuracy into the energy diagnosis. The information about how much inaccuracy millisecond level measurements could cause can help developers make the appropriate decision for energy measurement and optimization. My last research question on energy measurement is RQ 10: Is it necessary to account for idle state energy? In many recent measurement studies, developers do not consider the energy consumption during idle states, where no code of the current application is running. This method can inaccurately inflate the results of measurements because it includes energy that is not consumed by the code of an application. If the idle state of applications consumes a significant amount of energy, the measurement results could be misleading. Knowing whether the idle state energy is a problem can also help developers measure and optimize the energy of their applications correctly. 3.2 General Experiment Protocol of the Empirical Study In this section, I describe the protocol for obtaining the data used to address the research questions posed in Section 3.1. These questions require two key capabilities: obtain source line level energy information of mobile applications and automate the interaction with the subject applications. To obtain source line level information, I leveraged the capabilities provided in my prior work, vLens in Chapter 2. In this study, I modified the power measurement platform (PMP) of vLens. The PMP was originally based on an x86 based Android platform [105]. In order to run non-x86 applications, I built 31 a new PMP based on the Monsoon power meter [17]. The Monsoon power meter samples the energy consumption of the smartphone at a frequency of 5KHz and synchronizes every energy sample with the standard Unix time. In order to get the nanosecond level energy information, I instrumented with both millisecond and nanosecond level timestamps. I used the millisecond level timetamps, which are synchronized to the Unix time, to match executed code blocks to their corresponding energy measurements. If a code block was executed over a long enough time duration, I summed up the energy samples between the start and end millisecond level timestamps of the code block and used that value as its energy consumption. If the code block was executed in a small enough period of time that could not be accurately captured by millisecond level timestamps, I used the nanosecond level timestamps (System.nanoTime) to measure its time span and calculated its energy consumption with function PDt, where P is the power of the smartphone when the target code block is executed andDt is its execution time measured by the nanosecond level time stamps. For the purpose of this study, the Monsoon based PMP not only provides the same capabilities as the LEAP based PMP, but also allows vLens to work with any modern Android platform, and by extension, a broader set of mobile applications. In this study, I used the Samsung Galaxy SII smartphone with a rooted Android 4.3 operating system. To facilitate the automated interaction with the applications, I leveraged a workload generation tool, Monkey [2], from the Android SDK. Monkey is analogous to a traditional web crawler, but interacts with the GUI of Android applications, and generates a pseudo random stream of user events and inputs, such as touch and scrolling, for the target application. Monkey runs on a controller desktop and pushes UI events to the smartphone through the network. For each of the applications, Monkey generated five UI events per second and 500 events in total. Due to the subject pool size of my study, an important feature of this tool was that it could be completely automated to interact with the application. Also the tool uses pseudo random numbers to generate each user input, thus its workload was repeatable in every experiment for a single application with the same random number seed. To build a pool of subject applications, I started by collecting 9,644 free applications from the Google Play market using the Google Play Crawler [4]. These applications came from over 23 different cate- gories, but did not include games because they typically require non-deterministic and complex operations that could not be generated automatically. The vLens tool only supports instrumenting bytecode for path profiling, so I also excluded any applications that required native libraries. I then randomly selected 412 applications for my experiment. Finally, I removed the applications for which Monkey had generated state- ment coverage of less than 50% or that crashed during the automated interaction. This left us with 405 applications. The statistics and categories of these 405 applications are shown in Figure 3.2. In this figure, 32 I list the eight categories with the highest number of applications and all other categories are reported as “others.” Each category in Figure 3.2 has at least 28 applications and none of them have more than 74 applications. Figure 3.2: Categorization of the subject applications in the empirical study 3.3 Evaluation Result of the Empirical Study 3.3.1 RQ 1: How much energy is consumed by individual applications? To address this research question, I measured the total energy consumed by each of my subject applications during its execution. The total energy of an application was calculated as the sum of all energy consumed between the earliest timestamp and the latest timestamp associated with the application’s execution. I analyzed the results of all the applications as a group and then by the categories shown in Figure 3.2. On average, each application consumed 57,977 mJ of energy during its execution. 81% of the ap- plications fall into the range of 10,000 mJ to 100,000 mJ. The standard deviation of all applications was 62,416 mJ, and the difference between the highest and the lowest energy consumption was several orders of magnitude. The energy consumption by categories is shown in Figure 3.3: lifestyle & productivity (LP), entertainment (En), travel & transportation (TT), music & media (MM), health & medical (HM), sports and news (SN), photography (Ph), tools (To), and other (Other). These results show that energy consumption patterns are not homogeneous. The energy of different applications in the same category can vary by several orders of magnitude. Further, the category of an 33 LP En TT MM HM SN Ph To Others 0 50000 100000 150000 200000 Energy consumption (mJ) Figure 3.3: Average application energy consumption by category. 34 application does not have a strong correlation with its energy consumption. The average energy of ap- plications in different categories are at most 30% different. Compared with the difference in the energy consumption of applications within each group, which can be as large as several hundreds times, this amount of difference is not significant. 3.3.2 RQ 2: How much energy is consumed by the idle state of an application? To answer this research question, I divided the general energy usage of applications into three categories, PureIdle, APIIdle, NonIdle, and compared their energy consumption. PureIdle is the energy that is con- sumed with no code of the application running. For example, the running application may be suspended to wait for user input or asynchronous sensor data. During this time, there is no code of the application being executed but the system still consumes energy. APIIdle is the energy consumed by the sleeping APIs, such as java.lang.Thread.sleep and java.lang.Object.wait. These APIs set the running application to the idle state, but unlike the PureIdle, these APIs are a part of the application code. Finally, NonIdle is the amount of energy that is actually consumed by the application code. This amount of energy includes all energy consumed by the non-sleeping APIs and all the user code of the running application. I calculated each of these three categories as follows: First, I calculated APIIdle as the sum of the energy for all java.lang.Thread.sleep and java.lang.Object.wait series of APIs. The energy of an API is the sum of energy samples between its starting and ending timestamps. Second, I calculated the NonIdle energy as the sum of the energy of all execution paths minus the APIIdle energy. The energy of a path is the sum of all energy samples between the entry and exit timestamps of the path. This includes the energy of sleeping APIs called along the path so I subtracted their energy from the summed value to get NonIdle. Lastly, I calculated PureIdle as the total energy minus APIIdle and NonIdle, where the total energy for an application is the same as in Section 3.3.1. PureIdle represents all the energy that has been consumed while the application is running but not caused by any code of the application. On average the PureIdle, APIIdle, and NonIdle consume 36.6%, 25.0%, and 38.4% respectively. Their standard deviations are 37.8%, 30.0%, and 30.8% respectively. These numbers indicate that, for many applications, the code does not play a dominant role in energy consumption. In my study, in half of the applications, code consumed less than 31.1% of the total application energy. Most of the energy was spent as idle energy, either as PureIdle or APIIdle. Thus, simply optimizing the energy consumption of user code is insufficient; developers also need to reduce the energy consumed during idle states. One possible way to save energy is to design an energy efficient color scheme for mobile applications. Many current smartphones, such as the Samsung Galaxy series, use OLED screens as a display device. For these OLED based smartphones, a well designed energy efficient color scheme can reduce the energy consumption of mobile applications in the idle state. According to previous research [51], a well designed 35 color scheme could save up to 72% of the OLED screen power compared to energy inefficient color schemes. Based on the result of this research question, I proposed a technique that can automatically optimize the display energy for mobile web apps. The technique will be discussed in Chapter 4. 3.3.3 RQ 3: Which code consumes more energy: system APIs or developer-written code? Figure 3.4: Breakdown of non-idle energy. To answer this question, I broke down the NonIdle energy in Section 3.3.2 into three categories: API, Bytecodes, and Outliers. API is the energy of invoking any API in the Android SDK, such as an- droid.bluetooth.BluetoothA2dp.finalize(). Bytecodes represents the energy consumption of normal user code, such as branch instructions and arithmetic instructions. Outliers is the energy introduced by sys- tem events, such as garbage collection and process switching. During the execution of an application, the Android system may interrupt the current application to schedule a garbage collection or another process. Events like garbage collection and external process switching are controlled by the system and the ap- plication has no knowledge as to when they will happen. If the garbage collection and external process switching interrupt an execution path, their energy will be included in the measured energy of the execution path. According to my former research [91], the cost of a garbage collection or external process switching is 10 to 10,000 times larger than a normal path. Thus, if a path is interrupted by a garbage collection or external process switching, its energy consumption will be abnormally large. 36 In my measurement, I calculated the API energy as the sum of the energy of all API calls except for calls to the sleeping APIs, which were identified in Section 3.3.2. Similar to Section 3.3.2, the energy of each API is the sum of all energy samples between its starting and ending time stamps. I calculated the energy consumption of each bytecode with the robust regression techniques introduced in vLens [91]. The Bytecode energy is the total number of bytecodes multiplied with the per-bytecode cost identified via the robust regression. The energy of Outliers is the NonIdle energy minus the API energy and the Bytecode energy. The results of this analysis are shown in Figure 3.4. As shown in the figure, 75% of applications consumed more than 82.2% of their energy via API invocations, and 91.4% of applications consumed more than 60% of their total energy via API invocations. These results indicate that the system APIs dominate non-idle application energy for most applications. The user code does not consume a lot of energy. Thus, developers should focus on optimizing their usage of APIs to reduce the energy of their applications. 3.3.4 RQ 4: How much energy is consumed by the different components of a smartphone? Figure 3.5: Component level energy usage. To address this question, I report the ratio of the energy consumed by eight components of a smartphone to the total non-idle energy consumption of the test applications. These eight components are UI, network (Net), I/O operations (IO), sqlite queries (Sqlite), camera related operations (Camera), location information 37 (Location), sensor accessing (Sensors), and multimedia (Media). They represent the major functionalities of Android applications. For example, an Android application may read some data from the sensors and then display it on the screen through the UI or share it through the network. Sqlite, Camera, Location, and Media are also commonly used in Android applications. Since the hardware and software components of Android smartphones have to be accessed through APIs, I use the energy consumption of APIs that access a component as the energy consumption of the component. In my measurement, each of the eight components includes one or more packages of Android APIs, for example, the Net includes packages such as android.net.* and java.net.*. The result of this study is depicted in Figure 3.5, where the x-axis is the average percentage of the component level energy to the total non-idle energy. The average value is calculated only with the appli- cations that use a certain component. If one application does not use a certain component, it was excluded from the statistics of the component. For example, there were 330 applications using the network, so I calculated the average energy of Net only for these 330 applications. The data indicates that network is the most energy consuming and frequently used component in my measurement. Thus, the usage of network operations should be the optimization priority for reducing an app’s energy consumption. Other components have less energy consumption because they are not used as frequently as the network. Another observation is that the standard deviations are significant compared with the average value, which means the energy consumption of each component varies widely from application to application. Thus, despite the low average energy consumption of some components, they can still be significant in certain applications. Breakdown of Net Since Net is the most energy consuming category of APIs in most applications, I went one step further to break down the energy of Net. There are three mechanisms in Android to use the network, HTTP requests, Socket connections, and Webkit to display web pages. Thus, I divided the energy of Net into three categories that represent each one of the methods to access network. The average ratio of HTTP, Socket, and Webkit are 80%, 0.27%, 10%, respectively. Furthermore, 75% of the application spent more than 89% of their network energy in HTTP. This result indicates that making HTTP requests is the most energy consuming network operation. Thus, when considering the network, operations on HTTP requests should be a primary focus. 38 3.3.5 RQ 5: Which APIs are significant in terms of energy consumption? I answer this research question with three subquestions: (1) How many APIs are significant in energy consumption? (2) How similar are the top ten most energy consuming APIs across different applications? and (3) Which APIs are the most likely to be in the top ten most energy consuming APIs? 3.3.5.1 How many APIs are significant in energy consumption? To answer this subquestion, I calculated the average ratio of the energy consumed by each API to the total non-idle energy of all applications (Ratio) and the ratio of the top ten most energy consuming APIs to the total non-idle energy of each application (Top10). Ratio is a metric for each API that I use to evaluate how many APIs are significant across all applications. However, the ratio of an API can be distorted by the frequency of the API used in different applications. For example, an API may be the most energy consuming API when it is invoked, but if it is only used in a few applications, its Ratio may be very low. To give a complete view of this research question, for each application, I also calculated its Top10 to evaluate how many APIs are significant in the application. Top10 is the metric for each individual application and will not be distorted by the invocation frequency of APIs in different applications. For an API API i , its Ratio is given by the equation: Ratio i = N å j=1 RA i j N , where RA i j is the ratio of the energy of API i to the total API energy of the jth application. N is the number of applications, which is 405 in my experiment. For an application App i , its Top10 value is given by the equation: Top10 i = 10 å j=1 EA i j E i , where EA 1 to EA 10 are the energy consumption of the top 10 most energy consuming APIs in App i and E i is the non-idle energy of App i . There are 7,784 unique APIs used in my 405 applications. On average, each application invokes 292 different APIs. In my measurement, 98.4% unique APIs have a Ratio value below 0.1%. There are 11 APIs that have a Ratio larger than 1% and three APIs that have a Ratio larger than 4%. For the Top10 value of each application, the average is 76.4%, which means that the non-idle energy for market apps are concentrated in a few APIs. For an app, more than three quarters of its non-idle energy is consumed in ten APIs. This data indicates that only a few APIs are significant in the energy consumption. Across all applica- tions, the Ratio shows that, on average, only 2% of APIs consume more than 0.1% of the non-idle energy. For each application, the Top10 value shows that the top ten most energy consuming APIs out of the av- erage 292 APIs per each app consume more than 3/4 of the app’s total non-idle energy. Thus, developers can focus on a small set of APIs when they optimize the energy consumption of their applications. 39 3.3.5.2 How similar are the top ten most energy consuming APIs across different applications? To answer this subquestion, I measured the overlapping of the top ten most energy consuming APIs in each pair of applications. I calculated the size of the intersection of the top ten most energy consuming APIs for each pair of applications. Since I have 405 applications, there are 405(405 1)=2= 81;810 pairs of applications. On average, the size of the intersection of the top ten most energy consuming APIs is 1.093 and the median is 1. This number indicates that different applications consume energy in a different manner. There is not a general pattern among the top ten most energy consuming APIs across all applications. So, there is not a universal approach to optimizing the energy consumption for all applications. Developers have to design specific energy efficient schemes for each applications. 3.3.5.3 Which APIs are the most likely to be in the top ten most energy consuming APIs? Figure 3.6: Frequency distribution of the top ten most energy consuming APIs. To answer this subquestion, I plotted the frequency distribution of each API that falls into the set of the top ten most energy consuming APIs. To be more specific, I calculated, for each API, how many times it falls into the top ten most energy consuming APIs of the test applications. The result is shown in Figure 3.6, where the x-axis is the API ID and the y-axis is the frequency of a certain API. There are five APIs that are in the top 10 energy consuming APIs in more than 100 applications. Among these five APIs, 40 four are related to making HTTP requests or viewing web contents and one is used to synchronize shared files between threads. These results indicate that most APIs do not frequently appear in the set of the top ten most energy consuming APIs. Only a few APIs, such as the APIs that make HTTP requests and share files between threads, are most likely to be the most energy consuming in different applications. 3.3.6 RQ 6: How much energy is consumed by code in loops? Figure 3.7: Distribution of loop energy consumption. To address this research question, I studied three different types of loops: loops with HTTP requests (LoopHTTP), loops with other APIs (LoopNormalAPI), and loops with no APIs (LoopNoAPI). I studied LoopNormalAPI and LoopHTTP separately from other loops because APIs often dominate the energy consumption of an application and making HTTP requests often dominates most of the API energy. The result is plotted in Figure 3.7. From these results, I find that loops are significant in terms of consuming energy in applications. On average, instructions in loops consume 41.1% of the total non-idle energy. Among all loops, those with HTTP related APIs are the most energy consuming. This result is consistent with the conclusion of Section 3.3.4 — making HTTP requests is the most energy expensive operation. Another observation is that the standard deviation of each type of loop is larger than the average. This indicates that the energy consumption of loops varies significantly across different applications. This is because the functionalities of different applications are different and they may use loops in very different ways as well. 41 3.3.7 RQ 7: How much energy is consumed by the different types of bytecodes? Figure 3.8: Distribution of bytecode energy. To answer this question, I plot the average ratio of the energy consumption of each bytecode to the total non-idle energy consumption of applications. The average ratio of a bytecode B i is given by the for- mula: å EB i j =E j N , where EB i j is the energy consumption of the bytecode B i in the jth application, E j is the total non-idle energy of the jth application, and N is the number of applications. I measured the energy consumption of Java bytecodes that were generated from Dalvik bytecodes of Android executable files by reverse engineering. These two sets of bytecodes are different in the format of operation and the way they manipulate data. Although Java bytecodes and Dalvik bytecodes are not equal to each other, the types of their operations are similar. For example, both of them have instructions for arithmetic calculations, branching, and invocation. Therefore, instead of reporting the energy consumption of each bytecode, I cat- egorized bytecodes into five categories which represent the main operations in Java and Dalvik bytecodes. Since Java bytecodes are generated from Dalvik bytecodes, the functionalities of corresponding bytecodes are not changed. Thus, the energy consumption of a category in Java bytecodes will reflect the energy con- sumption of the category in Dalvik bytecodes. These five categories are Arithmetic, Branch, Invocation, Data Manipulation and Others. Arithmetic is the instructions for arithmetic calculations, such as add and multiply. Branch is control flow instructions, such as conditional jump. Invocation is instructions that call a method, such as invokevirtual in Java. The category Data Manipulation in Java represents the bytecodes to manipulate the stack, such as iload. In Dalvik, it represents the bytecodes of manipulating the registers, such as move. The category Others represents all other bytecodes that are not in the above four categories. 42 Examples of Others include monitor-enter, instanceof, length, etc. The result of this analysis is shown in Figure 3.8. From the results, I find that the energy consumption of data manipulating bytecodes is 1.7-5.4 times larger than other categories. This is because data manipulating operations are used more frequently than other operations. Before the execution of any arithmetic or invocation instructions, the arguments have to be properly set by the data manipulating instructions. For example, the operands have to be loaded into correct registers before arithmetic operations. 3.3.8 RQ 8: Is time equal to energy? To address this question, I measured two metrics. First, I calculated the size of the intersection of the top ten most energy consuming APIs and the top ten most time consuming APIs for each app. Second, I mapped the ranking of the top ten most time consuming APIs in descending order to that of the top ten most energy consuming APIs in descending order. To be more specific, for each application, I calculated 10 å i=0 O i , where O i = 1 if the i-th most energy consuming API is also the i-th most time consuming API, otherwise O i = 0. I chose this metric because if time could accurately reflect the energy consumption, the ranking of API calls with respect to time should be the same as that of APIs with respect to energy. In my study, the average size of the intersection of the top ten most energy consuming APIs and the top ten most time consuming APIs is 9.1 with a standard deviation of 0.8. The average number of APIs that are ranked the same is 4.6 with a standard deviation of 2.0. These results show that although the top ten most time consuming APIs are roughly the same as the top ten most energy consuming APIs, the ranking of each API may be significantly different. In practice, optimizing the energy consumption of an API is not trivial; an inaccurate ranking may lead developers to incorrectly prioritize their work and increase the amount of effort needed to achieve energy optimization. 3.3.9 RQ 9: What granularity of measurement is sufficient? I answer this question by measuring the difference between my nanosecond and millisecond level mea- surement. To get the millisecond value, I first replaced the nanosecond level fine grained time stamps of vLens with the normal millisecond level time stamps from System.currentTimeMillis(). Then I repeated the steps in Section 3.2 with the rough time stamps to get the millisecond level results. I compared the energy measurement on the millisecond level to that on the nanosecond level. The error ratio is given by the equationj millinano nano j, where milli and nano are the results from the millisecond and nanosecond level measurement, respectively. 43 In my measurement, the mean error rate of the non-idle energy is 64.1% and the largest error rate is over 2500%! Such a number indicates that using only millisecond level is not sufficient and is likely to give misleading results. The use of nanosecond level time stamp can improve the accuracy of measurement. I argue that nanosecond level measurement is sufficient because it could capture all API calls and methods. The clock frequencies of current processors are normally at the GHz level, so each instruction is executed at the nanosecond level. Since methods and APIs in general have several instructions, the nanosecond level time stamps are able to capture all methods and API calls. 3.3.10 RQ 10: Is it necessary to account for idle state energy? To answer this research question, I reported the measurement error of total energy caused by neglecting the idle state energy of an application. In traditional measurements, developers assume that energy consumed during the execution of an application belongs completely to the code of the application. However, this assumption is not valid. As I discussed in Section 3.3.2, the system keeps consuming energy while the running application is in the idle state. Thus, assigning the energy consumed during the idle state to the code of the application may be misleading. In my measurement, I calculated the energy consumption of an application without subtracting the idle state energy (IdleKept), which sums up all the energy samples between the first time stamp and the last time stamp during the execution of the application. The energy consumption during the non-idle state of an application (IdleSubtract) is obtained by summing up the energy consumption of all the execution paths of the application. In the result, the average difference between IdleKept and IdleSubtract is 36% and the largest error is over 99%. This result indicates that assuming the energy consumed during the execution of an application is the total energy can introduce inaccurate measurements and misleading results. Developers may incorrectly optimize the energy consumption of their applications. For example, developers may put a lot of effort into optimizing the energy consumption of the code of an application, even though most of the energy is consumed during the idle state of the application. In this case, the efforts will be wasted since the idle state energy consumption of the application cannot be reduced by optimizing the application code. 3.4 Threats to Validity External Validity: To ensure that my applications are representative of real world applications, I randomly selected 405 applications from the Google Play market. These applications were from 23 categories and had various functionalities. Their size ranged from 1.6KB to 18 MB and their number of instructions ranged from 1,507 to 1,866,692. 44 To ensure the workload generated by Monkey was representative of real use cases of the applications, I first removed all games since I could not generate deterministic workloads for them. I then removed all applications with statement coverage less than 50%. Therefore, the workloads I used in my experiments covered most of the statements of my test applications. Finally, I removed applications from the subject pool that crashed during execution. Monkey did not generate any text for user input, such as login infor- mation. However, if the user input was critical to an application, it was not likely to reach the statement coverage of 50% and would be removed. Internal Validity: In Section 3.3.3, the energy consumption of Bytecodes and Outliers were estimated with the method from my previous work [91]. The estimation error for my technique was 19.2%. Since the Bytecode energy was only 3.2% of the total non-idle energy, a 19.2% estimation error in Bytecode energy would not influence my final conclusion. For the accuracy of detecting Outliers, I previously showed [91] that my method could accurately detect process switching and garbage collection. In Section 4, the energy of bytecodes was also estimated through the robust regression. Thus, it had the same estimation error as I reported in Section 3.3.3. Since the energy consumption of each category differed by at least 50%, here again, a 19.2% estimation error would not change my conclusion. In my study, I measured the energy consumption once for each application, which may introduce bias to the result of each individual application due to the randomness of the workload and the environment, such as network conditions. However, since I made my conclusions based on the whole set of my test applications, this bias was mitigated by the large number of applications in my experiment. Lastly, in my study, Monkey was running on the controller desktop and pushed UI events to the smart- phone through the network. So Monkey did not introduce any extra energy consumption except the net- work energy consumption to transmit UI events. I checked this increment of network energy on a subset of my test applications and found that it was not noticeable compared to the total energy consumption. Construct Validity: In Section 3.3.4, the average energy consumption of a component may be dis- torted by the popularity of the component in the applications. One component may consume a lot of energy when it is used, but may have low average energy consumption due to low usage rates across all applica- tions. To avoid this distortion, I reported the effective average energy for each component, which excludes the applications that did not use a component from the statistics of the component. In Section 3.3.5, the average energy distribution of an API across all applications may be distorted by the frequency of the API being called in all applications. Some APIs may have more average energy consumption at the application level than others because they were used in more applications. To avoid this bias, for each application, I also reported how much energy was consumed by the top ten most energy consuming APIs in the ap- plication. In Section 4, my robust regression model estimated the energy consumption of JVM bytecode. Although the JVM bytecode was generated from the DVM bytecode by reverse engineering, it was not the 45 same as DVM bytecodes. To address the difference between the JVM bytecodes and the DVM bytecodes, I reported the energy of categories of bytecodes instead of the energy of each specific bytecode. Since there was a mapping of functionalities from the JVM bytecodes to the DVM bytecodes, the categories were consistent for both of them. 46 Chapter 4 Display Energy Otimization In my empirical study, I found that the idle state consumes the majority of the app energy. In the idle state, the apps are waiting for user input and there is no user code running. Thus, for the energy consumption of the idle state, optimizing the user code is not effective. However, since display is one of the dominant energy consuming components in a smartphone [40], optimizing the display energy is a useful method to reduce the idle state energy consumption for mobile apps. OLED screens [115] are increasingly popular in different smartphones, such as the Samsung Galaxy, Sony Xperia, and LG Optimus series. These screens are more energy efficient than previous generation displays, but also have very different energy consumption patterns. In particular, darker colors, such as black, require less energy to display than lighter colors, such as white [49]. Unfortunately, many popular and widely used web applications use light-colored backgrounds. This means that, for many web applica- tion, there is a significant opportunity to improve the battery life of smartphones by improving the color usage of a web application’s pages. Researchers and engineers have long recognized the need to reduce a smartphone’s display energy. A well-known and widely used smartphone technique is to dim the display to conserve energy [52]. For example, when the smartphone is idle. This technique is useful, but there is room for additional improve- ment by exploiting the OLED screen’s unique energy color relationships. One simple approach that has been suggested is to invert colors, switching light colors to dark and vice versa [51]. The primary problem with this approach is that it distorts the color relationships of the user interface because color difference is not an invertible relationship. Another approach is to create an alternate color scheme for mobile web applications. Chameleon proposes a browser extension that retrieves and applies a more energy efficient color scheme when displaying a web application [51]. The drawback of this approach is that it requires a customized browser, additional servers on the network to handle the color scheme, and the color scheme itself must be manually generated. 47 Given the state of the art, a technique that can automatically transform a web application to make its web pages more energy efficient is desirable. However, there are several significant challenges to providing such a solution. The first of these is to identify colors generated by a web application. Most modern web applications combine dynamically generated pages and cascading style sheets in a way that makes it complicated to determine which colors will be used in which parts of a web page. Second, it is important to model the color relationships in the web page. Here, it is necessary to know what kind of visual relationships the colors have with each other, i.e., whether they are contained or adjacent. Third, given this information, it is challenging to find a new color scheme that maintains, as much as possible, the color differences and aesthetics of the original web page, while also being more energy efficient. I propose a new technique for automatically transforming the color scheme of a web application. My technique works for all kinds of web apps, but it can be more effective for mobile web apps since these apps are more commonly used in mobile devices with OLED screens. The approach rewrites the server side code and templates of a web application so that the resulting web application generates pages that are more energy efficient when displayed on a smartphone. The rewritten web application can then be made available to OLED smartphone users via automatic redirection or a user-clickable link. My approach employs program analysis to model the possible pages that can be generated by the web application. Using this information, it models the potential visual relationships among the colors of the pages’ elements and defines a set of constraints for the new color scheme. My approach then defines a minimization problem whose solution represents a new color scheme in which the color differences are similar to those in the original web application. Finally, I define an efficient simulated annealing based algorithm to solve the minimization problem and produce a new color scheme that is both energy efficient and visually appealing. I have implemented my approach in a prototype tool, Nyx, and performed an extensive empirical evaluation on a set of seven web applications. The results of my evaluation show that my approach is successful at automatically rewriting web applications to generate more energy efficient web pages that will be acceptable to end users. In particular, my approach achieved an average 40% reduction in the display’s power consumption. Via a user study, I found that users rated the attractiveness and readability of the transformed pages as lower, but still close, to the original. Importantly, over 60% of users indicated that the transformed version would be acceptable for general use given the energy savings, and over 97% said it would be acceptable for use if the battery power was critically low. Overall, I consider these results to be a strong indication that my approach can provide efficient and visually-acceptable transformations for web applications. 48 Figure 4.1: An example web app. Figure 4.2: The generated code of Figure 4.1. 49 4.1 Motivating Example for Display Energy Optimization In this section, I introduce a motivating example to illustrate the challenges my approach must address. My simple example is shown in Figure 4.1 and its output is shown in Figure 4.2. For explanatory purposes, I inline the CSS properties used by the code in Figure 4.1. In my approach, I have three main challenges to address to automatically transform the colors of a web application. The first challenge is to extract color information from the implementation of a web application. The color information includes two types of information, the colors generated and the structural relationship they have with other colors. For example, in Figure 4.1, I need to know that (1) the< body> (line 3 and 4) tag has white as its background color and the < td > tag has red as its background color, and (2) the red color area is surrounded by the white color area. This information is obtained by analyzing the code and identifying the strings that define the page’s HTML structure and colors. In general, this requires us to model the output of a web application and then build more detailed models of the relationships among its HTML elements. I discuss how to extract color and structural information in Section 4.3. The second challenge is to model the relationship between colors that have a structural relationship. In general, a transformation must maintain this relationship to improve the readability and aesthetics of a new color scheme. For modeling this relationship, I use color distance, which is a function that accepts two colors and returns a numeric value to indicate the degree of difference between the two colors [135, 133]. Colors have a larger color distance if they are more different. This modeling is complicated by the fact that there are generally multiple colors used in a web page. For example, in Figure 4.2, there are six colors: white, black, green, yellow, red, and blue. All of these colors have different relationships: white surrounds red and green, green and red are next to each other, etc. Furthermore, not all color relationships in a web application are equally important. For example, in Figure 4.2, the relationship between white and black (the background color and text color in line 1) is more important than the difference between black and yellow (text color in line 1 and line 5). I address this modeling problem in Section 4.4.1 and Section 4.4.2. The third challenge is that given a model for the relationships between colors, I must find the best color scheme that saves display energy and, at the same time, maintains the attractiveness and readability of the web application. From prior research studies I know that the energy consumption of OLED screens is related to the RGB value of each pixel [49]. Energy consumption of a pixel ranges from black, the least energy consuming color, to white, the most energy consuming color. Therefore, I would want to change the background color of the < body> tag in Figure 4.2 to black to save energy. However, to maintain readability I also need to change the background colors for table cells (lines 3, 4, and 6) and the text color (lines 1, 3, 5, and 7). Otherwise, the content of the web may become unreadable and the appearance of the web application may be degraded. A brute force method to find a new color scheme that satisfies these 50 constraints is inefficient because of the large color space, there are 256 3 colors in total to choose among. I address this problem in Section 4.4.3. Figure 4.3: Architecture of Nyx 4.2 Overview of the Approach for Display Energy Optimization The goal of my approach is to reduce the energy consumption of the HTML pages displayed by a mobile web application. To do this, I automatically rewrite an application so that its generated HTML pages use more energy efficient color schemes and layouts. My approach can be described as having three phases. An overview of these phases is shown in Figure 4.3. The first phase is HTML Output Analysis (Section 4.3). In this phase, the approach builds a model, the HTML Output Graph (HOG), of the HTML pages that can be generated by the application. Then using the HOG, the approach builds the HTML Adjacency Relationship Graph (HARG), which captures the visual relationships, such as adjacency or containment, between pairs of HTML elements. The second phase, Color Transformation (Section 4.4), builds a Color Conflict Graph (CCG) that describes the relationships between the colors of HTML elements that have a visual relationship. Using the CCG, the approach generates the Color Transformation Scheme (CTS), a new energy efficient color scheme for the application. This is done for an application by calculating a new color scheme that maintains the color distances represented in the CCG, but whose primary colors will consume less energy during display. The third and final phase is Output Modification (Section 4.5). The result of this phase is that the approach rewrites the application so that the generated HTML pages use the colors contained in the CTS. 4.3 HTML Output Analysis In the first phase, the approach builds models of the application that describe the structural relationships among the HTML elements of the application’s web pages. I call this model the HTML Adjacency Rela- tionship Graph (HARG) and it shows which HTML elements can be adjacent to each other or contained by one another. To generate the HARG, the approach first builds another model, the HOG, which describes the HTML pages that can be generated by the application, and the HTML Tag Graph (HTG), which describes the potential HTML tags in the application. I explain the three models in more detail below. 51 4.3.1 The HTML Output Graph The HTML Output Graph (HOG) represents the potential HTML output of a web application. Intuitively, the HOG is a projection of the web application’s control flow graph (CFG) where all of the nodes are instructions that generate HTML output. An HOG is represented as a tuplehV;E;v 0 ;v f i. V is the node set where v2 V if the node is in the application’s CFG and prints to the application’s output stream. In the Java Enterprise Edition (JEE) framework, an example of such nodes would be invocations to JspWriter.println. E VV is the edge set where an edge (v i ;v j )2 E if there is a path from v i to v j in the CFG of the web application with no other node v k 2 V along that path. v o 2 V and v f 2 V are, respectively, the entry and exit nodes of the HOG. The approach builds an HOG for each method of the web application by analyzing the method’s CFG. To do this, my approach removes all of the nodes in the CFG except for the printing instructions. Then it connects all of the printing instructions with the paths between them in the CFG. The HOG for the entire application can be obtained by treating each node in the HOG that represents an invocation as a transition to the entry node of the target method’s HOG. For example, the HOG of Figure 4.1 is shown in Figure 4.4. Figure 4.4: Example HTML Output Graph for Figure 4.1 The HOG is suitable for modeling the pages produced by dynamic web applications that directly gen- erate HTML using mechanisms such as JSP, Servlets, or Struts. However, many modern web applications also contain HTML template files that are filled in by application logic to generate the final HTML content. This is very common in web application frameworks that implement the Model-View-Controller (MVC) 52 pattern, such as Apache Velocity and WebMacro. For these types of applications, the approach builds the HOG directly from the template files. To do this, the approach opens all macros in the template files and identifies the entire HTML frame. Then, the approach defines each line in the HTML frame as a node in the HOG and defines edges based on the order of the lines in the template file. The process for construction of a HOG for template based applications can vary based on the framework, but in general, the process requires a mixture of the string analysis based approach and the template parsing discussed here. 4.3.2 String Analysis After the HOG is built, my approach needs to identify the HTML strings produced by each output gener- ating node v2 V . To do this, I use the string analysis technique, Violist, I developed in Chapter 6. In Nyx, I used Violist to map each v to a finite state automaton (FSA). Note that the value of v may not be a string constant, it may be generated by using complex string expressions with user input or loop operations. My approach assumes that strings defined external to the application, such as user input, will not influence the color or the structure of the HTML tags. 4.3.3 The HTML Adjacency Relationship Graph The HTML Adjacency Relationship Graph (HARG) models the visual relationship between pairs of HTML elements in the HOG. The type of information present in the HARG is similar to the Document Object Model (DOM), in that it shows parent/child and sibling relationships. However, since the HARG is built from the HOG, it also contains relationships that could be derived from loop generated HTML elements. The HARG is defined as a tuplehV;Ei where each v2 V is a node in the graph that corre- sponds to an HTML element that can be generated by the application. E VV is the edge set where (v i ;v j )2 E means that v i is a parent HTML element of v j . To illustrate, the HARG for Figure 4.1 is shown in Figure 4.5. To build the HARG, my approach takes a two-step approach. First, it builds an intermediate graph, the HTG, to clean and organize the HTML tags in the HOG. My approach has this step because the printing instructions in the HOG may not contain exactly one HTML tag. For example, line 3 in Figure 4.1 prints one half of an HTML tag while line 5 prints two tags. Having the HTG will make sure that each node in the HTG will be exactly one HTML tag. Second, based on the HTG, my approach will build the HARG. 4.3.3.1 Getting the HTML Tag Graph The HTG is defined as a directed graph. On a high level, it is similar to the HOG, but guarantees that one node contains and only contains one HTML tag. For example, the HTG of Figure 4.1 is shown in 53 Figure 4.5: Example HARG for Figure 4.1. Figure 4.6: Example HTG for Figure 4.1 54 Figure 4.6. Note that one node in the HOG can be split into several nodes in the HTG and multiple nodes in the HOG can be merged into one node in the HTG. The formal definition of the HTG is following: A node in the HTG is an FSA which only contains one HTML tag. For two nodes, A and B, in the HTG. Assume their corresponding HTML tags are A’ and B’. There is an edge in the HTG between A and B if there is a path from A’ to B’ in the CFG and there is no other HTML tags on this path. To build the HTG, my approach uses a five-step approach. First, my approach converts the HOG to a huge FSA, F. To do so, my approach first adds all the nodes of the HOG, which are small FSAs, to F. Then for the automata of any two HOG nodes, FSA 1 , and FSA 2 , my approach connects the start state of FSA 2 to the accept state of FSA 1 if there is an edge from FSA 1 to FSA 2 in the HOG. Second, my approach finds the boundaries of each HTML tag in F. In this step, my approach looks for the character of ‘h’ and ‘i’ in F. ‘h’ indicates the beginning of an HTML tag and ‘i’ indicates the end of an HTML tag. In the third step, my approach matches each ‘h’ symbol to its next ‘i’. To do so, my approach uses the definition-use chain calculation algorithm on F [26]. My approach takes each ‘h’ as a definition and each ‘i’ as a use. Then the definition-use chain calculation algorithm will match each ‘h’ to its next ‘i’ on F. In the fourth step, my approach analysis the paths between the matched ‘h’ and ‘i’ in F. Each of these paths will be a candidate HTML tag. Then for each of the candidates, my approach checks if it is a legal HTML tag. If so, the FSA of the tag will be added as a node in the HTG. Finally, my approach calculates the edges between nodes in the HTG. To do this, my approach also uses the definition-use chain calculation algorithm. In this step, my approach treats each ‘i’ as a definition and each ‘h’ as a use. This algorithm will find the following ‘h’ of each ‘i’. Then for two nodes in the HTG, N 1 and N 2 , if the ‘i’ of N 2 is followed by the ‘h’ of N 1 , my approach adds an edge between N 1 and N 2 in the HTG. 4.3.3.2 Getting the HTML Adjacency Relationship Graph To build the HARG, my approach parses the HTG to figure out the parent-children relationships between the HTML tags. The traversal begins by traversing all of the edges in the FSA associated with the root node of the HTG, then following all of the outgoing edges of the root node, and repeating this process until all nodes in the HTG have been traversed. During the traversal, my approach checks if the tag in each node of the HTG is an opening tag, such as “hbodyi”, a self-closing tag, such as “himg=i”, or a closing tag, such as “h=divi”. If the tag is an opening tag or a self-closing tag, my approach creates a corresponding node in the HARG for the parsed HTML tag. 55 To connect each node in the HARG, my approach adds an edge (v i , v j ) in the HARG if and only if all of the following four conditions hold: (1) v i is an opening HTML tag; (2) there is a path P in the HTG from v i to v j ; (3) the closing tag of v i is not in P; and (4) along path P, if there is a node v k that meets conditions (1) – (3), v k is equals to v i . These conditions enforce that v j will be a child of v i , contained within v i ’s opening and closing tags, and that v i is the most immediate predecessor that satisfies these condition. The greatest challenge in my approach is to handle loops in the FSA. Loops will generate infinite strings. When the parsing encounters a loop in the FSA, the approach simulates its unraveling n times. This unraveling may be unsafe because it is possible that the n+ 1 traversal introduces new strings that are not included in the previous n unraveling. However, I have found that for the purpose of identifying the color attributes assigned to each tag, n has a reasonably small bound. In the analysis, I employ the following heuristic: n is assigned the maximum of either the integer value 6 or one more than the largest iteration of repeating substrings in the CSS file. For example, in the string NyxNyxNyx, Nyx is a repeating substring. I use the value 6 since this is the maximum iteration of repeating substrings of a hexadecimal string that can be defined as the value of color in an HTML attribute. Case in point, a color is defined by a six digit hexadecimal number (e.g., #000000) and each iteration of a loop could provide one character of this string. Since my goal is to capture potential color information, this gives the approach a reasonable upper bound on the loop unraveling. In practice, I found that 6 was always sufficient and there was no incompleteness in the HARG due to unraveling the loop in this way. The techniques for obtaining the HOG can lead to an over approximation of an application’s possible HTML pages. In turn, this can lead to the identification of spurious visual relationships that correspond to infeasible paths. This does not cause a problem for the approach, as this merely introduces additional color relationship constraints that must be accounted for while generating the Color Transformation Scheme in Section 4.4.3. 4.4 Color Transformation In the second phase, the approach calculates the new energy efficient color scheme for the application – the Color Transformation Scheme (CTS). There are two requirements for the CTS, it must: (1) use energy efficient colors as the basis for the new color scheme, and (2) maintain the color relationships between neighboring HTML elements. The first requirement serves the general goal of the approach and the second ensures that the color-transformed pages are readable and, ideally, as visually appealing as the original pages. To address the first requirement, the CTS should replace large, light colored background areas with dark colors (preferably, black), as mentioned in Section 4.1. To address the second requirement the approach must transform the other colors of the HTML elements so that their color relationship with 56 the new dark-colored background is similar to their color relationship with the previous light-colored background. My approach produces a CTS that meets both requirements. To do this, my approach first builds a Color Conflict Graph (CCG), which describes the color relationships between pairs of HTML elements that have a visual relationship. Then, my approach changes the background color of the root node of the CCG to black. Generally, the root node of the CCG corresponds to thehbodyi tag, but can differ for certain layouts. Then the approach calculates a new recoloring of the CCG so that the color distances between adjacent nodes in the recolored graph are the same as color distances in the original graph. This mapping of old colors to the transformed colors is the output of the second phase. My approach operates on three different types of CCGs, the Background Color Conflict Graph (BCCG), which models the relationship between the background colors of neighboring HTML elements; the Text Color Conflict Graph (TCCG), which models the relationship between text colors and their corresponding background HTML element colors; and the Image Color Conflict Graph (ICCG), which models the relationship between an image and its enclosing HTML tag. In the remainder of this section, I first give a formal description of the CCG and its three variants, then introduce how I derive them from the HARG, and finally discuss the calculations that generate the CTS. 4.4.1 The Definition of Color Conflict Graph The CCG and its three subtypes, the BCCG, TCCG, and ICCG, show the color relationships between the HTML elements of a page generated by the application. Formally, the CCG is represented as a weighted complete undirected graphhV;v 0 ;Wi. The set V represents the graph’s nodes, where each node represents a color in the HTML page. v 0 2 V is the root-node of the graph, which is the color that will be transformed to black. Typically, v 0 is the background color of thehbodyi tag, but users can specify a different root node for unusual HTML layouts. W is a weighting function W : VV! I. Since the CCG is a complete graph, there is an integer edge weight defined for every pair of tuples in V . The weighting function is used to give priority to certain types of visual relationships. The BCCG, TCCG, and ICCG vary in the weights attached to certain definitions. The BCCG models the relationship among all background colors in the web page. There are three types of relationships modeled in the BCCG: (1) parent and children nodes, (2) sibling nodes, and (3) all other nodes. The weights for these edges are assigned as the constants a;b;c, respectively, where a> b> c> 0. I rank the parent-child relationship as the most important. The reason for this is that a parent element’s color generally surrounds their children nodes in the rendering of the HTML page, which means that the color distance for these elements must be maintained to visually distinguish between the elements. I rank the sibling relationship as next important because, generally, siblings are rendered close to each other on 57 Figure 4.7: Example CCG for Figure 4.1. a page and therefore their color difference is more important to maintain than that among the remaining elements. Finally, I attach a weight to c because maintaining an element’s color difference relationship with the other elements on the page helps to preserve the overall aesthetics of the original color scheme, but is not as important as the other two relationships. An example BCCG with edge weights for Figure 4.1 is shown in Figure 4.7. In my example, white is the background color of thehbodyi tag at line 3. The htablei andhtri tags at line 5 inherit this color. Red and green are the background colors for thehtdi tags at line 11 and 14, which are children of thehtri tag at line 5. Thus, the weight between white and red, or white and green, is a, while the weight is b between red and green. The TCCG models the relationship between text colors and the background color of each HTML element. For example, the text color and the background color of the body tag at line 1 in Figure 4.4. In the TCCG, only the edges between text colors and their background color have a weight of a. Otherwise, the weight will be 0. My approach does this because all the text colors are enclosed by their background colors. A text color does not have any connection between colors other than its background color. The ICCG models the relationship between colors in an image and the background color of the HTML element that surrounds the image. Therefore only edges in the ICCG that connect an image’s colors to the background color of its enclosing HTML elements are given a non-zero weight of a with the remaining edges being assigned a weight of 0. In some cases, it is not desirable to transform every image in the web application. For example, it may be preferable to not alter a photo in a news article since the original appearance relates to the integrity of the story. Developers can specify a list or pattern of image file names that should not be transformed. When the approach finds an image tag including one of the excluded image tags, it does not construct an ICCG or CTS for the image. 4.4.2 Building the CCG The CCG is built using the information contained in the HARG. The general intuition of this transformation is that the approach identifies color definitions in the HARG and propagates the definitions along the graph 58 to elements that may inherit the color. The propagated information is then used to identify colors that have a relationship with each other. The first step is to identify the color definitions (CDs) generated by each node in the HARG. A CD is generated when an HTML element contains an attribute that defines either the text/background color of the element, or the HTML element is an image tag. For example, background colors of some elements can be defined by thebgcolor attribute and the color of text or links can be defined by thetext orlink attribute. For pages that use CSS, the approach identifies the set of CDs that an HTML element generates based on its ID, class name, or type, which can be determined using a standard CSS parser. An image tag generates a CD for every color used in the image. The second step is to propagate all of the background CDs to the other nodes in the HARG. This is done to determine which background colors will be adjacent to each other and which image and text colors will appear over a particular background color. The approach propagates the color information using standard iterative data flow analysis [26]. TheGen set of a node is comprised of the CDs generated at that node. TheKill set kills all CDs that flow in to the node if the node generates a CD. For example, for a node v i in the HARG, the approach propagates all of its background CDs to a child v j , if v j does not generate any background CDs. Standard equations are used for theIn andOut sets. Note that CDs originating from images and text are ignored during the propagation. The final step is to derive the CCG from the colors propagated over the edges of the HARG. Nodes are created slightly differently for each CCG variant. In the BCCG there are nodes for each unique background CD generated in the HARG. Nodes in the TCCG include those in the BCCG plus nodes for the CDs originating from text colors and the ICCG includes the nodes in BCCG plus nodes for the CDs originating from image colors. Since the CCG is a complete graph, there is an edge defined between each pair of its nodes. The edges of the CCG are assigned weights based on the different types of visual relationships discussed in Section 4.4.1. In general, the weighting is done by iterating over each node v in the HARG and identifying v’s set of corresponding nodes in the CCG, N c . For each edge in the setf(n;n i )jn2 N c ^n i 2 Ng where N is the CCG node set, the appropriate weight is assigned based on the relationship it represents and the variant of the CCG. The construction of the CCG does not take into account the effects of embedded JavaScript. This would result in an incomplete model of the color relationships if JavaScript was used at runtime to modify the colors of a web page. To determine if this would impact my approach, I conducted a small scale study on the use of JavaScript to modify colors in a web page at runtime. In this study, I examined the top 50 web sites, as ranked by http://mostpopularwebsites.net/, to determine how JavaScript affected the colors of a website. In all 50 , I found that JavaScript was not used to modify the colors. I believe 59 that this result generalizes beyond the top 50 websites and indicates that accounting for JavaScript in the construction of the CCG is not necessary. 4.4.3 Generating the CTS To generate the CTS, the approach analyzes each variant of the CCG and computes a recoloring that is more energy efficient, but maintains, as closely as possible, the color differences between nodes in the graphs. In this section I explain the analysis of the BCCG in detail and then briefly describe the analogous process for the ICCG and TCCG. To transform the background colors, the approach converts the color of the root node of the CCG to black and then transforms the other background colors to maintain their color distances. To state this more formally, let S=fC 0 ;C 1 ;C 2 ;C 3 ::::C k g be the set of colors of each node in the CCG where C 0 is the background color of the root-node (v o ) and each C i ;i> 0 is the color for the remaining k nodes. The approach creates a new color scheme S 0 in which C 0 0 =black and then finds color mappings for C 0 1 ;C 0 2 :::C 0 k that minimize the overall difference in the color differences of S and S 0 . The function to be minimized is H = k å i=0 k å j=0 w i j jDist(C i ;C j ) Dist(C 0 i ;C 0 j )j Where w i j is the weight of the edge between colors i and j in the BCCG. Basically, this minimization function is closer to zero the more closely each of the color difference distances in S 0 match the color difference distances in S. My approach maps this minimization problem to the Energy Minimization Problem 1 (EMP), which is a well-known pixel recoloring problem in the computer vision field [148]. The EMP minimizes the cost of a set of pixels and their labels. Given a set of pixels P =fP 0 ;P 1 ;:::::P k g, and a set of labels L=fL 0 ;L 1 ::::::L n g, and a cost function Cost(P;L; f), where f is a mapping from L to P, the EMP finds a transformation f that minimizes the cost function Cost. In my mapping, all nodes in the CCG, the whole color space, and the H function are the pixel set, label set, and cost function, respectively. My problem is then to find a mapping from the whole color space to all nodes in the CCG that maps the root-node to black and minimizes the H function. This minimization problem is NP-hard, but an approximation can be solved for in a reasonable time using a Simulated Annealing Algorithm (SAA) [148]. For my approach, a close to optimal solution for S 0 satisfies my two requirements for the CTS and, as I show in Section 4.6, allows the approach to compute the CTS in a reasonable amount of time. An SAA is a technique for finding a good approximate solution in a very large search space and works by probabilistically exploring states until a good enough solution 1 Here, the term “energy” refers to general cost 60 is found or the computation budget is fully consumed [86]. SAAs are a well-known technique utilized in search-based software engineering and are considered a good fit for problems where identifying an approximate solution in a large search space is sufficient [107, 70]. My approach’s adapted SAA is shown in Algorithm 1. The input of Algorithm 1 is the original color scheme S, and the CCG. The output is the transformed color scheme, S 0 , with the background color of the root node transformed to black. The SAA begins with an initial color scheme Current (line 1), which is generated by a greedy algorithm GreedyInit. The purpose of GreedyInit is to identify a reasonable starting point for the SAA. The basic approach of GreedyInit is to first flip the color of the root node, which is C 0 in S, to black. The algorithm then traverses over each of the remaining nodes in the CCG in order of decreasing edge weight. For each node, GreedyInit assigns a color that minimizes the cost function H for all nodes that have been visited. My search-based algorithm needs a time budget T in case the algorithm does not converge on the optimal solution. A counter T that represents the allocated time or computational budget is initialized with an integer at line 3. The SAA iterates until T reaches 0 (lines 4-15). In each iteration, the approach identifies a possible new color scheme Next. This is done by calling Random Successor, which generates a new color scheme by modifying each color in the current scheme, except for C 0 , by a random amount (line 6). If the new color scheme minimizes the cost function, H, more than the current one, then the new scheme Next replaces the current scheme (lines 10 - 12). If the new scheme is not an improvement, then the current scheme may still be changed with some small probability (line 13) to prevent the algorithm from getting stuck at a local optimum. The probability function is based on the size of the counter T and the most recent difference of H. Finally, after the counter expires, the current best solution is returned. This represents the S 0 for the CCG. The approach for computing the CTS for the TCCG and ICCG is very similar to the process described above. A key difference is that the background color transformations identified in S 0 are substituted into the corresponding colors in the TCCG and ICCG. These transformed colors are treated as fixed and the remaining colors (color of the text and colors within an image) are transformed using the above process. The CTS for the TCCG is a transformation of all of the text elements where each new text color main- tains the color difference with the transformed background color of its enclosing HTML element. The CTS for the ICCG is a recoloring of each image so that each color in the image maintains its color distance with the transformed background color of its enclosing HTML element. Note that every color in an image is transformed with respect to maintaining the color relationship with the enclosing HTML element, not the other colors in the image. In cases where the image contains color gradients or shadows, this generally results in a less attractive transformation. In fact, my evaluation showed that the attractiveness of appli- cations with more transformed images was generally rated lower than the original. In future work, I plan 61 Algorithm 1 Simulated Annealing Algorithm Input: S, CCG Output: S 0 1: Current GreedyInit(S;CCG) 2: BEST Current 3: Initilize(T) 4: while T > 0 do 5: Decrease(T) 6: Next Random Successor(Current) 7: if H(Current)< H(BEST) then 8: BEST Current 9: end if 10: if H(Next)< H(Current) then 11: Current Next 12: else 13: Current Next with P= e H(Current)H(next) T 14: end if 15: end while 16: S 0 BEST to investigate more advanced image processing techniques that could improve the aesthetics of recolored images. 4.5 Output Modification In the third phase, the approach rewrites the web application so that it generates web pages based on the CTS. For my approach I have two different mechanisms to modify the web app. For CSS files and HTML templates, my approach simply uses regular expressions to replace all color strings with their cor- responding colors. In practice I have found that more sophisticated approaches, such as using CSS parsers to identify style properties to modify is unnecessary. For colors that are defined by dynamically gener- ated HTML, the approach inserts instrumentation to perform the rewrite at runtime. The instrumentation replaces the APIs that print HTML to clients (e.g., JspWriter.println) with calls to customized printing functions. These printing functions scan the output as it is generated and replace printed colors with their corresponding colors in the CTS. 4.6 Evaluation for Display Energy Optimization I performed an empirical evaluation of three aspects of my approach, time cost, energy savings, and user acceptance of the appearance of the transformed web pages. I implemented my approach in a prototype tool, Nyx, and used it to address four research questions: 62 Table 4.1: Subject apps for display energy optimization Application Information Time cost(s) Energy saving(%) Name Framework SLOC Load Analyze Transform Rewrite Loading Energy Display Power Bookstore JSP 24,305 46.2 9.64 27.5 1.8 26.7 47.2 Portal JSP 21,393 45.1 8.34 53.8 1.7 24.7 44.2 JavaLibrary JSP&Servlet 73,468 45.8 21.7 29.9 2.9 26.1 35.8 ClassRoom JSP 5,127 18.1 5.97 0.385 0.1 35.8 51.6 Roller JSP&Struts 154,065 0.018 1.23 102 0.2 10.4 18.0 Scarab Velocity&Turbine 145,435 0.016 1.84 27.1 0.2 27.1 47.8 jForum Velocity 31,841 0.014 1.94 154 0.1 26.7 47.8 RQ1: How much time does Nyx take to generate the CTS? RQ2: How much energy is saved by the transformed web pages? RQ3: What is the runtime overhead introduced into the modified web applications by Nyx? RQ4: To what degree do users accept the appearance of the transformed web pages? 4.6.1 Subject Applications I use seven open source Java-based web applications to evaluate my approach, including applications that have been used in related work, to ensure a broad representation of implementation styles. These applica- tions are implemented using different web application frameworks, include colors defined by HTML and CSS, and employ a varying amount of JavaScript in their user interfaces. Details of each of these apps are shown in Table 4.1. For each subject, the column labeled “Framework” shows the underlying web framework for which the application was implemented. Frameworks included in my study are JSP, a very popular web application framework for Java based web application; Servlet, which describes applications that directly use the Java Enterprise Edition (JEE) framework with no intermediate framework; Struts, a very widely used library and framework for web applications; and, Velocity and Turbine, two popular template based frameworks for developing web applications. The column labeled “SLOC” shows the number of source lines of code in Java for each web application. For applications that are written in JSP, I converted the JSP code into Java, using the Tomcat Jasper compiler, and counted the resulting SLOC. My subject applications also represent varying levels of CSS and Javascript usage. JavaLibrary, Portal, and Bookstore define their styles in HTML directly, while ClassRoom, Roller, Scarab, and jForum use CSS to define their style. ClassRoom, Portal, and Bookstore do not make heavy use of JavaScript, while JavaLibrary, Roller, Scarab, and jForum do. Three of the applications, Roller, Scarab, and jForum, use the Model-View-Controller (MVC) architectural style. All applications, except Bookstore and Portal, are publicly available from their project web pages. Bookstore and Portal have been widely used in related work [66, 65] and are available via the SQL Injection Application Testbed [15]. 63 4.6.2 Implementation I implemented my approach in a prototype tool, Nyx. To generate the HOG, I leveraged Soot [19] to build the underlying call graphs, control flow graphs and the Jasmin representation for Java classes. For representing the FSAs of each string in the HOG and HARG, I used the BRICS automaton library [112]. As mentioned earlier, to build the required string analyses, I implemented the concatenation, replacement and widening operations from Yu and colleagues’ method [160] and combined them with the BRICS automaton library. I also built an automaton parser for the BRICS library to get the tag name, CSS ID, class name, and color information of HTML tags. I used the SAC CSS parser [9] to identify colors from CSS files. For the Output Modification phase, I used BCEL [28] to modify Java classes and Perl script to modify colors in the CSS files. My implementation handles HTML 4 and CSS 2, but it is straightforward to extend my tool to support HTML 5 and CSS 3. 4.6.3 RQ1: Time Cost To address the first research question, I ran Nyx on all of the subject applications and measured the ex- ecution time. The results are shown in Table 4.1. I separated the runtime into four different parts. The first is the time spent loading all of the Java classes, parsing templates, and building call graphs. This time is shown under the column labeled “Load.” The second is the time spent building the HOG, HARG, and CCG. This time is shown in the column labeled “Analyze.” The third is the time spent in calculating the CTS, which is shown under the column labeled “Transform.” Finally, the rewriting time is shown in the column “Rewrite.” All results were run on a DELL XPS 8100 desktop running Linux Mint 14 with an Intel Core i7@3GHz processor and 8GB memory. Each timing result reported was the average runtime of 10 executions. As the results show, overall it takes less than three minutes to analyze and transform each subject application. Most of the time cost is incurred in either the Load or Transform time periods. For the apps with a high loading time, most of this time was spent by Soot in building the call graphs of the application. Roller, jForum, and Scarab have very small Load times because my approach can build the HOG directly by parsing the templates files instead of analyzing Java classes. Roller, jForum, and Scarab also have a very small analysis cost since the string analysis for templates is much simpler than for Java classes. The length of the Transform time was highly dependent on the structure of the web pages generated in the application. For more complex pages with many colors it took longer to generate a new color scheme. 64 4.6.4 RQ2: Energy Saving To evaluate the energy savings of my approach, I deployed the original and transformed subject web applications on a Tomcat web server. I then accessed both versions of the web application using a Samsung Galaxy II smartphone and measured the energy/power consumption of the phone using the Monsoon Power Meter [17]. There are two distinct energy/power phases when a mobile phone visits a web application. The first phase is “Loading and Rendering”, in which the browser loads the contents of the web page and renders them on the screen. The second phase is “Display” in which the mobile phone has finished loading and just displays the web page contents on the screen. A key distinction is that the potential time for Display is unbounded, it ends when the user closes the browser or moves to another page. In contrast, the time for the Loading is bounded. Therefore, for fairness, I measure the energy consumed during the Loading and Rendering phase and the power draw during the Display phase. This is more fair than simply measuring the energy of both phases, since it is possible to inflate energy savings by allowing the Display phase to continue for an extended period of time. (Recall that energy = power * time.) To differentiate these phases, I leveraged the energy and power measurements provided by the Mon- soon Power Meter. Key to doing this was understanding what happens on a smartphone during these different phases. In the Loading and Rendering phase, multiple components in the smartphone, such as CPU, memory, WIFI, 4G network, and the screen, are busy. The Loading Phase has a limited time span, it starts at the point that the browser sends the request to the server and ends when the browser finishes rendering the contents of the web page to the screen. In contrast, during the Display phase, all components except the screen of the smartphone are in the idle state. Therefore, I can figure out the start and end times of the Loading and Rendering Phase by observing the power state of the mobile phone in the power meter. The start point of the Loading and Rendering phase is the point when the phone switches to the high power state and the end point is when the phone switches back to the low power state. The start point of the Display phase is the end point of the Loading and Rendering phase. For both phases, I took measurements of the original and transformed web application 10 times and reported the average percentage decrease in the columns of Table 4.1 labeled “Loading” and “Display.” On average, there was a 25% decrease in energy consumption during the Loading and Rendering phase and 40% less power consumed during the Display phase for the transformed web applications. Overall, these are strong results and show that my approach can result in significant energy savings for smartphone users. Of interest to us was the fact that energy decreased during the Loading and Rendering phase. This was puzzling since the transformation did not change the size of the pages in any meaningful way. In my investigation, I learned that in order to speed the display of the web pages, the smartphone begins to display 65 parts of the screen, such as background color, as soon as possible. Therefore, there was energy consumed by the screen even during the Loading and Rendering phase. This difference became more significant during the Display phase, when only the screen was actively drawing energy. Also, I investigated the lower savings incurred by the Roller app. I found that Roller only covers about 60% of the screen. Because of this, my approach can only change colors for 60% of the screen used by the web application, to save energy. The other 40% is left as white, which is the default color of the web browsers. This suggests that an easy to achieve optimization would be to change the default background color of mobile browsers to black. 0 2 4 6 8 10 Bookstore Portal JavaLibrary ClassRoom Roller Scarab jForum Execution Time (s.) Original Transformed Figure 4.8: Overhead of Nyx 4.6.5 RQ3: Runtime Overhead The Modifier introduces additional operations into the web application; namely, rewriting the color at- tribute of HTML strings. Therefore, I am interested in measuring the runtime incurred via this operation. For the experiment, the server was a Core i7@2.8GHz desktop with 8GB of RAM running Linux kernel 3.8 and Tomcat 6. The smartphone was a Samsung Galaxy II running Android 4.0 and connected to the server via wireless. To calculate the overhead, I compared the average time of the Loading and Rendering phase. I used the time for this phase as it represents the time that users need to wait before they can see the contents of the web application. I measured this time on the server and client side over ten executions of the experiment for RQ2. 66 The results of this experiment are shown in Figure 4.8. The results show that, on average, the trans- formed versions take 2.4% more time than the original. However, as can be seen in the figure, sometimes the transformed application is faster. Even for applications where I only transformed CSS files, I saw sim- ilar differences. I investigated this further by checking the results across different executions. My results indicated that average loading time for all versions was about 7 seconds and even the same version of an application would routinely vary by up to 1.2 seconds with a standard deviation of about 5.6%. From this data I concluded that variations in the wireless signal were likely dominating any variation introduced by my modification overhead. To eliminate interference from the wireless, I also measured execution time just on the server side. On average the server side increase was 34ms, which represented about a 22% increase. However, the actual distribution was bi-modal with an average of a 75% increase for apps whose code was modified as opposed to almost 0% for those with only template changes. This result is fairly intuitive, as any modification to a template based web application did not require much additional runtime overhead and in cases where runtime transformations were required, there were relatively few of these operations. Figure 4.9: Before/after screenshots 67 Table 4.2: Result for user study for display energy optimization Attractiveness Readability Name Original Transformed Original Transformed Preference(%) Bookstore 6.5 4.2 7.6 5.9 24 Portal 6.9 5.3 7.5 5.6 18 JavaLibrary 6.7 6.9 7.0 6.4 29 ClassRoom 6.8 6.4 7.2 7.1 59 Roller 7.0 6.5 6.9 5.5 24 Scarab 7.4 5.4 6.9 6.5 18 jForum 7.0 5.4 7.0 5.4 12 0% 20% 40% 60% 80% 100% Bookstore Portal ClassRoom JavaLibrary Scarab Roller jForum General Use Battery Low Battery Critical Never Figure 4.10: Acceptance rate of the transformed web application 4.6.6 RQ4: User Acceptance To address the fourth research question, I conducted an end-user survey in which users were asked to compare and rate the appearance of the original and transformed web applications. The survey group was 20 M.S. and Ph.D. students at the University of Southern California who were enrolled in the third author’s graduate level testing and analysis class. The students were asked to complete an anonymous online survey on their own time and no incentives were provided to the students to complete the survey. No background on the research project was given to the students and no connection of the work to the third author was suggested. The survey presented users with a series of before/after screenshots of the seven subject applications. An example for the Bookstore application is shown in Figure 4.9. For each image, the survey group was asked to rate the attractiveness and readability of each version on a scale of 1 to 10, with 10 being the 68 highest. The users were then asked which version they would prefer to use. Finally, the last question asked if the black background version could save them X% energy, at what battery level would they choose to use it. For each app, X was replaced by the energy savings of the Display phase. The available responses were “Always–regardless of battery level,” “Most of the time,” “Only when the battery level is low,” “Only when the battery level is critical,” and “Never.” The wording of the questions and forms are available via the project web page [73]. I received 17 responses to the survey. The results are shown in Table 4.2 and Figure 4.10. In Table 4.2, the columns Attractiveness and Readability report the related scores of both the original version and trans- formed version. The column Preference reports the percentage of users who prefer the transformed web application over the original one. In Figure 4.10, I report when users would choose to use the transformed web application. For space reasons, I merged the option “Always–regardless of battery level,” and “Most of the time” into “General Use.” The bars show the different time points. The y-axis is the percentage of users who would switch to the transformed version at each time point. For attractiveness, the original app received an average score of 6.9 and the transformed app a score of 5.7, an average decrease of about 17%. This indicates that generally the users thought the color scheme of the original apps were more visually appealing than the transformed version, but only by a relatively small difference. In fact, for one app, JavaLibrary, users found the transformed version to be more attractive. For readability, the original apps received an average score of 7.1 and the transformed apps a score of 6.1, an average decrease of about 14%. In general, as I examined the per app results in more detail, I noticed that applications whose screenshots contained a higher amount of transformed images, Bookstore, Portal, and jForum, received significantly lower scores. I hypothesize that my rather crude transformation of image colors, which neglects shadows and gradients, impacted this score significantly. Transformed images, in general, were not as clear or readable as their original versions. In future work, I plan to explore improved image processing techniques for transforming image colors. For user preference, it was clear that users preferred the original version based on visual appearance and usability alone. On average, over 73% of the users preferred the original application. However, when asked to consider the energy savings, there was a dramatic shift in user preference. On average 67% of the users chose to use the transformed version for general usage if it could save them X% of energy during display. Overall, more than 97% of users chose to switch to the transformed version before the battery became critically low. Overall, I consider the results for user acceptance to be positive. Although users rated the attractiveness and readability lower for the transformed apps, when made aware of the energy savings, they overwhelm- ingly preferred to use the transformed application. 69 4.7 Threat to Validity External Validity: In my measurement, I manually created a workload for each web app. This workload may not represent an actual workload that end users have in their daily usage. To alleviate this threat, I tried to create the workload as thorough as possible. I evaluated the user acceptance with a questionnaire. To avoid bias, I randomly invited people outside of my lab group. The participants in my study had different backgrounds, genders, and preferences for different colors. I believe they are representative of general end users. Internal Validity: I evaluated the whole system energy saving of each screenshot by measuring and comparing the energy consumption of the transformed versions to the original versions of benchmarks. In this process, the power meter can introduce random measurement errors that may affect the evaluation result. To eliminate the effect of random measurement errors, I repeated the measurement for each of my benchmarks ten times and calculated the average energy savings. In my questionnaire, users evaluated the screenshots from mobile devices on their laptops, which has a larger screen. Users may have a different perception of the screenshots on the actual mobile devices. However, with a larger screen, users may tend to give worse reviews to the transformed versions since it is easier to notice small problems in the transformed versions which are not noticeable on a small screen. 70 Chapter 5 HTTP Energy Optimization According to my empirical study, network communication is one of the primary energy consuming oper- ations in mobile apps. On average, network communications can consume over 40% of the total non-idle state energy of an app. Among all kinds of network operations, those related to HTTP are the most energy consuming ones, representing almost 80% of all network related energy consumption [90]. Therefore, re- ducing HTTP related energy consumption can have a significant impact on the overall energy consumption of an app and improve the overall user experience by increasing the underlying device’s battery life. Traditionally, optimizing network communication energy usage has been seen as a hardware or OS level concern. However, over the past couple of years, a growing body of work has begun to investigate ways to optimize network energy consumption at the application layer. One of the more promising ideas has focused on ways to “bundle” HTTP requests – combining multiple small HTTP requests into a larger HTTP request. This has been very successful in the web browser domain for improving performance [33, 62, 117]. In this domain, bundling has targeted multiple parallel asynchronous HTTP requests, such as AJAX calls, to reduce the average network latency of HTTP requests. However, these automated bundling techniques developed for web browsers cannot be used to reduce network energy consumption for mobile apps because they only target asynchronous HTTP requests, which are common in AJAX based web apps, but are far less common in mobile apps. My recent work has shown that bundling HTTP requests can also decrease energy consumption for mobile apps [89], but did not provide any way to automatically detect when apps should have their requests bundled nor define a way to automate the bundling process. In this chapter, I present a comprehensive approach for detecting when certain types of HTTP requests can be bundled and for rewriting the app so that the bundling can happen at runtime. My approach can be roughly broken down into three phases: detection, bundling analysis, and runtime optimization. In the first phase, I employ static analysis based techniques to identify HTTP requests that can be safely bundled together. In the second phase, I use string analysis to identify relationships between requests in bundles and configure a set of bundling rules. These rules are then used in the third phase, at runtime, by a proxy 71 server, which detects the start of sequences that can be bundled and uses the bundling rules to carry out the optimization. I performed an extensive empirical evaluation of my technique and ran it against a set of real-world mobile apps. In my experiments I found that my technique could reduce, on average, 15% of the apps’ total energy consumption. My approach was also fast, it analyzed and optimized each mobile app in under ninety seconds and did not impose any runtime overhead on the execution of the apps. Lastly, I ran my detection algorithms on 7,878 marketplace apps and found that 4.6% of the apps with more than one HTTP API invocation could potentially be optimized by my technique. This implies that over 40,000 apps in the Google Play app store could benefit from using my technique. Overall these results are very positive and indicate that my approach can significantly decrease energy consumption for a large number of mobile apps. 5.1 Background The process of making an HTTP request consumes a large amount of energy due to the underlying oper- ations such a request entails. HTTP is part of a multi-layer network protocol stack, which includes TCP, IP, and various hardware level protocols. When an HTTP request is sent, it is necessary to encapsulate it in packets of lower level protocols. This process often involves calculating checksums, copying data, and referencing data buffers. Many such operations result in high energy consumption for even a single HTTP request. HTTP also requires a significant overhead of extra data and messages to be sent when it makes a request. In other words, the size of the HTTP data sent is not the only data transmission cost. The overhead of an HTTP request comes in three forms. First, each packet must contain a set of headers. Although each one of these is small by itself, in total, a typical HTTP packet may contain anywhere from 200B to 2KB worth of headers. Second, the establishment or disconnection of each HTTP connection requires a multi- message handshake protocol at the underlying TCP layer. To establish a connection requires a 3-way TCP handshake and to disconnect requires a 4-way handshake protocol. For both handshake protocols, the packets are typically empty so no useful HTTP data is sent. Third, each HTTP API request also has tail energy, which is independent from the size of the request. Tail energy occurs when the system keeps the network radio in the active state after an HTTP request is finished. This is typically done to attempt to reduce the high energy overhead of starting and shutting down the wireless radio. Although seemingly small, the overhead of an HTTP request can have a significant impact on its energy efficiency. In prior work [88], I found that downloading one byte of data via HTTP consumed the same amount of energy as downloading 1,024 bytes of data, and downloading 10,000 bytes of data only 72 consumed twice the amount of energy as downloading 1,000 bytes of data. Compounding the problem is that many modern apps only need to send small amounts of information to the server per request. For example, a previous study found that 75% of non-video requests were below 10K bytes [75]. These insights motivate my decision to focus on HTTP optimization and, in particular, my decision to focus on reducing the amount of unnecessary HTTP connections. 5.2 Overview of the Approach for HTTP Energy Optimization The goal of my approach is to reduce the number of HTTP requests made by a mobile app. To do this I developed an approach to detect and bundle HTTP requests that can be made together. More specifically, my approach first detects SHRSs, which are sequences of HTTP requests in which generation of the first request implies that the following requests will also be made, and then merges these requests into one request. An example of an SHRS is shown in Program 3. Here, h 1 , h 2 , and h 3 represent an SHRS since after the generation of the first request, the other two will always be executed. My approach attempts to detect such situations and rewrite the client side code to combine the requests, where possible, so that there are overall less HTTP requests. My approach can be roughly described as having three phases. In the first phase, SHRS detection, my approach uses static analysis to identify all of the SHRSs in an Application Under Test (AUT). Once these are identified, the second phase, Bundling Analysis, analyzes the SHRSs to generate code that, at runtime, will be executed to facilitate the bundling of the HTTP responses of an SHRS. The third phase, optimization, occurs at runtime. In this phase, a proxy intercepts the HTTP requests and runs the bundling code to return all of the corresponding SHRSs’ responses. I now explain each of these phases in more detail. 5.3 SHRS Detection The first phase of my approach is responsible for detecting the SHRSs in an AUT. The input to the phase is the AUT and the output is the set of identified SHRSs. To perform the detection, I first define an intra-procedural static analysis to identify SHRSs within a method, and then use per-method summaries to perform the analysis inter-procedurally. 5.3.1 Definition of an SHRS I define an SHRS as a sequence of HTTP or HTTPS API invocations, S=hh 1 ;h 2 :::h n i that satisfy the following conditions: 73 1 p u b l i c void main ( ) 2 f 3 URL u r l 0 , u r l 6 ; 4 / / i n i t i a l i z e t h e s e s s i o n 5 u r l 0 = new URL("http://init" ) ; 6 u r l C o n n e c t i o n 0 = u r l 0 . openConnection ( ) ; 7 P a r s e ( u r l C o n n e c t i o n 0 . g e t I n p u t S t r e a m ( ) ) ; / / h0 8 p r i n t h t m l ( G e t C i t y ( ) ) ; 9 / / c l o s e t h e s e s s i o n 10 u r l 6 = new URL("http:/close" ) ; 11 u r l C o n n e c t i o n 6 = u r l 6 . openConnection ( ) ; 12 P a r s e ( u r l C o n n e c t i o n 6 . g e t I n p u t S t r e a m ( ) ) ; / / h6 13 g 14 p u b l i c void p r i n t h t m l ( S t r i n g c i t y ) 15 f 16 URL u r l 1 , u r l 2 , u r l 3 , u r l 4 , u r l 5 ; 17 URLConnection u r l C o n n e c t i o n 1 , u r l C o n n e c t i o n 2 , u r l C o n n e c t i o n 3 ; 18 / / query c u r r e n t weather 19 u r l 1 = new URL("http://weather?city="+ c i t y ) ; 20 u r l C o n n e c t i o n 1 = u r l 1 . openConnection ( ) ; 21 P a r s e ( u r l C o n n e c t i o n 1 . g e t I n p u t S t r e a m ( ) ) ; / / h1 22 / / query weather f o r c a s t 23 u r l 2 = new URL("http://daily?city="+ c i t y ) ; 24 u r l C o n n e c t i o n 2 = u r l 2 . openConnection ( ) ; 25 P a r s e ( u r l C o n n e c t i o n 2 . g e t I n p u t S t r e a m ( ) ) ; / / h2 26 / / query l o c a t i o n i n f o 27 u r l 3 = new URL("http://location?city="+ c i t y ) ; 28 u r l C o n n e c t i o n 3 = u r l 3 . openConnection ( ) ; 29 P a r s e ( u r l C o n n e c t i o n 3 . g e t I n p u t S t r e a m ( ) ) ; / / h3 30 / / query t h e i n f o r m a t i o n about t h e c i t y 31 i f ( Cond ( ) ) 32 f 33 u r l 4 = new URL("http://information?city="+ c i t y ) ; 34 u r l C o n n e c t i o n 4 = u r l 4 . openConnection ( ) ; 35 P a r s e ( u r l C o n n e c t i o n 4 . g e t I n p u t S t r e a m ( ) ) ; / / h4 36 g 37 / / g e t t h e r a t e o f t h e c i t y 38 u r l 5 = new URL("http://rate?city="+ c i t y ) ; 39 u r l C o n n e c t i o n 5 = u r l 5 . openConnection ( ) ; 40 P a r s e ( u r l C o n n e c t i o n 5 . g e t I n p u t S t r e a m ( ) ) ; / / h5 41 g Program 3: Example code containing SHRSs. 1. For any h i and h j , if i< j, h i is post dominated by h j in the app’s Control-Flow Graph (CFG). 2. For any h i and h j where i< j, if there is another HTTP API invocation h 0 on a path from h i to h j in the CFG, then h 0 2 S. The first condition guarantees that the execution of the first HTTP API invocation in S implies that the remaining invocations will be executed sequentially. Referring to the example SHRS of h 1 , h 2 , and h 3 in method print html of Program 3, this represents the relationship that executing h 1 implies h 2 will also be executed. The second condition ensures that the execution of HTTP APIs in an SHRS will maintain the original server-side state transitions when the requests are bundled. To illustrate, consider the print html method in Program 3. Assume that each h n causes the server side state to be s n . If line 31 is true, then the server-side state transition will be s 1 ! s 2 ! s 3 ! s 4 ! s 5 . If line 31 is false, then the transitions are s 1 ! s 2 ! s 3 ! s 5 . Without the second condition, an SHRS containing h 1 ;h 2 ;h 3 ;h 5 could be identified. If these requests were bundled and line 31 was true, the server-side would transition s 1 ! s 2 ! s 3 ! s 5 ! s 4 . Here the transition s 5 ! s 4 is incorrect. 74 Note that my definition of an SHRS includes both HTTP and HTTPS request invocations. As I explain in Section 5.5.2, as long as the proxy is configured with the correct cryptographic key, approach can properly handle HTTPS as well as HTTP traffic. 5.3.2 Intra-procedural Analysis Figure 5.1: Post dominator tree for Program 3. I defined an intra-procedural analysis to detect SHRSs within a method. The input of the analysis is a method m of the AUT and the output T m is the set of SHRSs defined within m. My approach begins by generating the post dominator tree, P, of m. The approach then creates a projection of P, which I call P 0 , that contains only nodes that make HTTP API invocations and edges that represent these invocations’ post dominance relationships. Figure 5.1 shows the projection of the post dominator tree for method print html in Program 3. The approach then identifies the SHRSs by analyzing P 0 . To do this, the approach identifies maximal sequences of consecutive nodes in P 0 in which an edge enters at the beginning of the sequence and ex- its at the end without the possibility of branching except at the end. These sequences are analogous to basic blocks in control-flow graphs, but are defined over the projection of the post dominator tree. For the purpose of defining these sequences, I consider all calls/invocations to be non-branching and ignore exceptional control-flow. The sequences identified by this analysis are returned as T m , the SHRSs in the AUT. Figure 5.1 shows the SHRSs identified for Program 3,hh 1 ;h 2 ;h 3 i,hh 4 i, andhh 5 i, in dotted boxes. These are denoted as SHRS 1 , SHRS 2 , and SHRS 3 , respectively. Although SHRS 2 and SHRS 3 are of size one and cannot be optimized based on intra-procedural information, they may be part of an SHRS defined inter-procedurally, so they are still tracked. The sequences identified in this analysis satisfy both of the SHRS conditions. First, since all HTTP API invocations are in a sequence in the post dominator tree, they maintain the first condition. Second, since a sequence of nodes (i.e., a basic block) contains no branches except at the last node, it guarantees that all nodes along the sequence are not interrupted by other HTTP requests on the CFG. 75 5.3.3 Inter-Procedural Analysis My approach also analyzes the AUT to identify SHRSs that are defined inter-procedurally. An example of such an SHRS ishh 5 ;h 6 i. To perform the analysis inter-procedurally, I extend my intra-procedural analysis to use and generate per-method summaries. I analyze all methods of the AUT in reverse topological order with respect to the AUT’s Call Graph (CG). This ensures that a method m i is summarized before the approach analyzes another method m j that calls m i . Cycles in the CG are handled by merging the individual methods’ CFGs and treating them as one method. The summary for each method m in the AUT represents the sub-tree of P 0 that represents SHRSs that post dominate the entry of m. The approach uses these SHRSs as the summary because they are guaranteed to be executed when m is invoked. This sub-tree, which I call P 00 , is identified by analyzing P 0 , which was identified in the intra-procedural analysis. The approach identifies all SHRSs in P 0 on the path from the entry node to the exit node of m, which, by definition, are the SHRSs that post dominate the entry node. Referring to Figure 5.1, this is SHRS 1 and SHRS 3 . h0 SHRS1 SHRS3 h6 SHRS0 SHRS4 Figure 5.2: Summary of main in Program 3. To use the summaries, the approach (1) adds a preliminary step to the intra-procedural analysis and (2) modifies the sequence definition. First, before generating P, the approach replaces all invocations to summarized methods with the target methods’ summary. The replacement process for an invocation i first connects the predecessor of i to the entry node of the summary and the exit node of the summary to the successor of i, then removes i from m’s CFG. Next, the replacement process reverses all edges within the summary and marks them as having come from the summary. Second, the definition of a sequence is modified as follows. No two nodes may be present in a sequence if they are joined by an edge that came from a summary. For all other purposes the nodes from the summary are treated as instances of an atomic HTTP API invocation. The purpose of these specially marked edges is to prevent SHRSs, such as s 3 and s 1 , from being joined together once they are used in the calling method’s context. Figure 5.2 shows the result of the summary substitution on the CFG of themain method. Here the summary edge is shown as a dashed line and the two SHRSs that will be identified by the intra-procedural analysis are shown as dotted boxes. 76 Once the analysis has finished analyzing the root method in the AUT’s CG, it takes the union of the T m for all methods m in the AUT. Then it expands all summary nodes (e.g., SHRS 3 ) to their constituent HTTP API invocations. For the example in Program 3, the reported set of SHRSs is: fhh 0 ;h 1 ;h 2 ;h 3 i;hh 5 ;h 6 ig 5.4 Bundling Analysis In the second phase of my approach, the Analyzer examines each SHRS to calculate the needed bundling information and rewrites HTTP API invocations so they will use the response bundles. The inputs to the Analyzer are the AUT and the SHRSs identified in the detection phase. The output is a Bundler for each SHRS in the AUT and AUT 0 , which is the original AUT transformed to carry out the bundlings at runtime. A Bundler is a function that decides which HTTP requests should be bundled for the SHRS and is invoked at runtime by the Proxy. The Bundler has two components, the Tester and the Operator. The Tester is a set of regular expression patterns that match the URL of the first request in an SHRS. The Operator calculates which HTTP requests should have their responses bundled and is called when the Tester matches a request. The Tester and Operator are separated so that the Proxy does not have to call the more heavyweight Operator every time a new request arrives. The Analyzer first rewrites the HTTP API invocations in the AUT and then analyzes the information provided as arguments to the invocation to generate both components of the Bundler. I explain details of the underlying analyses in the rest of this section. 5.4.1 String Analysis An HTTP request is composed of a target URL, a set of headers, and parameters. HTTP is a string based protocol, so if the developer provides these values, then they are provided as strings or via functions that are ultimately mapped to strings. Therefore, to identify the components of the HTTP requests, I developed the string analysis tool, Violist. The detail of Vioslist will be introduced in Chapter 6. Violist uses a two-phased analysis. First it generates an Intermediate Representation (IR) of string operations for a given string variables at a point in an AUT, and then it applies custom built interpreters to the IR to generate a model of the possible string values it can have at runtime. The Analyzer needs two different types of string interpretations, which can be easily handled by Violist. I summarize these below, but more details are provided in Chapter 6. The first type of interpreter, which I call the Safe Interpreter (SI), is one that provides a safe approxi- mation of a string variable’s possible values. This means that the interpreter generates models that could 77 be described as an over-approximation of the variable’s possible values. The SI handles loops by using theWiden operation proposed by Yu and colleagues [160] to generate a safe Finite State Automata (FSA) based model of the possibly infinite string values in a loop. The SI models any substring that cannot be resolved statically, such as user input or files, as any string (i.e., “.*”). The second type of interpreter, which I call the Concrete Interpreter (CI), provides precise approxima- tions that may not necessarily be safe. For example, the CI unravels loops assuming an upper bound n on the loop’s iterations. As with the SI, the CI also identifies substrings that cannot be resolved statically, such as user input or files, but represents these using placeholders with unique IDs that correspond to the variables that could not be resolved. These unique IDs facilitate later comparisons of requests at runtime by the Operator (Section 5.4.4) 5.4.2 Rewriting HTTP API Invocations The first step of the analyzer is to rewrite the AUT so that all HTTP API invocations can make use of the bundling invocation. This is done by replacing each HTTP or HTTPS API invocation in the AUT with an Agent HTTP API (AHA), which is a wrapper for the original invocation. The AHA is a static method call that takes the same parameters as its wrapped invocation, along with a Call Site ID (CSID) that uniquely identifies each original invocation, and returns the same type of response object as the wrapped invocation. This design ensures that there is no impact of the optimization on other network related functionality. At runtime, the AHA sends requests to the Proxy instead of the original server and manages the unpacking and distribution of the bundled responses. More details on the runtime behavior of the AUT 0 are given in Section 5.5.1. 5.4.3 Generating the Tester To generate the regular expressions for the Tester, the approach analyzes the arguments for the first invo- cation in each SHRS. Due to the definition of the Android network API, string values are not supplied directly to the APIs that make the network requests. Instead, the string values are used to initialize URL, header, and data objects that are then used as arguments for the network request. To address this the Ana- lyzer uses standard alias analysis techniques [113] to identify the allocation sites for the objects provided as arguments to the network requests. Then once the allocation or initialization site for the object has been found, the Analyzer uses the SI to solve for the possible string values. For example, at line 21 of Program 3, UrlConnection:getInputStream is the API that makes HTTP requests, but its parameter is the object urlConnection2 rather than the string. The approach uses the alias analysis to figure out that the URL for the HTTP request at line 21 is defined at line 19, and then performs the string analysis on the 78 variables at line 19. After that my approach uses the generated regular expression as the Tester. A safe approximation is preferred instead of a precise one so that the Tester can automatically generate a pattern that will be guaranteed to recognize the incoming request. Since the models produced by the SI are FSA based, it is straightforward to convert them to regular expressions. 5.4.4 Generating the Operator To generate the bundling code of the Operator, the Analyzer identifies the values of the arguments to the remaining invocations in the SHRS. The identified requests will be sent by the Proxy when it receives the first request in an SHRS. This process is analogous to the analysis done for the Tester, but uses the CI to solve for the string values. The Analyzer uses the CI because only SHRSs for which all requests can be solved precisely will be optimizable. An over-approximation is undesirable because it means the Proxy will return additional spurious responses, reducing energy savings. The Operator can solve requests precisely in two cases, which I explain below. For four of the five apps used in the evaluation, these two cases were sufficient for all of the SHRSs. Case 1 - Constant HTTP Request: An HTTP request is constant when there is only one possible value for the request information. Note that any of these values may itself be defined by expressions (e.g., concatenation) over multiple constant substrings, as Violist is able to evaluate string operations. In general, string values defined via loops or unknown string data (e.g., files) are not constant. If a request is constant, then the CI can precisely identify the string representation of the request and this is saved by the Operator. Case 2 - Decisive Semi-Constant HTTP Request: An HTTP request is semi-constant if values of the request information, such as the URL and parameters, are simple combinations of constants and unknown variables, such as user input. Once the values of all unknown variables are provided, there is no ambiguity in the request that the HTTP API invocation can make. For example, the five HTTP APIs in the method print html of Program 3 are semi-constant since their values are known once the value of the unknown variable city is provided. I say a semi-constant HTTP API is decisive if its unknown variables are the same as the unknown variables of the beginning invocation in its SHRS. For example, h 2 and h 3 are decisive semi-constant requests because they have the same unknown variable, city, as the beginning invocation, h 1 , in their SHRS. In my approach, the Analyzer first detects decisive semi-constant HTTP APIs in an SHRS with the following steps. First, the Analyzer finds all non-beginning HTTP requests in an SHRSs that are semi- constant. This is done by parsing the IR generated by Violist. For an HTTP API, if the IR shows that the variables that represent its URL, fields in headers, and parameters have no branches and loops, the HTTP API is semi-constant. I use this rule to check semi-constant HTTP APIs because the ambiguity of the URL, headers, and parameters of an HTTP API can only be from branches, loops and unknown variables. 79 Second, for each of the semi-constant HTTP APIs in SHRSs, I compare its unknown variables against the unknown variables in the beginning invocation of its SHRSs. If all the unknown variables are also in the beginning invocation, the request is decisive. At runtime, the Operator generates the concrete value of the decisive semi-constant requests by parsing the beginning request of the SHRS. The Operator does this by using regular expressions to retrieve the values for the unknown variables and substituting them into the subsequent requests. I take the SHRS of h 1 , h 2 , and h 3 in Program 3 as an example. For the conciseness of the paper, I ignore the headers and parameters of the HTTP APIs since they are all constant in this case. In the first step, my approach checks if h 2 and h 3 are semi-constant HTTP APIs since they are not the beginning HTTP API invocations of any SHRS. The approach analyzes h 2 and h 3 and finds that their URLs contain the unknown variable, city, but this part is not defined in branches or loops. Thus, both of them are semi- constant HTTP APIs. In the second step, the approach compares the unknown variable, city, in h 2 and h 3 against the unknown variables in h 1 , which is the beginning invocation in the SHRS of h 2 and h 3 , and finds that they are the same. So, the approach determines that h 2 and h 3 are decisive semi-constant HTTP requests. Finally, the Analyzer generates code for the Operator that uses the regular expression http://weather?city=(.*) to retrieve the value at runtime of the variable city and puts this value into the request of h 2 and h 3 . All Other Cases: For the requests that are not constant or decisive semi-constant, the Analyzer does not generate bundling code directly. Instead, it records the regular expressions describing the re- quest information along with relevant relationships between the variables. Namely, which ones are the same in different requests. The Analyzer generates these patterns using the same technique described in Section 5.4.3. The expressions and relationship information is then provided to the developer who may manually specify the form of the requests to be made and define bundles of requests. Although for these cases, Bouquet cannot avoid requiring manual effort, it can still substantially reduce the amount of manual effort needed for detecting SHRSs. Operator Customization: My approach allows developers to further “tune up” the generated Op- erator with domain specific knowledge. This tune up is useful to verify the correctness of the generated bundling rules and to achieve specific trade-offs during the bundling. For example, developers may want to avoid bundling an SHRS if they think the bundling could introduce a noticeable latency to a certain critical HTTP request in an SHRS. My approach exposes the rules of the generated Operator to developers and the tune up can be done by directly adjusting the rules and generated code. 80 Figure 5.3: Runtime workflow of Bouquet 5.5 Runtime Optimization The third, and final, phase occurs when the AUT 0 is executed. At runtime, the AHAs inserted into the AUT 0 redirect all requests to a Proxy, which then uses the Bundlers to determine the requests that should be bundled together. The Proxy sends these requests to the server, bundles the responses, and returns this bundle to the AUT 0 , where they are unpacked and managed by the AHAs. This runtime workflow is shown in Figure 5.3. In this figure, the dotted lines show the old workflow and the solid lines show the new workflow. 5.5.1 The The Agent HTTP APIs The role of the AHA is to carry out the client-side operations of the optimization process while hiding the details of the optimization from the application. When an AHA in the AUT 0 is invoked it carries out two steps. First, the AHA checks if the response for the current request is already cached locally. If so, this means that the response was already retrieved by a previous HTTP request in the same SHRS and the AHA directly returns the cached response. If not, the AHA generates a bundle request and sends it to the Proxy. A bundle request is an HTTP request that is identical to the original request but also contains two special header fields to identify the originating CSID and the URL of the original request. Second, when the AHA receives the bundled responses, it unpacks the bundle, returns the response for the originating request, and then caches the responses for the remaining requests in the SHRS. 81 5.5.2 The Proxy The Proxy carries out the server-side part of the optimization process. At runtime the Proxy receives bundle requests, identifies which responses should be bundled together, and returns these to the client-side. To identify the responses that should be bundled together, the Proxy passes the URL and CSID contained in the bundle request’s headers to the testers defined by the Proxy’s Bundlers. If a tester determines that the request is the first part of an SHRS, then the corresponding operator is used to identify the request information (i.e., URLs and parameters) of the subsequent invocations in the SHRS. These requests are then sent, in sequence, to the app’s server. The Proxy then bundles these responses and returns them to the client where they are unpacked and managed by the AHAs as described in the previous section. The Proxy is an independent software process that should be be installed on the same machine or in the same local network as the server. This is not a requirement for correct functionality, but ensures a low latency connection to the original server and avoids any significant slow-down due to network delays. I do not expect that this type of deployment would be difficult since the owners or developers of the AUT generally have control over the deployment of the app’s server. Also to properly handle HTTPS, the Proxy also needs to use the same cryptographic signature as the original server. Note that there is no requirement for the hardware to run the Proxy. In Bouquet, the Proxy is just a process that can run on any machines. For app developers, the simplest way to deploy the Proxy is to run it on the same machine as their web services. Thus, the Proxy can work independently but there is no extra cost for developers to deploy the Proxy. 5.5.3 Maintaining Server Side States My approach has to maintain the server-side states’ transitions during HTTP bundling. One challenge is to ensure the order of HTTP API invocations in SHRSs maintain the server-side state transitions. This is addressed by my definition of SHRSs, which requires that requests in an SHRS must be guaranteed to execute if the first one executes (i.e., they post dominate the first request.) Another challenge is to distinguish HTTP API invocations that have the same request information, but from different HTTP API invocations. I need to address the second challenge because two HTTP API invocations may have different responses if they have the same request information but different server side states. My approach addresses the second challenge by associating a unique CSID to each HTTP API invocation, which will be passed between the client side and the server side. Whenever my approach needs to query the response for a certain HTTP API invocation (i.e., from the cache) in an SHRS, it always uses the CSID along with the request information. 82 The server may impose global states for all clients. For example, two independent clients may modify a global counter of total active clients at the same time. In this case, bundling HTTP APIs in SHRSs may change the order of requests across different clients and get unexpected server side global states. My approach does not impose any constraints in the case of global server side states because the order of requests across different clients are generally not guaranteed in most mobile apps. Even if I were to not bundle HTTP requests in an SHRSs, the requests could still return unexpected global states. Furthermore, if the order of requests from different clients is important, developers can still choose to not bundle the requests to avoid potential mistakes. 5.5.4 Handling Exceptions Exceptions may interrupt the execution of SHRSs and introduce two challenges to my approach, down- loading responses for unreachable HTTP APIs and introducing unexpected server site states. I take an example to explain these two challenges. Suppose there is an exception at line 27 of Program 3. In this case, h 3 will not be executed in the SHRS. However, if I bundle the SHRS of h 1 , h 2 , and h 3 at the location of h 1 , I will have two issues. First, the energy consumed by retrieving the response of h 3 is wasted. Sec- ond, the expected server side state should have transition as s 1 ! s 2 , but it actually has the transition as s 1 ! s 2 ! s 3 since h 3 is also requested together with h 1 . To address these two challenges, I provide two options for developers. The first option, the greedy option, is to ignore the potential exception flow and bundle the SHRSs despite the possibility of them being interrupted by exceptions. This option is applicable for cases in which exceptions rarely happen and there is no hard restriction on the server side states. With the greedy option, my approach can still optimize the energy for SHRSs in most cases. The second option, which is the safe option, avoids bundling SHRSs that may be interrupted by excep- tions. This option should be used if the server side states need to be strictly maintained when there is an exception. By using the second option, my approach is safe and avoids any unexpected server side states and energy consumption of retrieving responses for unreached code, yet it may miss some opportunities to bundle larger SHRSs. 5.6 Evaluation for HTTP Energy Optimization I evaluated my approach to determine how well it could perform in practice. In my evaluation, I consid- ered: energy savings, required developer effort for cases where the analysis could not be fully automated, analysis time, runtime overhead, and the prevalence of SHRSs in marketplace apps. My research questions are listed below: 83 RQ 1: How much energy could be saved by using Bouquet? RQ 2: How much manual effort is needed to augment the Bundler? RQ 3: How long is the analysis time of Bouquet? RQ 4: How much runtime overhead is introduced by Bouquet? RQ 5: What is prevalence rate of SHRSs in marketplace apps? 5.6.1 Implementation My approach is implemented as a tool, Bouquet, that automatically detects SHRSs, generates code to bundle HTTP requests, and rewrites the AUTs. Bouquet works for Android apps. I chose Android because it is a widely used open source system, but my approach is also applicable for other platforms, such as Windows Phones and iOS, since the mechanism of making HTTP requests is similar on these platforms. To implement the AUT Instrumenter, I used the apktool [157] to unpack Android apps, dex2jar [3] to convert Dalvik bytecode of Android apps to Java bytecodes, and the BCEL [28] library to replace the HTTP API invocations with the AHAs. To implement the SHRS Detector and the Analyzer, I leveraged the Soot [87] framework to build analysis data structures, such as the control flow graph and the call graph. The Proxy and the Bundler in Bouquet were implemented as a Nodejs server app. I used the express framework [11] to handle the incoming HTTP requests in the Proxy. In my experiments, I deployed the Proxy and the mock-server on the same machine, which was a DELL XPS 8100 desktop running Linux Mint 14 with an Intel Core i7@3GHz processor and 8GB memory. The machine was connected to a local WIFI router which was linked to the school network of the University of Southern California. The platform on which I ran my subject apps was a Samsung Galaxy S5 smartphone, which was also connected to the same WIFI router of the server. 5.6.2 Subject Apps To measure the energy savings of Bouquet, I selected five apps that contained SHRSs. To ensure the representativeness of my subjects, I selected apps that used the two most common APIs for making HTTP requests in Android, which are URLConnection and HTTPClient, and both the GET and POST methods of HTTP requests. These five apps were selected from the 7,878 Google Play marketplace apps that I analyzed for RQ5. The descriptions of my selected apps are in Table 5.1, where #Bytecode represents the size of an app in terms of the number of Java bytecodes, API represents which APIs are used to make HTTP requests in the app, and Method means which HTTP method is used by the SHRSs in the app. 84 Table 5.1: Subject apps for HTTP energy optimization App #Bytecode API Method Generated Provided bobWeather 22,517 URLConnection GET 88 20 LIRR 4,408 HttpClient POST 88 0 Tapjoy 84,963 URLConnection GET 28 0 ALJA 279,114 HttpClient GET 24 0 PCH 216,842 URLConnection GET 28 0 5.6.3 RQ 1: Energy Saving To answer this research question, I used Bouquet to optimize my subject apps. Then I measured and com- pared the energy consumption of both the optimized and unoptimized versions. To have a full view of the energy saving of my approach, I measured two types of energy savings: the whole app energy savings and the SHRSs’ energy savings. For the whole app energy saving, I compared the energy consumption between the unoptimized and the optimized version of each app. For the SHRSs energy savings, I compared the energy of the optimized and unoptimized HTTP API invocations in each SHRS. 5.6.3.1 Protocol I measured the energy savings of Bouquet when running on the five subject apps. My approach required us to deploy the Proxy in the same network domain as the servers of the apps. However, I did not have access to the server-side since my subjects were not open source apps. To solve this problem, I built a mock-server to mimic the behavior of the servers of my subject apps, and redirected the HTTP requests of my subject apps to the mock-server. The mock-server was a Nodejs based server that accepted the HTTP requests from my subject apps and replied with previously recorded responses that were collected using a capture replay technique, Reran [61]. In my experiment, I executed my apps with the minimal workload that could trigger each of the bundled HTTP requests once. I also captured and replayed this workload with Reran to ensure that both versions of the app were run with the same interactions and timing. This enabled us to avoid any variation in the measurements due to unstable or inconsistent interactions across the versions. I handled the exceptions with the first option described in Section 5.5.4. The energy savings of Bouquet depend on the underlying network bandwidth and delay. To measure the energy savings of Bouquet under realistic network conditions, I used a simulator, NEWT [14], devel- oped by Microsoft to simulate the network conditions of WIFI (15M bandwidth, 27ms delay), LTE (6.2M bandwidth, 96ms delay), and 3G (2M bandwidth, 147ms delay) networks. The bandwidth and delay of my LTE and 3G networks were the average values of the four major mobile carriers in the US (ATT, T-Mobile, 85 Sprint, and Verizon) [16, 18]. Unlike LTE and 3G network, which have a uniform quality of network ser- vice for all customers, the bandwidth and delay of the WIFI network is highly dependent on the customers’ Internet plan. In my evaluation, the bandwidth and delay of WIFI were profiled from the standard plan of Time Warner Cable [23], with a popular network speed tester [21]. The energy consumption was collected with my previous technique, vLens [91], with the Monsoon [17] power meter. I used the vLens technique because it is able to isolate the energy consumption of individual HTTP API invocations from other parts of a subject app’s execution and eliminate idle state energy, such as the energy consumed by waiting for user input. Note that eliminating idle state energy was important to have an accurate energy measurement for Android apps. The idle energy was not consumed by the the app itself but by the operating system. Including idle state energy in the total energy measurement could introduce significant inaccuracies in the measurements [90]. To account for random measurement error, each of my subject apps was executed five times, enough to achieve statistical significance, and the average number was taken as the final result. I also conducted a t-test on the null hypothesis: “the energy consumption of an unoptimized version is not larger than the optimized version,” which was rejected. I could use the t-test because the random measurement error followed a normal distribution in independent measurements. Finally, I manually verified each execution to confirm that the optimized version of the AUT behaved the same as the unoptimized version. To verify this I monitored each workload and verified that the two versions of the app performed the same functions and returned the same answers. 5.6.3.2 Result Energy savings at the whole app level are shown in Figure 5.4 and the SHRSs level energy savings are shown in Figure 5.5. On average, my approach achieved energy savings of 21%, 15%, and 8% at the whole app level for WIFI, LTE, and 3G network, respectively. For the SHRSs level energy savings, the average value for WIFI, LTE, and 3G were 37%, 43%, and 35% respectively. The p-values were 0.049 and 0.0005 for the whole app level energy savings and the SHRS level energy savings, respectively. 5.6.3.3 Discussion The results showed that my approach was able to achieve significant energy savings across all three types of networks and at both the whole app and the SHRS level. The savings at the whole app level is particularly revealing because it emphasizes not only the energy savings that an end user could expect to see but also that these individual HTTP invocations represent a significant source of energy consumption for a typical app. So much so that the significant local reductions in energy consumption (i.e., for an SHRS) translate into large app level savings as well. 86 Figure 5.4: Energy savings of Bouquet at the whole-app level. Figure 5.5: Energy savings of Bouquet at the SHRS level. 87 One interesting pattern in my results was that the energy savings for WIFI were higher than those for LTE and 3G at the whole app level. However, the same trend was not observed at the SHRS level. This was interesting because it indicated that, under my configuration of WIFI, the ratio of the energy consumed by HTTP requests to the total energy of the whole app was higher. I hypothesized this was because my WIFI configuration had the fastest network bandwidth, which was not as energy efficient as slower network configurations for small HTTP requests. In my experiments, the size of my HTTP requests were all between one to three kilobytes. Using a very large bandwidth to retrieve such a small amount of data may not reduce the response time significantly but force the network to consume much more power. Thus, having a very fast network may in fact increase the ratio of HTTP energy to the total energy. Another pattern in my study is that LTE has a higher energy savings than 3G in most cases. This is because a significant portion of the energy savings of Bouquet are from the reduction in tail energy and LTE has a larger tail energy consumption than 3G. 5.6.4 RQ 2: Manual Effort In general my approach is fully automated and does not require any developer interaction except to confirm the identified optimizations. In most cases, developers using my approach will only need to check the completeness of the rules generated by the Analyzer. In some cases though, as described in Section 5.4.4, an optimization cannot be fully automated and the developers need to complete the generated code with the assistance of the generated comments. As an approximate measure of developer effort, I calculated how many lines and comments were generated by the Analyzer and how many lines needed to be written by the developers. I report these metrics because the Bundlers were the only parts of the process that would require manual intervention by the developer. The number of generated lines and comments reflects the effort needed to learn and understand the Bundlers, and the number of lines needed to be written reflects the required effort to create the Bundlers. To measure how many lines needed to be written, I manually completed the bundling rules with my understanding of the test cases and reported the number of lines I had written for each test case. The result is shown in the last two column of Table 5.1. The “Generated” column reports the lines of code and comments that Bouquet generated. The “Provided” column represents the lines of code that had to be created by us to complete the Bundlers. The language for these two columns was JavaScript of Nodejs. Note that the manual effort is to modify the generated code for the proxy, there is no modification required for the apps with my approach. On average, Bouquet generated 51 lines of code and comments for each test case. The number of lines of written code was zero for four test cases and 20 for bobWeather. This means that for 4 out of 5 of my test cases, developers only needed to understand the generated code, which was less than 88 lines, to ensure the 88 Table 5.2: Analysis time of HTTP energy optimization App (s) Loading (s) Analysis (s) Convert (S) Rewrite (s) Total (s) bobWeather 1.4 4.7 29.4 2.5 38.0 LIRR 1.1 1.2 6.7 1.0 10.0 Tapjoy 2.2 5.3 20.1 7.5 35.1 ALJA 25.1 4.4 43.3 12.4 85.2 PCH 10.4 7.0 36.9 12.2 66.5 correctness of the generated Bundlers. For bobWeather, I found that its string values that represented the URLs of each HTTP API invocation in the SHRSs had branches. Since my Analyzer cannot predict which branch would be taken during runtime, it could not automatically generate the code to bundle the SHRSs. However, the code I wrote for bobWeather was straightforward. It consisted of several regular expression operations without any loops or branches. Therefore I believe that the required amount of manual work for developers is reasonable. 5.6.5 RQ 3: Analysis Time To answer this research question, I evaluated how much time was consumed by Bouquet to detect SHRSs, generate the optimized version of each subject app, and generate the Bundler. To calculate this time, I measured the execution time of each phase during the measurements for RQ1 and then reported the average execution time. The result of this measurement is shown in Table 5.2. The unit for all numbers is seconds. The “Loading” column is the time consumed by the Soot framework to load Android apps. The “Analysis” column is the time consumed by Bouquet to detect SHRSs and generate bundling rules. The “Convert” column is the time used to convert Dalvik bytecodes to Java bytecodes with dex2jar. The “Rewrite” column is the time consumed by Bouquet to replace the original HTTP API invocations in Android SDK with the AHAs. The “Total” column is the sum of all other columns. On average, the total analysis time for my five subjects was 47 seconds. On average 61% of the total time was spent converting Dalvik bytecode to Java bytecode through dex2jar. Nevertheless, all of my test cases could be analyzed and instrumented in less than 1.5 minutes. These results suggest that the analysis time of my approach would not be a barrier to its acceptance by developers. 5.6.6 RQ 4: Runtime Overhead In my approach, runtime overhead may be introduced by the AHAs. Compared to the original HTTP APIs, the AHAs may take longer to process the responses of HTTP requests, since they need to pack and unpack 89 the bundled responses of the HTTP requests. It is important to know how much runtime overhead this may introduce. To answer this research question, I measured the total response time of all of the HTTP API invocations in the unoptimized app and all of the AHAs in the optimized app. I only measured the total response time of HTTP API invocations or AHAs in SHRSs because AHAs were the only sources of additional runtime overhead. I conducted the experiment with the same protocol as in Section 5.6.3. For each subject app, I reported the ratio of the total HTTP response time of SHRSs in the optimized version to the same metric for the unoptimized version. The result is shown in Figure 5.6. Figure 5.6: The runtime overhead introduced by Bouquet. On average, the HTTP response time of SHRSs in the optimized versions was 61% of the time in the unoptimized versions. The average standard deviation of my measurement was 13%. The p-values were all below 0.01. This result was counter-intuitive since it showed that the AHAs did not introduce any extra runtime overhead, but, in fact, reduced runtime overhead for the subject apps. I studied this result and found that even though using AHAs required extra time to process the bundled responses, more time was saved due to HTTP request bundling. In my approach, bundling HTTP requests reduced the amount of data that was transmitted through the network so that it also reduced the time consumed to get the HTTP responses. In general, most of the time spent making HTTP requests was consumed by the network transmission instead of processing the responses. Thus, in my experiment, the time saved in network transmission easily dominated the extra processing time for response. 90 5.6.7 RQ 5: Pervasiveness of SHRSs To answer this research question, I carried out an empirical study on a large set of market apps to see how many of them contained SHRSs. I collected 7,878 market apps from the Google Play market, representing 23 different categories, by using the Google Play Crawler [4]. The sizes of my apps varied from several hundreds of bytes to 50 Megabytes. I used the SHRS Detector of Bouquet to parse the 7,878 downloaded apps and reported how many of them had SHRSs. I found that there were 206 Android apps containing SHRSs. This was 2.6% of the entire app pool that I downloaded. Note that for many of the apps it was not possible to have SHRSs because they had zero or just one HTTP API invocation. By excluding the apps which had less than two HTTP invocations, the percentage increased to 4.2%. As reported in a commercial report [22], the number of apps in the Google Play Store was above 1.6 million as of July 2015. 2.6% represents more than 40,000 apps that could contain SHRSs. Note that I did not manually verify each result, so these results only indicate an upper bound on the number of apps that could be optimized. Overall though, these results show that there are potentially many SHRSs in real world apps and thus the impact of my tool could be high. 5.7 Threats To Validity External Validity: The energy saving of Bouquet for a particular app depends on how many SHRSs are contained and how often the SHRSs are invoked. In our evaluation, I measured the energy savings for five apps with the minimum workload to trigger all SHRSs. These apps and workloads may not cover all the possible cases in reality. The energy savings for other apps may be different from the numbers reported in this chapter. To alleviate this threat, I selected realistic apps from the market. These apps had different sizes, from 4,408 to 279,114 bytecodes, and covered general patterns of the usage of HTTP APIs, which were: both common methods of HTTP requests (POST and GET) and common APIs that can make HTTP requests. In general, I believe that my selected apps represent common usages of HTTP APIs in market apps. 91 The energy savings on the application level in my approach heavily depend on the network condition. To avoid bias introduced by any one particular network, I used a popular network emulator, NEWT [14], to simulate the common network speed and delay for WIFI, LTE, and 3G network. These network configu- rations were either set to the average value in the US or profiled from the standard network package of one of the common network service providers in the US. Therefore, I believe that the network configuration represents typical network conditions for smartphone users. The energy savings of Bouquet are also related to the size of each HTTP request. Namely, the larger that HTTP requests are, the less energy saving we will have. However, since 68% of HTTP requests are smaller than 0.5KB, I expect the energy savings measured in my study could be extended to more general cases [97]. Internal Validity: The accuracy of my approach is guaranteed by the definition of SHRSs, which is based on the post domination relationship and “basic blocks” on the post dominator tree. Since the definition of an SHRS is purely based on the static information of the AUT , my approach can accurately detect each SHRS. However, SHRSs are only a subset of HTTP APIs that could be optimized, other HTTP APIs, such as HTTP APIs across different event handlers, may be also optimized with more sophisticated bundling techniques. However, these HTTP APIs are not defined as SHRSs and are not addressed in my approach. I will work on these other HTTP APIs that can be optimized in future work. The energy measurement in my evaluation also depended on the workload for each app. For example, a longer workload can have a larger energy consumption. To have a fair comparison between the unopti- mized and optimized versions of apps, I used Reran [61] to record and replay the identical workload on both the unoptimized and optimized version. The energy measurement in my study may be affected by random factors, such as network variation and measurement error. To alleviate the impact of random factors, I repeated my measurement for five times and calculated the average value. To ensure the soundness of my conclusion about energy saving, I also calculated the statistical significance. Construct Validity: In Section 5.6.4, I used the lines of code that developers needed to read and write to approximate the manual workload of developers. However, these metrics may not fully reflect the real workload required for app developers to create the bundling code. For example, the lines of code cannot reflect the effort of app developers to gain enough understanding and knowledge about the HTTP requests in the AUT before creating the bundling code. However, my evaluation can still show that Bouquet can reduce the amount of workload to bundle HTTP requests for two reasons. First, four of five apps were automatically optimized. Second, for the only app that needed manual assistance, the only manual effort required is to specify the connections between different URLs in the SHRSs. The Bouquet can still save the workload to detect SHRSs. Furthermore, as the users of my approach are the developers of the AUT, 92 it is reasonable to assume that they would have the some expertise about the AUT. Thus, I believe that the workload reduction of Bouquet will not be affected by this threat of validity. 93 Chapter 6 String Analysis String analysis is a fundamental technique for my display energy and HTTP energy optimization tech- niques. It is used to model the output of mobile web apps in display energy optimization and the potential URLs of HTTP requests in HTTP energy optimization. The effectiveness of the string analysis technique can significantly affect the performance of the optimization techniques. Accurately analyzing strings is a significant challenge due to several issues: (1) how to analyze string values generated in (possibly nested) loops that may include loop carried data dependencies, (2) context- sensitivity for strings manipulated and composed inter-procedurally, (3) flexibility to attach different se- mantics to string operations, and (4) scalability to handle long strings or strings manipulated using a complex series of operations. Existing techniques that use string analysis have mainly side-stepped these challenges by cleverly leveraging aspects of their problem domain to simplify the required string analysis. The result is these key challenges for the analysis of strings have not been adequately addressed. Existing work related to string analysis can be broadly described as either performing string analysis in support of another software engineering goal or directly providing a string analysis technique. Approaches that fall into the first category include those for approximating HTML output (e.g., [116, 67]), computing possible SQL queries (e.g., [68, 154, 155, 59]), and identifying messages passed in Distributed Event- Based (DEB) systems [60]. These approaches have often not needed to address key challenges in string analysis due to aspects of their problem domains. For example, to optimize display energy [94], it was sufficient to assume that loops were unraveled once. The result of leveraging these domain-specific insights is that the developed techniques are not generalizable, since assumptions that work for one problem domain may not be appropriate for others. This has implications for the research community as each group that wants to develop techniques that need string analysis must essentially start from scratch. More generalizable techniques have also been proposed (e.g., [45, 160]). However, these have trade- offs in how they handle the four challenges, which makes them less broadly applicable. For example, JSA [45] uses global alias analysis to model inter-procedural manipulation of strings, which, as I show 94 in Section 6.3, leads to scalability problems when analyzing Android applications that include extensive invocations of framework APIs. Yu and colleagues [160] proposed a FSA based Widen operation, which can partially solve the first challenge, but is not able to handle nested loops. Symbolic execution based techniques can more precisely address challenges 1, 2, and 3. However, these techniques may not scale for large programs and may make simplifying assumptions about the strings under analysis. The goal of the work presented in this chapter is to present a general framework for string analysis that allows researchers to more flexibly choose how they will address each of the four challenges. My key insight into how to do this is to separate the representation and interpretation of string operations. My string analysis framework, Violist, defines an IR that faithfully represents the string operations performed on an application’s string variables. Violist’s IR is designed to accurately capture complex data-flow dependencies in loops and context-sensitive call site information. Violist also allows for the straightforward integration of IR interpreters that can implement the user’s desired interpretation of the string construction semantics. For example, it is straightforward to write an interpreter that will unravel loops once, n times, or approximate an upper bound for infinite unraveling. Finally, Violist can easily scale up and analyze large programs. To evaluate the usefulness and effectiveness of my framework, I carried out an extensive empirical evaluation. For the evaluation, I implemented two different IR interpreters and used these to compare the accuracy and scalability of Violist against the popular Java String Analyzer (JSA) [45]. For this evaluation, I used a mixture of publicly available benchmarks, systematically created test cases to mimic different data and control flows, and real-world Java and Android applications. My results showed that Violist is able to generate results that, on average, are more precise than JSA while maintaining the same level of recall. Furthermore, Violist was able to generate these results for a wider range of applications and to do this faster than JSA. 6.1 Motivation for String Analysis In this section, I provide a motivating example (Program 4) to illustrate the four challenges mentioned in the previous section. 6.1.1 Loops Consider a string analysis that targets the variable c at line 19 (c 19 ). This variable is redefined in each iteration of the loop at lines 17–21. Although the upper bound on the loop can be trivially identified via inspection, in general, it is challenging for a static string analysis to accurately account for the loop’s upper bound. Therefore, many techniques (e.g., [45]) simply assume the loop will be executed an infinite number 95 1 c l a s s Example f 2 p u b l i c s t a t i c S t r i n g addA ( S t r i n g v ) 3 f 4 return v+"A" ; 5 g 6 p u b l i c s t a t i c S t r i n g r e p l a c e A ( S t r i n g v ) 7 f 8 return v . r e p l a c e A l l ("A" ,"B" ) ; 9 g 10 p u b l i c s t a t i c void main ( S t r i n g [ ] args ) f 11 S t r i n g a="a" ; 12 S t r i n g b=addA ( a ) ; 13 b=addA ("b" ) ; 14 System . o u t . p r i n t l n ( b ) ; 15 S t r i n g c="A" ; 16 S t r i n g f ="" ; 17 f o r ( i n t i =0; i<3; i ++) 18 f 19 c=c+"A" ; 20 f = f +c ; 21 g 22 c= r e p l a c e A ( c ) ; 23 c= r e p l a c e A ("AAAA" ) ; 24 System . o u t . p r i n t l n ( c ) ; 25 S t r i n g d="" ; 26 S t r i n g e="" ; 27 f o r ( i n t i =0; i<3; i ++) 28 f 29 d=d+e ; 30 System . o u t . p r i n t l n ( d ) ; 31 f o r ( i n t j =0; j<1; j ++) 32 f 33 e=e+"b" ; 34 g 35 g 36 g 37 g Program 4: Example program of times or unrolled only once (e.g., [116, 94]). Even techniques that do model loops are generally unable to model the relationship of nested loops, an example of which is shown at lines 27–35. Nested loops are more challenging to analyze, because it is necessary to model the relationship of the strings in the inner and outer loops (e.g., d and e). Techniques based on the Widen approach for approximating loops ([160, 44, 27]) would not be able to handle this scenario. Many flow-based techniques would simply model this as one large loop, which is safe, but reduces precision. 6.1.2 Context-Sensitivity Next, consider a string analysis that includes string manipulations that are carried out inter-procedurally. An example of this is in method main at lines 12 and 13, which calls the method addA. Many string analyses handle this sort of invocations without any call site context sensitivity (e.g., [160, 45]). This means that when analyzing the values at lines 12 and 13, these analyses will assume that the return value can be based on the arguments provided at any call sites to addA. Using the example, this means that the value of b at lines 12 and 13 would be approximated asf“aA”;“bA”g instead of “aA” for line 12 and “bA” for line 13. As with handling of loops, this approach is safe, but loses precision as extra possible 96 string values would be returned as a possible value of the variable b. The reason this occurs is that the analyses use a global data flow analysis, which is, in turn, based on the call graph of the application. This representation of the inter-procedural control-flow results in a lack of context-sensitivity. Techniques that use symbolic execution do not face this problem, but may require SMT problem solving and need to analyze all possible paths of a program, which can make it difficult to scale the techniques for large programs. 6.1.3 Flexible Semantics A general limitation of many string based approaches is that they are tightly tied to one specific method of interpreting string values. For example, JSA [45] approximates loops as having an infinite upper bound, both D-model [116] and the string analysis underpinning Nyx assume loops are unraveled for a fixed number of times. Beyond that many string analyses are highly customized to make approximations in ways that are appropriate for their problem domains, but that limit their general applicability. For researchers and software engineers who are attempting to leverage string analysis, it is generally necessary to develop their own string analysis to be used for their project. This can be a significant barrier to entry and to the success of the project. 6.1.4 Scalability Lastly, symbolic execution techniques could be used to address the context-sensitive problem. However, these techniques may not scale easily for large programs. To improve scalability, symbolic execution techniques may assume strings are bounded in length (e.g., no longer than 8 characters), which limits the techniques’ general applicability, or model strings as sequences of characters, which cannot adequately represent the semantics of certain string operations, such asreplaceAll. 6.2 Approach of String Analysis My approach provides a general framework for string analysis. This framework allows for the develop- ment of customized string analyses that vary in terms of recall and precision in how they handle loops, context sensitivity, and string operation semantics. To build this framework, I designed an approach that separates the representation of the string operations from their interpretation. This separation allows all string analyses to share a common underlying string model, yet attaches different semantics to the modeled instructions. My approach can be defined as having two general phases. In the first, the approach builds a 97 model of the string operations in the Program Under Test (PUT), and then in the second, applies a custom interpreter to the model that implements the desired string operation semantics. To model the string operations, I define an IR that captures the data flow dependencies between string variables and string operations. The IR can be computed for any code that can be represented in a Static Single Assignment (SSA) form, for which translations exist for most mainstream language (e.g., Java, Dalvik, and PHP). In addition, the IR also includes operations that allow it to represent strings defined externally to the block of code (e.g., method parameters) and data dependencies caused by loops. I define the details of the IR in more depth in Section 6.2.1. In the first phase, my approach analyzes the PUT and computes an IR-based summary for each of its methods. Within each method, the approach uses a region-based analysis to identify code enclosed by (possibly nested) loops and then specially models the data dependencies caused by loops. In the sec- ond phase, my approach translates the IR-based summaries into a string representation. To do this, my framework allows a user to supply an interpreter of their choice that implements the desired semantics with respect to string operations, loops, and context sensitivity. As part of the evaluation, I implemented two such interpreters, one that models the strings as FSAs (as is done in JSA [45]) and the second that carries out an n-bounded loop unraveling for all loops in the PUT. The interpreters also allow the users to leverage additional analyses, such as alias analysis, that can more precisely identify loop upper bounds, include domain-specific knowledge, or resolve other constraints. 6.2.1 The Intermediate Representation I define an IR that represents the control and data flow relationships among the string variables and string operations in a program. The goal of the IR is to precisely model these relationships while deferring any sort of interpretation or approximation of these relationships until the second phase. My IR specifically targets control and data flow relationships that have traditionally complicated the modeling and interpreta- tion of strings. Namely, strings defined external to the analysis scope, strings constructed within (possibly nested) loops, and inter-procedural string manipulation. IR is structured as a tree with the leaf nodes defining either string constants or placeholders for un- known variables. The internal nodes in the IR tree are string expressions that represent the values of string variables in the PUT. For explanatory purposes in the paper, I will represent expressions in the IR tree as(op a 1 ;a 2 ;:::::), where op is the operation of the expression and a i represents the various arguments to op. I write the definition of a variable in the form of v l , where v is the source-code based name of the variable and l is the line number of the definition. To illustrate, line 13 of Program 4 is represented as b 13 =(addA “b”). 98 In cases where a variable is defined outside of the scope of the analysis (e.g., method parameters), my approach leaves placeholders in the IR. A placeholder variable is denoted by the subscript notation of “X n ” where X is a fixed symbol and n is a number, unique within the analysis scope, identifying the external variable. For example, consider line 4 of Program 4, the IR for this line is (+ v X4 “A”). These placeholders are preserved until they can be resolved. For example, in this case the IR of line 4 becomes the method summary for addA. Whenever a call site for addA is encountered, say at line 13, the argument can be provided for the placeholders. Placeholders also allow additional analysis to be employed by the interpreters to resolve strings that may originate from files or database queries. My approach defines several IR operators to model the effects of loops. The first of these ist, which is used to represent string variables whose values are partly defined via a loop carried data dependency. The form of this operator is(t T v:r (exp)). Here v is the name of the variable defined in the loop; r is the ID of the loop (my approach numbers each loop region, as explained in Section 6.2.3); T represents the number of the loop iterations, with T = 0 denoting the initial value of a variable in a loop and T = 1 denoting the value of a variable after one iteration; and exp is the IR expression that contains v in the loop. Note that exp may contain additional t expressions, which enables nested loops to be easily modeled. Within exp, the defined variable v is represented using the symbols T 0 v:r . To illustrate, consider the variable e 33 in Program 4. Equation 6.1 shows the value of e 33 for the k th iteration. (t k e 33 :R2 (+s k1 e 33 :R2 “b”)) (6.1) This equation shows that the value of e at line 33 is equal to the concatenation of its value from the previous iteration and the constant string ”b”. Here, the identifier “R2” refers to the loop region ID, which I will explain in more detail in Section 6.2.3. The final loop related operator isf. Thef expression has the same format as thet expression, but de- notes a dependency between self-referring variables in the same loop instead of a nested loop dependency. Here, I define “self-referring variable” as a variable that defines itself, at least in part, via a loop carried dependency. Thef expressions are always sub-expressions of thet expressions. When af expression is used, it represents that the variable of the t expression depends on the variable of the f expression and both of them are from the same loop. To make this clear, I take the example at line 19 and line 20 in Program 4. In my example, both c 19 and f 20 are self-referring variables from the same loop and c 19 is used to define f 20 . The IR of f 20 is shown in Program 6.2, where the underlined portion refers to the value of c 19 , which shows how the value of f 20 99 depends on the value of c 19 . In this expression, the superscript of 0 means the value of the variable in the current iteration andn means the value n iterations prior. f 20 =(t 0 f 20 :R3 (+s 1 f 20 :R3 (+(f 1 c 19 :R3 (+s 2 c 19 :R3 “A”)) “A”))) (6.2) I introduce thef symbol because the dependency between self-referring variables in the same loop is different from the dependency due to nested loops. I compare the value of f 20 in Program 6.2 to the IR of d 29 in Program 6.3. In the IR of d 29 , the underlined portion refers to the value of e 33 , which is generated in a loop different from that of d 29 . On the contrary, in the IR of f 20 , the underlined portion refers to c 19 , which is a variable from the same loop of f 20 . d 29 =(t 0 d 29 :R1 (+s 1 d 29 :R1 (t 1 e 33 :R2 (+s 2 e 33 :R2 “e”)))) (6.3) 6.2.2 Interpretation of the Intermediate Representation The interpretation step converts the IR of string variables into a representation of strings. My framework allows users to provide an interpreter that translates the IR as needed for different analysis problem. To do this, a user would need to implement two things: (1) the string model for each string operation (e.g., append andtrim); and (2) a Widen and Converge function for the t expressions. Here, Widen is used to generalize thet expression on each iteration to make the string values converge quicker. For example, one instance of Widen proposed by Yu and colleagues [160] can generalize string setf“a”;“aa”;“aaa”g to the regular expression a+. Converge is used to judge when the approach should terminate iterating over thet expression. The general process for IR interpretation is as follows. First, my approach uses the target string models to represent all leaf nodes in the IR that are string constants and external inputs. Second, my approach calculates the value of each non-leaf node using a postorder traversal of the IR tree. During this calculation, my approach uses the specified string operation semantics to calculate the value of each IR operation. Third, when the approach encounters at expression, it iterates over the expression to calculate the resulting string value. I will describe this process in more detail in the next paragraph. After each iteration, my approach calls the Widen expression to generalize the string model created in the current iteration, and then the Converge method to check if it should stop iterating over thet expression. My approach may iterate several times over eacht expression, depending on the Widen and Converge operations. In each iteration while interpreting the t expression (t T v:r (exp)), where exp contains at least ones k v:r , my approach first increases the counter T for thet expression and alls v:r . After this, my approach sets the value ofs k v:r as the value of(t k v:r (exp)) generated in the kth iteration. Then, it calculates the value 100 of exp, calls the Widen operation to calculate the result of current iteration. This value will be recorded as the value ofs in future iterations. During each iteration, if my approach encounters a nestedt expression, it will iterate over this expres- sion until the value converges using the processes I described in the prior paragraph. If it encounters a f expression, the approach updates the counters for f and its s, and calculates the value with a process similar to that of interpreting the t expression. The only difference is that my approach does not iterate over the f operation several times. It only calculates the value of f once without calling the Widen and Converge. I do this becausef represents another variable in the same loop so its value can only be calcu- lated once in each iteration. The resulting values of thef expression will also be recorded as the values of future correspondings operators. 6.2.3 Getting the Intermediate Representation In this section, I describe how my approach generates the IR for a given PUT. As inputs, my approach requires the Call Graph (CG) of the PUT and the SSA representation of each method in the PUT. In general, most mainstream languages (e.g., Java, Dalvik, and PHP) have analyses available that can provide both of these elements. Given these inputs, my approach analyzes each method in reverse topological order with respect to the PUT’s CG. For each method, the approach then identifies the Region Tree (RT) and builds the IR for each region in this tree in a bottom up fashion. After the approach finishes analyzing a method, the resulting IR serves as its summary and is reused whenever another method calls the summarized method. 6.2.3.1 Intra-procedural Analysis The approach begins with an intra-procedural analysis to calculate the IR based summary for each method in the PUT. The first step of this analysis is to identify the bodies of loops and their relationship to other loops. This information is needed in order to use the loop modeling operands defined by the IR. To identify loops, my approach uses a standard analysis to identify regions in the method’s CFG [26]. The regions of a method can be represented as a RT in which nested regions are shown as children of their parent regions and the root node of the tree is the method body. Figure 6.1 shows the RT for an excerpt of the code of Program 4. In this figure, R0 represents the method body ofmain and R1–R3 are loops in the method. After the approach builds the RT, the next step is to generate the IR for each region. The approach an- alyzes the RT using a post-order traversal (i.e., starting with the leaf nodes) so that the results of analyzing nested regions can be incorporated into the analysis of the parent regions. In general, there are two types 101 Figure 6.1: Region tree of string analysis for Program 4 102 of regions that the approach analyzes. The first type is Method Body Regions (MBRs), which represent the main body of a method (e.g., R0), and the second type is Loop Body Regions (LBRs), which are the regions that represent the bodies of loops (e.g., R1). I now explain how the approach analyzes each of these region types. Algorithm 2 Getting the intra-procedural intermediate representation Input: A region R Output: The IR of string variables in R 1: External all External Variables 2: for all Instruction S in R in topological order do 3: if S defines a variable v then 4: Represent S in the form of the IR as v=(op a 1 a 2 :::) 5: for all a i 2 Par(S) do 6: if a i has been solved and a i 62 External then 7: Replace a i with its IR 8: end if 9: end for 10: end if 11: end for 6.2.3.2 General Region Processing Regardless of their types, all regions are first processed by Algorithm 2. The input to Algorithm 2 is the SSA representation of the code in a region R. The output of Algorithm 2 is a basic IR transformation of R that does not consider the effect of loops. The first step of Algorithm 2 is to identify variables whose definitions occur outside of the region (line 1). These variables can be identified in a straightforward way, since the code is in SSA form, by checking if they are defined in the current region. All other variables are considered to be internal variables. The approach will leave placeholders for external variables until their definitions are located in a containing region. An example of an external variable is the use of c on the righthand side of line 19 in R3. After finding all external variables, the approach then iterates over each instruction in R in topological order with respect to R’s CFG. For each instruction i in which the left hand side defines an internal variable v, the approach converts i into an IR operand of the form (op a 1 a 2 :::) as shown in Section 6.2.1. Then the approach iterates over all arguments for op, which is Par(S). If any argument a i has previously been defined on the lefthand side of an instruction in the region, then the righthand side of that definition is substituted for the a i . This process is continued until there are no more such previously defined internal variables present in the IR form of i. Then the approach repeats this for each of the remaining i2 R. Note that since the effects of loops are not considered at this time and the instructions are processed in topological order, the transformation converges on a fixed point after one iteration over the instructions in R. 103 Example: To illustrate Algorithm 2, first consider line 19 of R3 in Figure 6.1. The IR for this line is (+ c 19 “A”). Therefore, the IR for line 20 in R3, would be(+ f 20 (+ c 19 “A”)). Note that due to the SSA transformation,(+ c 15 “A”) could also be a definition here. However, I omit these SSA based definitions from the explanation for simplicity, since my code example is not shown in SSA form. 6.2.3.3 Processing Loop Regions The next step is to analyze all LBRs and more precisely model the effects of loops in the region. This is performed by the algorithm shown as Algorithm 3. The input to Algorithm 3 is the IR of an LBR that has already been processed by Algorithm 2. The output is an IR representation of the LBR that has been adjusted to model the effects of loops. The IR of an LBR can be incorporated into its parent loop or a MBR. Algorithm 3 can be thought of as having three general phases. The first of these (lines 1–10) rewrites all string variables within the LBR so that they have counters attached to them. These counters are then used in the second phase to identify dependencies between loop iterations. The second phase (lines 11– 39) iterates over the instructions in the LBR and replaces string variables defined in the loop with values updated for each iteration. Loop-carried data dependencies are identified and replaced with the t and s operators so that the definitions of the string variables can converge on a fixed point. Finally, in the third phase (lines 40–48), the analysis replaces certain t operators with the f operators. I now explain these three phases in more detail below. In the first phase (lines 1–10), my approach assigns and initializes a counter to all region-internal string variables. The analysis iterates over each instruction i2 LBR and if it is of the form v=(op a 1 a 2 ::: a m ::: a n ) (i.e., a definition) and v is an internal variable then the analysis performs the following transformations. The variable v is rewritten to be v 0 and each of the arguments a k is rewritten as a 1 k . The intuition behind this transformation is that the superscripts show that the variable on the lefthand side is defined by the values of the righthand side variables that come from the previous (i.e.,-1) iteration. Note that since Algo- rithm 2 has already propagated all values forward, all righthand side variables were defined in the previous iteration or, in the base case, external to the LBR. To illustrate the first phase, I will build on the example from Section 6.2.3.2. Here for line 20 of Pro- gram 4, the IR is f 20 =(+ f 20 (+ c 19 “A”)). The transformed version of this IR with counters inserted and initialized would be f 0 20 =(+ f 1 20 (+ c 1 19 “A”)). Note that, after the transformation done by Algorithm 2, the f 20 on the righthand side refers to the value of f from the previous loop’s iteration (or in the base case, the value provided at line 16.) In the second phase (lines 11–39), the approach iterates over each instruction, propagating values and updating counters until the IR of the instructions converges on a fixed point. The approach begins this 104 phase by iterating over each instruction that is of the form v 0 =(op a 1 a 2 ::: a n ) (i.e., a definition). At lines 15–19, the approach adds all variables on the righthand side that are region-internal variables to a worklist Q. Then at lines 20–36, the approach will iterate over all variables in the worklist, replacing them with updated values and introducet ands notions so that variables defined in the loop will converge to a fixed point. In the loop beginning at line 20, the approach dequeues the first argument in the worklist and determines if it refers to v (i.e., if it is a self reference.) If it is, then the approach replaces the self reference with the s notation and puts a t notation around the expression, as described in Section 6.2.1. If the argument is not a self reference, then the approach replaces it with the most up-to-date IR representation and decreases all counters in the replaced expressions by the counter value for the replaced argument. (This maintains the correspondence between loop iterations.) All arguments present in the replaced argument that are self-references are then also added to the worklist. This process repeats until there are no more arguments to be resolved in the worklist. To illustrate the second phase, consider again the example from above. For line 19 in R3, My approach will add thet expression and change c 19 tos c 19:R3 . Therefore the IR for line 19 is as shown in Equation 6.4. c 19 =(t 0 c 19 :R3 (+s 1 c 19 :R3 “A”)) (6.4) Similarly, for line 20 in R3, I replace the reference c 1 19 with Equation 6.4 and the IR for line 20 will be as shown in Equation 6.5. Here, the underlined portion is the part that came from Equation 6.4. f 0 20 =(t 0 f 20 :R3 (+s 1 f 20 :R3 (+(t 1 c 19 :R3 (+s 2 c 19 :R3 “A”)) “A”))) (6.5) The third phase of the approach (lines 40–48) introduces thef operator that denotes the data-dependency between variables in the same loop region. This enables the interpreters to distinguish region-internal vari- ables from the same loop and nested loops. To do this, my approach iterates over all t expressions and checks their region IDs. If it finds that there is at expressiont a:R embedded by anothert expressionst b:R with the same region ID, the approach changest a:R tof a:R . This is because if twot expressions have the same region ID, they are generated in the same loop. To illustrate, consider again line 20 in R3. After the third phase, its IR will be as shown in Equation 6.2. Here, the symbolf 1 c 19 :R3 is the changed portion. 6.2.3.4 Inter-procedural Analysis Once the IR has been calculated for each method’s MBR, then analysis of the method is complete. The IR for the MBR is used as the method’s summary to enable inter-procedural analysis. When an invocation 105 Algorithm 3 Solving the loop region Input: A loop region R Output: The IR of string variables in R 1: for all Instruction S in R in topological order with respect to CFG do 2: if S defines a variable v then 3: v! v 0 4: for all a i 2 Par(v) do 5: if a i 62 External then 6: v! a 1 i 7: end if 8: end for 9: end if 10: end for 11: while the IR for all variables are not converged do 12: for all Instruction S in R in topological order do 13: if S defines a variable v then 14: Q EmptyQueue 15: for all a k i 2 Par(v) do 16: if a i 62 External then 17: Add a k i to the tail of Q 18: end if 19: end for 20: while Q is not empty do 21: a i Q:poll() 22: if a i equals to v then 23: Replace a k i withs k v:rid 24: if There is not 0 v:rid in the IR of v then 25: Putt 0 v:rid around the IR of v 26: end if 27: else 28: Replace a k i with its IR 29: Decrease all counters in a i by k 30: for all b j i 2 Par(a k i ) do 31: if b j i refers to v then 32: Add b j i to the tail of Q 33: end if 34: end for 35: end if 36: end while 37: end if 38: end for 39: end while 40: for all variable v in the Region do 41: if v contains at expressiont v:RID then 42: for allt x:rid contained byt v:RID do 43: if rid equals to RID then 44: t x:rid !f x:rid 45: end if 46: end for 47: end if 48: end for 106 to the summarized method is encountered during the analysis of Algorithm 2, the IR of the summarized method’s return variable is inserted into the current method body and placeholder variables in the IR are replaced with the arguments provided at the invocation call site. Note that methods are processed in reverse topological order with respect to the PUT’s CG so that a method’s summary is computed before that of any calling method. In the case that there are Strongly Connected Components (SCC) in the CG, My approach builds a global CFG for the SCC and then uses the intra-procedural analysis to generate the IR for this connected method. 6.2.4 Illustrative Example of the Analysis I walk through my algorithm using Program 4 as an example. My approach first builds the summary for each region in a bottom-up order. The approach first analyzes the leaf regions, R3 and R2. R3 is a LBR that defines two variables, c 19 and f 20 . As I described earlier in Section 6.2.3.2, I initially represent their values as c 19 =(+ c 19 “A”) and f 20 =(+ f 20 c 19 ). Then the approach replaces the symbol c 19 in f 20 with its definition so that I have f 20 =(+ f 20 (+ c 19 “A”)). After this, the approach marks the iteration from which the value of a variable comes from as a superscript of each variable. After this step, I will have c 0 19 =(+ c 1 19 “A”) and f 0 20 =(+ f 1 20 (+ c 1 19 “A”)). The approach continues iterating over the code in R3. c 19 does not contain any other variables that need to be replaced, but it does contain a self-reference, so the approach adds at expression and changes c 19 in the expression tos. The result after this step is shown in Equation 6.4. For f 20 , the approach replaces c 19 with its definition shown in Equation 6.4. Since c 19 is in the expression of f 20 with a -1 superscript, the approach decreases the superscript of all variables in the expression of c 19 by 1. So, after the replacement, I have: f 0 20 =(+ f 1 20 (+(t 1 c 19 :R3 (+s 2 c 19 :R3 “A”)) “A”)) Then the approach determines that f 20 also has a self-reference (underlined), so the approach changes it to as, as is shown in Equation 6.5. Finally, the approach checks the expression of f 20 and finds that it contains twot expressions, the first one is t 0 f 20 :R3 , and the second one is t 1 c 19 :R3 . However, these two t expressions do not represent different loops, they are generated in the same region and represent two related variables in the same loop. So the approach replacest 1 c 19 :R3 withf 1 c 19 :R3 . After the replacement, I have the IR of f 20 as shown in Equation 6.2. For the other leaf region, R2, its body only defines one variable, e 33 . The approach represents its IR as shown in Equation 6.1. 107 After the approach solves all leaf regions, it solves region R1, which is the parent of R2. R1 is also the body of a loop, it directly defines a variable d 29 and indirectly defines the variable e 33 in R2. Similar to R3, in the first step, the approach first replaces all variables with the known representations. Since the approach knows the IR of e 33 in R2, it directly uses the IR as the expression of e 33 in R1. Thus, in R1, the approach will have the value of e 33 the same as in R2, because e 33 is not changed in R1 and d 29 =(+ d 29 (t 0 e 33 :R2 (+s 1 e 33 :R2 “e”))). Then the approach adds superscripts to the internal variable, which is d 29 , so I have d 0 29 =(+ d 1 29 (t 1 e 33 :R2 (+s 2 e 33 :R2 “e”))) Finally, the approach adds at expression to d 29 and I have d 29 =(t 0 d 29 :R1 (+s 1 d 29 :R1 (t 1 e 33 :R2 (+s 2 e 33 :R2 “e”)))) The expression of d 29 also contains twot expressions, but they are from different regions and represent different loops. Finally, the approach analyzes the region R0, which is an MBR. Since it is not a loop body, the approach only replaces the variables with their corresponding expressions. In R0, the approach finds that c 15 , f 16 , d 25 , and e 26 are also constants and therefore uses their values in the expressions of c 19 , f 20 , d 29 , and e 44 . 6.3 Evaluation for String Analysis I evaluated my approach by measuring its performance in terms of accuracy and runtime on a mixture of benchmark test cases and real-world Java and Android applications. To provide a baseline for my measurements, I compared my approach against the popular and widely used JSA [45]. My evaluation addressed the following four research questions: RQ 1: How accurate is my approach in analyzing strings constructed with various data flows? RQ 2: How accurate is my approach in handling basic string operations? RQ 3: How accurate is my approach on real-world applications? RQ 4: What is the runtime of my approach? 6.3.1 Implementation The implementation of my approach, Violist, is written in Java. To extract SSA based representations of an application’s code, CGs, and CFGs, I use the Soot [19] analysis framework. I also implemented two string 108 interpreters, described below, that mimic two of the most common approaches to string analysis. For most users of the framework, these two interpreters subsume most string analyses I saw in the literature. My approach is not limited to these two interpreters though, as the approach can be easily extended to include new interpreters that implement different string operation semantics. In its current state of implementation, Violist can perform string analysis on apps for which either Java or Dalvik bytecode is available. It can be extended to other languages for which it is possible to represent their code in SSA form and generate CGs and CFGs. My first interpreter is the String Set Interpreter (SSI). This interpreter is provided a value n that represents the number of times a loop (i.e., at) will be unraveled in the IR. Note that this approach is not safe unless n is larger than the maximum loop unraveling or there are no loops in the IR. However, this mimics simple string analyses that do not need loop semantics (e.g., [94]). The output of the SSI interpreter is a set of concrete strings. For example, to interpret the IR for c 19 , which is shown in Equation 6.4, with n equal to 3, I would get the set of concrete stringsf“AA”;“AAA”;“AAAA”g. My second interpreter is the FSA Interpreter (FSAI). This interpreter approximates the values of strings in the form of FSAs. In general, most of the interpretation semantics for the FSAI are similar to those of the SSI. However, the key difference is that values are not converted to concrete strings, but instead FSAs, and loops are not unraveled. For loops, the FSAI does not unravelt expressions using an n-bound, instead the FSAI uses the Widen operation proposed by Yu and colleagues [160] to generate a safe model for infinite string values in a loop. In my implementation of FSAI, I leveraged theAutomaton library used by JSA to model the string outputs. The output of the FSAI is safe due to the approximation process defined by the Yu and colleagues’ technique. 6.3.2 Experiment Protocol To evaluate the performance of my approach, I measured Violist’s accuracy and runtime and compared the results with the publicly available implementation of JSA. For both analyses, I treated all public methods as possible entry points. To measure accuracy, I computed precision and recall of the two analyses against the ground truth. Below, I explain how I identified the ground truth for the different types of applications used in my study and how I compared the ground truth against the models generated by the analyses. All of my experiments were conducted on a Dell XPS-8300 desktop with an Intel i7 processor at 3.4 GHz and 8 GB of memory. Building Ground Truth: In my evaluation, I used three different types of subjects: the JSA bench- marks, hand-crafted test cases, and real-world Android and Java applications. For the first two types, I was able to identify the ground truth by manual analysis of the code, because it was provided as part of the benchmark. However, it was not possible to identify the ground truth for the Android market apps because 109 of their size and the lack of source code. For these apps, I used a profiling technique to identify a subset of the values that string variables in the apps could assume. To do this, I first randomly selected a set of non- constant string variables in each app that did not include external strings (e.g., user input). I then inserted probes to record the values of these string variables when they were executed. Then I created workloads that traversed all of the discoverable User Interfaces (UIs) and recorded the values of the string variables via the probes. I used these collected values as the ground truth for the string variables. For these apps, the collected values represent a subset of the real ground truth. The implication of this is that there may be less false positives and more false negatives than reported by my results. Thus, by using the profiled subset of the ground truth, I will get a lower bound on the precision and an upper bound on the recall. For both approaches, it is important to note that they provide safe approximations, so I would expect 100% recall. For precision, I do not expect that this calculation of the ground truth would impact one approach more than the other. Calculation: To calculate precision and recall, I compared the ground truth to the set of strings reported by the two approaches, JSA and Violist. For the JSA and FSAI interpreters, additional steps are needed to compute these values because their string representations may contain cycles, which represent strings of infinite length. For this situation, I followed the Sound and Most Precise (SMP) policy in which I set an upper bound to the length of strings or maximum loop iterations that maintained 100% recall but had the largest precision. For the FSA models generated by JSA and FSAI, I generated strings with length smaller or equal to k, where k is the length of the longest string in the ground truth. For the SSI interpreter, I iterated over the t expressions for j times, where j is the smallest number of iterations needed by any loop to generate all of the strings in the ground truth. 6.3.3 RQ 1: Accuracy on Various Types of Data Flow One of the goals of my approach is to develop a framework that would allow for more accurate modeling and handling of complex data flows, such as inter-procedural manipulation of strings and loops. To evaluate the success of my approach with respect to these goals, I created a test suite that included string value computations based on different types of data flows. In particular, I considered four such types: Branch, Single Loop, Nested Loop, and Inter-procedural. I also defined a fifth type called Circle Loop, which is a single loop containing three variables that depend on each other in a cycle. I combined these five variants to create a mixture of test cases, removing those that did not make sense to combine (i.e., Single Loop, Nested Loop, and Circle Loop). In total, I have eight more types of data flows, which are Branch and Inter-procedural, Single Loop and Branch, Single Loop and Inter-procedural, Nested Loop and Branch, Nested Loop and Inter-procedural, Circle Loop and Branch, Nested and Inter-procedural, and Mix up all. This in total gave us 13 different types of data flow. 110 Next, I inserted string operations into the different data flows. I broadly categorized all of the string operations into one of three categories and used one function from each as a representative. The first cate- gory is Concatenation (Concat), which included all APIs that joined two or more strings. Its representative is StringBuffer.append. The second category is Manipulation (Manip), which include APIs that manipu- late a substring of the original string, such as replaceAll, replaceFirst, and substring. Its representative is String.replaceAll. The third category is Converge, which include the APIs that the result will converge after one invocation to the APIs. Its representative is String.trim. For each of the types of data flows, I created one test case for each of the three categories of string operations and one test case that contains all three categories. This gave us 52 total cases. Figure 6.2: Precision of the three string analyses on varied data flows. Using the process described in Section 6.3.2, I calculated the precision and recall for each test case for each of the three string analyses. In all cases, the recall for the analyses was 100%, which was expected given my use of the SMP policy. The precision of the three analyses is shown in Figure 6.2. In this figure, the x-axis shows the names of the four categories of string operations and the individual string analysis. The y-axis shows the average precision for each operation over the different data flows. The average precision of SSI was 70%, FSAI was 64%, and JSA was 7%. From the results, both of Violist’s interpreters achieved precision that was, at least, nine times higher than that of JSA. When I examined the FSAs generated by JSA, this gap was clearly attributable to the increased precision available due to improved loop modeling and context sensitivity. Overall, these results show that my method of handling 111 these types of data flows can result in significant improvements in precision as compared to the popular and widely used JSA. 6.3.4 RQ 2: Accuracy for Basic String Operations For the second research question, I looked at the accuracy of Violist and JSA in computing models of the basic string operations without the effect of more complex data flows. In other words, the accuracy of computing string values for a standalone invocation of a string operator. To address this question, I leveraged a set of string analysis benchmarks that were made available as part of the JSA distribution [6]. These test cases cover most of the basic string operations and their usages (e.g., simple branches and loops). To have a meaningful comparison, I removed those test cases that contained operations that were not supported by JSA. I identified these as operations that were implemented by simply modeling them as any string (i.e., “.*”). In total, I had 80 viable test cases. I then ran the three analyses on the test cases and computed their precision and recall. Both JSA and the two interpreters of Violist achieved 100% recall, which again was expected since they are safe analyses and I used the SMP policy. On average, JSA achieved 76% precision and Violist achieved 71% precision. Both the SSI and FSAI had the same level of precision because the data flow in the test cases was simple enough that the Widen operation did not make any difference. Among the 80 test cases, JSA and Violist achieved the same precision in 60 test cases. JSA had a higher precision than Violist in 15 cases while Violist had a higher precision in five cases. To understand the results better, I inspected the 15 test cases for which JSA had a higher precision in more detail. It turns out that besides string analysis, JSA also performs basic string constraint solving for string constants. For example, if there is a branch statement that contains a constraint on a string, JSA can augment its results with this information. My approach does not refine the results in this way, so it had lower precision for these test cases. I also inspected the test cases for which Violist had a higher precision. I found that Violist had a higher precision for these because of its ability to more precisely model inter-procedural analysis. Overall, I believe that the results of this research question are neutral. For most of the implemented string operations, my approach and JSA were equivalent in terms of precision. For those in which JSA was better, I found that the reason for this was that JSA also performed basic analysis of string constraints in branches. Although my analysis could not handle these, the architecture of my approach would allow a user to incorporate these semantics into an interpreter. For example, by attaching string constraints to each region of the Region Based Techniques (RBT) and then using a string constraint solver to account for their semantics. Additionally, these results discount a threat to validity of RQ1; namely that the observed 112 Table 6.1: Subject apps for string analysis App Description Size Reflection Java reflection benchmark in JSA 7,156 BookStore A JSP web app 24,305 a2z An Android conference app 74,700 add.me.fast A social network app 39,330 pipefilter An information checking app 16,626 increase in precision was due to differences in the implementation of the basic string operations used in the various data flows. 6.3.5 RQ 3: Accuracy on Realistic Apps In the third research question, I looked at the performance of the string analyses on realistic string variables and programs. I selected five realistic apps from different sources. The name, description, and size of the apps are listed in Table 6.1. The unit of size is the number of bytecode instructions. Among my apps, Reflection is one of JSA’s benchmark apps [5]; BookStore is from my previous testbed [15]; and apps a2z, add.me.fast, and pipefilter are three mobile apps randomly chosen from the Android Play Market. I only used one of the JSA benchmarks because I was not able to locate older versions of libraries required for their compilation. In total, over these five apps, I recorded 133 different string variables. Figure 6.3: Precision of the three string analyses on market apps 113 I ran all three analyses on the subject apps and computed the precision and recall as described in Section 6.3.2. As with the previous experiment, the recall was 100% for all analyses for all apps. For the three mobile apps, JSA did not terminate before running out of memory, so I was unable to compute its accuracy. I investigated this issue further and believe the reason is due to the fact that JSA uses a variable- pair-based method to do the global inter-procedural aliasing analysis. This method has an O(n 2 ) memory complexity, where n is the number of variables across the whole application. In my experiments, Android applications generally have more than 100,000 variables. Thus, I hypothesize that the failure is due to the increased expense of the global alias analysis when running over an app that makes extensive use of a large framework (i.e., the Android API). Figure 6.3 shows the precision measured in the experiment. Both SSI and FSAI had the same precision results, so they are shown together. For the two apps measured, Violist was either equal to or higher in terms of precision. Across all apps, Violist had an average precision of 86% while JSA had an average precision of 68% on its two working apps. Overall, these are positive results for Violist. It was able to achieve, on average, a higher level of precision than JSA, thus showing that its results from RQ1 and RQ2 carry over to real world applications. Furthermore, the results highlight the general applicability of Violist, as it was able to accurately analyze apps written for Android as well. 6.3.6 RQ 4: Analysis Runtime To answer this question, I compared the execution time of Violist’s two interpreters against JSA. For each of the experiments I reported in RQ 1–3, I repeated the experiment ten times and recorded the execution time for each. The results of these timing measurements are shown, in milliseconds, in Table 6.2. Due to the large number of test cases in RQ 1 and RQ 2, I only reported the average time cost for each of their test cases. In my results, I subtracted out the time consumed by the Soot APIs (for generating call graphs and control flow graphs) from the timing measurements, since this was strictly time consumed by Soot and was the same for both. The results show that for all categories of tests, Violist was significantly faster. The FSAI interpreter averaged a 75% decrease in runtime versus JSA and the SSI interpreter averaged a 94% decrease versus JSA. As with the previous RQ, I hypothesize that the cost of performing the global alias analysis is responsible for the high runtime. Within the results for Violist, the SSI interpreter was clearly faster. The reason for this is that the Widen operation for FSAs is a relatively expensive operation. Overall, these are positive results for my approach and show that my approach can significantly outperform JSA in terms of analysis runtime. 114 Table 6.2: Time cost of Violist vs. JSA Case Name SSI (ms) FI (ms) JSA (ms) RQ 1 test cases 13 22 119 RQ 2 test cases 35 306 401 Reflection 101 147 54,762 BookStore 56,607 202,257 7,371,030 a2z 455,772 457,248 N/A add.me.fast 34 174 N/A pipefilter 16,913 17,083 N/A 6.4 Threats to Validity To ensure that I can have a fair comparison to JSA, I selected my benchmarks from three different sources. These included apps from JSA’s benchmarks or that had been analyzed by JSA in prior work [68] as well as test cases I had created. Although test cases created by my group could introduce bias, there were no benchmarks available that included the more complex data flows I wanted to test. A general threat to the validity of RQ3 is my mechanism for establishing ground truth. If I were incomplete, I would have an upper bound on recall and a lower bound on precision. As I expect both analyses to have 100% recall, this primarily would affect precision. A change in precision would affect one of my conclusions from RQ3, but would not change my larger finding that Violist could more easily scale up for larger applications than JSA. 115 Chapter 7 Related Work 7.1 Energy Measurement There are two areas of related work for vLens in Chapter 2, energy estimation and power measurement. The first area, energy estimation, assumes that developers do not have access to power measurement hardware and uses software based techniques to predict how much energy an application will consume at runtime. The second group of techniques, power measurement, makes use of power measurement hardware to obtain power samples and then uses software based techniques to attribute the power to implementation structures. The general approach of energy estimation techniques is based on model building. The process is to build a parameter-based model of energy consumption, capture values for the model’s parameters from the application, and then calculate the values produced by the model. Models are generally based on different software, hardware, or operating system features. Capturing parameter values can be done with static analysis, software profiling, and runtime monitoring. As compared to my approach, the primary difference is that estimation techniques assume that developers do not have access to power measurement devices, the target hardware platforms, or are unable to use these techniques for various reasons. These scenarios can occur frequently in practice and make estimation techniques a complementary approach to mine. Hao and colleagues developed a technique, eLens, for predicting the energy consumption of smart- phone applications based on an expected workload [69]. The eLens technique is based on per-instruction cost functions, which are provided by a Software Energy Environment Profile (SEEP) and driven by pa- rameters obtained via program analysis. Developing the SEEP is labor intensive and may not always be feasible, as there are thousands of APIs in the Android SDK, many of which require complex energy mod- els. In contrast, vLens takes live energy measurements and attributes them to source lines using regression and statistical techniques. The vLens technique does not require a SEEP, but does require a PMP, which 116 may be easier to obtain for some developers. Because vLens is based on runtime measurements, it must also deal with the influence of garbage collection, thread switching, tail energy, and sampling intervals. Seo and colleagues [130, 131, 129] also have proposed several software energy estimation techniques. They model the energy consumption of each Java bytecode and the network utilization, and capture these two types of information by modifying the JVM. Compared to vLens, their estimation requires modifica- tions to the runtime systems, the construction of energy models, and provides information at a component level instead of the source level. Other approaches estimate software energy using models based on operating system (OS) level fea- tures. These features include calls to OS-level APIs [96, 141] or state metrics internal to the operating system [119, 163], such as CPU frequency and network usage. Similar to this work are approaches to estimate the energy consumption in virtual machines [54, 150, 139, 83]. Compared with my approach, this body of work requires significant integration and modification to the runtime systems and is not as portable. Furthermore, the workload of building models can be high. Another group of approaches for software energy estimation is based on low-level hardware modeling. Works of this type build models based on the assembly instructions [143, 144] or micro-instructions beyond each assembly instruction [138, 108]. The drawback of these approaches is that they cannot estimate the energy of external devices (e.g., GPS, WIFI, etc.), which also consume a large portion of energy in modern mobile systems. Moreover, Sinha and colleagues [137] have shown that, at this level, the energy consumption of instructions or micro-instructions are roughly the same. Unlike this body of work, vLens can measure the energy consumption of the entire mobile system, including external devices. The last type of work to estimate software energy is to build simulators. The main drawback of sim- ulators is their speed. Cycle-level simulators [114, 38] can require thousands of instructions to simulate one instruction, which is too slow to use for simulating modern interactive applications. Even functional simulators [111], which are comparatively faster than cycle accurate simulators, run too slowly to be useful for capturing realistic user interactions. The most closely related techniques to vLens are based on power measurement. The general ap- proach for these techniques is to build or use a power measurement device, such as the LEAP [122] and Monsoon [17], that can take energy measurements at a certain frequency. These measurements are then combined with software based techniques to provide useful information to software developers. Sahin and colleagues [137] map energy consumption to different component level design patterns. Their work explored the correlations between energy consumption and the use of different design patterns. At a higher level, Flinn and colleagues [56, 32] measured the energy consumption of applications and mapped the energy to individual OS processes. This was done, in part, by instrumenting the operating systems. 117 Sesame [50] measures the energy by reading the battery interface of laptops and uses regression anal- ysis to automatically generate models of an applications energy consumption. The models are based on operating system level metrics, such as CPU workload and network utilization measurements. The use of such metrics allows Sesame to provide higher energy measurements at a finer level of granularity, but not at the level of source code lines. Tan and colleagues [142] also use regression analysis to map energy to paths in the control flow graph, but are not able to provide any finer level of granularity with their measurements. Compared with these approaches, vLens is able to provide energy measurements at a much finer level of granularity. vLens calculates energy measurements at the source line level, whereas the above mentioned approaches compute their information at the level of entire application, architecture level components, design patterns, or methods. 7.2 Empirical Study There are several studies related to my empirical study in Chapter 3. Eprof [119] provides insight into the energy usage of certain APIs. Their results show that 65%-75% of an app’s energy is consumed by third-party advertisement APIs and is concentrated in I/O. Other energy usage patterns were not discussed. Additionally, the results in Chapter 3 used a significantly larger pool of subject applications, which pro- vides a broader picture of the energy usage patterns of mobile apps. Linares-V´ asquez and colleagues [99] conducted an empirical study on analyzing API methods and mining API usage patterns in Android apps. As in my dissertation, they utilized the same power monitor Monsoon [17]. The authors provided practical advice or actionable knowledge for developers. My work differs from Linares-V´ asquez and colleagues’ in several aspects. First, they only identified energy-greedy API related usage patterns; whereas my work examined application, package, loop, and bytecode level energy usage. Second, over 400 apps from different domains were tested in my work compared to 55 mobile apps in their work. The higher number of apps helps to provide a more comprehensive and general picture of the energy usage of mobile apps. Third, due to the use of the Traceview of Android SDK, they could not report idle and non-idle energy as in my work. Fourth, compared to nanosecond time stamps, they measured the millisecond level energy usage, which as I show can lead to inaccuracies in the reported measurements. Previous studies of mobile apps have also focused on app usage. B¨ ohmer and colleagues [36] collected information, such as average session, time, and location, from 22,626 apps based on 4,125 users. Froehlich and colleagues [57] designed MyExperience to log objective traces and subjective feedback of real usage data. McMillan and colleagues [106] proposed a method to collect user feedback. They used the collected feedback to improve the user experience. Shepard and colleagues [136] proposed LiveLab to collect the 118 wireless usage of smartphones. They discovered that only a few apps were heavily executed in daily usage. Developers could make use of this app usage information to improve app performance. However, none of the works above studied the energy usage of mobile apps. Wilke and colleagues [156] studied the significance of the energy efficiency-issues for mobile apps by analyzing a large set of user comments extracted from the Google Play market place. Kwon and colleagues [85] reported the impact of distributed programming abstractions on application energy con- sumption and presented a set of practical guidelines for the development of distributed applications. Zhang and colleagues [162] compared power models and energy bugs across different smartphone platforms and discovered some useful bug patterns related to energy consumption. Nevertheless, all of these approaches focus very narrowly on specific problems or research on energy usage from a different perspective. 7.3 Display Energy Optimization The closest work to my display energy optimization technique, Nyx, is Chameleon[51]. Their approach modifies the source code of browsers to change the colors of web pages in the rendering buffer. It first manually builds color transformation schemes for each of the top twenty web sites, such as Google, and saves them in a cloud server. When the browser sends request to one of these web sites, it queries the cloud server and downloads the pre-installed transformed color schemes. Then, it renders the transformed web application with the downloaded transformation scheme. Nyx is different from Chameleon in two aspects. First, Nyx builds the color transformation scheme automatically for web applications. Thus, my approach is more easily applied to a broad range of web applications. Second, my approach modifies the web application directly on the server side. Thus it does not introduce the client-side cost of obtaining the transformation or applying it in the browser. Other approaches to save energy for OLED screens have also been proposed. Kamijoh and colleagues’ work [81] is one of the first to optimize energy for OLED screens. It optimizes the energy consumption of OLED screen for the IBM Wristwatch by reducing the number of pixels that are bright. However, this work only considers two colors, the black background color and the different foreground color. As such, this approach is not applicable for color displays. Choi and colleagues’ method [43] reduces the energy consumption of LCD screens by reducing the screen refresh rate, color depth, and luminance. However, this approach relies on changing the hardware circuitry of LCD screens. Lyer and colleagues [78] also proposed a method for changing colors on OLED screens to save energy. This method saves energy by darkening the areas that are not focused on by users. The drawback of this 119 approach is that the contents in the darkened area are not readable. Compared with this approach, Nyx can better maintain the readability of the entire page. Linares-Vasquez and colleagues proposed GEMMA [100] to use genetic algorithms to transform colors by finding a set of color schemes which are Pareto frontiers. Unlike GEMMA, which requires users to manually find the optimal color scheme among the Pareto frontiers, Nyx can automatically find the optimal color scheme. Further more, GEMMA cannot rewrite the code of mobile applications while Nyx can do so. 7.4 HTTP Energy Optimization Optimizing the performance of HTTP requests is a classic problem in browser and network design. Many approaches have been proposed to send HTTP requests with less TCP connections, such as HTTP pipelin- ing [117, 37, 47], SPDY [33], and multiplexing [62, 140, 127, 149]. Similar to my technique, these approaches optimize the performance of HTTP requests by bundling multiple HTTP request into one TCP connection. However, these techniques have been proposed to optimize parallel HTTP requests, such as AJAX calls, in web browsers. They cannot be used directly for synchronized sequential HTTP requests, such as SHRSs, in mobile apps because they do not account for the data dependencies in synchronized se- quential HTTP requests. Furthermore, unlike these techniques, my approach does not require any adaption of current protocols or system level infrastructure. To optimize HTTP energy consumption in mobile apps, I made two preliminary studies [88, 89] , which showed that bundling HTTP requests could possibly save HTTP energy. However, those studies did not investigate how to optimize HTTP request energy automatically with static analysis techniques. Besides optimizing energy consumption of HTTP requests, researchers also proposed techniques to optimize the energy of network communication and cloud. Hsu and colleagues proposed a technique to reduce the energy consumption of cloud by using task consolidation [72]. Shah and colleagues proposed a technique to find out the most energy efficient path to transmit data through ad hoc network [132]. There are many techniques for energy efficient routing and scheduling schemes [161, 29, 159, 25]. Although these techniques are useful, they do not directly target on HTTP requests. 7.5 Other Energy Optimization Techniques Besides optimizing the display and HTTP energy, energy consumption of mobile devices can also be optimized in other ways. One category of approaches for saving mobile energy is detecting the misuse of sensors [121, 102, 31, 103, 64, 55]. My approach EDTSO [93] saves the energy consumption of test suites 120 with energy directed test suites minimization. Zhong and colleagues [165] optimized the communication energy of mobile phones by redesigning the communication protocol. Chen and colleagues [41] proposed a method to save energy consumption for Java-based mobile applications by offloading workload to a server. Researchers have also proposed several techniques to optimize the energy consumption of test cases for in-situ testing [95, 82, 39, 104, 98]. Nikzad and colleagues proposed an annotation language and middleware service [118] that can schedule task on mobile apps in a more energy efficient way. Wang and colleagues proposed a framework to optimize energy for processors and memories [153, 152, 151]. Qian and colleagues proposed several techniques to offload mobile workload to the cloud to save energy [125, 123, 124]. 7.6 Other Web App Transformation Techniques Besides energy saving, transformation techniques for web applications are also used to optimize the user experience for mobile devices. Jones and colleagues [80] improve the readability and attractiveness of web applications on mobile devices by manually redesigning the layout of web applications. Bila and colleagues [34] design a system that enables end-users to adjust the layout of web applications manually. Chen and colleagues [42] improve the readability of web applications on mobile devices by partitioning web pages into segments. Minimap [126] improves the readability by enlarging the contents of the web application. 7.7 String Analysis There are several related techniques of my string analysis technique in Chapter 6. Nguyen and colleagues uses D-model to solve string values of PHP web apps [116]. D-model defines a tree-based representation for symbolic, string-based values originating from symbolic execution of PHP expressions. Violist is dif- ferent from this model in the following aspects. First, their approach addresses the problem of estimating multiple outputs of client-side pages while my approach targets a more general problem, which is estimat- ing the possible values of a string variable at a given program point. Second, their model only handles string concatenation, while mine supports all of the string operations in the Java API. Third, their model only unravels loops once, while my approach can unravel loops either a set n number of times or based on additional code analyses. Another closely related work is JSA [45], which uses flow graphs to produce FSAs that represented the possible string values of Java string variables. Compared with the flow graphs of JSA, the IR of Violist 121 has two main advantages. The first advantage is that it allows users to specify the iteration bound for loops while JSA assumes the bound is infinite. The second advantage is that it uses a summary based technique so that it can provide context sensitive inter-procedural analysis and avoid expensive global aliasing analysis. Other approaches also exploit symbolic execution to perform string analysis [134, 128]. Since these techniques use symbolic execution, they may not be readily scalable to complex programs (e.g., those containing thousands of branches). Other research related to symbolic execution based string analysis uses a constraint solver [46, 146, 147, 164, 58, 35] or a decision procedure [71, 84] to decide whether the string constraints are satisfiable or not. These approaches are generally concerned with string constraint solving as part of a symbolic execution based approach to string analysis, as opposed to my more general focus of a static string analysis framework. The automaton-based method is also widely used to perform string analysis. Minamide [110] used transducers (i.e., multi-track automata) to do PHP string output analysis. Other techniques [160, 44, 27] introduced the widening operator for loops to guarantee the convergence of symbolic reachability analysis. A drawback of these automaton-based approaches is that they overestimate the iteration bound for loops; generally assuming it is infinite. My approach can do this as well, but also offers a flexible way to incor- porate user provided information (e.g., via additional program analyses) to more accurately model loop bounds, which can help increase the precision of the estimated string values. 122 Chapter 8 Conclusion The goal of my dissertation was to develop techniques that can help developers to create energy efficient mobile apps. The thesis statement of my dissertation was: Program analysis techniques can help develop- ers to understand how energy is consumed in mobile apps, and can also be used to automatically optimize the energy consumption of mobile apps. There were three parts in this dissertation to evaluate the statement. The first and the second part showed that program analysis techniques could help developers to understand how energy is consumed in mobile apps. In the first part of my dissertation, I developed a technique to measure the energy consumption of mobile apps at the source line level. It was the first technique that can provide such a fine-grained energy measurement. As evaluated, my technique can achieve 90% measurement accuracy. In the second part of my dissertation, I used my energy measurement technique to conduct an empirical study on how energy is consumed by market apps with hundreds of Android apps from the Google Play market. It was one of the first studies on energy consumption of mobile apps with source line level measurements. This study showed that mobile apps could spend more energy while waiting for user input and that HTTP requests are the most energy consuming APIs in the system. The third part of my dissertation demonstrated that program analysis techniques can optimize energy consumption of mobile apps automatically. In this part, I developed two techniques to automatically optimize the energy of mobile apps. The development of these two techniques showed that using program analysis techniques can identify and modify the energy inefficient programming practices automatically to optimize the energy. The first technique was Nyx, which automatically optimized the display energy of mobile web apps. This technique reduced the display energy by transforming the colors of mobile web apps to energy effi- cient colors. During the transformation, Nyx could also maintain the usability and attractiveness of mobile web apps. In my empirical evaluation, Nyx could reduce the energy consumption of mobile web apps by 123 25%. Nyx was the first technique that could automatically transform colors of mobile web apps to reduce their energy consumption. The second optimization technique was Bouquet, which optimized the HTTP energy of Android apps. Bouquet reduced the energy consumption of HTTP requests by bundling multiple small HTTP requests into a large one. In my approach, I identified a pattern of HTTP requests that could be optimized with static program analysis techniques and developed effective algorithms to optimize them. This technique was the first technique that could bundle HTTP requests with static program analysis techniques. As evaluated, Bouquet can save 15% energy for mobile apps. The empirical evaluation of these techniques showed that using program analysis techniques can either point out energy hotspots in mobile apps or can directly achieve a significant energy saving for mobile apps. These results confirmed my thesis statement: the use of program analysis techniques can help app developers to understand the energy consumption and can also automatically save energy of mobile apps. 124 Chapter 9 Future Work In this dissertation, I showed that using program analysis techniques can help developers to understand how energy is consumed in mobile apps and optimize the energy consumption automatically. In the future, energy consumption of mobile apps will continue to be an important problem for developers. My disserta- tion lays the foundation for developing techniques that can continue to improve developers’ understanding and optimization of the energy consumption of mobile apps. In the future, multiple directions can be followed based on my techniques. The measurement technique in my dissertation can help developers to find which part of mobile apps may consume more energy. By using my measurement technique, developers can conduct more empirical studies to detect more energy issues in market apps. For example, people can study the energy consumption of mobile apps in future devices and systems. The empirical study in this dissertation pointed our several energy consuming com- ponents. In the future, developers can propose other approaches to optimize the energy consumption of these components. For example, researchers can develop techniques to optimize the energy consumption of the file system, SQLite, and sensors in mobile apps. The display energy optimization and the HTTP energy optimization techniques in the dissertation can also inspire additional energy optimization tech- niques. Developers can build new techniques to improve the two techniques in this dissertation in the future. For example, using search-based techniques to generate more energy efficient color schemes for OLED screens. Finally, my string analysis technique provides a fundamental program analysis framework which can be used in future energy optimization techniques, such as modeling the SQLite queries. 125 References [1] Android developer http://goo.gl/8jerX1. [2] Android monkey. http://goo.gl/wSlG0b. [3] Dex2jar. http://code.google.com/p/dex2jar/. [4] Google play crawlerhttp://goo.gl/0yDL5w. [5] http://www.brics.dk/JSA/dist/jsa-benchmarks.tar.gz. [6] http://www.brics.dk/JSA/dist/string-test.tar.gz. [7] Java String Analyser. http://www.brics.dk/JSA/. [8] The r project for statistical computing. http://www.r-project.org/. [9] http://cssparser.sourceforge.net/. [10] http://developer.android.com/training/articles/perf-tips.html. [11] http://expressjs.com/. [12] http://mashable.com/2013/09/21/battery-draining-app/. [13] http://money.cnn.com/2015/06/03/technology/battery-draining-apps/. [14] https://chocolatey.org/packages/newt. [15] http://www-bcf.usc.edu/ ˜ halfond/testbed.html. [16] http://www.fiercewireless.com/special-reports/3g4g_wireless_network_ latency_how_do_verizon_att_sprint_and_t-mobile_compar. [17] http://www.msoon.com/LabEquipment/PowerMonitor/. [18] http://www.pcworld.com/article/253808/3g_and_4g_wireless_speed_showdown_ which_networks_are_fastest_.html. [19] http://www.sable.mcgill.ca/soot/. [20] http://www.slideshare.net/jerrinsg/android-power-management. [21] http://www.speedtest.net/. [22] http://www.statista.com/statistics/276623/number_of_apps_available_in_ leading_app_stores/. [23] http://www.timewarnercable.com/en/plans-packages/internet/ internet-service-plans.html. 126 [24] Emile Aarts and Jan Korst. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. John Wiley & Sons, Inc., New York, NY , USA, 1989. [25] Ashfaq Ahmad, K Latif, N Javaidl, ZA Khan, and Umar Qasim. Density controlled divide-and- rule scheme for energy efficient routing in wireless sensor networks. In Electrical and Computer Engineering (CCECE), 2013 26th Annual IEEE Canadian Conference on, pages 1–4. IEEE, 2013. [26] Alfred V . Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Tech- niques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2006. [27] Muath Alkhalaf, Tevfik Bultan, and Jose L. Gallegos. Verifying Client-side Input Validation Func- tions Using String Analysis. In Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pages 947–957, Piscataway, NJ, USA, 2012. IEEE Press. [28] Apache. Bcel library. http://bcel.sourceforge.net/. [29] Md Azharuddin and Prasanta K Jana. A distributed algorithm for energy efficient and fault tolerant routing in wireless sensor networks. Wireless Networks, 21(1):251–267, 2015. [30] T. Ball and J.R. Larus. Efficient Path Profiling. In MICRO 29, pages 46–57. IEEE Computer Society, 1996. [31] Abhijeet Banerjee, Lee Kee Chong, Sudipta Chattopadhyay, and Abhik Roychoudhury. Detecting energy bugs and hotspots in mobile apps. In FSE, 2014. [32] F. Bellosa. The Benefits of Event-Driven Energy Accounting in Power-Sensitive Systems. In the 9th workshop on ACM SIGOPS European Workshop, pages 37–42. ACM, 2000. [33] Mike Belshe and Roberto Peon. Spdy protocol. 2012. [34] Nilton Bila, Troy Ronda, Iqbal Mohomed, Khai N. Truong, and Eyal de Lara. PageTailor: Reusable End-user Customization for the Mobile Web. In Mobisys, 2007. [35] Nikolaj Bjørner, Nikolai Tillmann, and Andrei V oronkov. Path Feasibility Analysis for String- Manipulating Programs. In Proceedings of the 15th International Conference on Tools and Algo- rithms for the Construction and Analysis of Systems: Held As Part of the Joint European Confer- ences on Theory and Practice of Software, ETAPS 2009,, TACAS ’09, pages 307–321. Springer- Verlag, Berlin, Heidelberg, 2009. [36] Matthias B¨ ohmer, Brent Hecht, Johannes Sch¨ oning, Antonio Kr¨ uger, and Gernot Bauer. Falling asleep with angry birds, facebook and kindle: A large scale study on mobile application usage. In MobileHCI, 2011. [37] Niels Bouten, Steven Latr´ e, Jeroen Famaey, Filip De Turck, and Werner Van Leekwijck. Mini- mizing the impact of delay on live svc-based http adaptive streaming services. In 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), pages 1399–1404. IEEE, 2013. [38] David Brooks, Vivek Tiwari, and Margaret Martonosi. Wattch: A Framework for Architectural- level Power Analysis and Optimizations. SIGARCH, 2000. [39] Bobby R Bruce, Justyna Petke, and Mark Harman. Reducing energy consumption using genetic improvement. In 17th Annual Conference on Genetic and Evolutionary Computation. ACM, 2015. 127 [40] Aaron Carroll and Gernot Heiser. An Analysis of Power Consumption in a Smartphone. In Proceed- ings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’10, pages 21–21, Berkeley, CA, USA, 2010. USENIX Association. [41] Guangyu Chen, Byung-Tae Kang, Mahmut Kandemir, Narayanan Vijaykrishnan, Mary Jane Irwin, and Rajarathnam Chandramouli. Studying Energy Trade Offs in Offloading Computation/Compi- lation in Java-Enabled Mobile Devices. IEEE Transaction on Parallel Distributed System, 2004. [42] Yu Chen, Wei-Ying Ma, and Hong-Jiang Zhang. Detecting Web Page Structure for Adaptive View- ing on Small Form Factor Devices. In WWW, 2003. [43] Inseok Choi, Hojun Shim, and Naehyuck Chang. Low-power Color TFT LCD Display for Hand- held Embedded Systems. In Proceedings of the 2002 International Symposium on Low Power Electronics and Design, ISLPED ’02, pages 112–117, New York, NY , USA, 2002. ACM. [44] Tae-Hyoung Choi, Oukseh Lee, Hyunha Kim, and Kyung-Goo Doh. A Practical String Analyzer by the Widening Approach. In Naoki Kobayashi, editor, Programming Languages and Systems, volume 4279 of Lecture Notes in Computer Science, pages 374–388. Springer Berlin Heidelberg, 2006. [45] Aske Simon Christensen, Anders Møller, and Michael I. Schwartzbach. Precise Analysis of String Expressions. In Proceedings of the 10th International Conference on Static Analysis, SAS’03, pages 1–18, Berlin, Heidelberg, 2003. Springer-Verlag. [46] Leonardo De Moura and Nikolaj Bjørner. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08/ETAPS’08, pages 337–340, Berlin, Heidelberg, 2008. Springer-Verlag. [47] Adrien Devresse and Fabrizio Furano. Efficient http based i/o on very large datasets for high per- formance computing with the libdavix library. In Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware, pages 194–205. Springer, 2014. [48] M. Dong and L. Zhong. Sesame: Self-Constructive System Energy Modeling for Battery-Powered Mobile Systems. In Proc. of MobiSys, pages 335–348, 2011. [49] Mian Dong, Yung-Seok Kevin Choi, and Lin Zhong. Power Modeling of Graphical User Interfaces on OLED Displays. In DAC, 2009. [50] Mian Dong and Lin Zhong. Self-constructive High-rate System Energy Modeling for Battery- powered Mobile Systems. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services, MobiSys ’11, pages 335–348, New York, NY , USA, 2011. ACM. [51] Mian Dong and Lin Zhong. Chameleon: A Color-Adaptive Web Browser for Mobile OLED Dis- plays. IEEE Transactions on Mobile Computing, 11(5):724–738, May 2012. [52] Mian Dong and Lin Zhong. Power Modeling and Optimization for OLED Displays. IEEE Trans- actions on Mobile Computing, 11(9):1587–1599, September 2012. [53] Stephen G. Eick, Joseph L. Steffen, and Eric E. Sumner, Jr. Seesoft-a tool for visualizing line oriented software statistics. IEEE Trans. Softw. Eng., 1992. [54] K.I. Farkas, J. Flinn, G. Back, D. Grunwald, and J.M. Anderson. Quantifying the Energy Con- sumption of a Pocket Computer and a Java Virtual Machine. ACM SIGMETRICS Performance Evaluation Review. 128 [55] A. Ferrari, D. Gallucci, D. Puccinelli, and S. Giordano. Detecting energy leaks in android app with poem. In Pervasive Computing and Communication Workshops (PerCom Workshops), 2015 IEEE International Conference on, pages 421–426, March 2015. [56] J. Flinn and M. Satyanarayanan. PowerScope: A Tool for Profiling the Energy Usage of Mobile Applications. In HotMobile. [57] Jon Froehlich, Mike Y . Chen, Sunny Consolvo, Beverly Harrison, and James A. Landay. Myexpe- rience: A system for in situ tracing and capturing of user feedback on mobile phones. In MobiSys, 2007. [58] Xiang Fu and Chung-Chih Li. A String Constraint Solver for Detecting Web Application Vulnera- bility. In SEKE, pages 535–542, 2010. [59] Xiang Fu, Xin Lu, B. Peltsverger, Shijun Chen, Kai Qian, and Lixin Tao. A Static Analysis Frame- work For Detecting SQL Injection Vulnerabilities. In Computer Software and Applications Confer- ence, 2007. COMPSAC 2007. 31st Annual International, volume 1, pages 87–96, July 2007. [60] Joshua Garcia, Daniel Popescu, Gholamreza Safi, William G.J. Halfond, and Nenad Medvidovic. Identifying Message Flow in Distributed Event-Based Systems. In Proceedings of the Symposium on the Foundations of Software Engineering (FSE), August 2013. To Appear. [61] L. Gomez, I. Neamtiu, T. Azim, and T. Millstein. Reran: Timing- and touch-sensitive record and replay for android. In Software Engineering (ICSE), 2013 35th International Conference on, pages 72–81, May 2013. [62] Ilya Grigorik. Making the web faster with http 2.0. Commun. ACM, 56(12):42–49, December 2013. [63] Jiaping Gui, Stu Mcilroy, Mei Nagappan, and William G. J. Halfond. Truth in advertising: The hidden cost of mobile ads for software developers. In ICSE, 2015. [64] Chaorong Guo, Jian Zhang, Jun Yan, Zhiqiang Zhang, and Yanli Zhang. Characterizing and de- tecting resource leaks in android applications. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on, pages 389–398, Nov 2013. [65] William G. J. Halfond and Alessandro Orso. Command-Form Coverage for Testing Database Ap- plications. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering, ASE ’06, pages 69–80, Washington, DC, USA, 2006. IEEE Computer Society. [66] William G. J. Halfond and Alessandro Orso. Improving Test Case Generation for Web Applications Using Automated Interface Discovery. In Proceedings of the the 6th Joint Meeting of the Euro- pean Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC-FSE ’07, pages 145–154, New York, NY , USA, 2007. ACM. [67] William G. J. Halfond and Alessandro Orso. Automated identification of parameter mismatches in web applications. In Proceedings of the 16th ACM SIGSOFT International Symposium on Founda- tions of Software Engineering, SIGSOFT ’08/FSE-16, pages 181–191, New York, NY , USA, 2008. ACM. [68] William G.J. Halfond and Alessandro Orso. AMNESIA: Analysis and Monitoring for NEutralizing SQL-Injection Attacks. In Proceedings of the International Conference on Automated Software Engineering, pages 174–183, Long Beach, California, USA, November 2005. [69] Shuai Hao, Ding Li, William G. J. Halfond, and Ramesh Govindan. Estimating Mobile Appli- cation Energy Consumption Using Program Analysis. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 92–101, Piscataway, NJ, USA, 2013. IEEE Press. 129 [70] Mark Harman. The Current State and Future of Search Based Software Engineering. In 2007 Future of Software Engineering, FOSE ’07, pages 342–357, Washington, DC, USA, 2007. IEEE Computer Society. [71] Pieter Hooimeijer and Westley Weimer. A Decision Procedure for Subset Constraints over Regular Languages. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’09, pages 188–198, New York, NY , USA, 2009. ACM. [72] Ching-Hsien Hsu, Kenn D. Slagter, Shih-Chang Chen, and Yeh-Ching Chung. Optimizing energy consumption with task consolidation in clouds. Information Sciences, 258:452 – 462, 2014. [73] http://www-scf.usc.edu/dingli/Nyx. Nyx Project Page. [74] P.J. Huber. Robust statistics. 1981. [75] Sunghwan Ihm and Vivek S. Pai. Towards understanding modern web traffic. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC ’11, pages 295–312, New York, NY , USA, 2011. ACM. [76] Intel. Atom N550 Datasheet. http://www.intel.com/content/dam/www/public/us/en/ documents/datasheets/atom-n400-vol-1-datasheet-.pdf. [77] Intel. X18-M/25-M SATA SSD Datasheet. http://download.intel.com/design/flash/ nand/mainstream/mainstream-sata-ssd-datasheet.pdf. [78] Subu Iyer, Lu Luo, Robert Mayo, and Parthasarathy Ranganathan. Energy-Adaptive Display Sys- tem Designs for Future Mobile Environments. In Proceedings of the 1st International Conference on Mobile Systems, Applications and Services, MobiSys ’03, pages 245–258, New York, NY , USA, 2003. ACM. [79] X. Jiang, P. Dutta, D. Culler, and I. Stoica. Micro power meter for energy monitoring of wireless sensor networks at scale. In Information Processing in Sensor Networks, 2007. IPSN 2007. 6th International Symposium on, pages 186–195. IEEE, 2007. [80] Matt Jones, Gary Marsden, Norliza Mohd-Nasir, Kevin Boone, and George Buchanan. Improving Web Interaction on Small Displays. Comput. Netw., 31(11-16):1129–1137, May 1999. [81] Noboru Kamijoh, Tadanobu Inoue, C. Michael Olsen, M. T. Raghunath, and Chandra Narayanaswami. Energy Trade-offs in the IBM Wristwatch Computer. In Proceedings of the 5th IEEE International Symposium on Wearable Computers, ISWC ’01, pages 133–, Washington, DC, USA, 2001. IEEE Computer Society. [82] E.Y .Y . Kan. Energy efficiency in testing and regression testing – a comparison of dvfs techniques. In QSIC. [83] A. Kansal, F. Zhao, J. Liu, N. Kothari, and A.A. Bhattacharya. Virtual Machine Power Metering and Provisioning. In Proceedings of the 1st ACM symposium on Cloud computing, pages 39–50. ACM, 2010. [84] Adam Kiezun, Vijay Ganesh, Philip J. Guo, Pieter Hooimeijer, and Michael D. Ernst. HAMPI: A Solver for String Constraints. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ISSTA ’09, pages 105–116, New York, NY , USA, 2009. ACM. [85] Young-Woo Kwon and Eli Tilevich. The impact of distributed programming abstractions on appli- cation energy consumption. Information and Software Technology, 2013. 130 [86] PeterJ.M. Laarhoven and EmileH.L. Aarts. Simulated Annealing, volume 37 of Mathematics and Its Applications. Springer Netherlands, 1987. [87] Patrick Lam, Eric Bodden, Ondrej Lhot´ ak, and Laurie Hendren. The soot framework for java program analysis: a retrospective. In Cetus Users and Compiler Infastructure Workshop (CETUS 2011), 2011. [88] Ding Li and William G.J. Halfond. An Investigation Into Energy-Saving Programming Practices for Android Smartphone App Development. In Proceedings of the 3rd International Workshop on Green and Sustainable Software (GREENS), 2014. [89] Ding Li and William G.J. Halfond. Optimizing Energy of HTTP Requests in Android Applications. In Proceedings of the Third International Workshop on Software Development Lifecycle for Mobile (DeMobile) – Short Paper, September 2015. To Appear. [90] Ding Li, Shuai Hao, Jiaping Gui, and Halfond William. An Empirical Study of the Energy Con- sumption of Android Applications. In 30th International Conference on Software Maintenance and Evolution. IEEE, 2014. [91] Ding Li, Shuai Hao, William G. J. Halfond, and Ramesh Govindan. Calculating Source Line Level Energy Information for Android Applications. In Proceedings of the 2013 International Symposium on Software Testing and Analysis, ISSTA ’13, pages 78–89, New York, NY , USA, 2013. ACM. [92] Ding Li, Yingjun Lyu, Jiaping Gui, and William G. J. Halfond. Automated Energy Optimization of HTTP Requests for Mobile Applications. In Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pages 249–260, New York, NY , USA, 2016. ACM. [93] Ding Li, C. Sahin, J. Clause, and W.G.J. Halfond. Energy-directed test suite optimization. In GREENS, 2013. [94] Ding Li, Huyen Tran, Angelica, and G. J. Halfond, William. Making Web Applications More Energy Efficient for OLED Smartphones. In ICSE, 2014. [95] Jing Li, Anirudh Badam, Ranveer Chandra, Steven Swanson, Bruce L Worthington, and Qi Zhang. On the energy overhead of mobile storage systems. In FAST, pages 105–118, 2014. [96] Tao Li and Lizy Kurian John. Run-time Modeling and Estimation of Operating System Power Consumption. In SIGMETRICS, 2003. [97] Zhenhua Li, Yafei Dai, Guihai Chen, and Yunhao Liu. Cross-Application Cellular Traffic Optimiza- tion, pages 19–48. Springer Singapore, Singapore, 2016. [98] Mario Linares-V´ asquez, Gabriele Bavota, Carlos Bernal-C´ ardenas, Rocco Oliveto, Massimiliano Di Penta, and Denys Poshyvanyk. Optimizing energy consumption of guis in android apps: A multi-objective approach. [99] Mario Linares-V´ asquez, Gabriele Bavota, Carlos Bernal-C´ ardenas, Rocco Oliveto, Massimiliano Di Penta, and Denys Poshyvanyk. Mining energy-greedy api usage patterns in android apps: an empirical study. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR), 2014. [100] Mario Linares-V´ asquez, Gabriele Bavota, Carlos Eduardo Bernal C´ ardenas, Rocco Oliveto, Mas- similiano Di Penta, and Denys Poshyvanyk. Optimizing energy consumption of guis in android apps: A multi-objective approach. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 143–154, New York, NY , USA, 2015. ACM. 131 [101] Yepang Liu, Chang Xu, and SC Cheung. Finding Sensor Related Energy Black Holes in Smart- phone Applications. In Proceedings of the 2013 IEEE International Conference on Pervasive Com- puting and Communications, PerCom ’13, pages 2 –10, Washington, DC, USA, 2013. IEEE Com- puter Society. [102] Yepang Liu, Chang Xu, and S.C. Cheung. Where has my battery gone? finding sensor related energy black holes in smartphone applications. In PerCom, 2013. [103] Yepang Liu, Chang Xu, S.C. Cheung, and Jian Lu. Greendroid: Automated diagnosis of energy inefficiency for smartphone applications. Software Engineering, IEEE Transactions on, 40(9):911– 940, Sept 2014. [104] Irene Manotas, Lori Pollock, and James Clause. Seeds: A software engineer’s energy-optimization decision support framework. In ICSE, 2014. [105] Dustin McIntire, Kei Ho, Bernie Yip, Amarjeet Singh, Winston Wu, and William J. Kaiser. The low power energy aware processing (leap)embedded networked sensor system. In IPSN, 2006. [106] Donald McMillan, Alistair Morrison, Owain Brown, Malcolm Hall, and Matthew Chalmers. Fur- ther into the wild: Running worldwide trials of mobile systems. In Pervasive, 2010. [107] Phil McMinn. Search-based Software Test Data Generation: A Survey: Research Articles. Softw. Test. Verif. Reliab., 14(2):105–156, June 2004. [108] H. Mehta, R.M. Owens, and M.J. Irwin. INSTRUCTION LEVEL POWER PROFILING. In ICASSP, volume 6, pages 3326–3329. IEEE, 1996. [109] Micron. DDR3 SDRAM SODIMM Datasheet. http://download.micron.com/pdf/ datasheets/modules/ddr3/jsf16c256x64h.pdf. [110] Yasuhiko Minamide. Static Approximation of Dynamically Generated Web Pages. In Proceedings of the 14th International Conference on World Wide Web, WWW ’05, pages 432–441, New York, NY , USA, 2005. ACM. [111] Radhika Mittal, Aman Kansal, and Ranveer Chandra. Empowering developers to estimate app energy consumption. In Proc. of MobiCom, 2012. [112] Anders Møller. dk.brics.automaton – finite-state automata and regular expressions for Java, 2010. http://www.brics.dk/automaton/. [113] Steven S. Muchnick. Advanced compiler design implementation. Morgan Kaufmann, 1997. [114] Trevor Mudge, Todd Austin, and Dirk Grunwald. The Reference Manual for the Sim-Panalyzer Version 2.0. [115] Klaus M¨ ullen and Ullrich Scherf. Organic light emitting devices. Wiley Online Library, 2006. [116] Hung Viet Nguyen, Hoan Anh Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. Auto-locating and Fix-propagating for HTML Validation Errors to PHP Server-side Code. In Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE ’11, pages 13–22, Washington, DC, USA, 2011. IEEE Computer Society. [117] Henrik Frystyk Nielsen, James Gettys, Anselm Baird-Smith, Eric Prud’hommeaux, H˚ akon Wium Lie, and Chris Lilley. Network performance effects of http/1.1, css1, and png. In Proceedings of the ACM SIGCOMM ’97 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM ’97, pages 155–166, New York, NY , USA, 1997. ACM. 132 [118] Nima Nikzad, Octav Chipara, and William G. Griswold. Ape: An annotation language and middle- ware for energy-efficient mobile application development. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 515–526, New York, NY , USA, 2014. ACM. [119] A. Pathak, Y .C. Hu, M. Zhang, P. Bahl, and Y .M. Wang. Fine-Grained Power Modeling for Smart- phones Using System Call Tracing. In EuroSys, 2011. [120] Abhinav Pathak, Y . Charlie Hu, and Ming Zhang. Bootstrapping energy debugging on smartphones: A first look at energy bugs in mobile devices. In Proceedings of the 10th ACM Workshop on Hot Topics in Networks, HotNets-X, pages 5:1–5:6, New York, NY , USA, 2011. ACM. [121] Abhinav Pathak, Abhilash Jindal, Y . Charlie Hu, and Samuel P. Midkiff. What is Keeping My Phone Awake?: Characterizing and Detecting No-sleep Energy Bugs in Smartphone Apps. In Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services, MobiSys ’12, pages 267–280, New York, NY , USA, 2012. ACM. [122] P.A.H. Peterson, D. Singh, W.J. Kaiser, and P.L. Reiher. Investigating energy and security trade-offs in the classroom with the atom leap testbed. In 4th Workshop on Cyber Security Experimentation and Test (CSET), pages 11–11. USENIX Association, 2011. [123] H. Qian and D. Andresen. Jade: An efficient energy-aware computation offloading system with heterogeneous network interface bonding for ad-hoc networked mobile devices. In Software Engi- neering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2014 15th IEEE/ACIS International Conference on, pages 1–8, June 2014. [124] H. Qian and D. Andresen. An energy-saving task scheduler for mobile devices. In Computer and Information Science (ICIS), 2015 IEEE/ACIS 14th International Conference on, pages 423–430, June 2015. [125] Hao Qian and Daniel Andresen. Extending mobile device’s battery life by offloading computation to cloud. In Proceedings of the Second ACM International Conference on Mobile Software Engi- neering and Systems, MOBILESoft ’15, pages 150–151, Piscataway, NJ, USA, 2015. IEEE Press. [126] Virpi Roto, Andrei Popescu, Antti Koivisto, and Elina Vartiainen. Minimap: A Web Page Visual- ization Method for Mobile Phones. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’06, pages 35–44, New York, NY , USA, 2006. ACM. [127] Jose Saldana, Julian Fernandez-Navajas, and Jose Ruiz-Mas. Can we multiplex acks without harm- ing the performance of tcp? In 2014 IEEE 11th Consumer Communications and Networking Con- ference (CCNC), pages 503–504. IEEE, 2014. [128] Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. A Symbolic Execution Framework for JavaScript. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, SP ’10, pages 513–528, Washington, DC, USA, 2010. IEEE Computer Society. [129] C. Seo, S. Malek, and N. Medvidovic. Estimating the Energy Consumption in Pervasive Java-Based Systems. In Sixth Annual IEEE International Conference on Pervasive Computing and Communi- cations, pages 243–247. IEEE, 2008. [130] Chiyoung Seo, Sam Malek, and Nenad Medvidovic. An energy consumption framework for dis- tributed java-based systems. In Proceedings of the 22nd IEEE/ACM international conference on Automated software engineering (ASE), pages 421–424. ACM, 2007. 133 [131] Chiyoung Seo, Sam Malek, and Nenad Medvidovic. Component-Level Energy Consumption Esti- mation for Distributed Java-Based Software Systems. In MichelR.V . Chaudron, Clemens Szyperski, and Ralf Reussner, editors, Component-Based Software Engineering, volume 5282 of Lecture Notes in Computer Science, pages 97–113. Springer Berlin Heidelberg, 2008. [132] R. C. Shah and J. M. Rabaey. Energy aware routing for low energy ad hoc sensor networks. In Wireless Communications and Networking Conference, 2002. WCNC2002. 2002 IEEE, volume 1, pages 350–355 vol.1, Mar 2002. [133] Renzo Shamey, David Hinks, Manuel Melgosa, Ronnier Luo, Guihua Cui, Rafael Huertas, Lina Cardenas, and Seung Geol Lee. Evaluation of performance of twelve color-difference formulae us- ing two NCSU experimental datasets. In Proceeding of the 2010 Conference on Colour in Graphics, Imaging, and Vision, volume 2010, pages 423–428. Society for Imaging Science and Technology, 2010. [134] D. Shannon, S. Hajra, A. Lee, Daiqian Zhan, and S. Khurshid. Abstracting Symbolic Execution with String Analysis. In Testing: Academic and Industrial Conference Practice and Research Tech- niques - MUTATION, 2007. TAICPART-MUTATION 2007, pages 13–22, Sept 2007. [135] Gaurav Sharma, Wencheng Wu, and Edul N Dalal. The CIEDE2000 color-difference formula: Implementation notes, supplementary test data, and mathematical observations. Color Research & Application, 30(1):21–30, 2005. [136] Clayton Shepard, Ahmad Rahmati, Chad Tossell, Lin Zhong, and Phillip Kortum. Livelab: Mea- suring wireless networks and smartphone users in the field. SIGMETRICS Perform. Eval. Rev., 2011. [137] Amit Sinha and Anantha P. Chandrakasan. JouleTrack: A Web Based Tool for Software Energy Profiling. In DAC, 2001. [138] Stefan Steinke, Markus Knauer, Lars Wehmeyer, and Peter Marwedel. An Accurate and Fine Grain Instruction-level Energy Model Supporting Software Optimizations. In Proceedings of the 11th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS). Citeseer, 2001. [139] J. Stoess, C. Lang, and F. Bellosa. Energy Management for Hypervisor-Based Virtual Machines. In 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference, page 1. USENIX Association, 2007. [140] Prabakar Sundarrajan, Junxiao He, Shashi Nanjundaswamy, Sergey Verzunov, Charu Venkatraman, and Anil Shetty. Systems and methods for providing client-side accelerated access to remote appli- cations via tcp multiplexing, October 1 2013. US Patent 8,549,149. [141] TK Tan, A. Raghunathan, and NK Jha. Energy Macromodeling of Embedded Operating Systems. ACM Transactions on Embedded Computing Systems (TECS), 4(1):231–254, 2005. [142] TK Tan, A. Raghunathan, G. Lakshminarayana, and N.K. Jha. High-level Software Energy Macro- modeling. In Proc. of Design Automation Conference (DAC), pages 605–610. IEEE, 2001. [143] V . Tiwari, S. Malik, and A. Wolfe. Power Analysis of Embedded Software: A First Step Towards Software Power Minimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2(4):437–445, 1994. [144] Vivek Tiwari, Sharad Malik, Andrew Wolfe, and Mike Tien-Chien Lee. Instruction level power analysis and optimization of software. Journal of VLSI signal processing systems for signal, image and video technology, 13(2-3):223–238, 1996. 134 [145] JW Turkey and AE Beaton. The fitting of power series, meaning polynomials, illustrated on band- spectroscopic data. Technometrics, 16:189–192, 1974. [146] Margus Veanes, Nikolaj Bj ˜ A¸rner, and Leonardo de Moura. Symbolic Automata Constraint Solv- ing. In ChristianG. Ferm ˜ A 1 4 ller and Andrei V oronkov, editors, Logic for Programming, Artificial Intelligence, and Reasoning, volume 6397 of Lecture Notes in Computer Science, pages 640–654. Springer Berlin Heidelberg, 2010. [147] Margus Veanes, Peli de Halleux, and Nikolai Tillmann. Rex: Symbolic Regular Expression Ex- plorer. In Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation, ICST ’10, pages 498–507, Washington, DC, USA, 2010. IEEE Computer Society. [148] Olga Veksler. Efficient Graph-based Energy Minimization Methods in Computer Vision. PhD thesis, Ithaca, NY , USA, 1999. AAI9939932. [149] Basant Verma, Kevin Piazza, and Weichin Lo Hsu. Systems and methods for multiplexing network channels, March 31 2015. US Patent 8,996,657. [150] Java [TM] VM. Energy Behavior of Java Applications from the Memory Perspective. 2001. [151] Shaodi Wang, Hochul Lee, Farbod Ebrahimi, P. Khalili Amiri, Kang L. Wang, and Puneet Gupta. Comparative evaluation of spin-transfer-torque and magnetoelectric random access memory. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2016. [152] Shaodi Wang, Greg Leung, Andrew Pan, Chi On Chui, and Puneet Gupta. Evaluation of digital circuit-level variability in inversion-mode and junctionless finfet technologies. Electron Devices, IEEE Transactions on, 60(7):2186–2193, 2013. [153] Shaodi Wang, Andrew Pan, Chi On Chui, and Puneet Gupta. Proceed: A pareto optimization-based circuit-level evaluator for emerging devices. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, (99):1, 2015. [154] Gary Wassermann and Zhendong Su. An analysis framework for security in Web applications. In In Proceedings of the FSE Workshop on Specification and Verification of Component-Based Systems (SAVCBS 2004, pages 70–78, 2004. [155] Gary Wassermann and Zhendong Su. Sound and Precise Analysis of Web Applications for Injection Vulnerabilities. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’07, pages 32–41, New York, NY , USA, 2007. ACM. [156] Claas Wilke, Sebastian Richly, Sebastian Gotz, Christian Piechnick, and Uwe Assmann. En- ergy consumption and efficiency in mobile applications: A user feedback study. In GreenCom, iThings/CPSCom, CPSCom, 2013. [157] R Winsniewski. Android–apktool: A tool for reverse engineering android apk files, 2012. [158] Z Yang. Powertutor-a power monitor for android-based mobile platforms. EECS, University of Michigan, retrieved September, 2, 2012. [159] Min Yoon, Yong-Ki Kim, and Jae-Woo Chang. An energy-efficient routing protocol using message success rate in wireless sensor networks. JoC, 4(1):15–22, 2013. [160] Fang Yu, Tevfik Bultan, Marco Cova, and OscarH. Ibarra. Symbolic String Verification: An Automata-Based Approach. In Klaus Havelund, Rupak Majumdar, and Jens Palsberg, editors, Model Checking Software, volume 5156 of Lecture Notes in Computer Science, pages 306–324. Springer Berlin Heidelberg, 2008. 135 [161] Yuanyuan Zeng, Kai Xiang, Deshi Li, and Athanasios V . Vasilakos. Directional routing and schedul- ing for green vehicular delay tolerant networks. Wireless Networks, 19(2):161–173, 2013. [162] Jack Zhang, Ayemi Musa, and Wei Le. A comparison of energy bugs for smartphone platforms. In MOBS, pages 25–30. IEEE, 2013. [163] L. Zhang, B. Tiwana, Z. Qian, Z. Wang, R.P. Dick, Z.M. Mao, and L. Yang. Accurate Online Power Estimation and Automatic Battery Behavior Based Power Model Generation for Smartphones. In Proc. of IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, pages 105–114. ACM, 2010. [164] Yunhui Zheng, Xiangyu Zhang, and Vijay Ganesh. Z3-str: A Z3-based String Solver for Web Application Analysis. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pages 114–124, New York, NY , USA, 2013. ACM. [165] Lin Zhong, Mike Sinclair, and Ray Bittner. A Phone-Centered Body Sensor Network Platform: Cost, Energy Efficiency & User Interface. In Proceedings of the International Workshop on Wear- able and Implantable Body Sensor Networks, BSN ’06, pages 179–182, Washington, DC, USA, 2006. IEEE Computer Society. 136
Abstract (if available)
Abstract
Energy is a critical resource for mobile devices. Many techniques have been proposed to optimize the energy consumption of mobile devices at the hardware and system levels. However, only optimizations at the hardware and system level are insufficient. Poorly designed applications can still waste the energy of mobile devices even with fully optimized hardware and system support. ❧ In my dissertation work, I proposed multiple techniques to help developers to create energy efficient apps. Particularly, my dissertation work addresses three problems in creating energy efficient apps. The first problem in my dissertation is “where is energy consumed.” Modern mobile apps are very complex. They may contain more than 500,000 lines of code. Thus, it is important to know which parts of the code consume more energy. To address this problem, I developed a source line level energy measurement technique that can report the energy consumption of mobile apps with a very fine granularity. My technique achieved 91% accuracy during the measurement. The second problem in my dissertation is “what to optimize.” Modern mobile apps may use different libraries and invoke thousands of APIs. It is also important to know what kind of libraries and APIs can consume more energy. To address this problem, I conducted an empirical study with 405 Android market apps about how these Android apps consume energy. In this study, I evaluated ten research questions that have motivated my following energy optimization techniques. The third problem is “how to optimize.” After knowing where energy is consumed and what to optimize, it is also important to design effective techniques to optimize the energy consumption of mobile apps. To address this problem, I developed two automated techniques. The first one can automatically optimize the display energy for mobile web apps and the second one can optimize HTTP energy for Android apps. My display energy optimization technique reduced the energy by 25% and my HTTP energy optimization technique achieved 15% energy savings. In summary, my techniques and the empirical evaluation show that program analysis techniques can help developers to understand how energy is consumed in mobile apps and can also help to optimize the energy consumption of mobile apps.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Automatic detection and optimization of energy optimizable UIs in Android applications using program analysis
PDF
Detecting SQL antipatterns in mobile applications
PDF
Automated repair of presentation failures in Web applications using search-based techniques
PDF
Toward understanding mobile apps at scale
PDF
Utilizing user feedback to assist software developers to better use mobile ads in apps
PDF
Reducing user-perceived latency in mobile applications via prefetching and caching
PDF
Automated repair of layout accessibility issues in mobile applications
PDF
Detection, localization, and repair of internationalization presentation failures in web applications
PDF
Energy-efficient computing: Datacenters, mobile devices, and mobile clouds
PDF
Towards energy efficient mobile sensing
PDF
Detecting anomalies in event-based systems through static analysis
PDF
A framework for runtime energy efficient mobile execution
PDF
Constraint-based program analysis for concurrent software
PDF
Reducing inter-component communication vulnerabilities in event-based systems
PDF
Architectures and algorithms of charge management and thermal control for energy storage systems and mobile devices
PDF
Design-time software quality modeling and analysis of distributed software-intensive systems
PDF
Techniques for methodically exploring software development alternatives
PDF
Energy proportional computing for multi-core and many-core servers
PDF
SLA-based, energy-efficient resource management in cloud computing systems
PDF
Improving efficiency, privacy and robustness for crowd‐sensing applications
Asset Metadata
Creator
Li, Ding
(author)
Core Title
Energy optimization of mobile applications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science (Software Engineering)
Publication Date
09/23/2016
Defense Date
07/27/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
energy saving,mobile applications,OAI-PMH Harvest,program analysis
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Halfond, William G. J. (
committee chair
), Annavaram, Murali (
committee member
), Gupta, Sandeep (
committee member
), Medvidovic, Nenad (
committee member
)
Creator Email
dingli@usc.edu,ld028888@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-304884
Unique identifier
UC11279464
Identifier
etd-LiDing-4802.pdf (filename),usctheses-c40-304884 (legacy record id)
Legacy Identifier
etd-LiDing-4802.pdf
Dmrecord
304884
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Li, Ding
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
energy saving
mobile applications
program analysis