Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Automatic detection and optimization of energy optimizable UIs in Android applications using program analysis
(USC Thesis Other)
Automatic detection and optimization of energy optimizable UIs in Android applications using program analysis
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Automatic Detection and Optimization of Energy Optimizable UIs in Android Applications Using Program Analysis by Mian Wan A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2021 Copyright 2021 Mian Wan Dedication To my parents, Guoqing and Jinghui, for their endless love, support, and encouragement. ii Acknowledgements Pursuing a Ph.D. is the most challenging, rewarding, and wonderful journey of my academic career. When I reach this milestone, it is a great chance for me to acknowledge several people who were essential to me in the course of this trip. First, I would like to thank my advisor, Prof. William G. J. Halfond, for his constant support, critiques, suggestions, and encouragement during my entire Ph.D. study. I was deeply infected by his enthusiasm of doing research and his adventurous spirit, so I dare to try new ideas without worrying about failures. His valuable suggestions and insightful criticism helped me improve the quality of my dissertation research. His diligent working style set a good example for my future success. His way of mentoring students ensured that I received good training in research skills, communication skills, and writing skills, thus helping me to become an independent researcher. My whole time at USC, especially in our lab, would not have been unforgettable without my labmates. It is a kind of fate for us to be in the same lab under the supervision of the same advisor. Thanks to them, I was able to learn a lot about dierent cultures, traditions, and food. I also received a lot of help from them. Therefore, I would like to thank Ding Li (now at Peking University) for helping me adjust to the new environment at USC, Sonal Mahajan (now at Fujitsu Laboratories of America) for consoling me when I was stuck in research, Jiaping Gui (now at Stellar Cyber) for sharing his tools and scripts with me, Abdulmajeed Alameer (now at King Saud University) for providing useful feedback on my presentation slides, Yingjun Lyu (now at Amazon) for being a good companion when attending the ICSME conference, Negarsadat Abolhassani for being a hard-working co-author and giving me great feedback on my research ideas, Paul Chiou for helping me a lot in conducting the user study, and Ali Alotaibi for sharing Saudi Arabian coee and snacks. Last but not least, I would like to thank my parents for being supportive and helping me to adjust my mentality throughout my academic career. iii Table of Contents Dedication ii Acknowledgements iii List of Tables vii List of Figures viii Abstract ix Chapter 1: Introduction 1 1.1 Major Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Insights and Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Insight 1: Display energy optimization potential can be quantied . . . . . 3 1.2.2 Insight 2: Combination of both types of program analyses can be more eective to gather UI information . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Overview of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2: Detecting Display Energy Hotspots 9 2.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.1 Step 1: Gather UI Layout Information . . . . . . . . . . . . . . . . . . . . . 11 2.1.2 Step 2: Workload Execution and Screenshot Capture . . . . . . . . . . . . . 12 2.1.3 Step 3: Generate Energy-Ecient Alternative UIs . . . . . . . . . . . . . . 14 2.1.4 Step 4: Display Energy Prediction . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.5 Step 5: Prioritizing the User Interfaces . . . . . . . . . . . . . . . . . . . . . 17 2.1.6 Discussion of Usage Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 The Display Energy Prole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.1 Subject Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.3 RQ1: Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.4 RQ2: Generalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.5 RQ3: Ad Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.6 RQ4: Analysis Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.7 RQ5: Potential Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.8 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 iv Chapter 3: UI Implementation Study 33 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.1 Select Subject Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.2 Identify Developer Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.3 Create UI Related API List . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4 Experiments, Results, and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.1 RQ1: How Well Does Dynamic Analysis Work for UI Identication? . . . . 39 3.4.2 RQ2: How do Developers Use APIs to Dene UIs in Android Apps? . . . . 41 3.4.3 RQ3: Do Developers Use Fragments Frequently in Android Apps? . . . . . 43 3.4.4 RQ4: How Many Views Are Dened in Fragments, and Activities Respec- tively? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.5 RQ5: How do Developers Use Views to Customize the Android UIs? . . . . 47 3.4.6 RQ6: How do Developers Set the UI Style Properties of Views? . . . . . . . 48 3.5 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Chapter 4: Repairing Display Energy Hotspots 53 4.1 Background and Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2 Overview of the Repair Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.3 Hybrid UI Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.1 Fragment and Color Setting Operation Modeling . . . . . . . . . . . . . . . 57 4.3.2 Color Value Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3.2.1 Value Analysis for Color Setting API Calls . . . . . . . . . . . . . 61 4.3.2.2 Value Analysis for XML Based Color Settings . . . . . . . . . . . 64 4.3.3 Extracting UI Information From Dynamic Analysis . . . . . . . . . . . . . . 65 4.3.4 Merging Both Types of Analysis Results . . . . . . . . . . . . . . . . . . . . 66 4.4 Adaptive Color Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.1 Building the Color Con ict Graph . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.2 Fitness Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4.3 Generating a New Color Scheme . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5 Automated App Rewrite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.6.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.6.2 Subject Apps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.6.3 RQ1: Benets of Hybrid Analysis . . . . . . . . . . . . . . . . . . . . . . . . 75 4.6.4 RQ2: Energy Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.6.5 RQ3: User Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.6.6 RQ4: Analysis Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.6.7 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Chapter 5: Related Work 86 5.1 Reducing Display Energy by Recoloring . . . . . . . . . . . . . . . . . . . . . . . . 86 5.2 Reducing Display Energy by Darkening . . . . . . . . . . . . . . . . . . . . . . . . 87 5.3 Detecting Energy Bugs in Mobile Apps . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.4 Power Modeling Techniques for Mobile Devices . . . . . . . . . . . . . . . . . . . . 89 5.5 Modeling Mobile GUIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 v Chapter 6: Conclusion and Future Work 93 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.2.1 UI Color Design Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.2.2 Software Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.2.3 Program Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 References 99 vi List of Tables 2.1 Subject application information for the detection technique . . . . . . . . . . . . . 20 2.2 Average Spearman's correlation coecient of rankings between devices . . . . . . . 23 2.3 Average common screenshots in top 5 and top 10 between devices . . . . . . . . . 23 2.4 The dierences between rankings with and without excluding ads . . . . . . . . . . 24 2.5 Analysis time of the detection approach . . . . . . . . . . . . . . . . . . . . . . . . 26 2.6 The ten apps with the largest display energy hotspots . . . . . . . . . . . . . . . . 29 3.1 Distribution of average API calls building UI per activity over time . . . . . . . . . 41 3.2 Distribution of average code usage for modifying UI per app over time . . . . . . . 41 4.1 Three common ways of attaching fragments to views in Android apps . . . . . . . 59 4.2 Dierent approaches' feature comparison . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 Subject apps for the repair approach . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.4 The repair technique's analysis time for subject apps . . . . . . . . . . . . . . . . . 84 vii List of Figures 2.1 Overview of the detection approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 The error estimation rate of the power model . . . . . . . . . . . . . . . . . . . . . 22 2.3 The average estimated power savings of the subject apps . . . . . . . . . . . . . . . 27 2.4 The number of apps with display energy hotspots . . . . . . . . . . . . . . . . . . . 27 2.5 Transformed and original screenshots of the most energy-inecient app . . . . . . 28 3.1 Distribution over time of the number of apps that use fragments . . . . . . . . . . 43 3.2 The average number of views in fragments and activities . . . . . . . . . . . . . . . 45 3.3 Distribution of the usage of dierent view types over time . . . . . . . . . . . . . . 47 3.4 Dierent component distribution in parameter expressions for style arguments . . . 50 4.1 Overview of the repair approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 Partial constraint graph for the example code . . . . . . . . . . . . . . . . . . . . . 58 4.3 Nodes and edges for two ways of attaching fragments not used in the example code 60 4.4 Partial constraint graph annotated with view tree relationships . . . . . . . . . . . 61 4.5 Expression tree input and output for the Apply() function . . . . . . . . . . . . . . 62 4.6 View number results for four approaches . . . . . . . . . . . . . . . . . . . . . . . . 76 4.7 View source results for four approaches . . . . . . . . . . . . . . . . . . . . . . . . 77 4.8 Color modication results for AIRES and D+M approaches . . . . . . . . . . . . . 78 4.9 Power savings for six color scheme solutions . . . . . . . . . . . . . . . . . . . . . . 80 4.10 Three versions of screenshots of a UI in App 9 . . . . . . . . . . . . . . . . . . . . 80 4.11 General preference results for six color scheme solutions . . . . . . . . . . . . . . . 82 4.12 Preference results considering the battery level for six color scheme solutions . . . 83 viii Abstract Mobile apps and smartphones play an essential role in our daily life, and the energy consumption of an app has become an important concern for its developers. Given the fact that an app's display energy consumption can be optimized at the software level, many techniques have been proposed to help optimize the apps' display energy on OLED screens. However, there are no automated techniques for detecting and repairing energy optimizable User Interfaces (UIs) in Android apps. Instead, for detection, the developers can only manually examine each UI's colors and determine which UIs are optimizable based on their intuition. As for repairing, the developers need to manually analyze the app to modify the color settings to recolor the UIs. In this dissertation, I aim to overcome the above challenges and limitations by using program analysis based techniques to automate the process of detecting and repairing energy optimiz- able UIs in mobile apps. To achieve the goal, my dissertation can be divided into three main components. First of all, I developed dLens, the rst program analysis based technique using dynamic analysis, power modeling, and color transformation to automate the detection of energy optimizable UIs. This technique could estimate the power precisely and rank the optimizable UIs accurately based on their optimization potentials. Second, I conducted an empirical study to explore what are the new trends of code practice of app developers and whether these new trends can cause problems for existing program analysis techniques. This study provided empir- ical evidence on several important code practice trends to guide the design of future UI analysis techniques. At last, I devised AIRES, the rst hybrid program analysis based technique using static and dynamic analyses, a search-based technique, and the app rewriting technique to au- tomate the repair of energy optimizable UIs. AIRES could reduce the display energy of mobile UIs with signicant savings and wide user acceptance. In addition to the above contributions, I discuss the lessons learned when constructing these techniques, and point out the future work that can be inspired by my dissertation. ix Chapter 1 Introduction Nowadays mobile apps and smartphones play an essential role in our daily life. In 2019, 204 billion app downloads worldwide were done by app consumers, and this number increased 45% over 2016 [1]. Smartphones and mobile apps have become so popular, in part, because they combine sensors and data access to provide many useful services and a rich user experience. However, the usability of mobile devices is inherently limited by their battery power, and the use of popular features, such as the camera and network, can quickly deplete a device's limited battery power. Therefore, energy consumption has become an important concern. For the most part, major reductions in energy consumption have come about through a focus on developing better batteries, more ecient hardware, and better operating system level resource management. However, software engineers have become increasingly aware of the way an app's implementation can impact its energy consumption [2{11]. This realization has motivated the development of software-level techniques that can identify energy bugs, provide more insights into the energy related behaviors of an application, and facilitate generating corresponding repairs. An important observation is that the display component of a smartphone consumes a signicant portion of the device's total battery power and its energy consumption can be optimized at the software level. Previous studies in mobile app energy consumption have shown that a mobile device's display consumes one of the largest shares of energy [3, 12]. This problem has only grown as smartphone display sizes have increased from an average of 2.9 inches in 2007 [13] to over 5.5 inches in 2018 [14]. Traditionally, optimizing display power has been seen as outside of the control of software developers. This is true for liquid-crystal display (LCD) screens, for which energy consumption is based on the display's brightness and is controlled by either the end user or by the OS performing opportunistic dimming of the display. However, many modern smartphones, such as the Samsung Galaxy S20, are powered by a new generation of screen technology, the organic light-emitting diode (OLED). For this type of screen, brightness is still important [12]; however, 1 the colors that are displayed also become important. Due to the underlying technology, this type of screen consumes less energy when displaying darker colors (e.g., black) than lighter ones (e.g., white). The use of these screens means that there are enormous energy savings to be realized at the software level by optimizing the colors of the UIs displayed by the smartphone. In fact, prior studies have shown that savings of over 40% can be achieved by this method [5{8]. To take advantage of this observation, many techniques have been proposed to help optimize the apps' display energy on OLED screens. Several techniques [5{8, 15, 16] recolor the rendered web pages in a web app using an energy-ecient color scheme to save display energy. Another group of existing techniques [9{11, 17] dedicated to mobile apps, can automatically provide an energy-saving color scheme to the developers. However, all the techniques above have some limi- tations. First, they do not tell developers which UIs should be optimized, and let the developers make the decision instead. Unfortunately, developers do not have an eective technique to know which UIs of their apps can be optimized to save energy. Therefore, they must rely on their intu- ition to detect the optimizable UIs, and this process can be error-prone. Second, these techniques do not automate the entire repair process. Even though the techniques automatically generate an energy-saving color scheme for an app, the developers still need to manually analyze the app to modify the color settings. Third, the optimization techniques targeting mobile apps obtain the colors from dynamically captured screenshots, hence they cannot guarantee the completeness of the obtained color information. Incomplete color information may impair the energy savings and user experience of the transformed version of mobile Graphical User Interfaces (GUIs). To overcome the limitations of current techniques, my dissertation work focuses on two goals: (1) automate the detection of energy optimizable UIs, and (2) automate the repair process of energy optimizable UIs. 1.1 Major Challenges These two goals pose challenges for developers. For the detection of optimizable UIs, there does not exist a criterion to reliably discriminate whether a mobile UI is optimizable. For the repair of optimizable UIs, it is challenging to track how colors are dened in an app and guarantee the extracted colors are complete. Moreover, the new color scheme should be carefully computed to maintain the user experience of the new UIs. Detecting energy optimizable UIs is a dicult task. Developers lack a concrete criterion to judge whether a UI can be optimized. Developers can isolate the display energy of a UI by measuring it via a power meter or computing it through a power model. However, the developers 2 only know whether this UI is energy-consuming, they cannot judge whether it can be optimized to save energy. Thus, the developers can only manually examine each UI's colors and determine which UIs are optimizable based on their intuition. This can make the detection process error- prone, hence a developer may miss an optimizable UI in an app. Repairing energy optimizable UIs also poses several challenges. The rst challenge is that it is hard to fully automate the whole repair (i.e., transforming) process. A prerequisite of doing this is to track how each rendered color is dened in the app. Tracking the colors dened in the developer code requires understanding the semantics of dierent UI related APIs, and taking the complex control ows and data ows into account. Tracking the XML-dened colors requires understanding the complex rendering rules for mobile apps. The second challenge is to ensure the completeness of the UI information. The Android SDK provides a exible set of mechanisms for dening UIs. Widgets and their colors can be dened via XML based resource les, by invoking certain APIs within the developer written code, or even combining both mechanisms. Therefore, it is required to account for dierent mechanisms of dening UIs and to extract the information completely. The third challenge is that a new color scheme must be constructed to save energy without compromising the aesthetics of the UI. For example, a possible repair could simply use many dark colors as both background and text colors. However, such a repair would lead to a decrease in the readability of the UI. 1.2 Insights and Hypothesis In this section, I present the key insights guiding my research and the hypothesis that this disser- tation tests in order to achieve the goal of detecting and repairing energy optimizable UIs. 1.2.1 Insight 1: Display energy optimization potential can be quantied To address the criterion challenge discussed above, my rst insight is that the display energy optimization potential of a UI can be quantied. A mobile UI can be represented as a screenshot. The optimization potential of a UI can be gauged by computing the energy consumption dierence between the screenshot and a color transformed version. The dierence extracted from the power comparison of two versions of a UI screenshot can serve as the quantication of how much energy the UI can save after recoloring. This measurement of energy optimization potential can act as the indicator of which UI needs optimization. To realize this new criterion, my approach needs to combine dynamic analysis, power modeling, and color transformation techniques. The screenshots of UIs can be captured from a dynamic analysis. The power consumption of a UI 3 can be estimated by calculating the power sum of each pixel in a screenshot according to a power model [18]. The recolored version of a UI screenshot can serve as a reasonable approximation of what an optimized alternative UI would look like. This can be generated by using a color transformation technique [5{8]. Given these insights, it should be feasible to design an automated approach to detect energy optimizable UIs. 1.2.2 Insight 2: Combination of both types of program analyses can be more eective to gather UI information To resolve the challenges of automation and completeness of UI information, my second insight is that both static analysis and dynamic analysis can be used in a complementary way. The precondition of automating the recoloring via code rewriting is to understand how colors are dened and identify the mapping between identied colors and the relevant program points that will be modied. A static analysis, by its design, is able to analyze the code, interpret the UI mechanisms, compute the changing points in the program, and perform the code rewriting. However, a static analysis alone still cannot ensure the completeness of UI information. It may miss some important UI information for complicated expressions in the code. For example, if a color variable depends on runtime data, such as a network response, the static analysis can only approximate the relationship between the color variable and the runtime data, but cannot resolve the color variable as a constant value, which is needed to design the new color scheme. Similarly, commonly used size settings for a view, such as WRAP CONTENT and MATCH PARENT, only describe the relative relationship between a view and its parent. The actual size of a view cannot be extracted by a static analysis, but this information is required when determining the power consumption of the new color scheme. At the same time, a dynamic analysis can overcome this limitation since it collects the UI information about the rendered UI with specic values. However, a dynamic analysis also has two main drawbacks: (1) it cannot build the mapping between identied colors and the relevant program points; and (2) it cannot guarantee the completeness of the UI information. According to my study [19], a dynamic crawler could only achieve 49% activity coverage on average per app. Given the discussion above, one analysis's limitation can be resolved by the other. Moreover, both analysis results can be described as view hierarchy trees of the Android UIs. Therefore, it is possible to merge the two analysis results into a more complete result by using a probabilistic matching algorithm [20{22]. In summary, the combination of both types of program analysis is possible and will gain better completeness of the UI information. 4 1.2.3 Hypothesis Based on the above two insights, the hypothesis statement of my dissertation is: Energy optimizable UIs in mobile apps can be detected and repaired with high eectiveness and eciency using approaches based on program analysis. To evaluate the hypothesis, I designed and implemented two techniques that are based on dierent types of program analysis. The rst technique mainly employs dynamic analysis to detect energy optimizable UIs in a mobile app. The technique employs a dynamic analysis to capture the UI screenshots, and uses power modeling and color transformation to determine whether a UI is optimizable. I evaluated the eectiveness of this technique by computing its detection accuracy and generalizability. I also evaluated the technique's eciency by computing its analysis time. The empirical evaluations demonstrated that my technique is highly eective and ecient in detecting energy optimizable UIs. The second technique I designed and implemented uses both types of program analysis tech- niques to repair the energy optimizable UIs in an app. This technique employs a hybrid (static and dynamic) program analysis to model the layout and color information of UIs. Then the technique generates a new color scheme using a search-based technique. Last, based on the pre- vious program analysis results, the technique rewrites the app to apply the new color scheme. I evaluated the eectiveness of this technique by computing the power savings after repair and by conducting a user study to quantify the impact of the new color scheme on the attractiveness and readability of mobile UIs. To evaluate its eciency, I measured the running time of the tech- nique. The empirical evaluation showed that the technique could repair the energy optimizable UIs eectively and eciently. 1.3 Contributions The contributions of my dissertation include the design and development of two program analysis based approaches that aid developers in detecting and repairing energy optimizable UI, and an empirical study of UI implementations in Android apps that motivates the design of the hybrid analysis in my repair technique. 1. Detection technique | I designed and developed an approach, dLens, that combines dynamic analysis, power modeling, and color transformation to detect energy optimizable UIs. To the best of my knowledge, my approach was the rst program analysis based technique to automate the detection of energy optimizable UIs. As part of this contribution, 5 I also conducted an extensive evaluation to demonstrate the eectiveness and the eciency of my approach on real-world mobile apps. Moreover, the evaluation results indicated that although some excluding contents like ads, account for a small portion of the UIs, they can have a signicant impact on the rankings of energy optimizable UIs. The nding that many apps in the Android market are not optimized in terms of display energy eciency motivated me to develop an automated technique to help developers repair energy optimizable UIs. I discuss this contribution in Chapter 2. 2. UI implementation study | I designed a study examining the coding practice changes in implementations of Android apps' UIs. The study results have uncovered several interest- ing observations that impede the functionality of state-of-art program analysis techniques. Moreover, this study discovered several important code practice trends that should be con- sidered in future UI analysis techniques. These ndings impacted my design of the static analysis part of my repair technique, and motivated the decision to utilize a hybrid analysis in my repair technique. To the best of my knowledge, my study was the rst empirical study to investigate the coding practice trends in UI implementations of Android apps. I discuss this contribution in Chapter 3. 3. Repair technique | I designed and developed an approach, AIRES, that employs dy- namic analysis, static analysis, a search-based technique, and the app rewriting technique to model and recolor Android UIs. This design choice of using hybrid program analysis was made based on my UI implementation study results. To the best of my knowledge, my ap- proach was the rst hybrid program analysis based technique to automatically repair energy optimizable UIs for mobile apps. As part of this contribution, I also conducted an extensive evaluation to demonstrate the eectiveness and the eciency of my approach on real-world mobile apps. Furthermore, the evaluation results regarding the comparison among dier- ent types of program analysis techniques demonstrated the benet of the hybrid program analysis used in AIRES. I discuss this contribution in Chapter 4. 1.4 Overview of Publications In this section, I provide an overview of the publications that I have written during the course of this dissertation. The dissertation work is composed of three bodies of work corresponding to my contributions. Each of the chapters is based on one or more papers, which have been published or are under submission. The papers are listed below. For each of the papers, I was the primary author (or one of the primary authors), with contributions including design, implementation, and 6 evaluation of the work. All of the papers were co-authored with my Ph.D. advisor, Prof. William G. J. Halfond. Chapter 2: Detecting Display Energy Hotspots In this chapter, I discuss the detection technique, dLens, which I designed for detecting energy optimizable UIs in mobile apps. This work was originally published in the research track of the IEEE International Conference on Software Testing, Verication and Validation (ICST) in 2015 [23]. An extended version of this work was published in the Journal of Software: Testing, Verication and Reliability Volume 27, Issue 6 in 2017 [24]. The paper describing the work was co-authored with Yuchen Jin, an undergraduate student at USC, and Ding Li, a Ph.D. student at USC. Its extension has two more co-authors: Jiaping Gui, a Ph.D. student at USC, and Sonal Mahajan, a Ph.D. student at USC. 1. [23] Mian Wan, Yuchen Jin, Ding Li, and William G. J. Halfond. Detecting Display Energy Hotspots in Android Apps. In 2015 IEEE 8th International Conference on Software Testing, Verication and Validation (ICST), pages 1{10, April 2015 2. [24] Mian Wan, Yuchen Jin, Ding Li, Jiaping Gui, Sonal Mahajan, and William G. J. Hal- fond. Detecting Display Energy Hotspots in Android Apps. Software Testing, Verication and Reliability, 27(6):e1635, 2017. e1635 stvr.1635 Chapter 3: UI Implementation Study In this chapter, I describe the study that I designed for investigating the coding practice trends in UI implementations of Android apps. This work was originally published in the research track of the IEEE International Conference on Software Maintenance and Evolution (ICSME) in 2019 [19]. The paper describing this work was co-authored with Negarsadat Abolhassani, a Ph.D. student at USC, and Ali Alotaibi, a Ph.D. student at USC. 3. [19] Mian Wan, Negarsadat Abolhassani, Ali Alotaibi, and William G. J. Halfond. An Em- pirical Study of UI Implementations in Android Applications. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 65{75, Sep. 2019 Chapter 4: Repairing Display Energy Hotspots In this chapter, I discuss the repair technique, which I designed for repairing energy optimizable UIs in mobile apps. This work is currently under submission [25]. The paper describing this work 7 was co-authored with Ali Alotaibi, a Ph.D. student at USC, and Paul Chiou, a Ph.D. student at USC. 4. [25] Mian Wan, Ali Alotaibi, Paul T. Chiou, and William G. J. Halfond. Automated Opti- mization of the Display Energy for Android Apps. In Submission 8 Chapter 2 Detecting Display Energy Hotspots In this chapter, I give a detailed description of my approach to detect Display Energy Hotspots (DEHs). I also provide an empirical evaluation of the approach to demonstrate its eectiveness and eciency in detecting energy optimizable UIs. Despite the high impact of focusing on display energy, developers lack techniques that can help them identify where in their apps such savings can be realized. For example, the well-known Android battery monitor only provides device level display energy consumption and cannot isolate the display energy per app or per UI screen. Other energy related techniques have focused on surveys to identify patterns of energy consumption [4], design refactoring techniques that improve energy consumption [2, 26], programming language level constructs to make implementation more energy aware [27], energy visualization techniques [28], or energy prediction techniques [29]. Al- though helpful, the mentioned techniques do not account for display energy nor are they able to isolate display related energy. Existing work on display energy has focused on techniques that can transform the colors in a UI (e.g., Nyx [5] and Chameleon [16]). But these techniques do not guide developers as to where they should be applied, therefore they must be (1) used automatically for the entire app, which means that although colors will be transformed automatically into more energy-ecient equivalents, the color transformation may be less aesthetically pleasing than a developer guided one; or (2) applied based solely on developers' intuition as to where they would be most eective, which means that some energy-inecient UIs may be missed. To address these limitations, I propose a novel approach to assist developers in identifying the UIs of their apps that can be improved with respect to energy consumption. My insight is that the display energy optimization potential of a UI can be quantied by analyzing the captured screenshot and computing the energy consumption dierence between the screenshot and an optimal color transformed version. To do this, I designed a program analysis based approach combining display energy modeling and color transformation techniques to identify DEHs | 9 UIs of a mobile app whose energy consumption is higher than energy-optimized but functionally equivalent UIs. The approach is fully automated and does not require software developers to use power monitoring equipment to isolate the display energy, which, as explained in Section 2.2, requires extensive infrastructure and technical expertise. The approach to identify DEHs performs three general steps. First, the approach traverses the UIs of an app and takes screenshots of the app's UIs when they change in response to dierent user actions. Second, for each screenshot, the approach calculates an estimate of how much energy and power could be saved by using a color optimized version of the screenshot. Finally, the approach ranks the UIs based on the magnitude of these dierences. The approach reports these results, along with detailed power and energy information, to the developer, who can target the most impactful UIs for energy-optimizing transformations. I also present the results of an empirical evaluation of the approach on a collection of real-world and popular mobile apps. The results showed that the approach was able to accurately estimate display power consumption to within 14% of the measured ground truth and identify the most impactful DEHs. Furthermore, the results generated by the approach can be generalized from one hardware platform to others. Overall, the high accuracy and good generalizability demonstrate the high eectiveness of detecting energy optimizable UIs, while a low runtime of 12 to 66 seconds per screenshot demonstrates the eciency of the detecting process. 2.1 Approach The goal of the approach [23, 24] is to assist developers in identifying UIs that can be improved with respect to energy consumption. More specically, the approach detects DEHs, which are UIs that consume more display energy than their energy-optimized versions would. To detect these, the approach automatically scans each UI of a mobile app and then determines if a more energy-ecient version could be designed. It is important to note that DEHs are not necessarily energy bugs, as the DEHs may not be caused by a fault in the traditional sense. Instead, DEHs represent points where code is energy-inecient with respect to an optimized alternative. After detecting the DEHs, the UIs are ranked in order of the potential energy improvement that could be realized via energy optimization, and then reported to the developers. To achieve complete automation and not require developers to have power monitoring equip- ment, there are two signicant challenges to be addressed. The rst challenge is to determine how much display energy will be consumed by an app at runtime without physical measurements. To address this, the insight is that power consumption can be estimated by a display power model 10 Figure 2.1: Overview of the detection approach that takes UI screenshots as input. The second challenge is to determine whether a more energy- ecient version of a UI exists and to quantify the dierence between these two versions. To address this, the insight is that automated energy-oriented color transformation techniques can be used to recolor the screenshots and then calculate the dierence between the original and the more ecient version. Based on these two insights, the approach can automatically detect DEHs without requiring power monitoring equipment. An overview of the approach is shown in Figure 2.1. The approach requires three inputs: (1) a description of the workload for which the developers want to evaluate the UIs, (2) a Display Energy Prole (DEP) that describes the hardware characteristics of the target mobile device platform, and (3) the mobile app to be analyzed. Using these inputs, the approach performs the detection in ve steps. In the rst step, the approach instruments the apps to record runtime information about the UI layout. This information is used to identify certain types of components, such as ads, that should not be part of the DEH identication. The second step is to run the app based on the workload description and capture screenshots of the dierent UIs displayed. In the third step, the approach processes these screenshots and generates energy-ecient versions via a color transformation technique. Next, in the fourth step, a hardware model based on the DEP is used to predict the display energy that would be consumed by each of the screenshots and their energy-optimized versions. Finally, the fth step compares the energy consumption of each UI with that of its optimized version and gives the developers a list of UIs ranked according to the potential energy impact of transforming each UI. Each of these steps is now explained in more detail. 2.1.1 Step 1: Gather UI Layout Information The goal of the rst step is to facilitate the detection of content displayed in the app's UI that should not be considered for the purpose of detecting DEHs. This type of content is called Excluding Content (EC) and, broadly, it includes UI elements whose appearance will vary between executions. ECs are very common in mobile apps. For example, mobile ads are present in over 11 50% of all apps [30]. Since the colors present in an EC should not or cannot be changed by a developer, yet they could occupy a potentially signicant amount of the screenspace of a UI, they must be identied and removed from the screenshots to preserve the usefulness of the calculated display energy for a UI. The primary challenge in detecting ECs is that they are mostly indistinguishable from static content. For example, there is no visual dierence between an image that is static versus one that is dynamically loaded, nor is there a dierence in terms of the APIs used to display them. A notable exception is mobile ads, which invoke special APIs to visually render themselves in the UI. Based on this insight, the approach instruments the app so that when the workload is executed during step 2, the ads' sizes and locations are recorded. The process to identify ad related information is as follows. First, the approach identies invocations in the app's bytecode that call the app's ad network API. Then, certain ad related event handlers and callbacks are instrumented. The exact set of invocations to be instrumented varies by ad network. For example, for the Google ad network, AdMob, this set would include onAdLoaded and onReceiveAd, which can be dened by an ad listener, and loadAd, which is dened in the ad library and can be called by an Activity. At runtime, the instrumentation records timestamps, position and size information about each of the ads displayed. This information is used to populate a sequence of tuples F , in which each tuple is of the formht;ai, where t is the time at which the content was displayed and a represents the location and size of the occupied area. The approach provides a mechanism by which other types of ECs can be excluded from the DEH detection as well. Tuples may be manually added to F . This allows developers to specify image or text areas that they know to be dynamic and that should be excluded from the energy and power analysis in step 3. 2.1.2 Step 2: Workload Execution and Screenshot Capture The goal of the second step is to convert the workload description into a set of screenshots that can drive the display energy analysis in the subsequent steps. Strictly speaking, a workload description is not a necessary input to the approach since an automated UI crawler (e.g., PUMA [31]) could navigate an app and execute a xed or random distribution of actions over the UI elements. However, the use of a workload description allows the approach to analyze the app using realistic user behaviors or a particular workload of interest to the developers. For example, developers could collect execution traces of real users interacting with their app and use this to dene a workload for the energy evaluation. 12 The inputs to this step are the target app and its workload description. The app A is the Android Package Kit (APK) le that can be executed on a mobile device. The workload W is represented as a sequence of event tuples in which each tuple is of the formhe;ti, where e is a representation of the event (e.g., \OK button pressed") and t is a timestamp of when the event occurs relative to the rst event (i.e., t = 0 for the rst event e 1 ). The approach does not impose a specic format or syntax on W except that it must be reproducible. In other words, it must be specied in a way that allows for some mechanism to replay the workload. In the current implementation of the approach (Section 2.3.2) the RERAN tool [32] is used to record and then replay a workload description, so the exact syntax and format of W is dictated by that tool. GivenW andA, the approach captures screenshots of the dierent UIs displayed on a device's screen during the execution of W onA. The general process is as follows. The replay mechanism executes each event at its specied time. A monitor mechanism executes in the background of the device and captures a screenshot of the display every time it changes. This is done by hooking into the refresh and repaint events of the underlying device. The execution of the workload continues until all event tuples have been executed. Once the screenshots have been captured, developers may manually analyze the screenshots to identify areas (in addition to the EC areas that are automatically identied) that should be excluded from the DEH identication. The output of the second step is a sequence of tuples S, in which each tuple is of the formhs;ti, where s is a screenshot and t is the time at which the screenshot was taken (i.e., when the display changed), andF , which contains the EC information collected via the mechanisms described as part of step 1 and the areas marked by the developer. To capture screenshots, the implementation uses a modied version of an existing tool called Ashot. Ashot periodically captures screenshots of the currently displayed UIs. Ashot has a maximum sampling frequency that is fast enough to catch user speed events (e.g., clicks), but will not sample videos or animations at their full refresh rate. This sampling frequency does not aect the accuracy of the approach; it only reduces the overall number of screenshots captured. Furthermore, to reduce storage overhead, Ashot drops consecutive screenshots that are identical. The use of Ashot did not introduce any observable delay in the execution of W . Note that a necessary condition of both the replay and screenshot capture mechanisms is that their use does not alter the functionality of the app or the UI's appearance when it is rendered on the device. Both of these conditions were met by RERAN and Ashot. 13 2.1.3 Step 3: Generate Energy-Ecient Alternative UIs The goal of the third step is to generate an optimized version of each screenshot tuple in S so that the fourth step can calculate estimates of the energy consumption for each screenshot and its optimized version. However, this optimized alternative does not exist, so the approach must rst generate a reasonable approximation of what such an alternative would look like. A guiding insight is that prior work has shown that darker colors on OLED screens are more energy-ecient than lighter colors [15]. To take advantage of this insight, one could invert the colors of UIs with a white-colored background or systematically shift colors to make them darker, and then use this transformation as the optimized version. However, these approaches neglect the fact that both color inversion and linear color shifts do not maintain color dierence, which is the visual relationship that humans perceive when they look at a colored display [5]. Therefore, although the color-adjusted UIs would be more energy-ecient, they would not represent a reasonable approximation of optimized UIs as the resulting UIs would not be aesthetically pleasing. To address this challenge, the approach leverages a color transformation technique, Nyx, that was developed in prior work [5{8]. A key aspect of Nyx is that the color scheme it generates represents a reasonably aesthetically pleasing new color scheme. Nyx statically analyzes the structure of the HTML pages of a web application and generates a Color Transformation Scheme (CTS) that represents a more energy-ecient color scheme for the web application. Nyx does this by rst creating a Color Con ict Graph (CCG), where each node in the graph is a color that appears in a web page and each edge represents the type of visual relationship (e.g., \next to", \enclosing", or \not touching") that any two colors in the CCG have. The edges in the CCG are weighted by the type of visual relationship, with higher weights given to edges so that \enclosing" > \next to" > \not touching". Then, Nyx solves for a recoloring of the CCG that is energy- ecient, and also maintains, as much as possible, the color distances between colors in the original page that have a visual relationship. The weighting allows Nyx to prioritize maintaining certain types of color distances over others. Empirical studies show that the resulting color schemes can reduce display power consumption of web apps by over 40%. Additionally, user studies of the UIs generated by Nyx and other similar color transformation techniques [5, 9, 15, 16], have shown that the transformed UIs have high end-user and developer acceptance while only minimally aecting the resulting UIs' aesthetics. The approach adapts the CTS generation process of Nyx. There are three primary challenges to be addressed to conduct this adaptation | generation of the CCG, accounting for areas in the screenshots occupied by EC, and scalability. Nyx generates the CCG by statically analyzing the server-side code that dynamically generates web pages. In contrast, the approach only has 14 screenshots available, therefore, the adapted CCG models the color relationships between adjacent pixels, which are identied by analyzing each pixel of each screenshot and identifying its color and the colors of its surrounding pixels. To handle ECs, the approach only builds a CCG for the pixels of the screenshot that are not in an area dened in F . Entries in F can be matched to screenshots by matching the time intervals specied by the timestamps. Using the screenshot to construct the CCG leads to the third challenge, scalability. The recoloring of the CCG is an NP-hard problem with respect to the number of colors. The rendering kits of mobile devices use anti-aliasing and color shading to smooth lines and curves. This means that even a simple image, such as a black circle over a white background, would be rendered with many additional colors, such as grays, to smooth out transitions between adjacent colors. Because of these extra colors, the time needed to generate a CTS would make the approach's analysis time impractical. To address this scalability challenge, the approach maps each color in a screenshot to the closest of the 140 standardized UI colors [33], and then uses the resulting reduced set of colors to create the CCG. The approach then converts the color with the largest cluster to black, which is the most energy-ecient color, and solves the CCG recoloring problem using a simulated-annealing algorithm to nd the CTS [5]. Guided by the newly generated CTS, the approach recolors the original screenshot, except for the areas in F , so that every color in the cluster is replaced with its corresponding color in the CTS. This process is repeated for every screenshot tuplehs;ti2 S. For each such s, the approach generates ans 0 , which is the alternate version of the screenshot recolored as described above. The output of this step is a function O that maps each s to its corresponding s 0 . Since the approach uses an approximation algorithm, the generated CTS may not re ect the most optimal recoloring. Instead, the recolored UI represents a lower bound on the potential savings a color optimization could achieve. Additionally, the use of clustering means the approach performs its analysis on simpler versions of the screenshots with fewer colors. This can also introduce inaccuracy into the power estimation of the color-optimized screenshot. However, unless the screenshots dier signicantly in the amount of anti-aliasing used, this inaccuracy is small. To conrm this, the power consumption between a set of screenshots and the versions of the screenshots using the results of the clustered colors was compared and the average dierence was found to be below 2%; thus indicating the simplied version was a reasonable proxy for the full-color version. 15 2.1.4 Step 4: Display Energy Prediction The fourth step of the approach computes the display power and energy of the screenshots and their energy-ecient alternatives. The approach does this by analyzing each screenshot obtained in the second step and its optimized version generated in the third step with cost functions that estimate the energy consumption based on the colors used in the screenshot. The inputs to this step areF , populated in the second step; the screenshot tuples,S, generated by the second step;O, generated by the third step; and the cost function, C, provided by the DEP. (The development of the cost function provided by the DEP is explained in Section 2.2 and an evaluation of its accuracy is in Section 2.3.3.) The outputs of this step are two functions that map each screenshot tuple in S or O to its power (P ) and energy (E). P (s;t) = X k2jsj C(R k ;G k ;B k ) X a2F(t) X k2jaj C(R k ;G k ;B k ) (2.1) E(s;t s ;t e ) =P (s) (t e t s ) (2.2) The formulas for calculating the output functions are shown in Equations (2.1) and (2.2). Here, s can be replaced with O(s) as needed. To calculate P (s;t) for allhs;ti2S, the approach rst sums the power cost of each pixel in s, which is calculated by the cost function C that takes the values associated with the red (R), green (G), and blue (B) values of the pixel's color. From the calculated power value, the approach subtracts the power values calculated for each of the EC areas contained in s. As in step 3, the approach identies the EC areas corresponding to the screenshots using the timestamp information, represented as F (t), and then uses C to calculate the power of each pixel in each EC area a. The sum of the power for all of the EC areas is subtracted from the screenshot's overall power value. The value returned by P is in Watts. Recall that energy is equal to power multiplied by time. Therefore, E is equal to the power associated with the screenshot (P (s)) multiplied by the amount of time the screenshot is displayed. The display time is calculated by subtracting the time the screenshot is displayed (t s ) from the time the next screenshot is displayed (t e ), or in other words, subtracting the timestamp associated with screenshot s i from the timestamp associated with screenshot s i+1 , which would be of the form E(s i ;t i ;t i+1 ). The value returned by E is in Joules. 16 2.1.5 Step 5: Prioritizing the User Interfaces The goal of the fth step is to rank the UIs in order of their potential power and energy reduction. To do this, the approach calculates the power and energy of each color-transformed screenshot and compares it to the power and energy of the original screenshot. The inputs to the fth step are S, P , E, and O. D P (s) =P (s)P (O(s)) (2.3) D E (s;t s ;t e ) =E(s;t s ;t e )E(O(s);t s ;t e ) (2.4) Given these inputs, the approach calculates the power and energy dierence according to the formulas shown in Equations (2.3) and (2.4). For the dierence in power (D P ), the approach subtracts the power of the corresponding O(s i ) from that associated with each s i 2 S. The resulting number is in Watts and represents the power that could be saved by using s 0 i , the color- optimized version of s i . For the dierence in energy (D E ), the approach subtracts the energy of the correspondingO(s) from that associated with eachs i 2S. The resulting numbers is in Joules and represents the energy that could be saved by using s 0 i instead of s i . The output of the fth step is two sequences, R P and R E . Each sequence is comprised of the tupleshs;Di where s is the screenshot and D is either the dierence in power (D P ) or the dierence in energy (D E ). By choosing the metricD P orD E , the developers could choose whether or not to take the time spent by each screenshot into consideration. The sequences are ordered by each tuple's D value from highest to lowest. This ranking is the output of the approach and represents a prioritization of the screenshots that appear during the workload's execution in order of their potential power and energy reduction if they were to be color optimized. 2.1.6 Discussion of Usage Scenarios The output of the approach allows developers to identify the UIs of their app that could save the most energy with color optimization. Although the technique provides developers with a CTS, it does not automatically transform the app to use these colors. Nonetheless, the color mapping information can be useful. Developers may choose to use this CTS, build on it as a starting point for graphic designers to create a new palette, or leverage other automated techniques for identifying energy-ecient and aesthetically pleasing color schemes [9]. To close the loop and use the new color scheme, developers must modify points in the code where colors are dened and/or 17 modify the Android UI layout XML le color specications so that the app uses the new colors in places where the old colors would have been used. 2.2 The Display Energy Prole The DEP provides a pixel-based power cost function for a target mobile device. The use of the DEP allows the approach to analyze display power for multiple devices by simply providing dierent DEPs as input. It is expected that, in the future, a DEP will be developed and provided as part of a device's Software Development Kit (SDK). However, this is currently not common in practice, so this section discusses the steps required to develop a DEP. At a high-level, the DEP provides a cost function that can predict how much power an OLED screen will consume when displaying a particular UI. Prior research work has shown that the power consumption of a pixel in an OLED screen is based on its color [18]. Therefore, the input to the cost function is the RGB value that denes a pixel's color. The output of the cost function is the amount of Watts that will be consumed by the display of the pixel on the target device. C(R;G;B) =rR +gG +bB +c (2.5) The general form of the cost function is shown in Equation (2.5). R;G; andB represent the red, green, and blue components of a pixel's color, respectively. The coecients r;g;b; andc represent empirically determined constants. The value for each constant varies by mobile device. Note that the power model does not account for screen brightness. This is generally controlled by the user or OS, not the software developer. Furthermore, savings incurred by adjusting brightness would apply uniformly across all UIs. DEPs for four mobile devices, a 2.83" OLED-32028-P1T display (OLED) from 4D Systems, a Samsung Galaxy SII (S2), a Samsung Galaxy Nexus (Nexus), and a Samsung Galaxy S5 (S5) were constructed. For all of these displays, power consumption was measured using the Monsoon Power Monitor (MPM) from Monsoon Solutions Inc. [34]. The MPM allows voltage to be held constant while supplying a current that may be varied from its positive and negative terminals. The MPM samples the voltage and the current supplied and outputs the power consumption with a frequency of 5kHz. This sampling frequency is sucient for the development of the DEP, since the average duration of screenshots is in the order of seconds. Each power model was built by roughly following the process outlined by Dong and col- leagues [18]. First, the power consumption of a completely black screen was measured to dene a baseline power usage for an active screen. To determine the parameters for each RGB component, 18 the power consumption of the screen was measured while displaying solid-colored pages. The in- tensity of each color component was varied while holding the other two components at zero, and data points for 16 intensities of each component (R, G and B) were collected. In total, 48 data points for each device were obtained. After taking measurements for each color component, the baseline power usage was subtracted from these measurements to isolate the power consumption of each R, G, and B component. The relationship between the power consumption and the RGB value is non-linear due to a gamma encoding of the screen. Gamma encoding is a digital image editing process that denes the relationship between pixel values and the colors' luminance. It allows for human eyes to correctly perceive the shades of color of images that are captured digitally and displayed on monitors. To account for this encoding, the RGB values were raised to the 2.2 power to decode the image. While the gamma value can vary between 1.8 to 2.6, 2.2 is the standard image gamma of screens adopted by industry. After gamma decoding of the RGB values, linear regression was used to determine the coecients for Equation (2.5). The linear relationship between the RGB values and the power consumption was very strong. The average R 2 value for the four models was 0.99288. The detailed coecients for each device can be found in the project webpage [35]. One particular problem with the S2, Nexus, and S5 was that the measured energy also included the energy consumed by background processes and other hardware components of the smartphone. To properly isolate the display energy, two measurements were taken. For the rst, the ex cable that provided power, as well as data and signals, between the display and the CPU was disconnected. This oered a baseline measurement of the power consumption of the phone without the display. By disconnecting the cable, the phone still maintains its background processes instead of suspending them and going into sleep mode. This baseline value was subtracted from the second measurement, the power of the phone with the display cable attached, to calculate the display power for each of the colored pages. 2.3 Evaluation This section presents the results of an evaluation of the approach. The approach was implemented in a tool called dLens, and it was used to answer the following research questions: RQ 1: How accurate is the dLens analysis? RQ 2: How generalizable are the dLens results across devices? RQ 3: What is the impact of ads on the rankings? 19 Table 2.1: Subject application information for the detection technique Name Size (MB) Screenshots Time (s) Facebook 23.7 116 554 Facebook Messenger 12.9 55 268 FaceQ 17.9 96 470 Instagram 9.7 93 429 Pandora internet radio 8.0 75 278 Skype 19.9 65 254 Snapchat 8.8 142 465 Super-Bright LED Flashlight 5.1 20 51 Twitter 13.7 101 388 WhatsApp Messenger 15.3 65 242 Arcus Weather 4.0 36 143 Drudge Report 2.8 25 105 English for kids learning free 7.9 43 181 Retro Camera 29.0 29 106 Restaurant Finder 5.3 41 148 RQ 4: How long does it take to perform the dLens analysis? RQ 5: What is the potential impact of the dLens analysis? 2.3.1 Subject Applications The top ten most popular free Android market apps in the United States, as of August 2014, were selected as subject applications. However, since these apps either did not contain ads or their ad invocations could not be instrumented due to obfuscation, another ve apps that used mobile ads without any obfuscation were added to the evaluation. The subject applications are from dierent developers and have dierent features. For each of the subjects, a workload was manually generated by exercising the primary features and functions of each app. The average duration of the workloads was 272 seconds and each workload resulted in an average of 67 captured screenshots. Information about the apps is listed in Table 2.1. For each app, the size of its APK le, the number of screenshots captured as part of its workload, and the time duration (in seconds) of the recorded workload are reported. The rst ten rows contain information about the top ten apps and the remaining ve rows contain information about the unobfuscated apps. 2.3.2 Implementation The approach was implemented in a prototype tool called dLens. The implementation of dLens leveraged several other libraries and tools. To gather the mobile ad UI layout information, Ap- ktool [36] and the dex2jar [37] tool were used to reverse engineer the APK les. The apps 20 were instrumented using the ASM library [38]. Workloads were recorded and replayed using the RERAN tool [32]. The Ashot [39] tool was used to record screenshots of the dierent UIs dis- played. Ashot was also modied to associate a timestamp with each generated screenshot. As described in Section 2.1.3, the color transformation scheme generation was based on the code developed in the Nyx project [5{8]. Nyx was adapted to build CCGs using color information obtained from screenshots instead of static analysis. The energy consumption of each screenshot was measured on a MPM. Finally, the experiments were performed on four dierent platforms: a 2.83"OLED-32028-P1T display (OLED) from 4D Systems, a Samsung Galaxy SII smartphone (S2), a Samsung Galaxy Nexus smartphone (Nexus), and a Samsung Galaxy S5 smartphone (S5). 2.3.3 RQ1: Accuracy This research question deals with the accuracy of the dLens approach. The accuracy of dLens was evaluated with two metrics. First, the Error Estimation Rate (EER), which is the accuracy of the power estimate produced by dLens for a given screenshot, was calculated. Note that the EER is dierent from the accuracy reported in Section 2.2, which is the Pearson coecient that expresses the closeness of the t between the power measurements of the solid-color screenshots and the regression-based model. In contrast, the EER evaluates the closeness of the ground truth of the screenshots from the subject applications with dLens's estimates. The second accuracy metric was Device Ranking Accuracy (DRA), which is the accuracy of the UI screenshot ranking provided by dLens compared to the UI screenshot ranking provided by the ground truth power measurements. Note that if the EER was 0, the ranking of the screenshots would be 100% accurate. However, since the approach is dealing with physical systems, a certain amount of estimation error is to be expected. Therefore, the measurement of the DRA re ects the ability of the dLens approach to rank the UI screenshots accurately despite the underlying EER of the approach. The accuracy metrics were only calculated for power since the corresponding energy measurements could simply be obtained by multiplying the power estimates by the length of time the screenshot was displayed. The EER was calculated by the following process. First, dLens was run on each of the subject applications to generate the list of screenshots ranked by power. To compute the EER, ve screenshots and their corresponding color-transformed versions, were randomly selected from each app. For each pair, their ground truths were measured using the MPM on each of the four devices. The screenshots were sampled instead of completely measured because the process for isolating a screenshot's power and energy (see Section 2.2) was manual and, therefore, time-intensive. Then the power values estimated by dLens were compared to the ground truths for the selected screenshots. Figure 2.2 shows, for each of the subject applications and for each of the four 21 Figure 2.2: The error estimation rate of the power model devices, the average EER for the ve screenshots. Across all of the applications, the average EER for the OLED, S2, Nexus, and S5, was 3%, 5%, 8% and 8%, respectively. The accuracy for the corresponding color transformed versions was 5%, 7%, 6% and 8%. Overall, the OLED had a lower EER than the other three devices. A possible reason for this is that the OLED is only a display device and therefore does not have any noise introduced into its power measurements and models by background components or processes that would be experienced by the three smartphones. To compute the DRA, a process similar to that of calculating the EER was followed. However, instead of a random sample, the top-ve ranked screenshots were chosen. The top ve were used because this represented a group of screenshots that would likely be similar in terms of power consumption, and therefore, their relative ranking accuracy would be more sensitive to the underlying EER than a random subset. Also, the top ve represented the set most likely to be used to guide the developers, and therefore is the most representative of the accuracy that an end user of dLens would experience in real usage. For these top ve, the ground truths of each screenshot and its color-transformed version were calculated, and then ranked by their power consumption. Then this ranking was compared against the ranking computed by the dLens tool. To compare the rankings, each ranking of the ve screenshots was treated as a vector and Spearman's rank correlation coecient was used to compute the closeness of the vectors. The coecient gives a -1 or 1 when the two rankings are perfectly negatively or positively correlated, and is 0 when the two 22 Table 2.2: Average Spearman's correlation coecient of rankings between devices Base Device OLED S2 Nexus S5 OLED - 0.9874 0.9849 0.9888 S2 0.9874 - 0.9985 0.9990 Nexus 0.9849 0.9985 - 0.9942 S5 0.9888 0.9990 0.9942 - Table 2.3: Average common screenshots in top 5 and top 10 between devices Top 5 Top 10 OLED vs. S2 4.33 9.47 OLED vs. Nexus 4.6 9.67 OLED vs. S5 4.47 9.2 S2 vs. Nexus 4.47 9.33 S2 vs. S5 4.07 9 Nexus vs. S5 4.27 9 rankings are not correlated. Across all of the applications, the average DRA for the OLED, S2, Nexus and S5 was 0.83, 0.72, 0.6 and 0.89, respectively. These results indicate that the rankings were not an exact match. The cases where the ranking was not an exact match were investigated and were found to be likely caused by the closeness of the power consumption measurements of the screenshots in the top ve. For example, it was typical to see the top ve separated by about a 2% dierence in their overall power consumption. This was well within the possible error range that was measured for the EER for each device and was likely the reason for this ranking variation. Overall, the results for RQ1 show that the dLens tool is very accurate. For the EER, the estimated power was within 14% of the ground truth for all devices. Even with a non-zero EER, dLens was able to accurately rank the UIs as veried by the ground truth measurements. The minor variations seen in the rankings could be attributed to the small size of the EER. Both of these accuracy measurements are important for developers, as they indicate that their design changes can be made with a high degree of condence in both the actual estimates and the relative ranking of the UI screenshots. 2.3.4 RQ2: Generalizability This research question evaluates the generalizability of the results computed by dLens. This research question addresses a potential limitation to the usage of the approach. Namely, the screenshots captured by the approach and the corresponding DEP re ect the power and energy usage of UIs displayed on a particular set of devices. In practice, this set would be the device(s) the developer has available for testing purposes. However, other devices are likely to vary in 23 Table 2.4: The dierences between rankings with and without excluding ads Name Top 5 Overlap Top 10 Overlap Rank Correlation Arcus Weather 4 6 0.8263 Drudge Report 5 9 0.9687 English for kids learning free 3 9 0.9822 Retro Camera 2 7 0.3908 Restaurant Finder 4 8 0.9745 terms of screen resolution and power consumption characteristics. To better understand this potential limitation, this research question evaluates how well the rankings for one device match the rankings that would be computed for other devices using their own DEPs. If the results of the dLens approach are generalizable across mobile devices, developers can use the results from their own devices as a proxy for other or similar devices. To answer this research question, the similarity of the rankings generated by dLens for each of the mobile devices was compared. For this experiment, dLens was run for each device (i.e., using its own DEP) on the set of all screenshots for each app. For each app, its screenshot rankings were compared against those computed for the other devices. To compare the rankings, the Spearman's rank correlation coecient, explained in Section 2.3.3, was used. A pair-wise comparison of the rankings generated for each of the four devices was performed. The average Spearman's correlation coecient across all apps for each of the devices is shown in Table 2.2. The similarity of the topn entries of each ranking was also investigated, since developers may only check the top entries instead of the entire list. To do this, the top n of each device's ranking were treated as a set and its overlap with that of the other devices was computed. For each such comparison, the cardinality of the intersection of the two set's intersection was computed. This result is shown in Table 2.3 forn equal to 5 and 10. The results show a high similarity in the top n of the rankings. The average overlap between rankings is over 4 for the top 5 and over 9 for the top 10, demonstrating that this similarity applies to the portion of the ranking most likely to be used by developers. Overall, the results show that the rankings generated by dLens for each of the mobile devices were, in fact, highly similar. The implication of this nding is that the results of running dLens on one device are similar to those for other devices. More broadly, this indicates that energy- reducing redesigns undertaken by developers based on results from one device are likely to also reduce energy on other devices. 24 2.3.5 RQ3: Ad Impact This research question evaluates the impact of mobile ads on the rankings reported by the dLens approach. Essentially, it is measuring the impact on the rankings of the mechanism to exclude advertisements described in step 1 (Section 2.1.1). To measure this impact, the dLens approach was run twice, the rst time excluding the ad portions of the screenshots and the second time including the ad portions. The rankings generated by the two variations of the dLens approach were then compared. The results of this analysis are shown in Table 2.4. First the rankings of all of the screenshots were compared using the Spearman's rank correlation coecient. This is shown in the table as \Rank Correlation." The amount of overlap in the top n of the rankings was also computed by treating the topn screenshots as a set and computing the cardinality of the intersection of the two rankings. The results of this comparison are shown in the columns labeled \TopN Overlap" whereN was set to 5 and 10. The results show that for all apps and comparisons except for one (top 5 for Drudge Report) ads could aect the rankings. The magnitude of this impact varied signicantly. For Arcus Weather and Retro Camera, the impact was signicantly higher than for the other apps, such as Drudge Report and Restaurant Finder. The results diered due to a number of reasons, including the prevalence of ads and the appearance of the rest of the UI. These results indicate that although ads account for a small portion of the UI display, for some apps, they can have a signicant impact on the rankings of the DEHs and thus it is useful to have a mechanism to exclude them from the DEH calculations. 2.3.6 RQ4: Analysis Time This research question addresses the time needed to analyze an app using the dLens approach. The overall time needed to run the dLens approach and the time for three of the steps (instrumentation, power estimation, and color transformation) was measured. Note that for this RQ, the time to replay the workload for each app was not included as this time is under developer control. The analysis times are shown in Table 2.5. Only the analysis time for the S2 DEP was reported because the results were very similar for all four devices. The time for the color transformation is shown as \T C " and the time for power estimation is shown as \T E ". For the ve apps with ads, which required instrumentation, the required time is shown as \T I ". The total time is shown as \All UIs", which also includes time for le processing, ranking, etc. Since each app varies in terms of the number of screenshots in its workload, the time measurement was also normalized by dividing the total time by the number of screenshots captured in each app's workload. This value is shown in the column labeled \Per UI." All time measurements are shown in seconds. 25 Table 2.5: Analysis time of the detection approach Name T C (s) T E (s) T I (s) All UIs (s) Per UI (s) Facebook 1,470 7 - 1,477 12 Facebook Messenger 997 3 - 1,001 18 FaceQ 1,145 5 - 1,151 12 Instagram 2,799 6 - 2,806 30 Pandora internet radio 1,418 4 - 1,423 19 Skype 871 3 - 875 13 Snapchat 1,444 8 - 1,453 10 Super-Bright LED Flashlight 863 1 - 865 43 Twitter 1,316 6 - 1,323 13 WhatsApp Messenger 897 3 - 901 13 Arcus Weather 879 2 1 883 25 Drudge Report 1,192 2 1 1,195 48 English for kids learning free 1,377 4 1 1,382 32 Retro Camera 1,899 3 1 1,903 66 Restaurant Finder 1,320 3 1 1,324 32 The average time for dLens to analyze an app was 22 minutes and ranged from 14 minutes to 46 minutes. Although this amount can be considered high, it was directly dependent on the overall size of the set of screenshots, which ranged from 20 to 142. Therefore, the per screenshot number, which ranged from 12 to 66 seconds, is informative. For each screenshot, it was observed that most of the time was taken by the generation of the CTS. The runtime of the CTS algorithm is exponential with respect to the number of colors present in a screenshot. The Nyx approach uses an approximation algorithm to solve this problem. Therefore, this aspect of the approach can be sped up, if needed, by accepting lower quality approximations. However, this may have a tradeo in terms of accuracy of the computed results and rankings. 2.3.7 RQ5: Potential Impact This research question investigated the potential impact of dLens in two ways. The rst was by determining how many market apps contained DEHs and the second was by computing the amount of energy savings that could be realized by transforming the apps that contained DEHs. This analysis was performed on both the subject apps listed in Table 2.1 and on a much larger sample of 1,082 random Android market apps. Because of the large number of apps, the evaluation process was fully automated. The execu- tion of the apps and the capture of their initial home page was automated by using the ADB [40] tool from the Android SDK. One challenge was to automatically detect when the initial page of an app was valid (i.e., nished loading). This problem was solved with the heuristic that the screenshot was captured ve seconds after the app started. For almost all of the apps, this was 26 Figure 2.3: The average estimated power savings of the subject apps Figure 2.4: The number of apps with display energy hotspots 27 (a) Original screenshot (b) Transformed screenshot Figure 2.5: Transformed and original screenshots of the most energy-inecient app sucient time for the initial UI to load and display. To ensure that all the apps had been exe- cuted successfully, all screenshots were manually checked, and invalid screenshots, such as those representing crashed apps, were removed. In total, screenshots of 962 apps were valid and dLens was run on this set. For the subject apps, the potential savings are shown in Figure 2.3. As the gure shows, the savings varied from 0% to 28%. Flashlight's potential savings were low because it has an almost all black background, which means it was already close to optimal and there were only minuscule improvements to be realized. For the market apps, dLens found that 398 of the 962 apps contained DEHs. That means that for 41% of the examined apps, their main page consumes more display power than the optimized version. On average, the optimized versions would consume 30% less energy than the original versions. For some apps, this number was as large as 50%. For the apps with DEHs, Figure 2.4 shows how much energy could be saved by using the optimized version. In this gure, a point (X,Y) on the line means that there are X apps that could consume at least Y% less energy than their original version. 28 Table 2.6: The ten apps with the largest display energy hotspots App Package Name Potential Power Savings (%) biblereader.olivetree 50 com.amirnaor.happybday 49 com.adp.run.mobile 49 com.airbnb.android 49 com.darkdroiddevs.blondJokes 49 com.chapslife.nyc.trac.free 49 com.al.braces 48 appinventor.ai vishnuelectric.notextwhiledriving2 checkpoint1 48 appinventor.ai freebies freesamples coupons.StoreCoupons 48 bazinga.emoticon 48 The app with the largest potential energy saving was \Bible Study." This app's original design and optimized design are shown in Figure 2.5. In Bible Study, the original design uses a signicant amount of white as the background color, which consumes more energy than other colors. However, by using black as the background color, as shown in Figure 2.5b, a signicant amount of energy could be saved. Information about the top 10 most energy-inecient apps is shown in Table 2.6. The category distribution of the apps, per their app store category, was also analyzed. The categories that involved extensive reading or presented textual information, such as Communi- cation and News & Magazines, tended to have a higher ratio of apps with DEHs (above 43%). The categories that contained more video and graphical information, tended to have fewer apps with DEHs (below 20%). A possible explanation for this is that developers of text or reading oriented apps prefer to use light colored backgrounds (e.g., white) to mimic the traditional print media colors (e.g., newspapers) or because this color combination is considered more readable. The increased readability of light-colored apps is supported by evidence from user studies in prior work [5], but this same study also showed that users would prefer energy savings over a small decrease in readability. This suggests that developers of these types of apps might improve user satisfaction by optimizing the color of their apps' UIs and explaining to end users the energy related benets of the revamped design. The UI color choices for the apps were also investigated. The apps that did not contain DEHs tended to use black (#000000) as the main color. Here a color is considered the main color of a screenshot if it covers the most pixels in the screenshot. On average, black covered 79.3% of the pixels of the screenshots of those apps. Dimgray (#696969), darkslategray (#2F4F4F), darkgray (#A9A9A9), and gray (#808080) were also commonly used as secondary colors in these apps. These colors covered 8%, 5%, 3%, and 2% of the pixels, respectively, of the screenshots. The 29 color combination black:darkgray:gray:white:dimgray with the ratio 794:32:13:10:6 was the most commonly used combination in apps without DEHs. It was used in 23% of apps without DEHs and on average comprised 86% of the screenshots. In the apps with DEHs, white, dimgray, and whitesmoke (#F5F5F5) were the three most popular main colors. They were used as the main colors in 42%, 12%, and 10% of the apps, respectively. On average, each of these colors covered 65%, 56%, and 55% of the pixels on the screen. The results of this RQ showed that many apps in the Android market are not optimized in terms of display energy eciency and that signicant savings could be realized through their optimization. The results of this RQ also revealed basic trends and UI design patterns in both apps with and without DEHs. These results show that dLens can generate useful and actionable information that can have a large potential impact in terms of helping developers optimize the display energy for their apps above and beyond the detection of DEHs. 2.3.8 Threats to Validity A possible threat to the external validity of the results is the selection of only the most popular apps and the workload generation. For app selection, although the most popular apps may not be representative of all apps, choosing subjects in this way eliminated the threat of selection bias for the subjects and made it possible to argue that the approach is useful for even well-engineered apps. Furthermore, the results of RQ4 show the general applicability of the approach to a wide range of other apps. Even though the workloads were generated by this paper's authors, the workloads used the primary features of the apps, which were easy to identify in all cases. It is important to note that the usefulness of the approach does not depend on accurately identifying typical or representative workloads, as the approach can be used for any workload of interest. A threat to the internal validity of the results could exist if the mechanism of establishing ground truth or the approximation of the CTS generated by the Nyx-based method was inaccurate. To ensure the accuracy of the ground truth measurements, the protocols were developed and tested to ensure that they reliably and accurately captured power measurements. The protocols are also based on prior approaches [5, 18, 41]. The accuracy of the Nyx-based CTS has been established in prior work [5]. A threat to internal validity is that that the screenshots may contain additional ECs whose colors should not be optimized. To determine the magnitude of this threat to validity, a study was performed to quantify the potential impact of not excluding dynamic images and text. In this study, the screenshots of nine apps (representing 298 screenshots) were manually modied to remove the display area associated with EC from the display energy calculation. Then the 30 dierence in power between the original and modied versions of the screenshots was calculated. The median power dierence was 0.81% and the average dierence was 1.8%. Although these numbers are small, they are high enough to aect the rankings of some apps. Therefore, a failure to properly exclude EC could lead to misranked screenshots. A nal threat to internal validity is that the approach assumes that the CTS generated by Nyx represents an aesthetically acceptable version of the UIs. This assumption is reasonable based on prior studies that measured end users' assessment of the aesthetics of the generated color schemes. In an end user study of the color schemes generated by Nyx, it was found that although the readability and attractiveness of the transformed UIs were rated slightly lower than the originals, users overwhelmingly preferred the new color schemes when made aware of the energy tradeos [5{8]. More recent approaches that incorporate additional aesthetic constraints into the generation of color schemes report even better preference results [9{11]. Taken together, these results indicate that the automatically generated color schemes represent reasonable proxies for an aesthetically acceptable version of the analyzed UIs. For RQ1, a threat to validity is that the accuracy of only the top-ve ranked screenshots and a random sample of the remaining screenshots were measured. Sampling was used since the screenshot isolation process described in Section 2.2 is very time consuming and it was not feasible to measure the power and energy of every screenshot. Nonetheless, the experiment evaluation included over 1,800 such measurements to evaluate the accuracy of the four devices for all of the experiments required by the RQs. The subset was chosen at random to eliminate any potential selection bias, and there is not a reason to believe that the results would dier with a larger subset. The raw data is available via the dLens project page [42]. For RQ2, there are two threats to validity. The rst of these is a threat to criterion validity for the selection of ranking as the metric that was used to evaluate consistency across devices. An alternative would have been to use power measurements. However, ranking is the more appropriate metric since developers would be likely to prioritize their work based on rank instead of actual power dierences. Second, a threat to external validity for RQ2 is that only four power models were used, and three of them are manufactured by Samsung. Despite the results that showed generalizability across devices, it is likely that some phones may have DEP cost functions that will result in dierent rankings due to dierent underlying hardware mechanisms. Therefore, the results will not always generalize to all devices. Future investigations into what enables more reliable result generalizability will enable stronger conclusions to be made on this aspect. For RQ3, a threat to validity is that only the ranking dierences for ve apps were computed. This was due to the limitations of reverse engineering tools and the use of obfuscation techniques 31 in the apps. However, these ve apps cover dierent app categories, and it is likely that the results will be similar when considering a larger set of apps containing ads. 2.4 Conclusion In this chapter, I presented a new program analysis based technique for detecting DEHs in mobile apps and conducted an empirical evaluation on it to demonstrate its eectiveness and eciency. First, my technique is designed to detect the DEHs in mobile apps, which conforms to the problem domain in the hypothesis. Second, my technique could estimate the display power to within 14% of the ground truth and the Spearman's rank correlation coecient for top 5 screenshots was above 0.6, which indicates there is only minor variation in the rankings compared to ground truth. Therefore, my technique is accurate in detecting DEHs. In addition, the top 5 and 10 of the rankings among dierent devices showed a high similarity. Thus, I discover the results are generalizable among dierent devices. The high accuracy and good generalizability can be interpreted as high eectiveness. Third, the average analysis time for each screenshot varied from 12 to 66 seconds, which is within a reasonable range. Thus, I can conclude that my technique is ecient. To sum up, I can infer that my technique can detect the DEHs with high eectiveness and eciency, which partially conrmed my dissertation hypothesis. The evaluation results also indicated that although ads account for a small portion of the UIs, for some apps they can have a signicant impact on the rankings of the DEHs. Therefore, when applying dLens, the developers should develop a mechanism to exclude contents (e.g., ads, Web- Views) that are of no interest for the developers. Designing automated ways to identify excluding contents can be a future work direction. Furthermore, the nding that many apps in the Android market are not optimized in terms of display energy eciency, motivated the development of an automated repair technique. Finally, since dLens adapts the color transformation technique Nyx, a drawback of dLens is that it only considers the recolored version of UIs whose background colors are always black. This color transformation may not re ect the need of the developers. Another future work direction is to improve the color transformation technique which allows developers to customize the color scheme or generates more color schemes as suggestions to developers. 32 Chapter 3 UI Implementation Study In this chapter, I describe the details and ndings of a study that investigates the code practice trends in UI implementations of Android apps. The observations from this study inform and motivate the analysis design decisions of my repair technique described in Chapter 4. To automate my repair technique, a static analysis should be employed, since it analyzes the code and can compute the points to be changed in the program. Currently, many static analysis techniques extract the layouts of UIs by analyzing the layout XML les that are loaded in the UIs. However, modern versions of the Android SDK provide more exible mechanisms to create or modify UI elements in an activity's UI. For example, a fragment may dene a part of a UI loading its own layout XML le and this can be shared by dierent activities. Therefore, it is not clear whether the previous static analyses are still applicable in identifying the layout and color information of an activity. Dynamic analysis, another type of program analysis, is a good candidate for use by my repair technique. It is widely used by developers, and relies on app crawlers, such as Monkey [43], Dynodroid [44], and PUMA [31], that automatically interact with an app's UIs and collect the UI information. A key assumption underpinning these dynamic analysis techniques is that an app's UIs can be completely identied by a crawler. However, in cases where an activity's UI can only be accessed if the crawler interacts with the app in a certain way or enters inputs that satisfy specic constraints, the UIs of some activities may never be fully interacted with. While it is conceptually clear that crawlers have limitations when UIs are built dynamically, based on code, it is not clear in practice how often such situations occur. In this chapter, I report the results of my large-scale investigation of the mechanisms developers use to build UIs in Android apps. The goal of this investigation is to determine whether the ways of building UIs can cause completeness and applicability problems for existing program analyses, such as dynamic and static analyses. To carry out this investigation, I studied a large set of 33 apps using a combination of crawling, lightweight static analysis, and manual analysis. My study characterized the types of implementation mechanisms used for UI construction and discussed their impacts on the two types of program analysis techniques. I also investigated these practices over time, allowing me to ascertain if these trends were increasing or decreasing, thus giving an indication of which practices have become more popular in creating UIs. Overall, my results found that existing static and dynamic analysis techniques can be incomplete in collecting UI information since the contribution of code to dene UIs is increasing and the way of setting style properties is complex. These ndings motivated the decision to utilize a hybrid analysis in my repair technique. 3.1 Background In the Android platform, each UI window displayed on a screen is called an activity. A developer can customize a UI by implementing a subclass of the Activity class. Each activity will execute a series of lifecycle callbacks when it reaches a certain lifecycle state. For example, when an activity is created, it will rst execute the onCreate callback to do some initialization. In an Android app, all the activities are registered in the AndroidManifest.xml le. Each activity consists of a bunch of widgets, which are called Views. This is because every widget is a subclass of the View class. In most cases, a widget can respond to some user oper- ations (e.g., click) by executing corresponding event handlers. There is another kind of views, ViewGroups, that are usually not visible and serve as the container of widgets. Both views and viewgroups determine the layout of an activity. There are two dierent ways to customize the layout of an activity. The rst way is to use a layout XML le to represent the layout, and load it in the application code by calling certain APIs (e.g., setContentView(int)). A widget maps to an XML tag, and the attributes of a tag can dene many features of the widget. For example, the android:id property assigns an ID to the widget. The other way is to instantiate a View class and insert the view object into the activity by calling some APIs (e.g., ViewGroup.addView(View)). Since Android 3.0, in order to support tablets, fragments were introduced to represent an arrangement of views that may be reused. Fragments share some features with both Activity classes and View classes. Like views, fragments can be declared as XML tags in layout XML les, and instantiated objects in the program. Similar to activities, a fragment has its own lifecycle callbacks to do data processing. One of them, onCreateView, initializes the details of UI elements, and returns the root view of the fragment layout. Even though the purpose of 34 introducing fragments is reusing UIs, but a fragment may not necessarily be a UI element. If the return value of a fragment's onCreateView callback is null, then this fragment is not a UI element. To help developers manipulate the views exibly, Android SDK enables developers to specify an attribute of a widget both in the XML tag and in the code. As mentioned earlier, a developer can directly set a property value for a view tag in a layout XML le. In the code, since each widget is a view object, the rst step is to retrieve the widget object. A common way to do that is to call ndViewById(int id), whose parameter corresponds to the value of android:id in the widget tag. This mapping is maintained by the special class R.id. Once the developer obtains the view object, they set the properties of this object by using dierent API calls. For example, setBackgroundColor(int) sets the background color of a widget, achieving the same eect of an android:background property. In some cases, a developer may want to reuse a certain UI conguration across activities. Similar to CSS les, Android SDK provides styles and themes to reduce the complexity and repetition in the layout XML les. A style is a set of attributes that dene the appearance of a view. A theme is a style that is applied to an activity or the whole app. If a theme is not specied for an activity, the app's theme will be used. Just like CSS styles, a style or a theme can inherit some attribute settings from a parent, and override some new settings by modifying the related property values. To apply a style, developers need to set the style attribute of a widget tag. Similarly, a theme is specied by setting the android:theme property of an <activity> tag or the <application> tag. Similar to attributes of views, developers can make API calls to set a style or a theme in code. 3.2 Research Questions To start my study, I began with the most foundational Research Question (RQ): RQ1: How well does dynamic analysis work for UI identication? In this RQ, I explored the question of whether crawlers are complete with respect to identifying all of the UI elements of an app. Although as is known in theory, there are situations in which a crawler would be incomplete, in this RQ, I am investigating if this happens in practice. To address this RQ, I employed two dierent methods. First, I compared the traversed activities against the activities dened in the app and then I compared UI elements that could be identied by a crawler against the apps' ground truth of layouts. Second, I used a static analysis to identify code patterns that could potentially be problematic for crawlers. The results showed that crawlers were markedly incomplete. This motivated the subsequent RQs, which focused on identifying and understanding 35 the programming practices that could pose problems for both crawlers and alternative solutions, such as static analyses. The next RQ focused on obtaining a high level overview of the kinds of UI related actions developers were performing in code: RQ2: How do developers use APIs to dene UIs in Android apps? In this RQ, I systematically classied all invocations in the code based on their types. For example, void addView(android.view.View) is used to add a view to a layout dynamically while void setContentView(int) is used for setting the activity content from a layout resource le. Besides dening the UI, many invocations may modify the look of existing UI elements. For example, developers may use void setTextColor(int) to change the text color of a TextView to a specic color. Exploring this RQ enables me to get a better understanding of the types of actions that might be missed by crawlers or that static analyses would need to handle to get a more thorough model of the possible UIs. The results of this RQ helped me to identify types of UI related actions that could cause problems for crawlers and static analyses. One of the types of actions that caught my attention was the usage of fragments. Broadly, fragments allow developers to dene UI snippets in an XML le and then include these snippets in the UIs of dierent activities. While a crawler would see the results of including fragments the same as any other dynamically dened UIs, they pose special challenges for static analyses. In particular, fragments, unlike views, have unique lifecycle callbacks similar to those of activities. Therefore, fragment usage may create new challenges because it introduces new semantics to the Android apps which requires additional analysis. So in this RQ, I examined the following question: RQ3: Do developers use fragments frequently in Android apps? The results found the use of fragments had consistently increased over time. Since many static analysis based techniques do not consider fragments, the next RQ tried to determine the impact of omitting fragments. Specically, I asked: RQ4: How many views are dened in fragments, and activities respectively? In this RQ, I counted the number of views that were being created in each activity versus the number created in fragments. Overall, the results found a non-trivial usage of fragments both in terms of frequency and in terms of the number of views they found, motivating the importance of handling these in a static analysis based approach. Broadly, two types of views may be incorporated into an activity's UI, atomic views and adapter-based views. Atomic views (e.g., button) are the basic building blocks of a UI, whose layouts are predened in their corresponding classes. These types of views may be created and congured either in code or via the static XML based layout les. Detecting and analyzing these views is straightforward for both crawlers and static analysis based approaches, since for crawlers they appear in the UI and for static analyses, semantics can be attached to a single invocation 36 call associated with an atomic view. Adapter-based views are views used to display a set of child views provided by an Adapter, which is the input to a setAdapter call. Adapter-based views do not know details about the contained children and a developer has to invoke setAdapter to render those views in the UIs. The setAdapter conguration can only happen via code. For crawlers, the more complex views do not pose a problem, since when they are displayed, they can be identied easily like any other views. However, for static analyses, they pose signicant challenges because an analysis needs to rst identify its corresponding adapter and then to track where and how each child view is dened and added to the adapter. For this reason, I was interested in understanding how prevalent these are and how dicult they might be for static analyses to handle. Therefore, I investigated the following question: RQ5: How do developers use views to customize the Android UIs? Another kind of actions that caught my attention was the large amount of invocations used to set or modify the look of UI elements. Results showed that transparency, color, and size properties were dened frequently. For changing the transparency of views, only a few APIs are used and the arguments provided to them are mostly constants. On the other hand, more complicated expressions are often used for setting colors and sizes. Identifying and interpreting these actions is important for static analyses since both of these properties directly aect the visual representation of the UI elements and are used by many higher-level analyses, such as checking UI equivalence and detecting GUI changes in evolving mobile apps [45]. However, interpreting these actions could also be very dicult for static analyses, depending on how the arguments provided to those APIs are dened. For example, if they are simple string or numeric constants, then the values of these can be trivially found by a static analysis. However, if they are dened inter-procedurally or via complex expressions, then it would be non-trivial to identify these values and, absent a likely sophisticated mechanism for handling these, a static analysis may generate a large amount of inaccurate style information. Therefore, I investigated the following question: RQ6: How do developers set the UI style properties of views? Using a static analysis, I looked at the data ows and control ows of the arguments of the API invocations and classied the patterns used to dene each of these. 3.3 Data Collection In this section, I discuss the experiments that are designed to address each of the research questions mentioned in Section 3.2. The general steps are as follows. First, I downloaded a large pool of real- world Android apps from the Google Play app store. To enable analyses of the UI implementation 37 practices over time, I downloaded apps that were published at dierent times in the Google Play Store. Specically, I looked at three time ranges: late 2013, early 2017, and early 2019. To extract the implementation of those apps, I leveraged Soot [46] to analyze the APK le of each app. For the RQ1 requiring dynamic analysis, I employed Monkey [43] to interact with the apps, and dumped the UI layouts using UI Automator [47]. In the manual analysis, I used jadx [48] to analyze the decompiled Java source code of APK les. I discuss details about common steps for the data collection in the remaining part of this section. The dedicated approach description to each RQ will be presented in Section 3.4. 3.3.1 Select Subject Applications I had three criteria for selecting the subject applications. They (1) are from dierent categories, to make sure the results can be generalized to a broad set of apps; (2) can be successfully analyzed by the tools I used, to ensure I can gather complete information from an app's APK; (3) were published at dierent points in time, to allow me to observe developer trends over time. To meet the above criteria, I downloaded free apps from the Google Play Store. I collected three groups of apps, one of apps that were published in late 2013, another from apps published in early 2017, and the last one from apps published in early 2019. Within each group, I downloaded 9,000 apps and then randomly selected a subset of 600 apps from each group for analyses in my study. From each subset, I only excluded an app whose bytecode or resource les could not be analyzed. Overall, this gave my study a subject pool comprised of 1,671 apps and representing over 23 app store categories. 3.3.2 Identify Developer Code Since I am studying developers' practices, I wanted the results to account for only developer writ- ten code. However, the subject apps contain both library code and developer code. Furthermore, some parts of an app may be obfuscated, which makes it more dicult to distinguish the two types of code. The research community has developed several techniques for addressing aspects of this problem (e.g., [49, 50]). However, these techniques focus on a precise identication of library code, not developer code. Therefore, these techniques could mistakenly classify unidentied li- brary code as developer code. My preference was to have a more precise identication of developer code, i.e., I would prefer to miss some developer code rather than erroneously include library code in my analysis. The reason for this preference was that library code is generally quite large and 38 may contain a broad range of implementation practices that are not necessarily representative of regular developer code. To consider developer code only, my insight was that many developer dened classes' (e.g., the main activity class) names and library classes are generally not completely obfuscated. The package names of those classes can be used as keywords to locate the corresponding library projects. Based on this insight, I identied developer code as follows. First, I manually checked nearly 180 top ranked apps, and summarized a list of library package names. Then, I extracted an app's package name and the main activity class package name from the AndroidManifest.xml le. Based on these two package names, I then created a string pattern that was sucient to uniquely represent the developer classes' packages. For example, for an app whose main activity class name is \com.thepuzzle.activity start", the summarized pattern is \com.thepuzzle", because \com" is very common. For classes whose name matched the summarized pattern and did not match any entry in the library package list, I considered them as developer classes. 3.3.3 Create UI Related API List In order to study the developers' practices, it is essential to create a complete list of UI related APIs in the Android SDK. To do that, I rst summarized the relevant packages (e.g., \android.widget") and method name keywords (e.g., \color"). Then I read through a list of the methods dened in each class under relevant packages in android.jar le in the Android SDK. If a method's name matched the keywords, then I added it to my API list. Later, I went over the Android SDK documentation to conrm each method was correctly identied and belonged to the right category. At the same time, I examined the documentation for any missing methods. In total, I identied 272 API calls from three categories in the list. The three categories were dened as APIs that are used for: (1) setting UI's look (e.g., setTextColor()), (2) in ating XML layouts (e.g., setContentView()), and (3) modifying UI in code (e.g., addView()). 3.4 Experiments, Results, and Discussions 3.4.1 RQ1: How Well Does Dynamic Analysis Work for UI Identication? Approach: To address this research question, I investigated the amount of incompleteness of dynamic analysis at dierent levels of granularity: (1) activity coverage, (2) view coverage, and (3) certain invocation statement coverage. For the rst two metrics, I designed a methodology to show the coverage dierences between dynamic analysis and ground truth. For the third metric, 39 I used another methodology to statically determine how straightforward it might be for crawlers to trigger the execution of certain invocations of UI related APIs. In the rst methodology, I directly evaluated the completeness of a representative crawling technique against the ground truth for a set of apps. To dene the ground truth at the activity level was relatively straightforward. I could obtain a list of all the activities from the Android- Manifest.xml le of an APK le. The denition of the ground truth at the view level was more challenging and required extensive manual analysis. Therefore, for a small set of eight apps, I manually analyzed each app to develop the view ground truth. I interacted with the app, sys- tematically exploring all UIs, clicking on all buttons and settings, and manually inspecting all code and resource les. I then ran Monkey [43] on each of a larger set of 71 apps for about 30 minutes to represent the results I would expect when using a crawler. I chose Monkey because of its widespread popularity, results in the research community that consistently show it among the top performing crawlers [51], and because it could robustly run against all apps. I used only eight apps in the view coverage study because of the long amount of time needed for each app. Additionally, I ltered out apps for which Monkey got stuck at certain screens, for example those requiring a login. I then compared the ground truth against the Monkey results and calculated the activity coverage and view coverage. For calculating view coverage, I only considered activities that Monkey could successfully interact with. In the second method, I wanted to determine if UI elements were created in ways that would require the crawler to interact with the app in a certain way in order for them to display. To calculate this number, I implemented a static analysis that found the invocations in the app's bytecode that modied the UI by creating a new view or changing the style properties of a view and then determined if these invocations were control dependent on another node. Informally, a statement is control dependent on another node if its execution depends on the conditional execution (i.e., true or false) of that statement. So my analysis was determining how many of the UI related API invocations in the app were dependent on a switch or if statement. Essentially, this gave me an approximation for the amount of UI related behaviors would only be triggered if some conditions were met during execution (e.g.,, satised by the crawler). My analysis for this method was based on the classic intra-procedural control dependency analysis dened by Ferrante, Ottenstein, and Warren [52]. Results: I found that the crawler was able to achieve 49% activity coverage on average per app and 93% view coverage on average per activity. More specically, for 63/71 apps, the crawler missed at least one activity and for 6/8 apps, the crawler missed at least one view. In terms of the control dependency of UI related APIs, I found that in the 2013 app set 39% of the calls were 40 Table 3.1: Distribution of average API calls building UI per activity over time Year Transparency Position Color Size General XML In ators 2013 4.83 0.08 3.42 1.73 0.77 2.19 2017 10 0.30 5.46 3.10 1.56 3.35 2019 10.94 0.42 4.84 2.49 1.67 3.11 Table 3.2: Distribution of average code usage for modifying UI per app over time Year Fragment Related Adapters Fragment Related Calls View Related Adapters View Related Calls View Instantiation Fragment Instantiation 2013 0.29 2.13 7.65 27.78 13.30 4.64 2017 0.63 3.45 13.97 49.67 23.47 17.35 2019 0.98 5.58 18.10 59.29 22.67 18.55 conditionally guarded; in the 2017 set this number was 42%; and in the 2019 set, this number was 41%. Discussion: Overall, the results conrm that apps are built in ways that can cause problems for crawlers. More broadly, these results indicate that crawlers face challenges in terms of com- pleteness, which has signicant implications for the soundness of verication techniques built on top of them. This nding of incompleteness motivates further investigations into the practices used to develop UIs to identify the potential challenges that alternative techniques, such as static analyses, may face in analyzing UI related code in apps. 3.4.2 RQ2: How do Developers Use APIs to Dene UIs in Android Apps? Approach: To answer this research question, I investigated the types of APIs invoked by devel- opers to dene the UIs of an app. To do this, I developed a static analysis to analyze each app's bytecode and count invocations to the UI related APIs. I compared the signature of the target method of each invocation in the app's code against the list of API calls described in Section 3.3. I divided API calls into three categories based on their usage and impact. The rst category is API calls used for setting the content of the UI (e.g., android.view.View in ate(int,android.view. ViewGroup)). The second category contains API calls used to manipulate the UI by adding, removing, or replacing UI elements (e.g., void removeViewAt(int)). This category is divided into two subcategories, (1) views related API calls, and (2) fragment related API calls. The third category comprises of API calls that modify the look of UI elements. I divided this third category into ve subcategories based on the impact of each API call on the UI. These ve subcategories are setting the (1) transparency, (2) position, (3) color, (4) size, and (5) general look. For example, 41 I considered setMargin( oat, oat) as an API call for setting the size of views. For general look settings, I considered views' constructors, setTheme and setStyle, which are able to set dierent UI properties from dierent categories described above. For the third category, I only consider invocations on View and Window classes since these two are responsible for creating the look of a UI. To compare the number of invocations over time, I used two metrics. One metric is the average number of API calls used in each activity, and the second one is the average number of API calls used in each app. For the rst and third categories, I used the rst metric, as their usage in apps is prevalent. However, for the second category, I leveraged the second metric because (1) their usage in apps is optional, and (2) some API calls in this category (e.g., adapter-based API calls) are dened for specic purposes. For computing these metrics, I excluded the apps with no activities like service apps and apps whose UIs are dened in native methods. In total, I excluded 15, 50, and 73 apps from 2013, 2017, and 2019 apps, respectively. Results: The number of API calls related to transparency increased from almost 5 calls per activity in 2013 to 10 and 11 calls in 2017 and 2019 respectively. The same observation is valid for the other look related API calls. For color property, the number of related API calls increased from 3 in 2013 to 5 in 2017 and in 2019. Size-related API calls were also used around three times in each activity in 2017 and 2019, while the number was less than two in 2013. The API usage for setting position and general attributes of views has not changed signicantly. The usage of adapters in apps for both views and fragments was 8 in 2013, 14 in 2017, and 19 in 2019. The total number of fragments and views instantiated in code was 17 in 2013, 40 in 2017, and 2019 per app. Discussion: Table 3.1 indicates that the contribution of code to set the look of the UI in each activity is increasing over time. Therefore, analyzing XML les for understanding the nature of UIs is not enough. Also, as the number of XML in ating calls (e.g., setContentView()) per activity is increasing, determining the possible looks of an app is getting more complicated since the use of these APIs indicates that a UI can change to another layout based on a set of conditions. In such situations, crawlers might miss the UI elements and their arrangement in layouts. Table 3.2 shows an increasing trend of using the code for building UIs in apps over time by modifying the views and fragments both with the help of adapters and only API calls. The dramatic increase in the average number of views and fragments instantiated in code per app implicates that crawlers will miss a considerable amount of views if they only account for the layout XML les. However, this aspect also adds complications to static analyses, as these analyses must consider code to identify the views used in a layout and their properties both from XML les and code. 42 160 266 300 397 287 253 2013 2017 2019 number of apps Without Fragments With Fragments Figure 3.1: Distribution over time of the number of apps that use fragments 3.4.3 RQ3: Do Developers Use Fragments Frequently in Android Apps? Approach: To answer this research question, I analyzed each app to determine how many frag- ments it used. To do this, I built a static analysis to inspect all of the application classes dened in each app by the developer code and determine how many were subclasses of the Fragment class. The analysis could determine if a particular instance was a subclass by transitively following each class's parent class to see if the root class was the Fragment class. My analysis then counted the number of apps that dened at least one Fragment class and compared them against the number of apps without any Fragment classes. It is possible that some fragments may not create any views. For example, this may happen when a developer wants provide a feature to an activity without being interrupted by activity re-rendering caused by UI conguration changes. I was interested if these situations might aect the results, so I also calculated the subset of fragments that did not create any views. I.e., they would not have any impact on the UI. The analysis identied these by looking at the return value of the onCreateView() callback in each Fragment class found and determined if it returned a view object or a non-view value (i.e., null). I counted those that returned a non-view value as non-UI fragments. Results: Figure 3.1 shows the usage of fragments over time. The results show that out of 557 apps each year, 160 apps used fragments in 2013, 266 used fragments in 2017, and 300 apps used fragments in 2019. The results also indicated that UI fragments dominated the fragment usage, averaging between 92% and 97% in each year. Discussion: The results show that fragments have become more popular over time and that the vast majority of these cause changes in the UI of an activity. The increasing usage of fragments poses signicant challenges for Android UI analyses, in particular, static analysis based 43 approaches. The layout of a fragment is dened via a unique set of lifecycle callbacks. To know the layout information, it is necessary to analyze the code of the lifecycle callbacks and understand their semantics. Of special note, the binding relationship between fragments and activities can be many to many, since a fragment can be shared across dierent activities. Beyond identifying the possible bindings, it is also essential to identify which fragment is introduced in which activity. For example, nding the mapping in cases of using a Collection in the fragment related adapter makes it dicult to analyze. All of these aspects would require sophisticated analyses to be accurate. 3.4.4 RQ4: How Many Views Are Dened in Fragments, and Activities Respectively? Approach: To address this research question, I developed an analysis to determine how many views, on average, would be missed if fragments were not accounted for by an analysis. Since I do not have a path sensitive analysis that is able to compute exact UI layouts, my methodology computes an estimation of this number. To do this, I rst determined how many views on average were created in each activity and fragment. Then, because multiple fragments may be included in an activity, I determined the average number of fragments included in each activity. My analysis computed averages for each of these numbers on a per-app basis and then used the product of those averages to compute a nal estimation. To nd the average number of views in an app's fragments and activities, I developed a static analysis to examine the bytecode of the methods of an app. For each app, the analysis counted the number of views that were dened in each activity and fragment of an app. This analysis considered views dened in both code and XML layout les. To count the number of views dened in the XML layout les, the analysis extracted the layout resource ID from the API calls that included XML content (e.g., in ate(int)) in Fragment or Activity classes. Specically, the analysis identied calls to such APIs that appeared in the body of onCreateView() and onCreate(), since these are the methods responsible for initializing fragments and activities. Then for each XML le identied in this way, the analysis used an XML parser to count the number of views dened in the le. To count the number of views dened by code, the analysis scanned the method bodies of all of the Fragment and Activity classes and counted the number of views created in each. Specically, the analysis identied view creation by identifying instantiations of objects of type View. For instantiations in loops, my static analysis assumed that the loop would be unraveled once, which means I calculated number is likely a lower bound on the average view numbers for a fragment or activity. Next, my analysis computed the average number of fragments included in each activity of the app. To do this, the analysis identied invocations to APIs that modify 44 0 50 100 150 200 2013 2017 2019 views number activity fragment Figure 3.2: The average number of views in fragments and activities 45 fragments in an activity and computed how many of these appeared, on average, in the activities of the app. My analysis considered two ways of modifying fragments: (1) calls to the add, replace, or remove methods of the FragmentTransaction class, and (2) calls to the getItem() method of the PagerAdapter class. Since the latter can add an indeterminate number of fragments in a single call, I conservatively assumed that these calls only added one fragment. This means that my estimation of the average number of fragments per activity is a lower bound. Finally, for each app, I estimated the average number of views dened in fragments of an activity by multiplying the average number of views in a fragment by the average fragment number in an activity. The resulting number served as my estimate of how many views, on average, could be missed by analyses that did not not account for fragments. Results: The calculations indicated that, on average, for each activity 13 views in 2013, 14 views in 2017, and 11 views in 2019 may be missed by analyses that do not account for fragments. I illustrate some of the intermediate values that enabled me to reach these results. Specically, Figure 3.2 shows the distribution of views dened in UI Fragment classes and Activity classes. The median number of views dened in activities was around 25 in 2013 and 2017, and it reached to over 40 in 2019. The median number of views dened in fragments increased from less than 20 views in 2013 to 30 in 2017, and then to 40 in 2019. I also show results regarding the number of fragments added per activity. In Table 3.2, the column labeled Fragment Related Calls shows the number of fragments added, on average, by the FragmentTransaction class and the column labeled Fragment Related Adapters shows the number of adapters that can create at least one fragment. Together these two columns show the lower bound of total number of fragment manipulations, on average, per activity. Discussions: Figure 3.2 shows an increasing trend of dening more views in majority of fragments while the trend is more stable for activities. This shows an increasing chance of missing views in more recent apps in terms of total views per app. However, as the number of activities has increased dramatically in 2019, the number of missing views per activity decreased. The results from this RQ and previous one emphasize on the importance of fragments and its impact on the UI. I observed activities in which developers dene an empty layout only and populate it using fragments. Although, crawlers can extract information about fragments, they might miss triggering an event responsible for manipulating a fragment in an activity. Therefore, missing fragments in crawlers means missing many views in majority of cases. Static analysis techniques also have complications for analyzing fragments in code similar to the previous RQ. 46 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Total non-SDK Total SDK Code non-SDK Code SDK XML non-SDK XML SDK 2013 2017 2019 Figure 3.3: Distribution of the usage of dierent view types over time 3.4.5 RQ5: How do Developers Use Views to Customize the Android UIs? Approach: To address this research question, I analyzed the apps to identify the number of views that represented standard views versus developer{customized views. Since customized views can pose additional challenges for static analyses, depending on whether these views can have children, I also looked at the prevalence of this programming practice. To address both aspects, I developed a static analysis that examined the bytecode of the apps and classied views based on their types and then based on the views' ability to have children. This analysis considered views created in code and in XML based layout les. The analysis to classify views based on types worked as follows. First, the analysis scanned the bytecode of the app and checked the type of new instances of objects in the code to determine if they were instances of the View class. Once a view object was identied, it was classied as either a standard view dened by the SDK or a custom view. To determine if a view was a standard view, the analysis looked at the package name to determine if it was from the Android SDK (i.e., the package name starts with \android"). The analysis classied the view as non-standard if its package name was not part of the Android SDK. The analysis also extracted and classied all views dened in XML based layouts by analyzing XML les referenced in invocations to API calls such as setContentView() or in ate(). The analysis used an XML parser to extract the names of views dened in layouts and leveraged the same approach to identify views in XML. The only 47 dierence was some SDK views are used without their package name in layout les which can be identied based on the same characteristic. My analysis also identied views that could have children. In general, any view that extends AdapterView is able to have content that could be a set of views (i.e., have children). My analysis also determined whether an identied view extends AdapterView, ViewPager, or RecyclerView, and if so counted it as a view that could have children. Note that this represents a lower bound on the number of classes that could have children, as I noticed many customized adapter-based views that did not extend any of the above mentioned adapter-based classes, but still provided some sort of capability to have child views. Results: Figure 3.3 shows the results of my analysis by type. In this graph, I show the breakdown of standard (SDK) views versus non-standard (non-SDK) views for each year. I further break down the results based on whether the view was dened in the code or via an XML based layout le. Each bar in the gure, shows the proportion of views in total, code or XML based on their types. Overall, in 2013 customized views accounted for 5% of all dened views, while in 2017 and 2019 they accounted for 13% of all views. In code, non-standard views accounted for 38%, 43%, and 46% in 2013, 2017, and 2019. For my investigation into the second aspect, views with children, the column labeled View Related Adapters in Table 3.2 shows the number of such views over time. The table reports the number of setAdapter() which would be considered as the total number of adapter-based views which can be used. Discussion: I found that there is an increasing trend in using non-SDK views in apps, specif- ically in dening views in code and also adapter-based views are becoming more prevalent among developers. Both customized views and adapter-based views create complications for analyzing UIs. First, customized views dene new semantics and can have unique appearances when ren- dered. Absent an analysis that can understand rendering rules, such views may require developer intervention to dene their semantics. Second, I observed that views that have children often dene these children inter-procedurally and track children using collections. Inter-procedural analyses and analyses that target collections are known for introducing complexity and imprecision in static analyses. 3.4.6 RQ6: How do Developers Set the UI Style Properties of Views? Approach: In this RQ, I investigated how arguments to style related APIs were set in code. To do this, I developed a static intra-procedural data- ow analysis that evaluated each of the style related arguments (e.g., width and height) to APIs that set color and size properties. The analysis built denition-use (DU) chains for each argument then analyzed the DU chains to determine how 48 the denitions were created. For example, the argument value can be the result of a mathematical expression, or dened using a parameter provided to the method enclosing the API invocation. The analysis examined the call site of each API invocation that was related to color and size. For each of the relevant arguments, the analysis calculated its backwards DU chains until no new uses or denitions could be found in the method enclosing the call site (i.e., the analysis reached a xed point). The analysis then classied the argument based on the operations that were used to create each of the denitions in the DU chain. The broad classications used by my analysis were: instantiated objects (e.g., create by new LayoutParams()), parameter place holders (e.g., dene by the i th parameter of a method), developer dened elds, library elds, API calls (i.e., the invoked methods are dened by library code), constants (e.g., the RGB value of red color), and developer dened method calls (i.e., the invoked methods are dened by a developer). For a given argument, my analysis counted all of the dierent ways used to dene it. Results: For the color related APIs, the frequency of calls using constant parameters was 9,535 (1.2/activity) in 2013, 12,979 (1.4/activity) in 2017, and 17,832 (1.4/activity). The frequency of color related API calls using variable parameters was 11,675 (1.5/activity) in 2013, 20,782 (2.3/activity) in 2017, and 32,962 (2.5/activity) in 2019. Compared to all color related API calls, invocations using constant parameters accounted for 45% in 2013, 38% in 2017, and 35% in 2019, while invocations using variable parameters accounted for 55% in 2013, 62% in 2017, and 65% in 2019. For the size related APIs, the frequency of API calls using constant parameters was 3,266 (0.4/activity) in 2013, 3,484 (0.4/activity) in 2017, and 4,775 (0.4/activity) in 2019. The fre- quency of size related API calls using variable parameters was 9,084 (1.2/activity) in 2013, 17,074 (1.9/activity) in 2017 and 21,475 (1.6/activity) in 2019. Compared to all size related API calls, invocations using constant parameters accounted for 26% in 2013, 17% in 2017, and 18% in 2019, while invocations using variable parameters accounted for 74% in 2013, 83% in 2017, and 82% in 2019. For those invocations using variables as parameters, Figure 3.4 shows the frequency of dierent categories of expression components for color and size parameter expressions. As a whole, the top frequent expression components for color parameters were APIs (53%), developer elds (38%), developer methods (24%), and parameter placeholders (20%). For size parameters, the top popular expression components were APIs (36%), instantiated objects (30%), developer methods (29%), developer methods (19%), and parameter placeholders (16%). Discussion: In general, the frequency of the color and size calls using variable parameters has been growing from 2013 to 2019 and represents a sizable majority all such denitions of style 49 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 instantiated object parameter developer field library field api developer method 2013 2017 2019 (a) Color parameter's distribution over time 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 instantiated object parameter developer field library field api developer method 2013 2017 2019 (b) Size parameter's distribution over time Figure 3.4: Dierent component distribution in parameter expressions for style arguments 50 related APIs. This indicates that a technique for estimating the parameter values of API calls is necessary for the accuracy of a static analysis. As for expression components, APIs are the top components in the expressions. This implies the semantics of API calls must be accounted for when statically analyzing the color and size values. Developer elds also play an important role in dening the values. This fact shows that a eld analysis is required to estimate the values. Developer methods, and parameter placeholders are quite popular for both color and size parameters. The prevalence of these two components shows that an inter-procedural analysis is indispensable to accurately approximate the values. It is worthy noting that size parameters use instantiated objects commonly to derive the value. This indicates that size parameters are usually objects that consist of elds mapping to size properties (e.g., width). To know a specic size property value, an intra-procedural analysis is required to analyze how that eld's value is dened in the constructors. If the value originates from a formal parameter, then an inter-procedural analysis is need to provide the runtime value of that parameter. 3.5 Threats to Validity External Validity: A potential threat is that my study did not consider paid apps, which may require dierent design of experiments. Therefore, the conclusions may not hold for paid apps. However, it is worth noting that my results are representative, since 94% of all Android applications in Google Play Store are free apps [53]. To ensure that the apps in my study are representative of real-world apps, I randomly selected 557 apps for each of three app set from the Google Play Store. These apps were from over 23 categories and had various functionalities. The dynamic analysis tool I used is Monkey, in which I did not apply heuristics of inputs for my large scale study. Therefore, the results in RQ1 may not apply to some advanced dynamic analysis tools that are optimized to some apps. To alleviate this threat, I excluded those apps that have certain UIs (e.g., login UI) that Monkey cannot bypass. Note that Monkey is conrmed to achieve higher average code coverage and trigger more failures than many existing dynamic analysis tools [51]. Internal Validity: Even though I designed a mechanism to make sure the UI related API list is complete, it is still possible that API list is not thorough. This would only impact the results of RQ 2 and 6. To mitigate this, I employed keyword matching and manual analysis to create 51 the API list. Even if the list is incomplete, it does not prevent me to draw the conclusion that developers frequently use UI related APIs to dene UIs in Android apps. 3.6 Conclusion In this chapter, I presented the results of an extensive empirical study I conducted on the mech- anisms developers use to build UIs. For this study, I investigated over 1,600 apps from Google Play Store, which were collected in 2013, 2017, and 2019, by applying a combination of crawling, lightweight static analysis, and manual analysis. In this study, I formulated six research questions regarding dynamic analysis techniques' performance, new code practice trends for developers to implement the mobile UIs, and potential problems that existing program analysis techniques may encounter. My main goal is to bridge the gap between existing program analysis techniques and the potential problems incurred by the new coding practices, and to provide guidelines for building future program analysis techniques in this area. I found that the contribution of code to dene UIs of mobile apps is increasing over time. This has directly invalidated the assumption of many previous program analysis techniques, that most UI elements are dened in static les, such as the layout XML les in Android apps. To analyze more recent apps' UIs, a program analysis technique has to take the code into account. In addition, my study shed light on the need to carefully consider fragments in Android apps for two reasons. First, I found that fragments have become more popular over time and that the vast majority of these cause changes in the UI of an activity. Second, the views dened in a fragment have also been increasing over time. Some existing program analysis techniques (e.g., GATOR) that neglect fragments, may miss many UI elements in an Android UI. Therefore, a future program analysis should consider fragments. My study also revealed that the way of setting color and size properties is complex, and it may involve APIs, developer elds, developer methods, and parameter placeholders. Thus, it requires researchers to design a new program analysis that can handle these complex cases. APIs are the top components in the expressions. This implies the semantics of API calls must be accounted for when statically analyzing the color and size values. Developer elds also play an important role in dening the values. This fact shows that a eld analysis is required to estimate the values. Developer methods, and parameter placeholders are quite popular for both color and size parameters. The prevalence of these two components shows an inter-procedural analysis is indispensable to accurately approximate the values. 52 Chapter 4 Repairing Display Energy Hotspots In this chapter, I describe my repair approach in detail. I also present the results of an empirical evaluation to demonstrate its eectiveness and eciency in repairing energy optimizable UIs. A large body of work has focused on optimizing the colors used for an app's UI in order to reduce its energy consumption [5{11, 15{17, 54{56]. This line of work is of particular importance and relevance for mobile devices that use OLED based screens. This type of display consumes more energy with highly saturated colors (e.g., white) than darker colors (e.g., black). This has led to the development of techniques that can subtly adjust UI color schemes to improve energy [54{56], create completely new color schemes for an app [5, 6, 8{11, 17], or change colors on the y [15, 16]. These techniques have been able to achieve energy improvements of anywhere from 5.9-66%, which represents a signicant savings for mobile device users. Although these techniques have made signicant achievements in terms of reducing display energy consumption, they have limitations that prevent them from being applied to mobile apps in a fully automated manner. Broadly, I can describe these techniques as having limitations in the following aspects. Existing techniques are incomplete with regards to identifying relevant UI information. For example, crawling based techniques do not account for all possible UI variations of an activity and static techniques do not account for XML based layout information in fragments. As I discuss in Section 4.1, this can reduce both the aesthetics of the resulting UI and energy savings. Previous well-known transformation techniques [5, 6, 8] focused on static analysis of server side web application that used string-based operations to create web pages and the techniques are not applicable to Android UIs. Other existing techniques for mobile app color transformation are not fully automated (e.g., GEMMA [9{11], CSA algorithm [17]) and require developers to identify where color changes should be made and carry out the required changes. In this chapter, I describe my approach for optimizing the display energy used by the UIs of an Android app. My approach addresses the aforementioned limitations of prior techniques and 53 provides a fully automated end to end transformation to optimize Android app UIs. To address the issues of completeness, I designed a novel hybrid analysis that takes into account multiple sources of information about a user app, including static XML layout les, fragments, and actions carried out dynamically on the UI in the program. To fully automate the approach, my hybrid analysis identies and changes the locations in the code that need to be changed to realize the color optimization. Lastly, for selecting an aesthetically pleasing energy-ecient color scheme from a Pareto frontier, I built my own weighted metric w.r.t. readability, attractiveness, and power consumption. Specically, the weight values regarding readability and attractiveness can be predicted based on a regression model customized for a specic user group. To demonstrate the eectiveness of my technique, I evaluated it against a set of real world Android apps. The results of my evaluation showed that the energy optimizations realized by my technique were signicant, as my approach reduced the display energy by an average of 43%. Furthermore, the resulting color schemes were well rated by end users and 81% of users preferred the new color schemes generated by my approach over the original color schemes for scenarios when the mobile device's battery level is critically low or above. These results about energy savings and user acceptance demonstrated that my repair approach could repair the energy optimizable UIs eectively. The average runtime of 63 seconds per app shows the eciency of my repair technique. 4.1 Background and Motivating Example In the Android platform, the code implementing a UI is referred to as an activity, and is imple- mented as a subclass of the Android SDK class Activity. The UI of an activity consists of a collection of widgets, which are called views (e.g., Button). A ViewGroup (e.g., LinearLayout) is another kind of view, which is usually invisible and serves as the container of other views. There are two dierent ways to customize the UI of an activity. The rst way is to use an XML le to represent the UI and load it in the code. For example, the API call at line 3 of Program 4.1 loads activity main.xml. Each individual XML tag maps to a view and denes its properties. The other way is to instantiate a View class (line 6) and insert the view object into an existing viewgroup of the activity by calling specic APIs (e.g., line 19). Fragments were introduced to represent an arrangement of views that may be reused and are very popular with app developers, being used by over 50% of apps [19]. Like views, a fragment can be declared via XML or in the code (e.g., line 11). Similar to activities, a fragment has its own lifecycle callbacks to do data processing. A callback dened by the fragment, onCreateView (lines 54 22-24), initializes the details of views and returns the root view of the fragment. A developer may attach a fragment to a viewgroup of an activity (lines 12-14). activity main.xml <LinearLayout android:id="@+id/root" android:background="#fafafa" ... > <LinearLayout android:id="@+id/frag container" ... /> </LinearLayout> frag pay.xml <LinearLayout ... > </LinearLayout> 1 class MainActivity extends Activity f 2 void onCreate() f 3 setContentView(R.layout.activity main); 4 LinearLayout l = (LinearLayout) ndViewById(R.id.root); 5 int balance = searchBalance(); 6 TextView tv = new TextView(this); 7 if (balance < 0) f 8 int color = Color.parseColor("#00ffff"); 9 int bgColor = 0x− color; 10 tv.setBackgroundColor(bgColor); 11 Fragment frag = new PayFragment(); 12 FragmentTransaction ft = getSupportFragmentManager().beginTransaction(); 13 ft.add(R.id.frag container, frag); 14 ft.commit(); 15 g else f 16 tv.setBackgroundResource(R.color.black); 17 g 18 tv.setText(value + ""); 19 l.addView(tv); g g 20 21 class PayFragment extends Fragment f 22 View onCreateView(LayoutIn ater in ater, ViewGroup container, Bundle savedInstanceState) f 23 View v = in ater.in ate(R.layout.frag pay, container, false); 24 return v; g g Program 4.1: Example code of an activity The style of a view can be set by using an XML attribute or an API call. For example, the background color of the button in frag pay.xml is set as #eeeeee. A background can also be set by using APIs (lines 10, 16). The Android SDK also provides style and theme attributes to reduce the complexity and repetition in layout XML les. A style is a set of attributes that dene the appearance of a view. A theme is a style that is applied to an activity or the whole app. If a theme is not specied for an activity, the app's theme will be used. A style or a theme can inherit attribute settings from a parent and override settings by modifying the related property values. To apply a style, developers need to set the style attribute of a widget tag. Similarly, a theme is specied by setting the android:theme property of an <activity> tag or the <application> tag. 55 Merge Extract Color Information in Code Extract UI Information From Runtime Build the CCG Generate a New Scheme Rewrite the App Automated App Rewrite Hybrid UI Analysis Extract Layout Information in Code Adaptive Color Transformation Transformed Android App Target Android App Figure 4.1: Overview of the repair approach Although Program 4.1 represents a small and simple example, it would be challenging for Android GUI analysis approaches to completely identify all view and color information { necessary information for my repairing approach. A crawler, such as Monkey [43], is unlikely to be able to guess the value of balance that is necessary to execute lines 7-14. A failure to do so would mean that fragment and color information dened in those lines would not be found by a crawler. My UI implementation study found that more than 39% of UI related API calls are control dependent on conditions [19], thus these situations can occur frequently. Existing static analysis based approaches (i.e., GATOR [57, 58], BACKSTAGE [59], and StoryDroid [60]) also have limitations that would prevent them from nding all view and color information. GATOR does not account for the views dened within fragments nor identify the color setting operations associated with any views. BACKSTAGE does model two of the three possible ways to incorporate fragments, but omits a popular third option, via an adapter, and also cannot identify the color setting operations associated with views. StoryDroid takes all three ways of adding fragment into account, and generates a static layout le to represent the initial state of an activity. Therefore, StoryDroid cannot identify views dened in non-initial states (e.g., click a button) and the color setting operations performed on views. This would mean that the information at lines 8-14 and 16 would not be accounted for by these approaches. Given the widespread usage of conditional UI actions and fragments, the limitations of these approaches would signicantly undermine the success of my repair approach and directly motivate the design and development of the analyses described in Section 4.3. 4.2 Overview of the Repair Approach The goal of my repair approach is to reduce the display energy consumption of the UIs in an Android app. My approach can be described as three phases. An overview of my approach is shown in Figure 4.1. The rst phase is Hybrid UI Analysis (Section 4.3). In this phase, the approach employs both a static analysis and a dynamic analysis to estimate the possible layout and color information of an app. This kind of information is built as a model called View Tree (VT) for 56 an activity of an Android app. After running these two types of analyses, my approach utilizes a merging process to combine their results to achieve better completeness. The second phase, Adaptive Color Transformation (Section 4.4), computes a new color scheme for the app, which is both energy-ecient and visually attractive. The third and nal phase is Automated App Rewrite (Section 4.5) that rewrites the app to apply the new color scheme. The output of my repair approach is an APK le of the app with the transformed UIs. 4.3 Hybrid UI Analysis The goal of this phase is to generate information about the layout and colors of an activity's UI. At a high level, this phase rst performs a static analysis of the app's implementation and a dynamic crawl of its activities. The results of these analyses are then merged to give a more complete summary of the possible layouts and colors employed by the app. To carry out this goal, I rst extend a well-known static analysis technique, GATOR [57, 58], so that its underlying representation of the app also includes information about fragments and color setting operations carried out on view objects. Given this information, my approach extracts the app's static VT, which is a tree-based representation of the layout information derived from the app via the static analysis. The extension to GATOR and extraction of the VT is described in Section 4.3.1. With the VT, the second step of my approach then analyzes each of the color setting operation nodes to identify what values they may have at runtime. The details of this part of the approach are described in Section 4.3.2. Then my approach runs a dynamic analysis discussed in Section 4.3.3 to collect the necessary UI information. Lastly, in Section 4.3.4 I describe how my approach merges the layout and color information identied in the second step with information derived from a crawler to produce the nal version of the app's VT, which is then used for the color transformation described in Section 4.4. 4.3.1 Fragment and Color Setting Operation Modeling This step identies and models the mapping between views and dierent invocations performed on those views. Related work, GATOR [57, 58], has shown that extracting this information can be done by calculating points-to information for references to views and viewgroups using a Constraint Graph (CG) based approach. However, as mentioned in Section 4.1, GATOR does not account for the content of fragments, which given that over 50% of Android apps use frag- ments [19], means that a signicant source of view denitions will not be accounted for, nor does 57 MainActivity this id : activity_main Inflate3 FindView4 I TextView6 id : frag_pay Inflate23 v tv AddView19 id : frag_container R.color.black bgColor FindView c AddView frag PayFragment SetBgColor10 SetBgColor16 GATOR Our Extension Figure 4.2: Partial constraint graph for the example code GATOR include modeling for color setting operations on views, which is necessary for my prob- lem. To accomplish this, there are several challenges to address. First, a fragment is attached to a view using an invocation on a view reference, which means my analysis must be able to accurately identify which view the reference is pointing to. Second, the fragment itself is also a reference, so my analysis must be able to accurately track which fragments this reference is pointing to as well. Third, the fragment represents a collection of views, which themselves require an analysis to determine their layout. Fourth, color setting operations are also typically invoked on a reference to a view, which means my analysis must identify the possible views that could be operated on by a particular color setting operation. My key insight is that these challenges can be mostly addressed by calculating points-to information for references to views and fragments. Therefore, my analysis extends GATOR by adding constraints to represent fragments and color setting operations in the code. The key GATOR abstraction that I extend in my approach is the CG. The nodes of the GATOR CG are constants, variables, heap allocations or predened operations. The edges of the CG represent the traditional owsTo relationship, which indicates that the value set represented by the source variable ows to the target variable's value set. A partial CG for Program 4.1 is shown in Figure 4.2. In Figure 4.2, I augment the nodes with subscripts that indicate the line numbers from which they are derived. 58 Table 4.1: Three common ways of attaching fragments to views in Android apps Category Code or XML Tag Via an XML tag <fragment android:name=\MyFragment"> Via API call 1 FragmentTransaction.add(int, Fragment) FragmentTransaction.add(int, Fragment, String) FragmentTransaction.replace(int, Fragment) FragmentTransaction.replace(int, Fragment, String) Via API call 2 ViewPager.setAdapter(PagerAdapter) In the CG, there are four types of nodes: constant nodes, variable nodes, allocation nodes, and operation nodes. For a statement in the format ofa =b, the CG contains a variable node for a. Since the right hand side b can be a constant, a variable or an allocation of a heap location, the source node may be a constant node (e.g., the integer value of the layout ID activity main), a variable node (e.g., tv), or an allocation node (e.g., TextView 6 ) forb. A owsTo edge connects fromb toa. When a statement contains related API calls likelhs =obj:m(para), the CG contains an operation node m. Then, obj andpara link tom, andm points tolhs. Broadly, an operation node can represent three kinds of operations: (1) In ate: render an XML layout le via a layout ID and return a root view, which is the root tag of the XML layout; (2) AddView: a parent viewgroup adds a child view; (3) FindView: retrieve a view via a view ID. To model fragments, I extend the GATOR CG by dening constraints that model all three common ways of attaching fragments to views listed in Table 4.1. First, for a fragment attached as an XML tag, my approach creates a new node in the CG representing the fragment tag, and the class of the fragment is specied in the \android:name" attribute. Figure 4.3a shows the code snippet of a layout XML le and its corresponding nodes in the CG. Second, a fragment is attached to a container view via APIs of the FragmentTransaction class. An example of this is shown in line 13 of Program 4.1. In the example, the rst parameter is the parent view ID, and the second parameter is the Fragment object to attach. My insight is that this way of attaching can be modeled in the CG as two existing operations: a FindView and an AddView. Therefore, my approach creates nodes and edges just as GATOR does in Figure 4.2. Frag container is the container ID, and frag is the fragment variable. C is an articial view variable. Third, a collection of fragments contained in a PagerAdapter to a ViewPager. My approach identies the base object of the invocation as the container viewgroup (i.e., the ViewPager). To identify which fragments are attached, my approach must determine which fragments are dened in the adapter. My insight for doing this is that PagerAdapter contains getter methods (e.g., getItem(int position)) that return a fragment corresponding to a given position. Therefore, to know which fragments are contained in the adapter, my approach tracks which objects may ow to the return value of the 59 <LinearLayout> <fragment name = “MyFragment” /> LinearLayout MyFragment flowsTo parent-child (a) Via an XML tag ViewPager.setAdapter( PagerAdapter) ViewPager getItem(int) of the adapter class new FragmentA ret new FragmentB (b) Via API call 2 Figure 4.3: Nodes and edges for two ways of attaching fragments not used in the example code getter methods. This is done by adding nodes and edges to the CG for the node that represents the value returned by all getter methods. For example, in the partial CG shown in Figure 4.3b, the return variable ret can be an instance of either FragmentA or FragmentB. Finally, since a fragment's layout is dened in an overridden onCreateView callback, my extension creates a node in the CG for the variable assigned the callback's return value. This allows the points-to analysis to identify the view objects could be the root view of the fragment. For Program 4.1, Figure 4.2 shows that PayFragment's root view is v, which is the return view of an In ate operation. I also extend the GATOR CG to include information about the relationship between views and color setting operations. In general, a color setting operation is invoked on a reference to a view. Therefore, my extension creates a node in the CG to represent the operation. The base view object and the parameter(s) are also modeled as nodes in the CG, and they are connected to the color setting operation node via owsTo edges. The bottom part of Figure 4.2 shows new nodes and edges created for the color setting operations in lines 10 and 16 of Program 4.1. My extension also includes colors set in XML les. For a color attribute in a view tag, my approach locates the view node in the CG corresponding to the view tag. Then my extension annotates the CG node with the color value. For the button of the frag pay.xml in Program 4.1, my approach annotates its background color as \#eeeeee". My approach analyzes the CG, as dened by the GATOR approach, to extract the VT. First my approach performs an iterative xed point analysis on the CG in order to propagate the constraint information and identify which views and fragments may be pointed to by a reference in the CG. Second, my approach analyzes the CG to identify two types of VT relationships based on the semantics of the modeled API invocations and operations in the CG. These include the parent-child relationship which we extend to include the same relationship involving fragments, and the connection between an activity or a fragment to its root view. The root view can be dened from either an In ate operation or a View instantiation. Both kinds of relationships are added to the CG as additional specially annotated edges, which is shown in Figure 4.4 for the running example. Finally, my approach extracts the VT by performing a breadth-rst tree 60 MainActivity LinearLayout PayFragment LinearLayout id : activity_main id : frag_pay SetBgColor10 SetBgColor16 LinearLayout root TextView6 layout id Button root id : frag_container child x child child layout id child child view id bg color bg color id : root view id Figure 4.4: Partial constraint graph annotated with view tree relationships traversal starting from the root view of the activity in the CG and then transitively visiting the children via the annotated VT relationship edges. The ve nodes with non-white background color and the parent-child edges among them are the VT for the MainActivity. The two green nodes and the parent-child edges among them are the VT for the PayFragment. Note that the parent- child relationship between the LinearLayout with an ID frag container and the PayFragment is replaced with a new parent-child relationship between that LinearLayout and the LinearLayout with green background color, which is the root view of the PayFragment. 4.3.2 Color Value Analysis The goal of this analysis is to identify the values of color setting operations in the VT. A key challenge is that color values may be assigned to a view in multiple ways based on API invocations and XML attributes. For a given color setting operation, my approach identies these multiple sources and determines the eective color value used for the corresponding view. 4.3.2.1 Value Analysis for Color Setting API Calls My color value analysis accounts for view colors that are set in an app's code via invocations of color setting APIs. At a high-level, the approach performs a global inter-procedural analysis to determine the possible values of the arguments for all color setting operation nodes in the CG. I explain the algorithm for computing these values in more detail below. Once the possible color values have been identied, my approach identies all views in the VT that will be aected by 61 Color.parseColor “#00ffff” 0xffffff color - 0xffffff - Color.parseColor “#00ffff” color bgColor bgColor Apply() Figure 4.5: Expression tree input and output for the Apply() function the color setting operation and updates their associated color value annotations with the values computed by the analysis. The aected views can be identied by performing a reachability analysis on the CG starting from the CG node representing the analyzed color setting operation. For example, as is shown in Figure 4.4, two SetBgColor operations at two dierent program points can assign the background color of the TextView tv. Before explaining the details of my value analysis, I rst introduce the Expression Tree (ET), an abstraction that my approach uses for representing color values. An ET is an n-ary tree that represents the value of an argument. An ET's leaf nodes can be constants, variables, or class types. Its non-leaf nodes can describe operators or invocations. The operator nodes can be unary or binary operators in Java language, or the new operator in object instantiation expressions like new Color(). The children of an operator node represent the corresponding operands. The invocation nodes represent the target method of an invocation with child nodes representing the base object and parameter(s). The left part of Figure 4.5 shows the ETs for values of color and bgColor respectively. The core part of my color value analysis is done via an inter-procedural analysis of the app's implementation. Specically, the color value analysis is an analysis to determine the possible values of arguments supplied to a color setting API invocation. To achieve context-sensitivity and scalability in the analysis, my approach uses the method summarization. My approach generates two summaries of each method called by an activity and then integrates these summaries at places where the summarized method is called. The rst summary is the return value summary, which tracks the return value of a method as an ET to address situations where the return value of a method is used to calculate a color value. The second summary, side eect summary, tracks eld denitions inside a method. This second summary is necessary because I found that UI related 62 Algorithm 1: Construct a summary for a method m Input: m: a method Output: Rsummary: a global Map (K: method, V: denition set), SEsummary: a global Map (K: method, V: denition set) 1 foreach s2getStatements(m) do 2 if s is a denition stmt then 3 if rhs(s) calls a summarized method n then 4 GEN [s] f(lhs(s); Rsummary[n]g 5 else 6 GEN [s] f(lhs(s);ET (rhs(s))g 7 else 8 GEN [s] ; 9 if s contains an invocation of a summarized method o dening elds then 10 GEN [s]:add(SEsummary[o]) 11 if GEN [s]6=; then 12 foreach (v;et)2 GEN [s] do 13 KILL[s]:add((v;)) 14 else 15 KILL[s] ; 16 do 17 foreach s2getStatements(m) do 18 IN [s] [ p2pred[s] OUT [p] 19 NEO[s] Apply(GEN [s]; IN [s]) 20 OUT [s] (IN [s] KILL[s])[ NEO[s] 21 while OUT has changed 22 Rsummary[m] f(v;et)js2 RetStmt(m); (v; et)2 IN [s]^ v = returnValue(s)g 23 SEsummary[m] f(v;et)js2 RetStmt(m); (v; et)2 IN [s]^ isField(v) 63 code frequently uses elds to store color values that are then used by many UI elements in an activity. During the summarization, my approach uses the eld-based approach [61] to model elds. This approach records elds as variables identied by name and does not dierentiate their base objects. The algorithm for computing a summary is shown in Algorithm 1. The input is a method m called by an activity of an app. The output is two summaries ofm: a return value summary and a side eect summary. They are stored in the global maps Rsummary and SEsummary respectively. The summary building process roughly follows the iterative data ow analysis algorithm. First, my analysis computes the GEN and KILL sets for each statement of method m (lines 1-15). If a statement s is a denition statement in the format a =exp, the GEN set stores the denition as a tuple (a;b) where a is the left hand side variable and b is the ET of right hand side exp. If exp calls a summarized methodn,b is obtained fromn's entry in Rsummary. Ifs calls a summarized method dening elds, my analysis also considers the eld denitions, which are retrieved from SEsummary. Those eld denitions are added to the GEN set as well. Ifexp is a non-invocation expression, b is created by a function ET (). Correspondingly, the KILL set is all the denitions of variables whose denitions are dened in the GEN set. After initializing the GEN and KILL sets, the iteration starts (lines 16-21). First, the IN set of a statement is computed as the union of the OUT sets of its predecessor statements. Note that when calculating IN sets, my analysis ignores the back edges for loops in the control ow graph (i.e., unraveling each loop once). Next, my approach computes a NEO set, which contains denition tuples that are generated by a function Apply(). If the ETs of denition tuples in the GEN set contain unknown variables, the Apply() function replaces those unknown nodes with ETs describing the values from the IN set. As is shown in Figure 4.5, the right ET after the replacement describes the value of bgColor in Program 4.1. At last, the OUT set of the current statement is the union of the dierence of IN and KILL set, and the NEO set. The iteration continues until the OUT sets reach a xed point. Once the iteration terminates, my analysis records ET denitions of return variables and dened elds as an entry in Rsummary and SEsummary (line 22-23). During summarizing the caller method of a color setting API call, my approach extracts the color parameter's value(s) from the IN set of the statement containing that API invocation. 4.3.2.2 Value Analysis for XML Based Color Settings My color value analysis also accounts for the multiple ways view colors can be set via XML layout les. Broadly, XML based color settings can be done in various ways, each of which carries a 64 dierent priority. For example, a view's background can be set using a \android:background" at- tribute directly (high priority) or indirectly (low priority) using a \style" attribute. The situation can be even more complicated when a view tag does not have any of the aforementioned attribute settings. For some views (e.g., LinearLayout), no color settings means the color value is null. But this may not be true for other views. In fact, a default style setting (which is dened in a theme) containing color settings may be applied. In addition, styles and themes can have inheritance relationships. To identify the eective value of the color set for a view via XML, my approach denes an XML color analysis. My XML color analysis consists of two parts. The rst part analyzes all of the XML based theme and style settings, then identies and models the relationships among them. My approach represents the theme and style relationship using a directed graph, with nodes corresponding to dierent style/theme tags and directed edges representing an inheritance relationship between two tags' denition information. My approach builds this graph by parsing the Android SDK's and app's style and theme XML les. Once this graph is constructed, my approach can propagate style and theme information (i.e., attribute denitions) by traversing the graph in topological order. The second part of the XML color analysis then determines the eective color based on style priorities and Android rendering rules. Eectively, this part determines the eective color based on the style precedence [62] rules dened by the Android SDK. For a node in the VT, my approach rst searches for the direct color attribute, which has the highest priority. If there is no direct color attribute, my approach checks whether the \style" attribute is specied. If yes, my approach retrieves the eective color from that specied style's denition. Otherwise, my approach obtains the color setting from the default style dened in the eective theme for the activity. For example, a Button's default style is specied as the attribute entry whose name is \buttonStyle" in the eective theme's denition. The eective theme is usually specied in the AndroidManifest.xml le. If no theme is assigned to an activity, then a default theme is used based on the Android OS version. Finally from the eective style denition, my approach retrieves the eective color value. If the value refers to an attribute value dened in the eective theme, then my approach will further parse the value until there are no further cross-references dening the color value. 4.3.3 Extracting UI Information From Dynamic Analysis The goal of this step is to collect the necessary UI information by running a dynamic analysis. The dynamic analysis can be oine since the later step will mainly process the screenshots. When the dynamic analysis (e.g., an app crawler) visits an activity displayed on a device (or emulator), 65 it can extract the view hierarchy (i.e., the structure of the layout) and the features of a view, such as location, dimension, type, and ID, via the UI Automator tool [47]. Meanwhile, the color histogram are represented as a screenshot captured via the ADB tool [40]. The output of this step is tuples of an XML le describing an activity's layout and an image le representing the screenshot of the activity. 4.3.4 Merging Both Types of Analysis Results The goal of this step is to supplement the statically generated VTs with the required information from dynamic analysis results. The information includes: (1) layout of views that may be missed by my static analysis; (2) concrete values of colors; (3) the actual locations and sizes of views. There are several challenges to address when combining the two kinds of analysis results. First, my approach should be able to extract the color values from the screenshots. As is known, color value in a screenshot may be distorted because of rendering optimization (e.g., anti-aliasing). Second, for an activity, matching the VTs of both analyses is dicult, since their structures may not be isomorphic. Therefore, a node in a VT may have multiple candidate nodes in another VT. My approach needs to be able to discover the correct matching node of each node eciently. Third, for information only from the dynamic analysis , my approach needs to nd its denition point information in the app. Otherwise, the colors captured in dynamic analysis may not be transformed, which can impact the energy consumption and user experience of the app. To address the challenges, my approach rst preprocesses the dynamic analysis results to add color values, and represents them as VTs. Second, my approach merges two sets of VTs for both analyses into one set of VTs based on a probabilistic matching algorithm. Third, to nd the denition point information for a dynamic-only VT, my approach matches that VT with VTs generated from In ate operations in the CG. If a best match is found, the denition point information is discovered as well. Before doing the merging, I need to make sure that the dynamic analysis results contain the necessary UI information. However, the dynamic analysis results for a UI include an XML-based representation of the view hierarchy of the UI (including location, dimension, type, and ID of a view), and a screenshot. The layout structure is described in the XML le, but the mapping between a view and its detailed color information (e.g., background and text colors) is missing. To determine the missing color information of a view, I applied two heuristics [63{65]: (1) the color with the maximal pixel quantity in the color histogram of a view (not displaying images)'s display area of the screenshot is the background color of this view; (2) the color in the histogram having the maximal color distance with the identied background color is the text color of this view. 66 Recall that the XML-based representation of the view hierarchy is a tree-structure projection of the UI layout, therefore it is straightforward to parse an XML layout le into a VT. The features of a view (e.g., dimension) will be attached as attributes of a view node in a VT. After retrieving the color information, the UI information of an activity is modeled as one or more VT(s) from my static and dynamic analyses. To combine the information modeled in these VTs, my approach needs a mechanism to merge two VTs. I dene a function merge(t 1 ;t 2 ) to merge two VTs. This function adapts a probabilistic matching algorithm [20{22] based on the XPath of each node and copies the unmatched nodes in t 2 to t 1 . The matching is determined based on the IDs and XPaths of nodes. If the ID of a t 1 node is equal to the ID of another t 2 node, then the match index is 1. If a node a in t 1 does not have an ID, when comparing to another node b in t 2 , the match index is calculated as Equation (4.1). For a node m in t 1 , the algorithm traverses each unmatched node in t 2 and compute a match index for it. Then if the node n with the maximum match index value and the value is greater than the threshold value , then (m;n) is a matching tuple. The output of the matching algorithm is a map between the nodes int 1 and their corresponding nodes int 2 . For each pair of matching nodes, the attributes of the node in t 2 will be copied to its counterpart in t 1 . For nodes (i.e., subtrees) that exist only in treet 2 , they will be copied directly to their parents' corresponding nodes in treet 1 . In the merged treet 1 , a node can be from the static analysis only, both analyses , and the dynamic analysis only. Now the dynamic analysis only nodes do not have denition point information. 1LevensteinDistance(a:xpath;b:xpath)=max(length(a:xpath);length(b:xpath)) (4.1) For nodes from dynamic analysis only, my approach tries to nd their denition points in the code by matching with my static analysis results. My insight is that many of the missed views are dened using In ate operations, which are already modeled in the CG. Therefore, for each operation, my approach can extract a VT by traversing from its return view as described in Section 4.3.1. Then my approach tries to match each subtree formed by dynamic-only nodes with each of the extracted VTs. The extracted VT having the maximum size of matching map will be the best match. Then my approach will call the merge function again so that the node information (e.g., denition point) from the best match tree is copied to those dynamic-only nodes of a merged tree. 67 4.4 Adaptive Color Transformation The goal of this phase is to compute a new color scheme for the app. However, the new color scheme should be both energy-ecient and visually attractive. My approach produces the new color scheme by combining the steps of existing approaches. First, my approach builds a CCG to describe the color relationships that are determined by the visual layout of the views showing colors. Second, based on the CCG, my approach computes a new recoloring of the CCG. This mapping of old colors to the transformed colors is the output of this phase. In the remainder of this section, I describe how my approach derives the CCG from the VTs, the tness functions I consider, and nally discuss the process that generates the new color scheme. 4.4.1 Building the Color Con ict Graph Given the context of Android GUIs, my approach adapts the denition of the CCG used in Nyx [5{ 8]. Each node in the graph is a color that appears in an activity and each edge represents the type of visual relationship (eg, \next to", \enclosing", or \not touching") that any two colors in the CCG have. The edges in the CCG are weighted by the type of visual relationship, with higher weights given to edges so that \enclosing" ()>\next to" ()>\not touching" ( ). Unlike Nyx has three variants of CCG, my approach has only two variants of CCG. The rst variant, the general CCG, models all background and text colors who have all three types of relationships above. Specically, it considers the \enclosing" relationship between the text color and its enclosing view's background color. The second variant, Image CCG (ICCG), considers the \enclosing" relationship between colors in an image and the background color of the view displaying the image. Before building the CCG, my approach needs to propagate the identied background color of a node to its descendant nodes in a VT. By doing that, my approach can determine the enclosing relationships between background colors, text or image colors and their specic enclosing colors. This propagation can be done by a standard tree traversal. A parent node propagates its background color(s) to a child if the child does not have any background colors. For example, the second LinearLayout in activity main.xml in Program 4.1 has the a background color \#fafafa" propagated from the root LinearLayout. The creation of the two variants of CCGs are slightly dierent. In the general CCG, the nodes are created for each unique color (e.g., background or text) that is annotated in the VT nodes. In the ICCG, the nodes include the general CCG nodes and the nodes created for identied colors in local images. Note that for the statically identied colors that are not constants, my technique 68 would use the corresponding colors identied from the dynamic analysis (if available). Otherwise, my approach ignores these colors. If a developer wants to keep a certain color (e.g., the theme color of the app), my approach allows him to mark the corresponding CCG node unchanging. Given that the CCG is a complete graph, an edge is always connected between each pair of nodes. The weight of each edge is assigned based on the type of relationship. In addition, unlike Nyx, during the node creation of the general CCG, the display area of each color node is accumulated by adding the actual sizes of views with this background color. I assume that the size of a view is the background color's display area in this view, since the background color dominates the pixels in the view's region. 4.4.2 Fitness Functions The requirements of the new color scheme can be interpreted as low power consumption and good visual appeal. Therefore, my approach uses three tness functions as metrics to check whether a color scheme meets the requirements in these two aspects. I describe the details of the tness functions as follows. The rst tness function (to be minimized) is the Power Consumption Fitness (PCF), which is also used in previous work [9{11]. Given a color scheme, my approach rst computes the power consumption of a pixel based on a power model shown in Equation (4.2), whose coecients can be provided as a display energy prole [24] for a smartphone. It then takes the sum of the estimated power consumption product of each color's power and its display area according to Equation (4.3). P (color) =rR +gG +bB +c (4.2) PCF = n X i=1 P (color i )A colori (4.3) The second function (to be maximized) is the Contrast Fitness (CF) [9{11]. For each color in the solution, if it has a enclosing (i.e., parent-child relationship) color, my approach computes the color contrast between two colorsa andb using the functionCon(a;b) dened by W3C [66]. Then it sums the contrast of each color pairs having enclosing relationship according to Equation (4.4). The third function (to be minimized) is the Design Fitness (DF) [5{8]. In Equation (4.5), w(i;j) returns the weight of the edge between colorsi andj. C i is the original color, andC 0 i is the corresponding transformed color in the new color scheme. The color distance between two colors c and d is calculated by the function Dist(c;d) which is dened as the CIEDE2000 [67] formula. 69 CF = X w(i;j)= Con(C i ;C j ) (4.4) DF = n X i=1 n X j=1 w(i;j) Dist(C i ;C j )Dist(C 0 i ;C 0 j ) (4.5) 4.4.3 Generating a New Color Scheme My approach chooses the Non-dominated Sorting Genetic Algorithm-II (NSGA-II) to search for a new color scheme considering the three constraints. NSGA-II is a well-known search-based algorithm to solve multi-objective optimization problems and is conrmed to be the best Genetic Algorithm (GA) to optimize the three objectives when computing the new color scheme [11]. My approach adapts the NSGA-II shown in Algorithm 2. The input of Algorithm 2 is the original color scheme S and the CCG ccg. They are mainly used when evaluating the three objective values. The output of Algorithm 2 is the new color scheme S 0 . Algorithm 2: NSGA-II based algorithm Input: S: the original color scheme, ccg: the general CCG constructed from VTs Output: S 0 : the new color scheme 1 Generate the initial population P0 2 Evaluate individual tness values and sort the P0 based on non-domination 3 New ospring population Q0 createNewPop(P0) 4 g 0 5 do 6 Rg =Pg[Qg 7 Evaluate individual tness values and sort the Rg based on non-domination 8 Select the rstjP0j individuals as next population Pg+1 9 Qg+1 createNewPop(Pg+1) 10 g g + 1 11 while g <maxGen 12 Select S 0 from the nal population PmaxGen At rst, my approach randomly selects a set of values from the whole RGB color space (i.e., from #000000 to #) as an individual of the initial population P 0 . For colors marked as unchanging by a developer, my approach keeps their values unchanged in each individual of P 0 . The step will not stop generating individuals until the population size is reached (line 1). Next, the algorithm evaluates the three objective values (line 2). Based on the objective values, the population will fall on dierent fronts. The rst front assigned with rank value 1 is the completely non-dominant set in the current population. The second front assigned with rank value 2 is dominated by the individuals in the rst front only. And so on, each individual in each front is assigned a rank value based on the front in which it belongs to. Aside from the rank, NSGA-II 70 introduces another metric called crowding distance, which is a measurement of how close an individual is to its neighbors. After this step, the individuals are ranked based on the ranks and crowd distances. The individual with lower rank precedes the one with higher rank, and when both ranks are equal, the individual with higher crowding distance precedes the one with lower crowding distance. Then my approach generates the rst ospring population Q 0 from the initial population P 0 (line 3). The function MakeNewPop uses genetic operators to generate the new ospring. The operators include Simulated Binary Crossover (SBX) [68] and polynomial mutation [69]. Note that during this process, the unchanging colors in each individual stay the same. After that, my approach iterates untilg reaches the maximum generationmaxGen (line 5- 11). In each iteration, my approach combines the parent population and the ospring population asR g . As is discussed earlier, the algorithm sorts individuals inR g based on the ranks and crowd distances. Then my approach chooses individuals as the next parent population P g+1 using the binary tournament selection until the population size is met. Next, similar to line 3, the new ospring population Q g+1 is generated. At last, my approach selects an individual S 0 from the nal population P maxGen , which forms a Pareto frontier. To guarantee that the new color scheme is energy-ecient and maintains the aesthetics, my approach has to make a balance between these two aspects. To facilitate the process, I create two new functions to quantify the attractiveness and readability aspects of aesthetics. The formu- las for Readability Score (RS) and Attractiveness Score (AS) are represented in Equation (4.6) and Equation (4.7). The independent variables are derived from the existing tness scores { CF and DF. To considered both aspects, my approach computes a weighted sum for each individual color scheme solution in the nal population according to Equation (4.8), and selects the indi- vidual with the maximum sum. The coecients of equations of RS and AS can be obtained by applying a regression analysis to a training data set. To collect the data set, I conducted a survey to ask participants to rate the readability and attractiveness scores of dierent UI screenshots of an app. Each UI of the app has several transformed versions of screenshots, which correspond to dierent value combinations of the tuple (CF; DF). The coecients in Equation (4.8) are empirically assigned. RS =aCF 2 +bCF DF +cDF 2 +dCF +eDF +f (4.6) AS =xCF 2 +yCF DF +zDF 2 +uCF +vDF +w (4.7) sum =iRS +jAS +kPCF (4.8) 71 Once the new color scheme for colors of the general CCG is determined, my approach computes the transformation for the images modeled in ICCG. The transformation of an image is a recolored image whose color in each pixel maintains the color distance with the new background color(s) of its enclosing view to ensure the readability. 4.5 Automated App Rewrite The goal of this phase is to rewrite the app so that the app applies the new color scheme. In my approach, the rewrite consists of two aspects. First, for resource les, my approach would modify their contents based on the new color scheme. The resource les can be further divided into layout les and color les. If a VT node has an XML color setting, then the corresponding view tag can be localized based on the layout le and the XPath of the node. If this view tag's color setting does not reference color resources, my approach modies the color setting via color attributes: (1) for that view tag, my approach will modify its color attribute value if the original attribute is set explicitly; (2) otherwise, my approach will insert a new color attribute into this view tag to apply the new color. But if this view tag's color setting refers to a color resource, my approach will change the contents of that color resource instead: (1) if the color resource is dened in a type of color le (i.e., the color resource XML le), the approach will modify the referenced color denition; (2) if the color resource refers to another type of color le (i.e., a local image le), my approach will recolor this image based on the new color scheme. Note that if the app uses a SDK resource le, my approach would create a local copy and modify the content accordingly. Second, for the source code, my approach may modify the code if there are color setting operations (i.e., API calls) carried on VT nodes. For an API call that passes a resource via a resource ID, my approach will not change the argument value unless the referred resource is from Android SDK and has a modied local copy. If that is the case, the argument value will be the resource ID of the local copy. For an API call that passes a color value, my approach will do the modication in two ways. If the value is constant, my approach will directly modify the color value in the API call; if the value is represented by a variable, my approach will retrieve the denition point of the variable, and redene this variable at that denition point. For example, my approach will rewrite the denition of bgColor in line 9 of Program 4.1 to apply the new color. 72 4.6 Evaluation To evaluate my approach, I designed experiments to determine the advantages of the hybrid analysis, the energy savings and user ratings of the transformed UIs, and time cost. The specic research questions I answered were: RQ1: Is my hybrid analysis better than other types of analyses alone? RQ2: How much energy is saved by the transformed UIs? RQ3: How do users rate the appearance of the transformed UIs? RQ4: How much time does it take to optimize an app? 4.6.1 Implementation I implemented my approach in Java as a prototype tool called AIRES (Android guI coloR schEme tranSformer). My implementation uses Apktool [36] to disassemble APK resource les and repack the modied les into a new APK le. I developed the static UI analysis on top of the GATOR [70] framework to compute the layout and color information of the UIs. For performing color analysis, I used Soot [46] to provide the Jimple code of an APK and basic app information, such as the control ow graph and the call graph. To collect UI information from runtime, I used UI Automator [47] and ADB [40] to dump the layout hierarchy les and capture the screenshots when interacting with an app on a Samsung Galaxy S5 smartphone running Android 5.0. I adapted the NSGA-II algorithm implementation in jMetal [71] to generate a new color scheme. As mentioned earlier, I adapted the probabilistic matching algorithm [20{22] to merge the VTs. For app rewriting, I used Soot to modify the Dalvik bytecode. To answer these RQs, I considered four dierent types of program analysis techniques. The detailed features of these four approaches are shown in Table 4.2. Column \Approach" and \Analysis Type" are the name of each approach and which type of program analysis it represents. Column \Fragment" refers to whether each approach can handle fragments dened in an app. Column \View Source" describes that whether each approach can track the denition points of views in source code. Column \Color Collection" denotes from what inputs each approach collects the color information. Column \Color Source" shows on which underlying step(s) each approach relies to track the color denition points. Within these columns, the values \7", \3", and \N/A" mean no, yes, and not applicable. The rst approach is the original GATOR, which is a static analysis technique for modeling views, therefore it can track views' denition points. The second approach is GATOR with my extension described in Section 4.3.1 to handle fragments (GATOR+), and it has one more feature of considering fragments compared to GATOR. Neither 73 Table 4.2: Dierent approaches' feature comparison Approach Analysis Type Fragment View Source Color Collection Color Source GATOR static 7 3 N/A N/A GATOR+ static 3 3 N/A N/A D+M dynamic 3 3 screenshots naive matching AIRES hybrid 3 3 screenshots XMLs code color value analysis merging Table 4.3: Subject apps for the repair approach ID Package Name Size (MB) Activity# Class# Jimple# 1 android.nachiketa.ebookdownloader 1.7 3 26 1,461 2 be.brunoparmentier.openbikesharing.app 1.5 5 61 3,422 3 be.brunoparmentier.wikeyshare 2.0 4 69 3,810 4 ch.ihdg.calendarcolor 0.05 2 18 385 5 com.danhasting.radar 2.6 5 88 4,496 6 com.danielkim.soundrecorder 1.2 2 56 2,339 7 com.dosse.bwentrain.androidPlayer 1.9 8 73 5,271 8 community.peers.internetradio 16.4 2 50 2,105 9 com.ibrahimyousre.resumebuilder 2.1 2 62 2,535 10 com.javierllorente.adc 1.3 3 31 1,000 11 com.fproject.cryptolitycs 2.3 9 116 6,363 GATOR nor GATOR+ can model color information. The third approach is a baseline approach { dynamic analysis plus a naive VT matching (D+M). To enable the dynamic analysis based technique to track the view and color sources, I introduced a naive tree matching algorithm, which matches two tree nodes only when their XPaths are identical. This allows the dynamic analysis based technique to nd the source locations of views by matching the view hierarchies with the layout XML les in the APK le. For a dynamic analysis, it captures the displayed views no matter how they are dened (e.g., in a fragment). It collects colors from screenshots, and assign color information to the matched views in the layout XML les identied by the naive matching. It assumes that a color source is from the matched view tag. AIRES uses a hybrid analysis, which includes a static analysis modeling fragments and estimating color values via a color value analysis, a dynamic analysis capturing screenshots, and merging results of two types of analyses. For my experiments, I ran each approach on a Dell XPS 8700 desktop running 64-bit Ubuntu 16.04 with an Intel Core i7-4770 CPU and 32GB memory. 4.6.2 Subject Apps For my experiments, I used 11 open source apps collected from F-Droid [72]. Since App 11 stopped working between experiments for RQ1 and RQ2{4, I only used the whole set for RQ1, and Apps 74 1{10 in RQ2{4. The subject apps are listed in Table 4.3. Column \Activity#" refers to the number of activities dened by the app developers. Column \Class#" and \Jimple#" are the number of application classes, and Jimple statements in each app respectively. These apps used a wide range of mechanisms to dene their layout and color information. In specic, they used various views dened by the Android SDK and dened these views in libraries, developer code, and fragments. The colors were also set in dierent ways { API calls, explicit XML settings, and implicit XML settings. 4.6.3 RQ1: Benets of Hybrid Analysis In this RQ, I evaluated whether my hybrid analysis is better than other types of analyses alone in supporting the goal of recoloring apps. As discussed in Section 4.1, the completeness of identied information will in uence the success of the recolored apps. Similarly, the completeness of mod- ications will also impact the eect of the recolored apps. To address RQ1, I considered three metrics for completeness. For the rst metric, I used the number of distinct identied views. As is known, the colors are set for views. Missing a view may neglect several colors, thus it may harm the energy saving and quality of the new color scheme. To calculate the result for each of the four approaches, I counted the number of nodes in an activity's overall VT, which is the merged VT representing all the possible identied layouts. Even though a view is captured by an approach, it does not imply that this view's color can be altered which requires localizing the sources (i.e., denition points). It is important to know how many of them have sources that can be accurately identied. Therefore, for the second metric, I compared the numbers of views whose source locations were correctly identied for each app. This metric indicates how many color modications can be made at the view level by changing the XML attributes. This metric can be applied to the four approaches. To calculate this metric, I rst manually examined the merged VTs and the source code of each app, and counted the number of views that had correctly corresponding views dened in the layout XML les. The two metrics above measure the com- pleteness of collected UI information, but it is unknown what impact the collected UI information will have on the color transformation of the app. Thus, I introduced the third metric measuring how many color modications were made when applying the new color scheme. Since the new color scheme requires the detailed color and size information of views, this metric is only applied to AIRES and D+M. To calculate this metric, I counted the color modications performed by dierent approaches in various aspects (e.g., layout XML les, images, colors in code, and colors in resource XML les). 75 0 50 100 150 200 250 300 350 400 App1 App2 App3 App4 App5 App6 App7 App8 App9 App10 App11 # of Identified Views GATOR GATOR+ D+M AIRES Figure 4.6: View number results for four approaches The results of the rst metric are shown in Figure 4.6. It is clear that the hybrid analysis in AIRES is eective in identifying views in Android UIs. The number of views identied by AIRES was never less than other approaches. More views identied implies that more colors are considered when generating the new color scheme. Thus, my hybrid analysis is more helpful. I further manually inspected the identied views by these four approaches. In App 5 and App 8, AIRES did identify 44 and 6 false positives respectively, which are included in the corresponding bars of Figure 4.6. Actually these false positives were also found in the results of GATOR and GATOR+. Specically, the 44 false positives in App 5 were identied by GATOR and GATOR+, while the 6 false positives in App 8 were identied by GATOR+ only. These 6 false positives in App 8 belong to fragments. GATOR did not handle fragments, thus it did not have these 6 false positives. GATOR is a ow-insensitive analysis, therefore it does not dierentiate whether certain views are added within a specic activity's methods. Since AIRES is based on GATOR, its hybrid analysis inherits this limitation from GATOR. If excluding those false positives, regarding identifying views, the hybrid analysis (AIRES) beat other analyses in 9 out of the 11 apps and tied with the dynamic analysis (D+M) in the remaining 2 apps. Given that my hybrid analysis is a combination of static and dynamic analyses, I then investigated scenarios when either analysis was better in identifying views. The dynamic analysis could capture more views when they are (1) complex SDK/library dened views (e.g., Toolbar's layout) which are not considered in the static analysis; (2) a WebView's contents which are dened in dierent semantics not considered in the static analysis; and (3) some contents that are set using complex Adapters, which are not handled in GATOR+ and GATOR. The static analysis (GATOR+) could collect more views 76 0 50 100 150 200 250 300 App1 App2 App3 App4 App5 App6 App7 App8 App9 App10 App11 # of Correct View Sources GATOR GATOR+ D+M AIRES Figure 4.7: View source results for four approaches when they are invisible or they are from unvisited activities. Compared to GATOR, GATOR+ could identify more views that are dened in fragments. The results indicate that my extension of processing fragments is useful. Given that views identied by either analysis are not identical, the merged results of my hybrid analysis are the superset of each type of analysis. To sum up, the merging process makes the hybrid analysis benet from the advantages of both types of analyses. Similarly, as is shown in Figure 4.7, AIRES is more eective in tracing the source locations of views. AIRES correctly identied more views' source locations than other approaches. I also investigated the reasons why other approaches failed to nd many views' source locations. The reasons are (1) since D+M uses the naive matching, the matching will fail when a layout is loaded inside another (e.g., loading a fragment); (2) a layout may be loaded for multiple times (e.g., contents of a ListView) while the static analysis only considers it once; and (3) views are invisible or unvisited and can only identied by the static analysis. All these reasons justied the necessity of the static layout analysis and the merging part of my approach. I also inspected views whose sources were wrongly identied. I found that some views only from the dynamic analysis were annotated with a wrong XML le that shares many identical tags (including XPaths) with the correct XML le. That represents a limitation of my XPath based matching algorithm. Without considering more information, the les with identical XPath nodes are equal when performing the matching. As is shown in Figure 4.8, AIRES generates more modications than D+M. AIRES modied more layout XML les in 9 out of 11 apps, and tied with D+M in the 2 remaining apps. More color modications means that more UI parts in an app will be recolored. Therefore, AIRES is 77 0 5 10 15 20 25 30 35 App1 App2 App3 App4 App5 App6 App7 App8 App9 App10 App11 # of Modifications Layout XML File (AIRES) Layout XML File (D+M) Color in Code (AIRES) Color in Code (D+M) Image (AIRES) Image (D+M) Color in Resource XML (AIRES) Color in Resource XML (D+M) Figure 4.8: Color modication results for AIRES and D+M approaches better than D+M. This is because the static analysis and the merging allow AIRES to identify more correct view sources. Besides layout XML les, AIRES could also modify images (set both in layout XML les and in code), colors in code, and colors in resource XML les while D+M could only modify images that are explicitly referred in layout XML les. My color value analysis enables AIRES to track images and colors set in XMLs and the code. Later, I inspected the layout XML le changes at the tag attribute level. In many activities, I discovered that AIRES yielded more complete recoloring than D+M with fewer attribute modications. This is because AIRES analyzed the view background color source and propagated the background color from a parent view and its child views. On the other hand, D+M assumed that a view's displayed background color was dened in the corresponding XML tag. In summary, my hybrid analysis (AIRES) outperformed the static analysis (GATOR and GATOR+) and the dynamic analysis (D+M) in the three metrics above. First, the hybrid analysis could identify more views when collecting the UI information. Second, due to the static analysis and the merging process, the hybrid analysis could accurately track more views' sources. Third, the hybrid analysis could generate more color modications to ensure better color transformation eect of the app. Despite the advantages of the hybrid analysis, my approach still has some limitations: (1) it inherits the limitations from the underlying GATOR framework. Therefore, AIRES could miss SDK/library dened views, web pages, and some views dened via Adapters. The rst two types are out of scope of my problem, and the third type can be alleviated by the merging process of 78 AIRES and GATOR implementation improvement; (2) due to the use of static analysis, AIRES may identify some false positive views. However, since my problem is to generate a new global color scheme for an app instead of an activity, these false positives only sacrice the optimality of the new color scheme with better completeness and the ability to track color source information; (3) AIRES may assign the wrong layout XML sources to identied views due to the limitation of the probabilistic matching algorithm used in the merging step of my approach. However, this case only accounts for 3% of all the identied view sources. 4.6.4 RQ2: Energy Saving The goal of this RQ is to assess the energy savings of the transformed UIs. To address this RQ, I ran AIRES and D+M on each subject 30 times to mitigate the non-determinism inherent in the underlying search-based algorithm. For this RQ, I evaluated the solutions with the lowest, median, and highest energy consumption scores from the Pareto frontier formed by a non-dominated set of solutions from 30 runs. To measure the energy, I ran the original app and 6 modied versions on a Samsung Galaxy S5 smartphone connected to a Monsoon power monitor [34]. When measuring each version of an app UI's power consumption, I set the screen brightness level to maximal and sampled the power measurement for 20 seconds after the phone was in an idle state. For each visited UI, I took measurements of the UI screen 10 times and reported the average power consumption. The power dierence between the original UI and a modied version serves as the actual power saving for that version. Note that I excluded the UIs where unchangable views (e.g., WebView, MapView) dominate the display area. The results are shown in Figure 4.9. The Y-axis shows the percentage of power saving of each color scheme solution. On average, the lowest, median, and highest energy consumption solutions generated by AIRES decreased the original UI's power consumption by 54%, 44%, and 30% respectively. Meanwhile, the lowest, median, and highest energy consumption solutions generated by D+M decreased the power consumption by an average of 32%, 27%, and 20% respectively. Overall, these are strong indications that my approach can result in signicant energy savings for app users, and is better than D+M regarding energy savings. I further inspected the transformed UIs and investigated the reason why AIRES's transforma- tion saved more energy. I found that the root cause was that D+M could not nd the correct source locations for many views, therefore many views' colors remained unchanged when applying the new color scheme. For example, for a UI in App 9, the transformed version generated by AIRES (Figure 4.10b) contained more color changes than the one generated by D+M (Figure 4.10c). I no- ticed an interesting observation by comparing the three versions of UIs in Figure 4.10. For a view 79 Figure 4.9: Power savings for six color scheme solutions (a) Original (b) Transformed by AIRES (c) Transformed by D+M Figure 4.10: Three versions of screenshots of a UI in App 9 80 whose displayed background color inherits from its parent view, changing its parent view's back- ground color should be conceptually the same as changing its own background color, but layout distortion may occur unexpectedly. Therefore, AIRES's transformation which realizes more com- plete recoloring with fewer attribute modications may avoid potential layout distortion. From this nding, I can conclude that the static analysis and the merging component of my approach have a positive impact on energy savings of transformed UIs. 4.6.5 RQ3: User Acceptance In this RQ, I try to understand whether the new UIs are appealing from an aesthetics perspective. To address this RQ, I conducted an end-user survey in which users were asked to compare and rate the two aspects of the original and transformed UIs. Like RQ2, I used the solutions with lowest, median, and highest energy consumption scores from the Pareto frontier generated by 30 runs of AIRES and D+M. For each subject, the survey presented, in random order, a screenshot of the original and transformed UIs when displayed on a mobile device. Similar to existing techniques [5, 15, 16], I asked each participant four questions to retrieve users' acceptance: (1) select which of the two versions (original or transformed) they would prefer to use on their mobile devices; (2) rate the readability of each version of the UI on a scale of 1{10, where 1 means low and 10 means high; (3) rate the attractiveness of the UI on a scale of 1{10, where 1 means low and 10 means high; and (4) if one version could save them X% energy, at what battery level would they choose to use it (where X is replaced by the actual energy saving for each app UI). The available responses to the last question were \Always{regardless of battery level", \Most of the time", \Only when the battery level is low", \Only when the battery level is critical", and \Never". The answers to the rst and fourth questions re ect users' preferences in general and special situations. The second and third questions help us to understand the impact of the new color scheme on readability and attractiveness. I conducted the survey on the Amazon Mechanical Turk (AMT) platform. AMT allows re- questers to anonymously publish jobs to anonymous workers to earn money by completing tasks. To lter out workers with records of haphazardly completing tasks, I restricted the participants to workers that have high approval ratings (over 98%) and have completed over 1,000 approved tasks. In general, that is a fair selective criterion widely used in AMT. In total, I had 120 anony- mous participants, and each solution for the 10 subject apps had 20 completed surveys. Each participants was paid $0.65 for completing a survey. For median energy consumption solution generated by AIRES, the mean dierences of attrac- tiveness and readability scores between transformed and original apps were -1.92 and -1.47. For 81 74% 79% 87% 84% 94% 88% 26% 21% 13% 17% 7% 13% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% AIRES (Highest) D+M (Highest) AIRES (Median) D+M (Median) AIRES (Lowest) D+M (Lowest) % of Participants' Preference Original Transformed Figure 4.11: General preference results for six color scheme solutions the median energy consumption generated by D+M, the mean dierences of attractiveness and readability scores between transformed and original apps were -1.88 and -1.89. Therefore, the score decrease for AIRES was very close to D+M's in attractiveness and was 78% of the score decrease for D+M in readability. For the highest energy consumption solutions, the attractive- ness and readability score drops for AIRES were also close to the score drops for D+M. However, for the lowest energy consumption solutions generated by AIRES and D+M, the average attrac- tiveness and readability score decreases were -3.33 and -3.60 while the same score decreases for D+M were -1.86 and -1.87. Then I inspected the modied UIs and found that some of modied UIs generated by D+M were partially recolored. For instance, in Figure 4.10, the original color scheme was partially modied in the UI generated by D+M while the UI generated by AIRES was completely recolored using dark colors. This may explain why AIRES's score drops are much higher than D+M's among lowest energy consumption solutions. The preference question results are shown in Figure 4.11. The X-axis represents the six dierent color scheme solutions and the Y-axis is the percentage of participants preferring corresponding versions of UIs. On average, 84% of users preferred the original version among these six solutions. More specically, more participants preferred AIRES's transformation over D+M's transformation for the highest energy consumption solutions. This can also be explained by the fact that several D+M generated apps were partially recolored. The preference question results when considering 82 16% 20% 14% 18% 28% 29% 24% 12% 22% 21% 15% 14% 17% 25% 43% 29% 38% 34% 31% 31% 17% 18% 11% 13% 13% 13% 5% 16% 9% 11% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% AIRES (Highest) D+M (Highest) AIRES (Median) D+M (Median) AIRES (Lowest) D+M (Lowest) % of Participants' Preference Never Only when critical Only when low Most of the time Always Figure 4.12: Preference results considering the battery level for six color scheme solutions battery level are shown in Figure 4.12. The X-axis represents the six dierent color scheme solutions and the Y-axis is the percentage of participants preferring the transformed version at dierent battery levels. When asked to consider the energy savings, there was a dramatic shift in user preference. On average, 31% of users chose to use the transformed UI for general usage (i.e., \Always" and \Most of the time") among these six solutions. Overall, more participants (81%) preferred AIRES's transformed UIs than those of D+M (78%) before the battery becomes critically low. In summary, I believe that the results for user acceptance are positive. Although users generally rated lower attractiveness and readability for my transformed apps, when they were informed of the energy savings, they dominantly switched to use the transformed version. 4.6.6 RQ4: Analysis Time To answer this research question, I measured the running time of AIRES on the subject apps. The results are shown in Table 4.4. The analysis time is divided into the static analysis (Column \Static"), merging the dynamic analysis results (Column \Merge"), the time to compute the color scheme (Column \Transform") and rewrite (Column \Rewrite"). Each result reported is the average run time of 30 executions. The analysis time ranged from 29 to 139 seconds, with an 83 Table 4.4: The repair technique's analysis time for subject apps ID Static (s) Merge (s) Transform (s) Rewrite (s) Total (s) 1 3.5 18.8 13.6 8.7 44.5 2 8.5 35.1 5.5 28.4 77.5 3 9.0 29.5 5.9 10.6 54.9 4 2.4 7.7 17.3 1.5 28.9 5 6.9 47.7 20.4 63.6 138.6 6 6.5 15.9 8.9 17.1 48.3 7 9.3 50.4 39.4 9.6 108.7 8 4.2 21.6 6.2 11.2 43.1 9 3.7 27.8 4.6 9.7 45.8 10 3.0 19.3 7.0 6.7 35.9 average of 63 seconds per app. These results suggest that the analysis time is reasonable and will not be a barrier to the user acceptance of my tool. 4.6.7 Threats to Validity External Validity: A potential threat is that the subject apps may not be representative. To mitigate this threat, I randomly selected a set of apps from F-Droid, which collects real-world apps from Google Play Store. Furthermore, these apps covered common mechanisms of dening layouts and colors (e.g., API calls, explicit and implicit XML settings). Since my approach does not consider complex SDK/library views and WebViews, my approach cannot save energy for UIs when these kinds of views cover the most region. Note that these limitations are either inherited from the underlying static analysis (i.e., GATOR) or out of the problem scope. As a result, I excluded this type of UIs in the RQ2 experiment. Internal Validity: A threat to the internal validity of the results could exist if the mechanism of establishing the energy consumption was inaccurate. To ensure the accuracy of the power measurements, the protocols were developed to ensure that they reliably and accurately captured power measurements. The protocols are also based on prior approaches [11, 18, 23, 24, 41, 73]. Construct Validity: A potential threat is that the attractiveness and readability objec- tives used in my approach are quite subjective among users. To address this threat, I conducted user-based studies to quantitatively understand users' rating of color transformation on the vi- sual appeal of a mobile UI. I used the relative score dierence to quantify the impact of color transformation on attractiveness and readability aspects of a mobile UI. 84 4.7 Conclusion In this chapter, I introduced a novel technique to automatically optimize the display energy for Android apps and conducted an empirical evaluation on it to demonstrate its eectiveness and eciency. First, my technique is designed to repair the DEHs in mobile apps, which conforms to the problem domain in the hypothesis. Second, my technique was able to reduce the display energy by an average of 43%. My user study also showed that 81% of users were in favor of adopting my approach's transformed Android UI before the battery level drops to critically low. The signicant energy savings and the good user acceptance can be interpreted as high eectiveness. Third, the average runtime of my approach was 63 seconds per app, which is within a reasonable range. Thus, I can conclude that my technique is ecient. To sum up, I can draw a conclusion that my approach could repair DEHs with high eectiveness and eciency. In the evaluation, the results of the comparison among dierent types of program techniques demonstrated the benet of the hybrid program analysis used in AIRES. At the same time, the hybrid analysis inherits the limitations of the underlying techniques. Consequently, the hybrid analysis will benet from the future improvement of these underlying techniques. For example, future research can focus on analyzing the complex adapter-based views and interpreting how HTML elements are translated into Android views when displayed in a WebView. The more UI elements can be identied statically, the more elements can be recolored. Moreover, if a dynamic analysis can visit more UI windows or more UI states of a window, it can provide more UI information for the hybrid analysis to merge. 85 Chapter 5 Related Work In this chapter, I discuss related work regarding the detection and repair of energy optimizable UIs in mobile apps. I also discuss dierent approaches in the literature that are related to the underlying techniques of my approaches. 5.1 Reducing Display Energy by Recoloring Due to the unique feature of OLED displays, an important way of saving display energy is to apply an energy-ecient color scheme to the display contents. These approaches can be broadly divided into two categories as described below. Recoloring the UI There has been extensive work on optimizing display energy by recoloring the UIs. Dong et al. [15, 16] proposed a color adaptive browser Chameleon, changing the color rendering of web pages based on predened color transformations. Li et al. [5{8] presented another optimizing technique, Nyx, which computes a color transformation for a web page and directly applies the new color scheme by modifying the source code of the web page. Both techniques target web apps, which consist of HTML pages and CSS settings. Linares-V asquez et al. [9{11] proposed GEMMA utilizing a multi-objective genetic algorithm to compute an energy-ecient color scheme for Android GUIs. Lee et al. [17] proposed a dynamic programming algorithm to generate an energy-ecient color scheme while maximizing users' color preference. Based on a screen space variant energy model, Chuang et al. [74] presented an approach to generate energy-ecient color designs by using iso-lightness colors. Wang et al. [75] proposed a technique to nd an energy-saving color scheme for sequential data on OLED screens. Kamijoh et al. [76] reduced the display energy of the IBM Wristwatch by reducing the number of white pixels. My repair approach diers in 86 the following aspects: (1) my approach utilizes a hybrid analysis to collect the color information while the aforementioned techniques rely on static analysis or dynamic analysis only; (2) my approach automatically modies the app to apply the new color scheme, while these techniques either cannot be applied to mobile apps or only provided a new color scheme without changing the app's color design. Recoloring Non-UI Content Many researchers also proposed color transformation techniques for non-UI contents (e.g., images, videos) to save energy for OLED screens. Lin et al. [77, 78] proposed an image pixel scaling tech- nique taking visual attention into account to reduce power and maintain the visual quality of an image. Li et al. [79] presented a technique for modifying an image to save display energy by dim- ming the region of non-interest. Chang et al. [80] proposed a real-time image color transformation technique based on structural similarity assessment to save energy for OLED displays. Chondro et al. [81] presented a pixel dimming transformation for an image to darken the luminance and save power for OLED displays. Jin et al. [82] developed a low-power color transformation approach for images, which balances between visual satisfaction and power consumption. Stanley-Marbell et al. [54] developed a runtime system called Crayon to reduce display power via acceptable shape and color transforms. Kim et al. [55] presented a technique called Blind to transform an image color space with another imperceptibly dierent color space with higher energy eciency. Wang et al. [75] proposed an approach to generate energy-saving color schemes for sequential data visual- ization. All the techniques above directly extract the colors from the input les (e.g., images) and modify the colors by rewriting the input les. In contrast, my repair approach analyzes the code of an app, models dierent ways of setting colors, and applies the new color scheme by changing the color settings in the code. My repair approach is not designed to optimize non-UI contents, but all of the above techniques can be adopted together with my approach by developers to save more display energy when introducing non-UI contents in their apps. 5.2 Reducing Display Energy by Darkening Another classic way to save display energy is to darken the brightness level of the screen. Iyer et al. [83] proposed a method to reduce the energy consumption of OLED screens by darkening the user-unfocused areas. Wee et al. [84] designed an approach to reduce the power consumption of gaming on OLED displays by dimming non-interesting parts. Tan et al. [85] also proposed a tool called Focus to reduce OLED display power by dimming less important regions. Chen et al. [86] 87 reduced OLED display power by using a dimming scheme to eliminate undesired details. Shim et al. [87] proposed a hardware solution called EDLS to save energy for LCD displays by reducing the backlight luminance. Iranli et al. [88, 89] proposed a technique to maximize backlight dimming in an image while maintaining a pre-specied distortion level for a LCD display. Bhowmik et al. [90] presented another technique that changes backlight brightness and refresh rate of LCD panel to enhance battery life. My repair approach does not take the brightness of a screen into account, and those aforementioned techniques can be complementary approaches to save more display energy of a mobile app. 5.3 Detecting Energy Bugs in Mobile Apps Recently, there are many approaches that are proposed to detect dierent types of energy bugs in mobile apps. Pathak et al. dened energy bugs [91] and proposed an automatic technique to detect energy bugs on smartphones [2]. Zhang et al. [92] developed a tool called ADEL to detect energy leaks from unnecessary network communication. Oliner et al. [93] employed a black-box based approach to detect energy bugs in mobile devices. Guo et al. [94] proposed a light-weight static analysis tool called Relda to detect resource leaks in Android apps. Following that, Wu et al. [95] presented another light-weight and precise static resource leak detection tool, Relda2, which is based on a resource table. Yan et al. [96] proposed a comprehensive testing approach focusing on the coverage criteria related to resource leaks. Liu et al. [97, 98] proposed an automated approach to detect energy bugs that do not deactivate sensors and misuse sensor data. Linares-V asquez et al. [99] studied energy greedy API usage in Android apps. Banerjee et al. [100] proposed a test generation framework to detect energy bugs/hotspots preventing the smartphone from becoming idle in Android apps. Gui et al. [101] conducted an empirical study on ads in mobile apps, and found that the ads introduce hidden costs including more energy consumption. Ma et al. [102] presented a tool called eDoctor to diagnose abnormal battery drain issues on smartphones. Liu et al. [103] developed a static analysis tool called Elite to detect two most common patterns of wake lock misuses in Android apps. Zhang et al. [104] presented an automated test generation technique for resource leaks in Android apps. Li et al. [105] presented a technique called Bouquet to detect and optimize Android apps' HTTP requests that can be bundled to reduce energy consumption. Jiang et al. [106] presented a static analysis technique called SAAD to detect two types of energy bugs (resource leak and layout defect) in Android apps. Lyu et al. [107] proposed an approach to detect and repair Repetitive Autocommit Transaction pattern in Android apps to improve energy cost and runtime performance. Abbasi et al. [108] developed a tool to detect application 88 tail energy bugs in Android apps. None of the techniques discussed above is designed to detect display related energy bugs in mobile apps. 5.4 Power Modeling Techniques for Mobile Devices There is a body of work related to the power modeling technique that is used in my detection and repair approach. Existing techniques target dierent components of a mobile device. Those techniques can be categorized as below. Power Model for OLED Screens There are some techniques proposed to build dierent power models for OLED displays. Dong et al. [18, 41] constructed a power model for a commercial QVGA OLED display module. In their power model, they demonstrated the linear relationship between power consumption and sRGB value. Zhang [12] built a quadratic model for OLED screens, which increased estimation accuracy. Kim et al. [109] modied Dong et al.'s model by considering the brightness and sum of RGB values for AMOLED screens. Mittal et al. [110] rened a power model based on thresholds of RGB component values. Even though the DEP in Section 2.2 and Section 4.4 was built from Dong's power model, other power models could also be used to implement the DEP. However, the above work only focused on developing new techniques for modeling the power consumption of OLED screens and did not detect DEHs or provide optimization guidance to developers. Power Model for Non-Display Hardware and Mobile Software Aside from the display component, many approaches are proposed to construct power models for other hardware components and mobile software. Shye et al. [111] obtained power models for all components of an Android G1 based on measurements of a logger they developed. Negri et al. [112] treated applications as Finite State Machines, and built power models through measurement of selected states. Zhang et al. [12, 113] collected the power consumption records and employed a regression based approach to derive the power model for each hardware component of a mobile device. Dong et al. [114] proposed a self-constructive energy modeling system called Sesame to build a model from self power measurement data generated from the smart battery interface. Xiao et al. [115] proposed a methodology to build system-level power models for dierent components of mobile devices without measurements. Hao et al. [29, 116] combined program analysis and per-instruction energy modeling to provide ne-grained energy consumption estimate at the code level. Li et al. [28] developed a technique to collect power measurement for paths via program 89 analysis and trained a power model to predict energy consumption at the source line level. Tiwari et al. [117, 118] modeled the CPU energy of hardware instructions to quantitatively estimate the software energy consumption. Eprof [119] modeled the energy using a state machine. Wang and colleagues [120] estimated the power consumption of mobile applications with prole-based battery traces. Li et al. [121] proposed Bugu, an application level power proler and analyzer for mobile phones. Tsao et al. [122] estimated the energy consumption of I/O requests in application processes. 5.5 Modeling Mobile GUIs There exist techniques for constructing models for mobile GUIs, and the models are widely used in the research area of automated mobile testing. There are mainly two focuses of the GUI information { UI transition and UI content. Therefore, I classify the techniques into these two categories. Modeling UI Transitions An important aspect of mobile GUI information is UI window transitions. There are many tech- niques modeling UI transitions via dynamic analysis. Takala et al. [123] proposed a keyword-driven Android test automation tool, which models the UI transitions as state machines. Amaltano et al. [124] presented an Android testing tool A 2 T 2 to build a GUI model through a crawling pro- cess and automatically generate test cases. Amaltano et al. [125] developed another tool called AndroidRipper extended from a previous ripping technique, to re new events when traversing Android GUIs. The test suite generated by AndroidRipper could reveal undocumented bugs. Based on the idea of GUI ripping, Amaltano et al. [126] proposed a new testing framework Mo- biGUITAR to generate test cases automatically. Choi et al. [127] presented a machine learning based testing algorithm to maximize the test coverage while avoiding app restarts. Cao et al. [128] proposed a new technique that develops a feedback-based exploration strategy to execute actions that are likely to lead to new UI states. Gu et al. [129] developed a GUI model that is rened based on the runtime information during testing to achieve better test coverage. Dong et al. [130] proposed a time-travel testing technique for Android apps to identify progressive states and travel to them for maximizing code coverage and state exploration. The GUI transition model can also be built from static analysis. Azim et al. [131] presented A 3 E which constructs the activity transitions statically and employs depth-rst strategy to explore 90 the activities. Yang et al. [132, 133] employed static analysis to model the transitions as a Win- dow Transition Graph (WTG) including GUI windows, and their associated callbacks and events. Mirzaei et al. [134] proposed a combinatorial technique called TrimDroid to automate GUI test- ing. This technique uses static analysis to model the activity transitions and partitions dependent widgets so that it can prune test combinations during test generation. Zhang et al. [135] proposed a technique called CHIME to model the UI transitions while dierentiating the launch modes and contexts of the activity instances. Chen et al. [136] presented a new technique FragDroid that modeling the transitions between Android activities and fragments. Huang et al. [137] pro- posed a new GUI models that includes an enhanced WTG [132, 133] (considering NavigationView, Fragment, RecyclerView) and transition constraint information. Lai et al. [138] proposed a new analysis to generate a Screen Transition Graph, which modeled activities, fragments, menus/draw- ers, and dialogs as nodes. This model is more ne-grained than the previous WTG, which helps the dynamic exploring hit more targets during testing. Yan et al. [139] proposed a multiple-entry testing framework to statically build activity launching models and generate complete launching contexts of activities. All the static analysis techniques above focus on modeling API calls that trigger UI window transitions to cover more UI windows. On the other hand, the static analysis part of my repair approach models the API calls and XML based settings that dene the layout and color information of a UI. Meanwhile, my static analysis identies the points in the code that need to be modied to realize the color optimization. In addition, the GUI transition model can be constructed via hybrid analysis. Zheng et al. [140] presented SmartDroid to employ hybrid analysis to detect the UI-based trigger conditions of sensitive behaviors in Android apps. Yang et al. [141] proposed a grey-box approach that uses crawling to build UI transition model and static analysis to identify supported events. Su et al. [142] proposed a stochastic self-mutated model to guide the test generation so as to achieve high code and model coverage. This technique employs dynamic analysis to build the UI state model and static analysis to extract the reable events. Behrang et al. [143] employs hybrid analysis to model GUI transitions, and then computes similarity scores for open-source apps so as to nd the reference of a developer's app sketch. Li~ n an et al. [144] proposed an augmented GUI model that intersects the models of both static analysis and dynamic analysis to represent richer information to guide the testing process. All the aforementioned techniques use the dynamic analysis to retrieve the layout of the visited UI, and employ the static analysis to collect the reable events which will guide the traversal in the dynamic analysis. The purpose of their hybrid analysis is to model more UI windows. In the hybrid analysis of my repair approach, the dynamic and static analyses are designed to collect both layout and color information. Once both types of 91 program analysis terminate, there is a merging process to combine the collected UI information. The purpose of my hybrid analysis is to glean more complete layout and color information of a UI. All the techniques above mainly care about whether more UIs can be visited, therefore they model each UI state at dierent levels of granularity (from keywords to screenshots). None of them can guarantee the completeness of the UI information, nor identify the changing locations in the app to realize the color transformation. However, those techniques can help guide the dynamic analysis part of my repair approach to traverse more UIs. Modeling the UI Content There are some techniques that model the content of a UI window. Rountev et al. [57] and Yan [58] presented a static reference analysis framework, GATOR, which can be used to model the layout of an activity. Kuznetsov et al. [59] presented a static analysis to identify the UI elements and their callbacks in an Android app. Chen et al. [60] presented a tool, StoryDroid, to generate snapshots of rendered Android UI pages and their implementation code, so that it can assist dierent roles to review apps eciently. As described in Section 4.1, dynamic analysis and static analysis of the aforemetioned techniques have their own limitations in modeling UI layouts. The hybrid analysis in my repair approach overcomes these limitations and collects color values and their denition points. In addition, Xiao et al. [145] proposed a technique called IconIntent, which employs a static analysis to identify the association between views and icons, and classies icons to identify sensitive UI widgets. Besides the icon-view association, the static analysis of my repair approach also considers the association between views and their background and text colors. While IconIntent merely handled explicit XML settings and API calls with constant argument value, the static analysis of my approach can handle more complex cases (e.g., implicit XML settings, non-constant values). Moreover, none of the aforementioned techniques is designed to transform the colors automatically. 92 Chapter 6 Conclusion and Future Work In this chapter, I rst discuss the conclusion of my dissertation and then discuss directions of future work. 6.1 Conclusion This dissertation has broadly demonstrated how program analysis techniques can help in the new area of optimizing mobile UIs' display energy through three major components. To demonstrate the eectiveness of the program analysis technique for detecting energy opti- mizable UIs, Chapter 2 presented dLens, the rst program analysis based technique using dynamic analysis, power modeling, and color transformation to automate the detection of energy optimiz- able UIs. DLens conrms the hypothesis in the following aspects. First, the approach targets Android app UIs, which is consistent with the problem domain in the hypothesis. Second, to automate the detection process, I dene an energy optimizable UI as a DEH { a UI of a mobile app whose energy consumption is higher than an energy-optimized but functionally equivalent UI. Third, the approach itself employs dynamic analysis, a type of program analysis, to capture the UI screenshots that are analyzed. Fourth, the detection approach can estimate the power precisely and rank the screenshots accurately based on the optimization potential. The results are generalizable among dierent devices. The high accuracy and good generalizability can be interpreted as high eectiveness. Last, my approach can analyze a UI screenshot in a reasonable amount of time, which indicates my approach is ecient. In summary, I can draw a conclu- sion that my approach can detect the DEHs with high eectiveness and eciency. Furthermore, the nding that many apps in the Android market are not optimized in terms of display energy eciency motivated the third component discussed later. 93 To help determine the design choice of the repair technique, I conducted an empirical study to explore what are the new trends of code practice of app developers and whether these new trends can cause problems for existing program analysis techniques. Chapter 3 described the details of the empirical study and its results have uncovered several interesting observations that impede the functionality of state-of-art program analysis techniques. Moreover, this study discovered some important code practice trends that should be considered in future UI analysis techniques. They also directly guided the design of the hybrid analysis used in the third component. To show the eectiveness of repairing energy optimizable UIs, Chapter 4 described AIRES, the rst hybrid program analysis based technique using static and dynamic analyses, a search-based technique, and the app rewriting technique to automate the repair of energy optimizable UIs. AIRES conrms the hypothesis in the following ways. First, the approach targets Android app UIs, which is consistent with the problem domain in the hypothesis. Second, the approach itself employs both types of program analysis, static and dynamic analyses, to model the layout and color information of UIs. Third, my approach can reduce the display energy of UIs with signicant savings and good user acceptance. These two ndings demonstrate the high eectiveness of my approach. Last, the average run time of my approach is short for an app, and this shows that my approach can optimize an app eciently. To sum up, I can draw a conclusion that my approach can repair the DEHs with high eectiveness and eciency. Beyond that, the results of the comparison among dierent types of program analysis techniques demonstrated the benet of the hybrid program analysis used in AIRES. To sum up, this dissertation has broadly explored the applicability of program analysis tech- niques in the context of optimizing the energy optimizable UIs in mobile apps. I rst developed a dynamic analysis based technique (Chapter 2) to show the feasibility of applying program analysis to the detection of energy optimizable UIs. Then I conducted an empirical study (Chapter 3) about UI implementation trends in Android apps. This study's observations helped me under- stand the problems of existing program analysis techniques, and motivated the decision to utilize a hybrid analysis in my repair technique. Finally, I developed a hybrid program analysis based technique (Chapter 4) to automatically repair the energy optimizable UIs. The two approaches I developed have been demonstrated eective in detecting and repairing energy optimizable UIs in the mobile app domain. 94 6.2 Future Work Even though focusing on optimizing the display energy of mobile apps, this dissertation can make broader impact on the several research areas in computer science, including UI color design evolution, software testing, and program analysis. This section discusses the future research directions that can be inspired from this dissertation. 6.2.1 UI Color Design Evolution The straightforward impact of this dissertation is that it establishes the foundation of automat- ically collecting and modifying the color design of an app's UIs. Even though the goal of my dissertation is to reduce the display energy, this dissertation work, especially the underlying pro- gram analyses, can help developers in a wide range of scenarios during the entire lifecycle of their apps. When the developers want to change the UI color design, they can do that by feeding AIRES with a new color scheme. Moreover, with the capability of collecting and modifying the color design, this dissertation provides fundamental insights to assist the following research in the area of UI color design evolution. I mainly exemplify three directions below. Altering UI Color Design. The motivation of my approach AIRES is that an energy- ecient color scheme can reduce display energy. However, there can be more useful scenarios needing to alter UI color design. For example, the theme color of an app may be switched during dierent phases of the app's lifecycle. To meet new requirements of changing UI color design, the developers can replace the color transformation component in my repair technique with a customized component to generate the requested color scheme. Furthermore, another direction is to migrate AIRES to problem domains other than mobile apps, particularly IoT (Internet of Things) apps with GUIs, such as smartwatches. Many smartwatches (e.g., Fitbit Versa 3) use OLED screens. Aside from that, Android Wear OS apps share many common features with Android phone apps, such as UI rendering and interaction mechanisms. Therefore, the hybrid analysis can be adapted to extract the UI information of a Wear Activity. More specically, the dynamic analysis and the merging process can be directly applied to analyzing a Wear OS app. As for the static analysis part, the color value analysis part is general and can accept any program point of a color setting API call in a Wear OS app as input. But the layout analysis part needs to be modied to overcome some challenges dedicated to the Wear OS app domain. These challenges include modeling new APIs and their semantics, incorporating new rules of dening the layouts and colors of widgets, and designing a feasible way to implement the repair accordingly. 95 Tracking UI Design Changes. As mentioned earlier, an app's UIs may have dierent designs at dierent phases of its lifecycle. Given that the hybrid analysis of my approach can be used to collect the UI information including layouts and colors, researchers can study how the developers make changes to the UI design. Currently, a related technique [45] only detects the UI changes of an app from dynamically captured screenshots. The hybrid analysis of my repair approach can detect more GUI changes that may be missed by the dynamic analysis, and can also track the denition point of each widget in a UI. The extra information identied by my hybrid analysis can provide more information to help developers summarizing the UI design changes and analyzing why the developers make UI changes. Helping UI Design. With the capability of collecting UI information, researchers can employ my hybrid analysis to prole the UI design of apps on a large scale. Based on the gleaned data, researchers can classify the layout or color design based on the apps' functionalities or app store categories, and extract good examples of UI design for each category as a set of benchmarks. When developers want to create a new app, they can search for reference apps based on its functionalities and app category, and directly import the example UI design from the benchmark apps as the app sketches. As for now, GUIfetch [143] has been proposed to take the app sketches as input, and recommend apps' layout design and implementation by downloading a large set of open-source apps ltered by keywords, and comparing the similarity scores between the apps and sketches. As discussed earlier, with the help of the benchmarks, the developers will take less eorts to design the app sketches. As for the color design implementation, my hybrid analysis can also extend GUIfetch to cover the implementation code of color design in an app. Therefore, the developers can spend less eorts on how to implement the color design. 6.2.2 Software Testing Given that the hybrid analysis has the advantage of identifying more UI information and the mapping between the UI information with its source, the hybrid analysis in Chapter 4 can also impact the area of software testing. First, the VT model constructed by my hybrid analysis can serve as the base of developing new evaluation metrics, especially for GUI testing. Traditionally, the UI window coverage is a coarse-grained metric, and the classic code coverage is too ne- grained to help test generation for GUI testing. Since GUI testing is more concerned about the UI elements and their UI properties, my hybrid analysis can provide new metrics for GUI testing. Second, with the functionality of connecting the UI elements with their denition source, my hybrid analysis enables developers to debug UI issues in mobile apps. The dynamic analysis part of my hybrid analysis captures the actual rendering of a mobile UI, which can be used to 96 detect issues based on dierent criteria. The static analysis part of my hybrid analysis associates the rendered UI elements with their denition source. Based on this information, researchers can develop automated ways to x certain UI issues. I highlight two research directions in the software testing area next. Test Evaluation Metrics. As discussed above, code coverage, which is a widely used metric in the software testing, may not be a good indicator of the exhaustiveness of GUI testing. A code coverage result does not indicate whether the UI related code has been tested thoroughly. Thus, to provide more informative solutions, future research could develop more useful metrics to fulll dierent specic testing goals. For example, to test the GUI of an activity, the activity coverage is too coarse-grained. However, based on the VT identied by my hybrid analysis, I can dene a new metric called view coverage, which is based on how many view nodes in the VT are visited by executing test cases. By checking the unvisited views, the tester can focus more on triggering the code that causes UI changes in the activity. Debugging UI Issues. As mentioned earlier, my hybrid analysis has provided the necessary information to debug UI issues. Future research can analyze the collected UI information and detect the UI issues based on patterns. When repairing a UI issue, my hybrid analysis has bridged the gap between the displayed UI information and how it is programmed by the developers. Thus, future research can try dierent code modications to the localized code snippet automatically or manually. An example is size related issues in mobile UIs. Even though the static analysis of AIRES is designed for color properties, it can be easily extended to analyze size properties. Then if a certain size related issue (e.g., abnormal size in certain devices) is detected, the developer can adapt my analysis to localize the buggy code snippet. Researchers can also design techniques to compute a x by trying dierent modied settings. 6.2.3 Program Analysis As is demonstrated in Section 4.6.3, the hybrid analysis does benet from the advantages of both types of program analysis techniques. The success in my problem domain inspires future researchers to use the two types of program analysis techniques in a complementary way for more problem domains. Previously, the static and dynamic analyses in a hybrid analysis are mainly designed to model dierent facets of required information. For instance, a crawler relying on a hybrid analysis uses a dynamic analysis to interact with an app and records the transitions between UI windows while employing a static analysis to extract reable events. However, it is also reasonable to employ the two types of program analysis techniques to model the same kind of information. On the surface, it looks like wasting eorts, and static analysis is obviously more 97 complete than dynamic analysis. However, as is shown in my repair technique, many expressions in the code cannot be resolved by static analysis, and the static analysis results fail to provide informative or actionable information to solve the research problems including modeling the color design of an app. With the help of combining static analysis results with dynamic analysis results, many unresolved results become more meaningful and easier to be processed. Thus, the strategy of using the runtime results to resolve complex expression results can be applied to more research problems in software engineering, such as monitoring network communication and analyzing SQL queries. The prerequisite of doing this is that the required information can be captured using both types of program analyses. The remaining challenge is to map the execution trace based dynamic analysis results to the static analysis results analyzed from the code. 98 References [1] J. Clement. Number of Mobile App Downloads Worldwide From 2016 to 2019. https://www.statista.com/statistics/271644/ worldwide-free-and-paid-mobile-app-store-downloads/, 2020. [2] Abhinav Pathak, Abhilash Jindal, Y. Charlie Hu, and Samuel P. Midki. What Is Keeping My Phone Awake?: Characterizing and Detecting No-Sleep Energy Bugs in Smartphone Apps. In Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services, MobiSys '12, pages 267{280, New York, NY, USA, 2012. ACM. [3] Ding Li, Shuai Hao, Jiaping Gui, and William G. J. Halfond. An Empirical Study of the Energy Consumption of Android Applications. In Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on, pages 121{130, Sept 2014. [4] Ding Li and William G. J. Halfond. An Investigation Into Energy-Saving Programming Practices for Android Smartphone App Development. In Proceedings of the 3rd Interna- tional Workshop on Green and Sustainable Software, GREENS 2014, pages 46{53, New York, NY, USA, 2014. ACM. [5] Ding Li, Angelica Huyen Tran, and William G. J. Halfond. Making Web Applications More Energy Ecient for OLED Smartphones. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 527{538, New York, NY, USA, 2014. ACM. [6] Ding Li, Angelica Huyen Tran, and William G. J. Halfond. Nyx: A Display Energy Opti- mizer for Mobile Web Apps. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 958{961, New York, NY, USA, 2015. ACM. [7] Ding Li and William G. J. Halfond. Optimizing Energy of HTTP Requests in Android Applications. In Proceedings of the 3rd International Workshop on Software Development Lifecycle for Mobile, DeMobile 2015, pages 25{28, New York, NY, USA, 2015. ACM. [8] Ding Li, Angelica Huyen Tran, and William G. J. Halfond. Optimizing Display Energy Con- sumption for Hybrid Android Apps (Invited Talk). In Proceedings of the 3rd International Workshop on Software Development Lifecycle for Mobile, DeMobile 2015, pages 35{36, New York, NY, USA, 2015. ACM. [9] Mario Linares-V asquez, Gabriele Bavota, Carlos Eduardo Bernal C ardenas, Rocco Oliveto, Massimiliano Di Penta, and Denys Poshyvanyk. Optimizing Energy Consumption of GUIs in Android Apps: A Multi-Objective Approach. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 143{154, New York, NY, USA, 2015. ACM. [10] Mario Linares-V asquez, Carlos Bernal-C ardenas, Gabriele Bavota, Rocco Oliveto, Massim- iliano Di Penta, and Denys Poshyvanyk. GEMMA: Multi-Objective Optimization of Energy 99 Consumption of GUIs in Android Apps. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pages 11{14, May 2017. [11] Mario Linares-V asquez, Gabriele Bavota, Carlos Bernal-C ardenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. Multi-Objective Optimization of Energy Consump- tion of GUIs in Android Apps. ACM Trans. Softw. Eng. Methodol., 27(3):14:1{14:47, September 2018. [12] Lide Zhang. Power, Performance Modeling and Optimization for Mobile System and Ap- plications. PhD thesis, University of Michigan, 2013. [13] Alex Barredo. A Comprehensive Look at Smartphone Screen Size Statistics and Trends. Medium. [14] Daniel Petrov. What's the Ideal Android of Today? A Bench- mark Report Has the Answer. https://www.phonearena.com/news/ Average-phone-screen-size-resolution-storage-RAM-report-AnTuTu_id106725, July 2018. [15] Mian Dong and Lin Zhong. Chameleon: A Color-Adaptive Web Browser for Mobile OLED Displays. In Proceedings of the 9th International Conference on Mobile Systems, Applica- tions, and Services, MobiSys '11, pages 85{98, New York, NY, USA, 2011. ACM. [16] Mian Dong and Lin Zhong. Chameleon: A Color-Adaptive Web Browser for Mobile OLED Displays. IEEE Transactions on Mobile Computing, 11(5):724{738, 2012. [17] Yeongju Lee and Minseok Song. Adaptive Color Selection to Limit Power Consumption for Multi-Object GUI Applications in OLED-Based Mobile Devices. Energies, 13(10), 2020. [18] Mian Dong and Lin Zhong. Power Modeling and Optimization for OLED Displays. IEEE Transactions on Mobile Computing, 11(9):1587{1599, Sept 2012. [19] Mian Wan, Negarsadat Abolhassani, Ali Alotaibi, and William G. J. Halfond. An Empir- ical Study of UI Implementations in Android Applications. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 65{75, Sep. 2019. [20] Shauvik Roy Choudhary, Husayn Versee, and Alessandro Orso. WEBDIFF: Automated Identication of Cross-Browser Issues in Web Applications. In 2010 IEEE International Conference on Software Maintenance, pages 1{10, Sep. 2010. [21] Shauvik Roy Choudhary, Mukul R. Prasad, and Alessandro Orso. CROSSCHECK: Com- bining Crawling and Dierencing to Better Detect Cross-Browser Incompatibilities in Web Applications. In 2012 IEEE Fifth International Conference on Software Testing, Verica- tion and Validation, pages 171{180, April 2012. [22] Shauvik Roy Choudhary, Mukul R. Prasad, and Alessandro Orso. X-PERT: Accurate Iden- tication of Cross-Browser Issues in Web Applications. In Proceedings of the 2013 Interna- tional Conference on Software Engineering, ICSE '13, pages 702{711, Piscataway, NJ, USA, 2013. IEEE Press. [23] Mian Wan, Yuchen Jin, Ding Li, and William G. J. Halfond. Detecting Display Energy Hotspots in Android Apps. In 2015 IEEE 8th International Conference on Software Testing, Verication and Validation (ICST), pages 1{10, April 2015. [24] Mian Wan, Yuchen Jin, Ding Li, Jiaping Gui, Sonal Mahajan, and William G. J. Halfond. Detecting Display Energy Hotspots in Android Apps. Software Testing, Verication and Reliability, 27(6):e1635, 2017. e1635 stvr.1635. 100 [25] Mian Wan, Ali Alotaibi, Paul T. Chiou, and William G. J. Halfond. Automated Optimiza- tion of the Display Energy for Android Apps. In Submission. [26] Cagri Sahin, Furkan Cayci, Irene Lizeth Manotas Guti errez, James Clause, Fouad Kiamilev, Lori Pollock, and Kristina Winbladh. Initial Explorations on Design Pattern Energy Usage. In Proceedings of the First International Workshop on Green and Sustainable Software, GREENS '12, pages 55{61, Piscataway, NJ, USA, 2012. IEEE Press. [27] Irene Manotas, Lori Pollock, and James Clause. SEEDS: A Software Engineer's Energy- Optimization Decision Support Framework. In Proceedings of the 36th International Con- ference on Software Engineering, ICSE 2014, pages 503{514, New York, NY, USA, 2014. ACM. [28] Ding Li, Shuai Hao, William G. J. Halfond, and Ramesh Govindan. Calculating Source Line Level Energy Information for Android Applications. In Proceedings of the 2013 International Symposium on Software Testing and Analysis, ISSTA 2013, pages 78{89, New York, NY, USA, 2013. ACM. [29] Shuai Hao, Ding Li, William G. J. Halfond, and Ramesh Govindan. Estimating Mobile Application Energy Consumption Using Program Analysis. In Proceedings of the 2013 International Conference on Software Engineering, ICSE '13, pages 92{101, Piscataway, NJ, USA, 2013. IEEE Press. [30] Israel J. Mojica Ruiz, Meiyappan Nagappan, Bram Adams, Thorsten Berger, Steen Dienst, and Ahmed E. Hassan. Impact of Ad Libraries on Ratings of Android Mobile Apps. IEEE Software, 31(6):86{92, Nov 2014. [31] Shuai Hao, Bin Liu, Suman Nath, William G.J. Halfond, and Ramesh Govindan. PUMA: Programmable UI-Automation for Large-Scale Dynamic Analysis of Mobile Apps. In Pro- ceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys '14, pages 204{217, New York, NY, USA, 2014. ACM. [32] Lorenzo Gomez, Iulian Neamtiu, Tanzirul Azim, and Todd Millstein. RERAN: Timing- and Touch-Sensitive Record and Replay for Android. In Proceedings of the 2013 International Conference on Software Engineering, ICSE '13, pages 72{81, Piscataway, NJ, USA, 2013. IEEE Press. [33] Recognized Color Keyword Names. http://www.w3.org/TR/SVG/types.html. [34] Inc. Monsoon Solutions. Power Monitor. https://www.msoon.com/LabEquipment/ PowerMonitor/. [35] DEP Coecients. https://sites.google.com/site/dlensproject/dep-coefficients. [36] A Tool for Reverse Engineering Android Apk Files. https://ibotpeaches.github.io/ Apktool/. [37] dex2jar. https://github.com/pxb1988/dex2jar. [38] ASM. http://asm.ow2.org/. [39] Android Screenshots and Screen Capture. http://sourceforge.net/projects/ashot/. [40] Google Inc. Android Debug Bridge. http://developer.android.com/tools/help/adb. html. 101 [41] Mian Dong, Yung-Seok Kevin Choi, and Lin Zhong. Power Modeling of Graphical User Interfaces on OLED Displays. In Proceedings of the 46th Annual Design Automation Con- ference, DAC '09, pages 652{657, New York, NY, USA, 2009. ACM. [42] DLens. https://sites.google.com/site/dlensproject/. [43] Google LLC. UI/Application Exerciser Monkey. https://developer.android.com/ studio/test/monkey, . [44] Aravind Machiry, Rohan Tahiliani, and Mayur Naik. Dynodroid: An Input Generation System for Android Apps. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pages 224{234, New York, NY, USA, 2013. ACM. [45] Kevin Moran, Cody Watson, John Hoskins, George Purnell, and Denys Poshyvanyk. De- tecting and Summarizing GUI Changes in Evolving Mobile Apps. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, pages 543{553, New York, NY, USA, 2018. ACM. [46] Raja Vall ee-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. Soot: A Java Bytecode Optimization Framework. In CASCON First Decade High Impact Papers, CASCON '10, pages 214{224, Riverton, NJ, USA, 2010. IBM Corp. [47] Google LLC. UI Automator. https://developer.android.com/training/testing/ ui-automator, . [48] skylot. JADX - Dex to Java Decompiler. https://github.com/skylot/jadx. [49] Ziang Ma, Haoyu Wang, Yao Guo, and Xiangqun Chen. LibRadar: Fast and Accurate Detection of Third-Party Libraries in Android Apps. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C), pages 653{656, May 2016. [50] Michael Backes, Sven Bugiel, and Erik Derr. Reliable Third-Party Library Detection in Android and Its Security Applications. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS '16, pages 356{367, New York, NY, USA, 2016. ACM. [51] Shauvik Roy Choudhary, Alessandra Gorla, and Alessandro Orso. Automated Test Input Generation for Android: Are We There Yet? (E). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 429{440, Nov 2015. [52] Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst., 9(3):319{349, July 1987. [53] Statista. Distribution of Free and Paid Android Apps in the Google Play Store From 3rd Quarter 2017 to 1st Quarter 2018. https://www.statista.com/statistics/266211/ distribution-of-free-and-paid-android-apps/. [54] Phillip Stanley-Marbell, Virginia Estellers, and Martin Rinard. Crayon: Saving Power Through Shape and Color Approximation on Next-Generation Displays. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys '16, New York, NY, USA, 2016. Association for Computing Machinery. [55] Seikwon Kim, Shinyeong Hyun, Taekyung Heo, Daegil Im, and Jaehyuk Huh. Blind: Power Saving Color Transform Method for OLED Displays. In 2016 IEEE International Confer- ence on Consumer Electronics (ICCE), pages 500{501, 2016. 102 [56] Tedis Agolli, Lori Pollock, and James Clause. Investigating Decreasing Energy Usage in Mobile Apps via Indistinguishable Color Changes. In 2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft), pages 30{34, 2017. [57] Atanas Rountev and Dacong Yan. Static Reference Analysis for GUI Objects in Android Software. In Proceedings of Annual IEEE/ACM International Symposium on Code Gener- ation and Optimization, CGO '14, page 143{153, New York, NY, USA, 2014. Association for Computing Machinery. [58] Dacong Yan. Program Analyses for Understanding the Behavior and Performance of Tradi- tional and Mobile Object-Oriented Software. PhD thesis, Ohio State University, July 2014. [59] Konstantin Kuznetsov, Vitalii Avdiienko, Alessandra Gorla, and Andreas Zeller. Analyzing the User Interface of Android Apps. In Proceedings of the 5th International Conference on Mobile Software Engineering and Systems, MOBILESoft '18, pages 84{87, New York, NY, USA, 2018. ACM. [60] Sen Chen, Lingling Fan, Chunyang Chen, Ting Su, Wenhe Li, Yang Liu, and Lihua Xu. StoryDroid: Automated Generation of Storyboard for Android Apps. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pages 596{607, 2019. [61] David J. Pearce, Paul H.J. Kelly, and Chris Hankin. Ecient Field-Sensitive Pointer Anal- ysis of C. ACM Trans. Program. Lang. Syst., 30(1), November 2007. [62] Google. Styles and Themes. https://developer.android.com/guide/topics/ui/ look-and-feel/themes.html. [63] Sonal Mahajan and William G.J. Halfond. Finding HTML Presentation Failures Using Image Comparison Techniques. In Proceedings of the 29th ACM/IEEE International Con- ference on Automated Software Engineering, ASE '14, page 91{96, New York, NY, USA, 2014. Association for Computing Machinery. [64] Sonal Mahajan and William G. J. Halfond. Detection and Localization of HTML Presen- tation Failures Using Computer Vision-Based Techniques. In 2015 IEEE 8th International Conference on Software Testing, Verication and Validation (ICST), pages 1{10, April 2015. [65] Sonal Mahajan, Bailan Li, Pooyan Behnamghader, and William G. J. Halfond. Using Visual Symptoms for Debugging Presentation Failures in Web Applications. In 2016 IEEE International Conference on Software Testing, Verication and Validation (ICST), pages 191{201, 2016. [66] W3C. Contrast Ratio Denition. https://www.w3.org/WAI/ER/WD-AERT/ #color-contrast. [67] M. R. Luo, G. Cui, and B. Rigg. The Development of the CIE 2000 Colour-Dierence Formula: CIEDE2000. Color Research & Application, 26(5):340{350, 2001. [68] Kalyanmoy Deb, Ram Bhushan Agrawal, et al. Simulated Binary Crossover for Continuous Search Space. Complex systems, 9(2):115{148, 1995. [69] M.M. Raghuwanshi1and O.G. Kakde. Survey on Multiobjective Evolutionary and Real Coded Genetic Algorithms. In Proceedings of the 8th Asia Pacic symposium on intelligent and evolutionary systems, pages 150{161, 2004. [70] GATOR: Program Analysis Toolkit For Android. http://web.cse.ohio-state.edu/ presto/software/gator/, Aug 2018. 103 [71] Juan J. Durillo and Antonio J. Nebro. JMetal: A Java Framework for Multi-Objective Optimization. Advances in Engineering Software, 42:760{771, 2011. [72] F-Droid Limited. F-Droid, Dec 2019. URL https://f-droid.org/en/. [73] Li Li, Alexandre Bartel, Tegawend e F. Bissyand e, Jacques Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric Bodden, Damien Octeau, and Patrick McDaniel. IccTA: Detecting Inter-Component Privacy Leaks in Android Apps. In Proceedings of the 37th International Conference on Software Engineering - Volume 1, ICSE '15, pages 280{291, Piscataway, NJ, USA, 2015. IEEE Press. [74] Johnson Chuang, Daniel Weiskopf, and Torsten M oller. Energy Aware Color Sets. In Computer Graphics Forum, 2009. [75] Ji Wang, Xiao Lin, and Chris North. GreenVis: Energy-Saving Color Schemes for Sequential Data Visualization on OLED Displays. 2012. [76] Noboru Kamijoh, Tadanobu Inoue, C. Michael Olsen, M. T. Raghunath, and Chandra Narayanaswami. Energy Trade-os in the IBM Wristwatch Computer. In Proceedings of the 5th IEEE International Symposium on Wearable Computers, ISWC '01, Washington, DC, USA, 2001. IEEE Computer Society. [77] Chun-Han Lin, Chih-Kai Kang, and Pi-Cheng Hsiu. Catch Your Attention: Quality- Retaining Power Saving on Mobile OLED Displays. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1{6, 2014. [78] Chun-Han Lin, Chih-Kai Kang, and Pi-Cheng Hsiu. CURA: A Framework for Quality- Retaining Power Saving on Mobile OLED Displays. ACM Trans. Embed. Comput. Syst., 15 (4), August 2016. [79] Deguang Li, Bing Guo, Yan Shen, Junke Li, and Yanhui Huang. Make Image More Energy Ecient for Mobile OLED Displays. In 2016 13th International Conference on Embedded Software and Systems (ICESS), pages 143{147, 2016. [80] Teng-Chang Chang, Sendren Sheng-Dong Xu, and Shun-Feng Su. SSIM-Based Quality-on- Demand Energy-Saving Schemes for OLED Displays. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 46(5):623{635, 2016. [81] Peter Chondro and Shanq-Jang Ruan. Perceptually Hue-Oriented Power-Saving Scheme With Overexposure Corrector for AMOLED Displays. Journal of Display Technology, 12 (8):791{800, 2016. [82] Jeong-Chan Jin, Jae-Hyeok Lee, E. Kim, and Y. Kim. OPT: Optimal Human Visual System- Aware and Power-Saving Color Transformation for Mobile AMOLED Displays. Multimedia Tools and Applications, 77:16699{16720, 2017. [83] Subu Iyer, Lu Luo, Robert Mayo, and Parthasarathy Ranganathan. Energy-Adaptive Dis- play System Designs for Future Mobile Environments. In Proceedings of the 1st International Conference on Mobile Systems, Applications and Services, MobiSys '03, pages 245{258, New York, NY, USA, 2003. ACM. [84] Tan Kiat Wee and Rajesh Krishna Balan. Adaptive Display Power Management for OLED Displays. In Proceedings of the First ACM International Workshop on Mobile Gaming, MobileGames '12, pages 25{30, New York, NY, USA, 2012. ACM. 104 [85] Kiat Wee Tan, Tadashi Okoshi, Archan Misra, and Rajesh Krishna Balan. FOCUS: A Usable & Eective Approach to OLED Display Power Management. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp '13, pages 573{582, New York, NY, USA, 2013. ACM. [86] Haidong Chen, Ji Wang, Weifeng Chen, Huamin Qu, and Wei Chen. An Image-Space Energy-Saving Visualization Scheme for OLED Displays. Computers & Graphics, 2014. [87] Hojun Shim, Naehyuck Chang, and Massoud Pedram. A Backlight Power Management Framework for Battery-Operated Multimedia Systems. IEEE Design Test of Computers, 21(5):388{396, Sept 2004. [88] Ali Iranli and Massoud Pedram. DTM: Dynamic Tone Mapping for Backlight Scaling. In Proceedings. 42nd Design Automation Conference, 2005., pages 612{616, June 2005. [89] Ali Iranli, Hanif Fatemi, and Massoud Pedram. HEBS: Histogram Equalization for Backlight Scaling. In Proceedings of the Conference on Design, Automation and Test in Europe - Volume 1, DATE '05, pages 346{351, Washington, DC, USA, 2005. IEEE Computer Society. [90] Achintya K. Bhowmik and Robert J. Brennan. System-Level Display Power Reduction Technologies for Portable Computing and Communications Devices. In Portable Informa- tion Devices, 2007. PORTABLE07. IEEE International Conference on, pages 1{5, May 2007. [91] Abhinav Pathak, Y. Charlie Hu, and Ming Zhang. Bootstrapping Energy Debugging on Smartphones: A First Look at Energy Bugs in Mobile Devices. In Proceedings of the 10th ACM Workshop on Hot Topics in Networks, HotNets-X, pages 5:1{5:6, New York, NY, USA, 2011. ACM. [92] Lide Zhang, Mark S. Gordon, Robert P. Dick, Z. Morley Mao, Peter Dinda, and Lei Yang. ADEL: An Automatic Detector of Energy Leaks for Smartphone Applications. In Pro- ceedings of the Eighth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS '12, pages 363{372, New York, NY, USA, 2012. ACM. [93] Adam J. Oliner, Anand Iyer, Eemil Lagerspetz, Sasu Tarkoma, and Ion Stoica. Collabo- rative Energy Debugging for Mobile Devices. In Proceedings of the Eighth USENIX Con- ference on Hot Topics in System Dependability, HotDep'12, page 6, USA, 2012. USENIX Association. [94] Chaorong Guo, Jian Zhang, Jun Yan, Zhiqiang Zhang, and Yanli Zhang. Characterizing and Detecting Resource Leaks in Android Applications. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 389{398, 2013. [95] Tianyong Wu, Jierui Liu, Zhenbo Xu, Chaorong Guo, Yanli Zhang, Jun Yan, and Jian Zhang. Light-Weight, Inter-Procedural and Callback-Aware Resource Leak Detection for Android Apps. IEEE Transactions on Software Engineering, 42(11):1054{1076, 2016. [96] Dacong Yan, Shengqian Yang, and Atanas Rountev. Systematic Testing for Resource Leaks in Android Applications. In 2013 IEEE 24th International Symposium on Software Relia- bility Engineering (ISSRE), pages 411{420, 2013. [97] Yepang Liu, Chang Xu, and S. C. Cheung. Where Has My Battery Gone? Finding Sen- sor Related Energy Black Holes in Smartphone Applications. In 2013 IEEE International Conference on Pervasive Computing and Communications (PerCom), pages 2{10, 2013. 105 [98] Yepang Liu, Chang Xu, S. C. Cheung, and Jian L u. GreenDroid: Automated Diagno- sis of Energy Ineciency for Smartphone Applications. IEEE Transactions on Software Engineering, 40(9):911{940, Sept 2014. [99] Mario Linares-V asquez, Gabriele Bavota, Carlos Bernal-C ardenas, Rocco Oliveto, Massi- miliano Di Penta, and Denys Poshyvanyk. Mining Energy-Greedy API Usage Patterns in Android Apps: An Empirical Study. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pages 2{11, New York, NY, USA, 2014. ACM. [100] Abhijeet Banerjee, Lee Kee Chong, Sudipta Chattopadhyay, and Abhik Roychoudhury. Detecting Energy Bugs and Hotspots in Mobile Apps. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, page 588{598, New York, NY, USA, 2014. Association for Computing Machinery. [101] Jiaping Gui, Stuart Mcilroy, Meiyappan Nagappan, and William G. J. Halfond. Truth in Advertising: The Hidden Cost of Mobile Ads for Software Developers. In Proceedings of the 37th International Conference on Software Engineering - Volume 1, ICSE '15, pages 100{110, Piscataway, NJ, USA, 2015. IEEE Press. [102] Xiao Ma, Peng Huang, Xinxin Jin, Pei Wang, Soyeon Park, Dongcai Shen, Yuanyuan Zhou, Lawrence K. Saul, and Georey M. Voelker. EDoctor: Automatically Diagnosing Abnormal Battery Drain Issues on Smartphones. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, nsdi'13, page 57{70, USA, 2013. USENIX Association. [103] Yepang Liu, Chang Xu, Shing-Chi Cheung, and Valerio Terragni. Understanding and De- tecting Wake Lock Misuses for Android Applications. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, pages 396{409, New York, NY, USA, 2016. ACM. [104] Hailong Zhang, Haowei Wu, and Atanas Rountev. Automated Test Generation for Detection of Leaks in Android Applications. In Proceedings of the 11th International Workshop on Automation of Software Test, AST '16, page 64{70, New York, NY, USA, 2016. Association for Computing Machinery. [105] Ding Li, Yingjun Lyu, Jiaping Gui, and William G. J. Halfond. Automated Energy Opti- mization of HTTP Requests for Mobile Applications. In Proceedings of the 38th Interna- tional Conference on Software Engineering, ICSE '16, pages 249{260, New York, NY, USA, 2016. ACM. [106] Hao Jiang, Hongli Yang, Shengchao Qin, Zhendong Su, Jian Zhang, and Jun Yan. Detecting Energy Bugs in Android Apps Using Static Analysis. In Formal Methods and Software Engineering, pages 192{208, Cham, 2017. Springer International Publishing. [107] Yingjun Lyu, Ding Li, and William G. J. Halfond. Remove RATs From Your Code: Au- tomated Optimization of Resource Inecient Database Writes for Mobile Applications. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, pages 310{321, New York, NY, USA, 2018. ACM. [108] Abdul Muqtadir Abbasi, Mustafa Al-Tekreeti, Kshirasagar Naik, Amiya Nayak, Pradeep Srivastava, and Marzia Zaman. Characterization and Detection of Tail Energy Bugs in Smartphones. IEEE Access, 6:65098{65108, 2018. [109] Dongwon Kim, Wonwoo Jung, and Hojung Cha. Runtime Power Estimation of Mobile AMOLED Displays. In Design, Automation Test in Europe Conference Exhibition (DATE), 2013, pages 61{64, March 2013. 106 [110] Radhika Mittal, Aman Kansal, and Ranveer Chandra. Empowering Developers to Estimate App Energy Consumption. In Proceedings of the 18th Annual International Conference on Mobile Computing and Networking, Mobicom '12, pages 317{328, New York, NY, USA, 2012. ACM. [111] Alex Shye, Benjamin Scholbrock, and Gokhan Memik. Into the Wild: Studying Real User Activity Patterns to Guide Power Optimizations for Mobile Architectures. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pages 168{178, New York, NY, USA, 2009. ACM. [112] Luca Negri, Domenico Barretta, and William Fornaciari. Application-Level Power Manage- ment in Pervasive Computing Systems: A Case Study. In Proceedings of the 1st Conference on Computing Frontiers, CF '04, pages 78{88, New York, NY, USA, 2004. ACM. [113] Lide Zhang, Birjodh Tiwana, Robert P. Dick, Zhiyun Qian, Z. Morley Mao, Zhaoguang Wang, and Lei Yang. Accurate Online Power Estimation and Automatic Battery Behavior Based Power Model Generation for Smartphones. In Hardware/Software Codesign and Sys- tem Synthesis (CODES+ISSS), 2010 IEEE/ACM/IFIP International Conference on, pages 105{114, Oct 2010. [114] Mian Dong and Lin Zhong. Self-Constructive High-Rate System Energy Modeling for Battery-Powered Mobile Systems. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services, MobiSys '11, pages 335{348, New York, NY, USA, 2011. ACM. [115] Yu Xiao, Rijubrata Bhaumik, Zhirong Yang, Matti Siekkinen, Petri Savolainen, and Antti Yl a-J a aski. A System-Level Model for Runtime Power Estimation on Mobile Devices. In Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int'l Conference on Int'l Conference on Cyber, Physical and Social Computing (CPSCom), pages 27{34, Dec 2010. [116] Shuai Hao, Ding Li, William G. J. Halfond, and Ramesh Govindan. Estimating Android Applications' CPU Energy Usage via Bytecode Proling. In Proceedings of the First Interna- tional Workshop on Green and Sustainable Software, GREENS '12, pages 1{7, Piscataway, NJ, USA, 2012. IEEE Press. [117] Vivek Tiwari, Sharad Malik, and Andrew Wolfe. Power Analysis of Embedded Software: A First Step Towards Software Power Minimization. pages 384{390, 1994. [118] Vivek Tiwari, Sharad Malik, Andrew Wolfe, and Mike Tien-Chien Lee. Instruction Level Power Analysis and Optimization of Software. In VLSI Design, 1996. Proceedings., Ninth International Conference on, pages 326{328, Jan 1996. [119] Abhinav Pathak, Y. Charlie Hu, and Ming Zhang. Where Is the Energy Spent Inside My App?: Fine Grained Energy Accounting on Smartphones With Eprof. In Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys '12, pages 29{42, New York, NY, USA, 2012. ACM. [120] Chengke Wang, Fengrun Yan, Yao Guo, and Xiangqun Chen. Power Estimation for Mobile Applications with Prole-Driven Battery Traces. In Low Power Electronics and Design (ISLPED), 2013 IEEE International Symposium on, pages 120{125, Sept 2013. [121] Youhuizi Li, Hui Chen, and Weisong Shi. Power Behavior Analysis of Mobile Applications Using Bugu. Sustainable Computing: Informatics and Systems, 2014. 107 [122] Shiao-Li Tsao, Cheng-Kun Yu, and Yi-Hsin Chang. Architecture of Computing Systems { ARCS 2013: 26th International Conference, Prague, Czech Republic, February 19-22, 2013. Proceedings, chapter Proling Energy Consumption of I/O Functions in Embedded Applications, pages 195{206. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013. [123] Tommi Takala, Mika Katara, and Julian Harty. Experiences of System-Level Model-Based GUI Testing of an Android Application. In 2011 Fourth IEEE International Conference on Software Testing, Verication and Validation, pages 377{386, March 2011. [124] Domenico Amaltano, Anna Rita Fasolino, and Porrio Tramontana. A GUI Crawling- Based Technique for Android Mobile Application Testing. In Software Testing, Verication and Validation Workshops (ICSTW), 2011 IEEE Fourth International Conference on, pages 252{261, March 2011. [125] Domenico Amaltano, Anna Rita Fasolino, Porrio Tramontana, Salvatore De Carmine, and Atif M. Memon. Using GUI Ripping for Automated Testing of Android Applications. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012, pages 258{261, New York, NY, USA, 2012. ACM. [126] Domenico Amaltano, Anna Rita Fasolino, Porrio Tramontana, Bryan Dzung Ta, and Atif M. Memon. MobiGUITAR: Automated Model-Based Testing of Mobile Apps. IEEE Software, 32(5):53{59, Sept 2015. [127] Wontae Choi, George Necula, and Koushik Sen. Guided GUI Testing of Android Apps With Minimal Restart and Approximate Learning. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Ap- plications, OOPSLA '13, pages 623{640, New York, NY, USA, 2013. ACM. [128] Yuzhong Cao, Guoquan Wu, Wei Chen, and Jun Wei. CrawlDroid: Eective Model-Based GUI Testing of Android Apps. In Proceedings of the Tenth Asia-Pacic Symposium on Internetware, Internetware '18, pages 19:1{19:6, New York, NY, USA, 2018. ACM. [129] Tianxiao Gu, Chengnian Sun, Xiaoxing Ma, Chun Cao, Chang Xu, Yuan Yao, Qirun Zhang, Jian Lu, and Zhendong Su. Practical GUI Testing of Android Applications via Model Ab- straction and Renement. In Proceedings of the 41st International Conference on Software Engineering, ICSE '19, page 269{280. IEEE Press, 2019. [130] Zhen Dong, Marcel B ohme, Lucia Cojocaru, and Abhik Roychoudhury. Time-Travel Testing of Android Apps. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ICSE '20, page 481{492, New York, NY, USA, 2020. Association for Computing Machinery. [131] Tanzirul Azim and Iulian Neamtiu. Targeted and Depth-First Exploration for Systematic Testing of Android Apps. In Proceedings of the 2013 ACM SIGPLAN International Con- ference on Object Oriented Programming Systems Languages & Applications, OOPSLA '13, pages 641{660, New York, NY, USA, 2013. ACM. [132] Shengqian Yang, Hailong Zhang, Haowei Wu, Yan Wang, Dacong Yan, and Atanas Rountev. Static Window Transition Graphs for Android. In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), ASE '15, pages 658{ 668, Washington, DC, USA, 2015. IEEE Computer Society. [133] Shengqian Yang, Haowei Wu, Hailong Zhang, Yan Wang, Chandrasekar Swaminathan, Da- cong Yan, and Atanas Rountev. Static Window Transition Graphs for Android. Interna- tional Journal of Automated Software Engineering, 25(4):833{873, December 2018. 108 [134] Nariman Mirzaei, Joshua Garcia, Hamid Bagheri, Alireza Sadeghi, and Sam Malek. Re- ducing Combinatorics in GUI Testing of Android Applications. In Proceedings of the 38th International Conference on Software Engineering, ICSE '16, pages 559{570, New York, NY, USA, 2016. ACM. [135] Yifei Zhang, Yulei Sui, and Jingling Xue. Launch-Mode-Aware Context-Sensitive Activity Transition Analysis. In Proceedings of the 40th International Conference on Software En- gineering, ICSE '18, page 598{608, New York, NY, USA, 2018. Association for Computing Machinery. [136] Jia Chen, Ge Han, Shanqing Guo, and Wenrui Diao. FragDroid: Automated User Interface Interaction With Activity and Fragment Analysis in Android Applications. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 398{409, June 2018. [137] An Huang, Minxue Pan, Tian Zhang, and Xuandong Li. Static Extraction of IFML Models for Android Apps. In Proceedings of the 21st ACM/IEEE International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings, MODELS '18, pages 53{54, New York, NY, USA, 2018. ACM. [138] Duling Lai and Julia Rubin. Goal-Driven Exploration for Android Applications. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 115{127, 2019. [139] Jiwei Yan, Hao Liu, Linjie Pan, Jun Yan, Jian Zhang, and Bin Liang. Multiple-Entry Testing of Android Applications by Constructing Activity Launching Contexts. In Proceedings of the 42nd International Conference on Software Engineering, ICSE '20, New York, NY, USA, 2020. Association for Computing Machinery. [140] Cong Zheng, Shixiong Zhu, Shuaifu Dai, Guofei Gu, Xiaorui Gong, Xinhui Han, and Wei Zou. SmartDroid: An Automatic System for Revealing UI-Based Trigger Conditions in Android Applications. In Proceedings of the Second ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, SPSM '12, pages 93{104, New York, NY, USA, 2012. ACM. [141] Wei Yang, Mukul R. Prasad, and Tao Xie. A Grey-Box Approach for Automated GUI-Model Generation of Mobile Applications, pages 250{265. Springer Berlin Heidelberg, Berlin, Hei- delberg, 2013. [142] Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu, Yang Liu, and Zhendong Su. Guided, Stochastic Model-Based GUI Testing of Android Apps. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, pages 245{256, New York, NY, USA, 2017. ACM. [143] Farnaz Behrang, Steven P. Reiss, and Alessandro Orso. GUIfetch: Supporting App Design and Development Through GUI Search. In Proceedings of the 5th International Conference on Mobile Software Engineering and Systems, MOBILESoft '18, page 236{246, New York, NY, USA, 2018. Association for Computing Machinery. [144] Santiago Li~ n an, Laura Bello-Jim enez, Mar a Ar evalo, and Mario Linares-V asquez. Auto- mated Extraction of Augmented Models for Android Apps. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 549{553, 2018. [145] Xusheng Xiao, Xiaoyin Wang, Zhihao Cao, Hanlin Wang, and Peng Gao. IconIntent: Auto- matic Identication of Sensitive UI Widgets Based on Icon Classication for Android Apps. 109 In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pages 257{268, 2019. 110
Abstract (if available)
Abstract
Mobile apps and smartphones play an essential role in our daily life, and the energy consumption of an app has become an important concern for its developers. Given the fact that an app’s display energy consumption can be optimized at the software level, many techniques have been proposed to help optimize the apps’ display energy on OLED screens. However, there are no automated techniques for detecting and repairing energy optimizable User Interfaces (UIs) in Android apps. Instead, for detection, the developers can only manually examine each UI’s colors and determine which UIs are optimizable based on their intuition. As for repairing, the developers need to manually analyze the app to modify the color settings to recolor the UIs. ❧ In this dissertation, I aim to overcome the above challenges and limitations by using program analysis based techniques to automate the process of detecting and repairing energy optimizable UIs in mobile apps. To achieve the goal, my dissertation can be divided into three main components. First of all, I developed dLens, the first program analysis based technique using dynamic analysis, power modeling, and color transformation to automate the detection of energy optimizable UIs. This technique could estimate the power precisely and rank the optimizable UIs accurately based on their optimization potentials. Second, I conducted an empirical study to explore what are the new trends of code practice of app developers and whether these new trends can cause problems for existing program analysis techniques. This study provided empirical evidence on several important code practice trends to guide the design of future UI analysis techniques. At last, I devised AIRES, the first hybrid program analysis based technique using static and dynamic analyses, a search-based technique, and the app rewriting technique to automate the repair of energy optimizable UIs. AIRES could reduce the display energy of mobile UIs with significant savings and wide user acceptance. In addition to the above contributions, I discuss the lessons learned when constructing these techniques, and point out the future work that can be inspired by my dissertation.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Detecting SQL antipatterns in mobile applications
PDF
Energy optimization of mobile applications
PDF
Automated repair of presentation failures in Web applications using search-based techniques
PDF
Detection, localization, and repair of internationalization presentation failures in web applications
PDF
Utilizing user feedback to assist software developers to better use mobile ads in apps
PDF
Automated repair of layout accessibility issues in mobile applications
PDF
Side-channel security enabled by program analysis and synthesis
PDF
Toward understanding mobile apps at scale
PDF
Reducing user-perceived latency in mobile applications via prefetching and caching
PDF
Constraint-based program analysis for concurrent software
PDF
Detecting anomalies in event-based systems through static analysis
PDF
Static program analyses for WebAssembly
PDF
Techniques for methodically exploring software development alternatives
PDF
Improving efficiency, privacy and robustness for crowd‐sensing applications
PDF
A joint framework of design, control, and applications of energy generation and energy storage systems
PDF
Reducing inter-component communication vulnerabilities in event-based systems
PDF
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
PDF
Analysis of embedded software architecture with precedent dependent aperiodic tasks
PDF
Differential verification of deep neural networks
PDF
Studying malware behavior safely and efficiently
Asset Metadata
Creator
Wan, Mian
(author)
Core Title
Automatic detection and optimization of energy optimizable UIs in Android applications using program analysis
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2021-12
Publication Date
09/13/2021
Defense Date
05/14/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
display,dynamic analysis,Energy,mobile applications,OAI-PMH Harvest,optimization,Power,static analysis,user interface
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Halfond, William (
committee chair
), Deshmukh, Jyotirmoy (
committee member
), Gupta, Sandeep (
committee member
), Medvidovic, Nenad (
committee member
), Wang, Chao (
committee member
)
Creator Email
crownwan@gmail.com,mianwan@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC15916710
Unique identifier
UC15916710
Legacy Identifier
etd-WanMian-10055
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Wan, Mian
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
dynamic analysis
mobile applications
optimization
static analysis
user interface