Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A framework for runtime energy efficient mobile execution
(USC Thesis Other)
A framework for runtime energy efficient mobile execution
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A FRAMEWORK FOR RUNTIME ENERGY EFFICIENT MOBILE EXECUTION by Sangwon Lee A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2016 Copyright 2016 Sangwon Lee Abstract As mobile applications demand more computing power the limited battery energy on mobile devices will become an impediment to meet application demands. In this sce- nario computation offloading on mobile devices is emerging as one possible energy savings approach. With computational offloading some of the method calls from the mobile application will be executed on a cloud server and the results are integrated back in to the mobile application. For offloading to be useful any offloading approach must provide seamless movement of computation between the cloud and the mobile device and must be transparent to the application developer and end user. When a method is remotely executed on a cloud server it is necessary to transfer the required input data to the server. Hence, offloading trades off computing energy with communication energy. Existing approaches to offloading require significant amount of method call state to be transferred to the remote server. Furthermore, existing approaches are unable to handle the increasingly popular native method calls that are embedded in most mobile applications. This thesis addressees these drawbacks. We present a framework named FREEME for efficient offloading of computations to a remote server. At the first, we motivate the need for this computational offloading by presenting a comprehensive quantification of a mobile phone’s energy consumption using an in-filed deployed Wire- less Body Are Networks (WBAN) called KNOWME. We quantify the energy impact ii of different programming paradigms, sensing modalities, data storage, and conflicting computation and communication demands. Based on the knowledge gained from the measurement studies, we propose an Active Energy Profiling strategy that uses short profiling periods to automatically determine the most energy efficient choices for run- ning a WBAN. Driven by this motivational data, we propose a novel static analyzer to identify offloadable methods from a legacy Android application. The proposed ana- lyzer automatically analyzes the Java class methods and user-defined native methods in Android applications to identify target methods for remote execution. FREEME’s static analysis identifies minimum set of data elements that are necessary for remote execu- tion thereby shrinking the size of data transferred to the server. The server also optimizes the amount of data it sends back to the mobile phone by eliminating data transfers of unmodified data. FREEME implements these approaches within the Android framework by developing novel static analysis and object serialization approaches. We evaluated FREEME on Android phones and show that significant energy and latency reductions can be achieved with FREEME. iii To my family.. iv Acknowledgements First and foremost, I would like to express my respectful gratitude to my advisor Prof. Murali Annavaram for his continuous support of my PhD study. Without his patience, motivation, and immense knowledge, I could not finish this journey. I would like to thank Prof. Bhaskar Krishnamachari who is my co-advisor and led me to this journey of study. I can still clearly remember one day in summer 2007. It was the first day I met him for the GBR project that I was involved as a student volunteer. He said “Are you Sang?” and it changed my last six years entirely. I would like to thank my committee member, Prof. Aiichiro Nakano for his comment and advice that helped enrich my dissertation. I also grateful to my undergraduate advisor, Prof. Sunghwan Cho, who showed me the exciting world of computer science when I had very little software knowledge. Without his support, I could not begin this journey. I cannot thank enough to my life-long friend in Korea, Dongnam Kang. Interestingly, with him, I encountered so many lucky chances that had been helpful to shape my cur- rent career path. I also want to thank my friends and supporters in Korea: Jihae, Eunjin, Seokwhwan, and Jinhwan. v Many thanks are also due to the members of the SCIP and ANRG: Sabya, Kumar, Daniel, Qiumin, Gunjae, Mehrtash, Mohammad, Waleed, Krishna, Bardia, Melina, Abdulaziz, Suvil, Parisa, Nachikethas, Keyvan, Quynh, Griffey, Kwame, Shagxing, Pradipta, Jason, Mahesh, Yi Gai, Ying, Majed, Marjan, Yi Wang, Scott, Hua, Amitabh, Sundeep, Pai-Han, and Avinash. I also thank Korean friends in USC: Simon, Dukhee, Yoonsik, Deahyung, Joongheon, Eunsung, Dr. Son, and Dr. Ahn. Chong Lim Lee and Hong Ja Kang who are aunt and uncle of my best friend Dongnam gave me family-like love that helped me to settle down in USA easily. I owe my deepest gratitude to my family: parents, brother, sister and my family-in-law. Last but not least, I give endless thanks to my best supporter, Hyeran Jeon who is my lovely wife. The greatest piece of luck in my life was meeting her when I was working in Korea. I still cannot believe how I was lucky to meet her again in the same school in USA, study together under the same advisor, and graduate together on the same day! What are the chances! Jealous, right? vi Table of Contents Abstract ii Dedication iv Acknowledgements v List of Figures ix List of Tables xi 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 A Brief Introduction to Computational Offloading . . . . . . . . . . . . 6 1.2.1 Thread Migration . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 Method Migration . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Energy Impact of Design Choices 11 2.1 KNOWME Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Impact of Software Platform on Energy Consumption . . . . . . . . . . 14 2.3 Sensitivity to Hardware Platform . . . . . . . . . . . . . . . . . . . . . 17 2.4 Energy Consumption of the Sensors . . . . . . . . . . . . . . . . . . . 18 2.5 Energy Consumption of GPS . . . . . . . . . . . . . . . . . . . . . . . 19 2.6 Storage Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.7 Compute vs. Communicate . . . . . . . . . . . . . . . . . . . . . . . . 23 2.8 Optimal Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.9 Active Energy Profiling Method . . . . . . . . . . . . . . . . . . . . . 28 2.10 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3 Characteristics of Android Applications 38 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 vii 3.1.1 Android OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.1.2 Android Application Package . . . . . . . . . . . . . . . . . . 42 3.2 Comprehensive Static Analysis for Legacy Android Application . . . . 44 3.2.1 Call Graph Construction . . . . . . . . . . . . . . . . . . . . . 45 3.3 Dynamic Analysis Strategy . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.1 Trace Method Calls . . . . . . . . . . . . . . . . . . . . . . . . 59 3.3.2 Trace Native Method Calls . . . . . . . . . . . . . . . . . . . . 60 3.3.3 Input Sources for Dynamic Profiling . . . . . . . . . . . . . . . 62 3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.4.1 Fundamental App Data . . . . . . . . . . . . . . . . . . . . . . 63 3.4.2 Sample Applications . . . . . . . . . . . . . . . . . . . . . . . 66 3.4.3 Call graph Construction . . . . . . . . . . . . . . . . . . . . . 66 3.4.4 Analysis of Native Method Call . . . . . . . . . . . . . . . . . 70 3.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4 Framework for Runtime Energy Efficient MobileApp Execution 73 4.1 FREEME System Architecture . . . . . . . . . . . . . . . . . . . . . . 74 4.1.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.1.2 FREEME manager . . . . . . . . . . . . . . . . . . . . . . . . 76 4.1.3 FREEME Package analyzer . . . . . . . . . . . . . . . . . . . 77 4.1.4 Modifying Dalvik Virtual Machine . . . . . . . . . . . . . . . 82 4.1.5 FREEME Service . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.1.6 Minimizing data transmission size . . . . . . . . . . . . . . . . 89 4.1.7 FREEME Handler . . . . . . . . . . . . . . . . . . . . . . . . 93 4.1.8 Remote Procedure Call Server . . . . . . . . . . . . . . . . . . 93 4.1.9 Putting it All Together . . . . . . . . . . . . . . . . . . . . . . 97 4.1.10 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.2.1 PassMark PerformanceTest Mobile . . . . . . . . . . . . . . . 101 4.2.2 Chess for Android . . . . . . . . . . . . . . . . . . . . . . . . 104 4.2.3 Mobile Health Monitoring App . . . . . . . . . . . . . . . . . 105 4.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5 Conclusion & Summary 110 Reference List 112 viii List of Figures 1.1 Smartphone’s Battery Capacity . . . . . . . . . . . . . . . . . . . . . . 3 1.2 FREEME Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1 An Example 3-Tier WBAN System . . . . . . . . . . . . . . . . . . . 11 2.2 KNOWME Application: (a) The Components of KMCore (b) A Screen- shot of KMClient Running on N95 (c) ECG Signal . . . . . . . . . . . 12 2.3 Execution Time of Three WBAN Functions on Three Programming Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Energy Cost of Positioning Methods . . . . . . . . . . . . . . . . . . . 19 2.5 Battery Drain With Storage . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6 Three Phases of Transmissions . . . . . . . . . . . . . . . . . . . . . . 23 2.7 Data Transmission Costs (Uplink) . . . . . . . . . . . . . . . . . . . . 24 2.8 Local vs Remote Computation of QDA . . . . . . . . . . . . . . . . . 27 2.9 Comparing Energy Saving Method . . . . . . . . . . . . . . . . . . . . 30 2.10 Power usage of various cases . . . . . . . . . . . . . . . . . . . . . . . 32 2.11 Comparison of Energy Consumption (a) Baseline (b) AEP with 3G Only (c) AEP with Two Wi-Fi APs & 3G . . . . . . . . . . . . . . . . . . . 34 3.1 Android’s architecture diagram . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Android Application Package Structure . . . . . . . . . . . . . . . . . 41 3.3 ELF File Format and Sample . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Process of Static Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5 Activity Lifecycle redrawn from [1] . . . . . . . . . . . . . . . . . . . 46 3.6 Detecting JNI Method Calls . . . . . . . . . . . . . . . . . . . . . . . 50 3.7 Abstract memory space . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.8 Initialization sequence . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.9 Example for Z3 Solver: (a) Source code (b) ARM Thumb ASM (c) Flowchart of function test (d) Flowchart of function test with Z3 constrains 55 3.10 Static Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.11 Process of Dynamic Analysis . . . . . . . . . . . . . . . . . . . . . . . 59 3.12 Using ftrace to trace method calls . . . . . . . . . . . . . . . . . . . . 60 3.13 Trace JNI functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 ix 3.14 Categories of Apps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.15 NDK Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.16 Method Coverage with Various number of Input Events . . . . . . . . . 69 3.17 Screenshot of Native Analyzer . . . . . . . . . . . . . . . . . . . . . . 71 4.1 A Logical flow of FREEME . . . . . . . . . . . . . . . . . . . . . . . 74 4.2 FREEME Structure in Android . . . . . . . . . . . . . . . . . . . . . . 76 4.3 Exploring Method Causality . . . . . . . . . . . . . . . . . . . . . . . 77 4.4 Identifying the largest sub-call chains for offloading . . . . . . . . . . . 78 4.5 Inspection for Method Offloading . . . . . . . . . . . . . . . . . . . . 79 4.6 XML formatted Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.7 FREEME Method Passing on Dalvik . . . . . . . . . . . . . . . . . . . 84 4.8 Energy Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.9 Data flow for Remote Execution . . . . . . . . . . . . . . . . . . . . . 93 4.10 Dump and Restore of Native Memory . . . . . . . . . . . . . . . . . . 95 4.11 Local & remote instances in Stub . . . . . . . . . . . . . . . . . . . . . 97 4.12 Energy Consumption For Applications . . . . . . . . . . . . . . . . . . 99 4.13 Screenshots of Applications . . . . . . . . . . . . . . . . . . . . . . . 101 4.14 Energy Consumption Phases for Integer Math . . . . . . . . . . . . . . 104 x List of Tables 2.1 Energy Consumption For Three Processing Functions . . . . . . . . . . 15 2.2 Specification of the Mobile Phones . . . . . . . . . . . . . . . . . . . . 16 2.3 Energy Cost of Sensor Readings . . . . . . . . . . . . . . . . . . . . . 18 2.4 Connection and Tail Energy Costs . . . . . . . . . . . . . . . . . . . . 25 3.1 Selected Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2 Static vs. Dynamic Analysis . . . . . . . . . . . . . . . . . . . . . . . 68 3.3 Native method analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.1 Remote instances in FREEME . . . . . . . . . . . . . . . . . . . . . . 92 4.2 Profiled data for CPU and Wi-Fi Interface . . . . . . . . . . . . . . . . 99 4.3 Results of Static Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.4 Comparison of total transmitted data size for Remote Execution . . . . 102 4.5 Execution Time For CPU Benchmarks . . . . . . . . . . . . . . . . . . 103 4.6 Execution Time for KNOWME App . . . . . . . . . . . . . . . . . . . 106 xi Chapter 1 Introduction 1.1 Motivation The increasing capabilities of smart phones, including wide color screen, powerful audio, anywhere anytime Internet access, smart communication choices, lead people to a new world where these devices morph dynamically from a game machine to a nav- igation device to an e-book reader, and to a music player. Indeed, smartphones are replacing personal computers rapidly. As more components are integrated on a smart- phone it is becoming a platform of choice for creating unique software solutions where sensing, communication and computation are seamlessly merged into a single applica- tion. Even today many smartphone applications rely on sensor data and web services to provide an automated and personalized service, such as context or location based ser- vices. The capabilities of mobile devices will continue to increase with the integration of several computing, communication and sensing functions. Current mobile devices sup- port a number of wireless data communication standards, such as Bluetooth, Near Field Communication (NFC), LTE, 3G, Edge, and Wi-Fi. As the demand for higher data rates continue to outpace the installed capacity new wireless standards for data communica- tion will continue to evolve. Current on-board sensors on mobile phones include digital cameras, GPS, accelerometer and Gyroscope. Additional sensors, such as bio-sensors 1 for heart rate and ECG, may also be integrated into a smartphone using existing wire- less standards. As mobile phones became feature-rich the apps that run on these devices have also evolved to be more computationally intensive. For instance, complex com- putation functions such as face recognition and on-device photo editing have become common in high-end smartphones. In the near future mobile devices will also play a key role in health care improvements by integrating wireless body area sensor networks with mobile phones to continuously monitor user’s health-related factors. These tasks demand even higher compute and communication capabilities than what is provided in today’s mobile platforms. As a programming platform mobile devices have introduced a radically different envi- ronment, compared to traditional desktop computing. By abstracting many of the hard- ware complexities these platforms enable even a novice programmer to quickly churn out mobile apps that serve different user segments. Given the very low entry barrier for mobile application development there are innumerable application developers in the field as evinced by the number of mobile applications in the market place; the iPhone App store and Google Play offer nearly a million apps. These application developers focus primarily on stability and functionality of the application. In particular, the devel- opers are unlikely to consider issues such as the interactions between the applications and the underlying hardware and the resulting impact on energy efficiency. However, adding features to mobile devices is a challenge when each added feature places an incremental burden on the most precious resource, that is mobile phone battery. For instance, in [2] the authors report that a Nokia N95 phone battery has nearly 200 hours of standby time, but when it is used for compute intensive context monitoring by sensing audio, GPS and Wi-Fi access points, the battery drains in 4 hours. As mobile applica- tions become pervasive the limited battery problem will only get worse as applications 2 iPhone 4 (3.5 inch) iPhone 4s (3.5 inch) iPhone 5 (4.0 inch) iPhone 5s (4.0 inch) iPhone 6 (4.7 inch) iPhone 6s (4.7 inch) Galaxy S (4.0 inch) Galaxy S2 (4.3 inch) Galaxy S3 (4.8 inch) Galaxy S4 (5.0 inch) Galaxy S5 (5.1 inch) Galaxy S6 (5.1 inch) 0 500 1000 1500 2000 2500 3000 2010 2011 2012 2013 2014 2015 Battery Capacity (mAh) Figure 1.1: Smartphone’s Battery Capacity demand more computing power. Figure 1.1 shows how smartphone’s battery capacity has grown over the past several years. As can be seen battery capacity does not grow at the same pace as the mobile device capability. Battery capacity grows at a rate of 6-8% per year. One way to increase capacity is to increase the battery size along with display size. But increasing battery size is fraught with various problems. First and fore- most large batteries can have adverse impact on the form factor and customer appeal. Second battery costs run about US$2 per watt-hour (a measure of battery capacity but commonly rated in milliampere-hour at 3.7 volts) which is a substantial cost. Even if larger batteries are made commercially viable increased energy consumption of mobile applications also leads to thermal problems. Since mobile devices are passively cooled their heat extraction capability is also limited. Most mobile devices today come with built-in cloud support. For instance, iPhone uses iCloud support for a variety of functions, such as storing media content, syncing content across devices, and tracking misplaced devices. Current generation Windows phones 3 use Microsoft SkyDrive for achieving similar functionality. In addition to built-in cloud support from device vendors, some gaming applications also use their own proprietary cloud support to enable multi-player online mobile games where large amount of com- putations can be done in the cloud. We anticipate that in the future built-in cloud support will continue to increase and it will be feasible to perform cloud-based computations within any mobile application. This increasing prevalence of cloud support provides new opportunities to improve battery life of a mobile phone. To support computationally intensive tasks within a constrained battery environment, one approach that this thesis explores is efficient computational offloading. Computation offloading essentially allows part of the computation to be performed on a remote node and the results are then integrated back into the application thread on the mobile phone. The concept of computation offloading is already used widely in the real world, such as Simple Object Access Protocol (SOAP), Enterprise JavaBeans(EJB), Remote Procedure Call(RPC), and Remote Method Invocation(RMI) [3–6]. But these prior approaches are targeted toward client-server computing where a desktop or laptop acts as a client that offloads some computation to a server. Computational offloading is either implicitly built into the runtime or explicitly programmed into these client-server applications. As such these approaches are not designed for relatively novice mobile app developers. Even if a mobile app developer wants to adopt computational offloading within their app, there are several challenges, since there are variety options to trade-off computation and communication for improving battery life of a mobile phone. The app designer has a wide range of options to trade-off battery life for other important metrics in today’s mobile devices. During system development phase the designer has a choice of programming platforms for improving software productivity. For example, the 4 designer may use Python, Java or a phone’s native programming model such as Symbian C++ or iPhone SDK to implement the system. The choice of language can trade-off software productivity with potential runtime overheads of managed environments that lead to energy inefficiency. During the data transmission phase, the designer has to make energy trade-off decisions on computation and communication costs. Storing data on flash may allow the mobile phone to compress large chunks of data and send compressed data to a server. Data compression places more compute demands. On the other hand, compression may reduce the transmission energy cost. Local storage of data also allows mobile phone to perform rudimentary computation locally first and only send interesting events to the back-end. For instance, detecting an abnormal heart beat signal from ECG will reduce the need to continuously transmit normal operational data thereby reducing transmission energy. Hence, taking these trade-offs into consideration during application design time is a dif- ficult challenge, since energy efficient operation point of a mobile application itself drifts due to changing external conditions, such as available data transmission networks, size of the input data for a computation. Given the daunting list of trade-offs, it is imperative to provide a mobile application execution framework where the developer is relieved from making energy efficiency trade-off decisions, while the underlying framework implements computation and communication services in an energy efficient manner. In this thesis we propose an application execution platform called FREEME (Framework for Runtime Energy Efficient MobileApp Execution) that acts as intermediary between operating system, such as Android, and application layers as shown in Figure 1.2. 5 Figure 1.2: FREEME Overview 1.2 A Brief Introduction to Computational Offloading Although mobile computational offloading has recently garnered significant attention, it originated in distributed systems built during 1970s using support from local area net- works, such as Ethernet [7]. For instance, the RPC protocol proposed in [5] enables a programmer to invoke a procedure call remotely by sending the necessary input data to the remote node. One example implementation that uses RPC extensively is the SETI project [8], where desktop computers in the world are used for scientific compu- tation during idle time. Box et al. [3] proposed Simple Object Access Protocol(SOAP) based on XML protocol that is used for calling a web service by exchanging messages. Beside above prior study, couple of protocols invented for remote execution for desk- top and server computers, such as EJB and RMI. However, those methods are usually for wall powered computing devices connected over wired networks. Thus they focus on increasing performance, fault tolerance, and accessing remote resources rather than 6 energy issues. On the other hand, mobile devices are energy limited with limited net- work bandwidth with highly erratic connectivity. Due to these reasons, many researchers designed offloading approaches targeting mobile devices. Mobile offloading approaches can be broadly classified into two categories. 1.2.1 Thread Migration Thread migration [9–12] clones the mobile application thread on a remote server. Once both threads’ contexts are synchronized by transferring the thread state, the remote thread can execute the local thread computation on a remote server. While the remote thread is executing the assigned code segment, the local thread is suspended. When the remote thread finishes the execution, the local thread resumes the execution from the end point of the remote executed code segment after synchronizing the updated remote thread’s context. When thread migration is used, there is no restriction on which code segment may be offloaded. For example, either multiple methods or even just a part of a method can be offloaded at a time. However, thread migration requires significant amount of state transfer. Besides the heap objects, the thread stack frame also should be transferred between the client and server. The large data movement impedes broader applicability of this approach. 1.2.2 Method Migration The second approach used for offloading is method migration [13], which is similar to remote procedure call (RPC). The input parameters and the data set that are accessed 7 within the target method are sent to the remote server alongside the method id. Once the server finishes the method execution, the results are sent back to the mobile device. Since a method’s inputs are sent to the remote node there is no need to synchronize the entire thread context as is necessary for thread migration. In general method migration requires less data to be transferred compared to thread migration. However, method migration is done at the granularity of a method and it cannot be used to offload code at arbitrary boundaries. This thesis focuses on the method migration. 1.3 Thesis Statement The primary goal of this thesis is to design and implement techniques to extract a set of feasible candidate methods and the minimum necessary data elements from unmodified mobile applications for computational offloading with the goal of improving battery life. 1.4 Thesis Contributions This dissertation first presents a compelling study to motivate why managing var- ious design choices for improving battery life is a challenging problem in mobile systems. We show comprehensive quantitative evaluation of how much energy is consumed in various modules of a wireless body area sensor network(WBAN) application that is used for health care monitoring. This evaluation is based on a field-deployed WBAN called KNOWME that is deployed in Los Angeles to mon- itor early teen obesity. We provide energy consumption evaluation of the various 8 WBAN design choices using Nokia Symbian S60 mobile phones. We then com- pare the energy consumption of a subset of WBAN components across Nokia N95, E75 and Apple iPhone 3Gs to demonstrate that the qualitative observations made on S60 phones are applicable to a broader set of mobile platforms. It then uses the knowledge gained from the evaluations to design Active Energy API(Application Programming Interface) that can be used by a designer to implement common WBAN functions in an energy efficient manner. Since energy efficient opera- tion point drifts due to operating conditions, the API are implemented within an Active Energy Profiling(AEP) framework. AEP uses short profiling intervals to determine the most energy efficient method to sense, compute and communicate data at runtime. The API reduces the burden on WBAN designer by transparently selecting the energy efficient operating point for a WBAN. We will KNOWME as a demonstration vehicle to study the trade-offs and to implement APIs within AEP framework. Our initial approach requires a programmer to explicitly use these APIs while designing the application. The lessons learned from KNOWME and AEP are then applied to develop FREEME, a novel framework for efficient computational offloading in Android mobile application environment that requires no intervention from the application developer. Since Android is the world’s dominant mobile operating system we strive to implement FREEME within Android. We present challenges faced in the Android operating system while performing automatic analysis of Java and user-defined native methods to determine which data elements are necessary for each method call. In particular, we describe the constraints in Java serialization and remote procedure call methods on existing Android platforms that thwart our ability to efficiently perform computational offloading. We describe the changes 9 made to the Dalvik virtual machine and the Android OS on the mobile phone to implement novel Java serialization approaches that significantly reduce the data transfer size needed for remote procedure call execution. A fully functional proto- type of FREEME was built which can run unmodified Android apps without ARM emulator and can automatically offload computation whenever remote execution is more energy efficient than local execution. This dissertation presents a hybrid analysis combining static and dynamic analysis to detect native memory references and JNI method calls from ARM binary code developed using NDK(Native Development Kit). Using the hybrid analysis we increase the number of method call chains that may be offloaded even with these call chains contains user-defined method invocations. We enable native method offloading by migrating required native memory space and values inside JVM. 10 Chapter 2 Energy Impact of Design Choices To drive the motivation that different design choices can have significantly different energy consumption, we will rely on a field-deployed mobile phone based wireless body area network (WBAN) called KNOWME that we designed, developed and deployed for obesity monitoring in Los Angeles. We provide a comprehensive energy consumption evaluation of the various WBAN design choices using a Nokia N95 phones. A typical mobile phone based WBAN consists of three layers as shown in Figure 2.1. The first component is the sensor layer which measures physiological and even emotional signals and transmits this data wirelessly. The second layer is a mobile phone which acts as a data collection hub and receives the external sensor data. It may further enrich the sensor data with GPS, audio and video tags to get an accurate state of a person’s health and environmental conditions. The mobile phone may also process data locally. The last layer is a back-end server that processes and stores the data. Internet Back-End Server Body Sensors Smart Phone ECG SpO2 ACC ACC GSR End-to-end encryption of sensitive data Figure 2.1: An Example 3-Tier WBAN System 11 Device Manager Local Storage [User Configuration] [Analyzed Data] [Raw Data] Transmitter [Encrypt/Decrypt] Analyzer [Plug-in modules] ACC GPS ECG SpO 2 Data Collector Service Manager Client Application with GUI Local Socket or IPC (a) (b) P R Q S T (c) Figure 2.2: KNOWME Application: (a) The Components of KMCore (b) A Screenshot of KMClient Running on N95 (c) ECG Signal 2.1 KNOWME Platform In the current implementation, each KNOWME node consists of a Nokia N95 mobile phone (N95) and four sensors, namely tri-axial accelerometers (ACC), electrocardio- graph sensor (ECG) , blood oxygen saturation sensor (OXI), and GPS. The ACC, ECG and OXI are Bluetooth enabled off-the-shelf sensors from Alive Technologies, while GPS is a built-in sensor on N95. The ECG samples at 300Hz, OXI uses 100Hz sam- pling, ACC uses 30Hz sampling while GPS can be sampled at any user-specified rate. KNOWME software runs as a mobile application on N95. The mobile application has two components: a background process (KMCore) for data collection in background and a client interface application (KMClient) for configuring the sensors and data visualiza- tion. The KMCore comprises of seven components arranged in a four-tier hierarchy: (1) device manager at the bottom-most tier (2) data collector at the second tier (3) at the third tier there are data analyzer, local storage manager, and data transmitter and (4) a service manager at the top tier. Figure 2.2(a) shows how various components in the KMCore interact each other. There is one thread per each sensor called device manager 12 thread that receives data from its associated sensor. By creating a separate thread per each sensor KNOWME continues to get data from working sensors without being hin- dered by a single non-functional sensor. The data collector thread receives sensor data from each device manager and synchronizes all the sensor data with the same timestamp into a single health record. It combines several such records into a single write buffer and sends it to the local storage manager to write the data to the flash storage. When the flash storage runs out of space old data is replaced with new data in a FIFO manner. The analyzer modules are designer defined modules that perform domain specific tasks. In pediatric obesity management domain these modules perform user state classification using multi-modal signal processing algorithms. In current implementation the ana- lyzer classifies user state as either sedentary or non-sedentary using just ACC data with Sedentary Analyzer (SA) based on Support Vector Machine classifier [14]. The trans- mitter module transfers data to back-end and handles data compression and encryption for privacy and energy saving. The back-end server runs a more comprehensive suite of classification algorithms to detect a range of user states such as walking, running, fidgeting, standing, sitting using ECG and ACC sensor data with multi-modal signal processing [15]. Such finer classi- fication is used by the physicians at the back-end to get a comprehensive understanding of user behavior and to precisely measure the calories burned. Back-end server stores sensor data indefinitely. In other words, the data stored on the back-end server is a com- plete record of a person’s physical activity history, and the most recent time window of this history is stored on the mobile phone’s flash storage. The last component of the KMCore application is a service manager thread that uses sockets or inter-process communication (IPC) to provide the sensor data to other mobile 13 applications running on the phone such as data visualization application, KMClient. Figure 2.2(b) shows a visualization screen of the KMClient that shows how long a user is sedentary from SA. Using the KNOWME framework described in the prior section, we will now pro- vide a comprehensive evaluation of the energy consumption of the various components in KNOWME. While KNOWME is one particular implementation of a WBAN, we would like to note that most WBANs that we are aware of have very similar architec- ture [16,17]. Furthermore, where possible we quantify the energy consumption of basic operations without relying on KNOWME semantics. For instance, energy consumed for a byte of data compression and transmission is independent of whether it is performed within the KNOWME framework or otherwise. In each subsection below we evalu- ate the energy cost of each component of KNOWME design using multiple available choices. 2.2 Impact of Software Platform on Energy Consump- tion Selecting an appropriate software development platform is arguably the most impor- tant factor in any system design. The programming platform choice can determine the development cost in terms of person-hours as well as the system performance. Hence, in this section we first evaluate the design choices available for mobile phone software development in terms of their energy efficiency. There are three popular SDKs for pro- gramming Symbian Operating System based phones such as N95; namely, Symbian C++, Java 2 Micro Edition (J2ME), and S60 Python (PyS60), and show their relative 14 13.83 0.07 1.24 13.82 0.06 0.79 3.52 35.33 0.54 25.97 38.76 0.42 12.65 0.04 517.00 147.88 2.42 377.19 91.76 1.74 0.61 0 1 10 100 1000 QDA AES Gzip QDA AES Gzip Q A G Nokia N95 Nokia E75 iPhone 3GS Execution Time (Seconds) C++ J2ME PyS60 Figure 2.3: Execution Time of Three WBAN Functions on Three Programming Lan- guages Model QDA AES Gzip C++ J2ME PyS60 C++ J2ME PyS60 C++ J2ME PyS60 N95 6.6J 19.34J 270.97J 0.03J 0.30J 77.62J 0.50J 12.22J 1.35J E75 5.75J 15.95J 151.63J 0.02J 0.16J 36.89J 0.33J 4.93J 0.70J iPhone 3GS 2.58J n/a n/a 0.04J n/a n/a 0.55J n/a n/a Table 2.1: Energy Consumption For Three Processing Functions energy efficiency. For Android and iPhones the programming interfaces are similar to the J2ME interfaces on Symbian. We selected three functions that are of particular interest to WBANs, namely a heart beat detection algorithm (QRS Detection [18]), Advanced Encryption Standard (AES) encryption, GNU zip (Gzip) data compression. AES encryption is commonly used to transmit the data from the mobile phone to the back-end server for data protection. Gzip is also another commonly used function to compress data before transmitting the data to the back-end server to reduce transmission costs. ECG signal is characterized using the P, QRS and T waves as shown in Figure 2.2(c). The QRS Detection Algorithm (QDA) detects R peak used as basis of reference in ECG segmentations and is necessary to recognize heart beat. We selected QDA implementation from the Open Source ECG Toolbox [19]. We ported this algorithm to each of the three programming languages. 15 The first 9 bars in the Figure 2.3 show the execution time of the three functions using the three programming platforms (the remaining bars will be discussed in the next section). Table 2.1, row labeled N95, shows the corresponding energy consumption in joules. In this thesis, we used the Nokia Energy Profiler Tool to report energy consumption values. The tool itself is fairly lightweight and adds negligible overhead. We used 10 minutes of ECG data, which is 180KB of data, as input to each of the three functions QDA, AES and Gzip. Results from Figure 2.3 show that PyS60 performance on AES is three orders of magnitude worse than C++ implementation!, since PyS60 doesn’t have a just-in-time(JIT) compiler unlike PC’s. However, for QDA PyS60 is about 40X slower than Symbian C++ (comparing Bars 1 and 3 in Figure 2.3) and has a corresponding 40X higher energy consumption (Column1 and Column 3 in Table 2.1). The improved performance of PyS60 in QDA case is attributed to efficient implementation of several built-in library functions, such as low-pass filtering in PyS60. For Gzip, PyS60 per- formed as fast as Symbian C++. Further analysis of PyS60 Gzip function showed that PyS60 implements Gzip as a native function written in Symbian C++ and included in PyS60 as an extension module. In essence, PyS60 implementation of Gzip is a Symbian C++ implementation of Gzip. Spec N95 E75 iPhone 3GS CPU Dual ARM 11 ARM 11 ARM A8 332Mhz 369Mhz 600Mhz Memory 128 MB RAM 85MB RAM 256MB OS Symbian S60 Symbian S60 iOS4 Table 2.2: Specification of the Mobile Phones 16 2.3 Sensitivity to Hardware Platform One may raise the obvious concern, are the results from prior section are specific to the underlying hardware platform of N95? Do these application behave differently on a different mobile phone. In order to clarify these concerns, we repeated the same set of experiments on a Nokia E75 and Apple iPhone. Table 2.2 shows relevant hardware and software specifications where the mobile phones differ. All platforms are based on ARM cores but iPhone processor is nearly twice as fast as the other two and has also twice the amount of memory. Hence, memory related issues like garbage collection are likely to be more severe on Nokia platforms compared to iPhone. iPhone provides only one programming interface, namely iPhone SDK, which is closer to the Symbian C++ in terms of programming complexity. The second set of 9 bars in the Figure 2.3 show the execution time of the three functions while running on E75. The last set of 3 bars in the Figure 2.3 show the execution time of the three functions while running on iPhone. Similarly in Table 2.1, rows labeled E75 and iPhone, show the corresponding energy consumption in joules. Comparing results between N95 and E75 the relative impact of programming language on execu- tion time (and energy) across all applications remain the same. Also execution time decreases in almost all cases on E75. The primary reason for the improved performance and energy efficiency on E75 is that these applications are all single threaded and the higher frequency ARM 11 processor on E75 executes them faster. iPhone executes these applications even faster compared to Symbian C++ implementation on E75. Again, the reason is that iPhone processor is nearly 2X faster than E75. All these applications have similar code footprint, given that all platforms use ARM ISA. Hence, the relative execution time differences of the three application remain the same. 17 Sensor Energy (J) Sampling Rate Xmit Rate Built-in ACC 37.804 30 30 ECG 114.846 300 4 OXI 137.433 100 10 ECG&OXI 156.419 300 & 100 4 & 10 Assisted GPS 53.994 0.003 0.003 Table 2.3: Energy Cost of Sensor Readings A complete evaluation across multiple platforms for all the results is beyond the scope of this thesis since it requires significant additional resources. Hence, we restrict our evaluation in the rest of this thesis to only N95 and show results only on this phone model. 2.4 Energy Consumption of the Sensors After establishing the energy impact of software development choice, we now focus on the energy costs of sensing itself. In KNOWME built-in sensors such as ACC con- sumes energy to perform the sensing operation while external sensors (ECG and OXI) cause mobile phone’s battery drain due to Bluetooth communication. Table 2.3 shows the energy consumption of the built-in phone sensors and the energy consumption when reading data from external sensors using Bluetooth. This data is obtained while sensing for a 10 minute interval using the sampling rates shown in the table 2.3. A Symbian C++ application is used for reading the sensor information. ECG generates 300 sam- ple per second while OXI generates 100 samples per second. However, these sensor samples are internally buffered in the sensor and bulk transmitted over the Bluetooth channel. Although ECG takes 300 samples it only transmits 4 packets per second, each packet has 75 sensor samples. OXI transmits 10 packets per second. Hence, the energy 18 0 0. 5 1 1. 5 2 2. 5 0 25 50 75 100 Power (W) Time (Seconds) Bluetooth GPS 0 0.5 1 1.5 2 2.5 0 25 50 75 100 Power (W) Time (Seconds) Assisted GPS 0 0.5 1 1.5 2 2.5 0 25 50 75 100 Power (W) Time (Seconds) Integrated GPS 0 0. 5 1 1.5 2 2.5 0 25 50 75 100 Power (W) Time (Seconds) Network based Figure 2.4: Energy Cost of Positioning Methods consumption while receiving data from OXI sensor is slightly higher than when com- municating with ECG. Interestingly, when both ECG and OXI concurrently send data the mobile phone is more energy efficient since the energy expenditure is much less than the sum of the energy spent when both sensors transmit data in isolation. The energy expended in putting the Bluetooth radio in an active listening mode is amortized over both sensors when hearing from the two sensors simultaneously. The Built-in ACC consumes 38 Joules for 10 minutes while generating 30 samples per second. 2.5 Energy Consumption of GPS Table 2.3 also shows the GPS energy consumption using assisted GPS technology used in N95. In the 10 minute interval we made two GPS readings and hence the sample rate 19 is 0.0033 samples per second. GPS consumes significantly higher power per each sam- ple than any of the other sensing functions in KNOWME. Given that GPS is an energy- intensive operation mobile phones provide multiple options for obtaining location infor- mation. In particular, when using N95 the user has four choices: Bluetooth based exter- nal GPS, Assisted GPS, Traditional GPS with no assist, and network-based GPS that uses only cell towers to provide approximate position information. Figure 2.4 shows energy consumption when GPS is processing a first request using all four approaches. Generally, once the position is known, the next GPS reading cost is reduced. For exam- ple, in the assisted mode GPS, the first reading cost for GPS is 29 Joules per sample, whereas from the second reading it is reduces to 25 Joules. The figure shows trade-offs in the power consumption and the time to obtain the first GPS reading using all four methods. In Figure 2.4 the Bluetooth based external GPS power consumption shows only the mobile phone power consumption for establishing a Bluetooth link and read the GPS coordinates. There is a spike in the power consumption when the connection is established and the data is read over the Bluetooth channel and the power consumption drops down to the idle power when Bluetooth is ON but not actively sending/receiving data. The Assisted GPS curve shows a large power spike when it communicates with the cell tower to get the rough location first. Once the cell tower provides the orbital data of GPS satellites the GPS receiver on the mobile phone can narrow the search for satellite signals and quickly obtain the position information. In our measurements this approach took roughly 15 seconds to receive the position information from network and further 35 seconds to retrieve the precise position from a cold start. The third curve labeled Integrated GPS is the basic non assisted GPS. In this mode the GPS receiver continu- ously scans for the satellite information. This approach uses lower peak power but takes nearly 100 seconds before obtaining the position information due to the absence of any 20 0 50 100 150 200 250 300 350 0 20 40 60 80 100 Time (Minutes) Battery Life (%) All Unbuffered All Buffered Figure 2.5: Battery Drain With Storage assistance from the cellular network; the total energy consumed by Integrated GPS is 33.9 Joules. The last curve shows the power consumption of Network based position information which, like Assisted GPS, communicates with cell tower to triangulate its approximate position information but no further data is retrieved from satellites. Hence network-based GPS also consumes roughly the same power as Assisted GPS initially when communicating with the cell tower for the 15 seconds. The power consumption then drops to idle power since it does not try to compute accurate position information by communicating with the satellites. Based on the total energy consumption (Power consumption * time to get GPS reading) our evaluations show Network-based GPS is best for saving energy in a WBAN (13 Joules per sample), even though location data is only approximate. Network-based GPS is more than 2X energy efficient than Assisted GPS. When precise location is needed then Assisted GPS is the best option. 2.6 Storage Costs Once the sensor data is received the mobile phone may write the data to phone’s flash memory for further analysis. Figure 2.5 shows the battery level of the mobile phone as 21 we continuously sense data from ECG, OXI and write sensor data to the flash memory. The curve labeled ALL UnBuffered shows the battery level on the mobile phone as we write each packet of data immediately to the local flash without any buffering. In other words, every sensor sample received is immediately written to the flash memory. It is well known that flash energy efficiency is significantly compromised for small writes. Flash writes must be done at the size of a page granularity, typically 4KB pages. If a smaller than page size write is performed usually the page that is being modified must be first read from the flash into a DRAM buffer and the bytes that are going to be written are updated in the DRAM buffer. Then the entire DRAM buffer is written back to flash. Hence, writes lead to a read-modify-write sequence in the Flash memory, where even a few bytes of write translate to a full page write, which is referred to as write amplification effect [20]. The curve labeled ALL Buffered shows the battery level if we buffer the writes to DRAM and send large chunks to write to the flash. In this case we buffered sensor data till we receive at least 100 packets from a sensor. Then we write the buffered data to the flash. As can be seen buffering improves the battery life from 240 minutes to 299 minutes. Just a note of caution that the battery life time here does include the cost of using Bluetooth to receive data as well. Hence, the 240 minutes of battery life is not just due to flash writes only. Rather we focus on the difference in the battery life time with and without buffered writing to the flash, rather than the absolute values. The difference of 50 minutes translates into an additional 25% increase in the battery life time in this experiment. 22 0. 0 0 .4 0 .8 1. 2 1 .6 2 .0 0 10 20 30 40 Power (W) Time (Seconds) Connection Transfer Tail WiFi EDGE 3G Figure 2.6: Three Phases of Transmissions 2.7 Compute vs. Communicate The mobile phone in KNOWME is not just a sensor data collection node but as described in Section 2.1 there is an Analyzer module that can perform local data analysis of the sensed data and a Transmitter module that transmits data to the back-end server, option- ally using compression and-or encryption. Hence, the last trade-off we consider in this study is the energy consumption cost of performing local data analysis versus perform- ing data analysis on the back-end server but pay for the data communication energy. This compute versus communication cost is not unique to KNOWME as we expect most WBANs to make this fundamental trade-off. Communication Costs To quantify the energy costs of communication we created a testbed. The testbed can send data from mobile phone to the back-end server using either: 3G, EDGE, Wi-Fi. AT&T broadband network is used for 3G and EDGE, while an 802.11g Linksys router 23 30 40 50 60 70 Energy (Joules) T ail T ran sfer Con ne ct ion 10.85 23 . 89 43.04 63.74 70 . 06 19.98 26.72 34.77 43.83 46. 24 1.17 1.99 3.53 5.36 6.11 0 10 20 30 40 50 60 70 100 300 600 900 1000 100 300 600 900 1000 100 300 600 900 1000 ED GE 3G Wi - Fi Energy (Joules) Data Size (KB) Tail T ransfer Con ne ction Figure 2.7: Data Transmission Costs (Uplink) is used for Wi-Fi. We varied the data size transmitted to the back-end from 100KB to 1000KB. As the performance of mobile wireless networks vary from place to place and from carrier to carrier, we tested all of data transfer measurements from the same location during early morning within one hour to reduce network congestion problems with other users. We repeated this study multiple times, but always done at the same time of the day, to measure day to day variations. A data transmission consists of three phases, namely a connection phase, a transfer phase and a tail phase. Figure 2.6 shows the power consumption during the three phases for the three wireless data transmission approaches. In this experiment we used a 200KB data transfer. Due to shorter range, even though Wi-Fi has much higher bandwidth the peak power consumed is only slightly worse than 3G and EDGE radios on N95. Figure 2.7 shows data transmission (uplink) energy costs of three of wireless interfaces as we increase the size of data transfer from 100KB to 1000KB. Wi-Fi is the most energy efficient across all data packet sizes. Hence, the overall energy consumption using Wi-Fi is significantly less than either 3G/EGDE in our setup. Obviously, due to limited Wi-Fi coverage a practical WBAN implementation will most likely use either 24 3G/EDGE for real time data transfer for mobile users. In this experiment, 3G consumes more energy than EDGE until about 400KB of data size. In order to understand the reason, Table 2.4 shows average connection and tail energy costs of all three network interfaces. The connection and tail energy of 3G are much higher than EDGE, while the transfer energy is lower. Hence for small data packets 3G consumes more energy than EDGE. As the data packet size increases beyond 400KB, the transfer energy dominates the overall energy costs and hence 3G consumes less energy than EDGE. Medium Connection Energy (J) Tail Energy EDGE 0.346 2.987 3G 2.331 10.752 Wi-Fi 0.132 0.166 Table 2.4: Connection and Tail Energy Costs Local or Remote Computation In KNOWME the most complex data analysis function is to detect user state to identify long phases of physical inactivity (sedentary behavior). The choice of whether to per- form the user state detection on the mobile phone or on the back-end depends on the total energy cost taking several factors into account. Consider a simple case where we are interested in performing QDA on 10 minutes of ECG data, which is 180KB of data. Figure 2.8 shows the energy cost of this computation for local and remote computation. The first two bars in the graph show the energy cost of local computation when QDA is implemented in C++ or J2ME. The second set of three bars show the transmission cost to perform data analysis on the remote server using three different transfer options: 25 EDGE, 3G and Wi-Fi. The last set of 6 bars show the energy cost when data is com- pressed and then transmitted. The compression algorithm is implemented in C++ and J2ME and the transfer options are EDGE, 3G and Wi-Fi. Let us consider J2ME implementation of QDA. The local computation cost is 19.34 Joules. The remote computation cost (without Gzip) varies between 1.46 Joules using Wi-Fi up to 22.72 Joules using 3G. On the other hand the remote computation cost with Gzip implemented in J2ME varies from 13.46 Joules using Wi-FI to 32.89 Joules using 3G. Now consider C++ implementation of the QDA. The local computation cost is 6.6 Joules. The remote computation cost without Gzip remain the same as before since there is no software platform dependence on transmission cost. But when Gzip is also implemented in C++ the energy cost varies between 1.74 Joules using Wi-Fi up to 21.17 Joules using 3G. When Wi-Fi is available it is clearly energy efficient to perform remote computation. But when the user is roaming, which will be a common case in WBAN operation, local computation is better than EDGE or 3G cost when QDA is implemented in C++. But if QDA implemented using J2ME then then remote computation using EDGE and without Gzip compression (15.86 Joules) is better. Even in this simple scenario the choice of remote versus local computation is a complex function of which software platform the application is developed under, the wireless radio being used and whether or not data is compressed. We even simplified the dis- cussion by removing the network signal quality issues that may alter the energy costs dynamically; as explained earlier, in our setup we used the network during the least congested time and from a location with the best signal quality. Through this simple experiment we demonstrate that there is no single statically best choice when it comes to trading off energy costs of computation with communication. 26 6.6 19.34 15.86 22.72 1.46 12.6 21.17 1.74 24.32 32.89 13.46 0 5 10 15 20 25 30 35 Energy (Joules) EDGE 3G WiFi C++ J2ME C++ J2ME XMIT ZIP_XMIT Remote Computation Local Computation Figure 2.8: Local vs Remote Computation of QDA 2.8 Optimal Configuration In this section we summarize our results and conclusions from the WBAN characteriza- tion experiments. It is qualitatively obvious that interpretive languages are likely to be slower than languages that run natively. But, there is growing trend in traditional computing to use interpretive languages due to their programming simplicity and portabil- ity. However, in the WBAN domain given the stringent battery constraints native execution provides significant energy savings. It is more energy efficient to communicate with multiple Bluetooth-enabled sen- sors concurrently rather than sequentially. Concurrent connections amortize the energy cost of connection establishment as well as reduce energy due to improved utilization of residual bandwidth. 27 For localization service, we considered four different types of services with differ- ent energy and location accuracy trade-offs. This trade-off is an application depen- dent choice but our results show at least 2X improvement when using Network- based GPS compared to Assisted GPS. Buffering data in the flash storage before transmission adds a small delay before the data is received at the back-end. Even for WBANs that need real-time data at the back-end the additional delay due to buffering is minimal while it improves energy efficiency by at least 25%. For data transmission, Wi-Fi has the lowest energy cost for data transmission. On cellular network, however, 3G is not always more energy efficient than 2G network, since for small data packets 2G network is more energy efficient that 3G due to smaller connection and tail costs of 2G. 2.9 Active Energy Profiling Method The previous section quantified the energy cost of each sub-component within a WBAN. It is clear that there is no single design choice that optimizes energy cost under all operating conditions. To improve energy efficiency even in the presence of dynamic in energy usage we present a dynamic approach to energy management, called the Active Energy Profiling (AEP) approach. We first present a general approach for implementing AEP in any WBAN. Later we present our specific implementation of AEP within the KNOWME implementation lim- itations. An idealized AEP uses a short profiling phase where it tests the various choices 28 for WBAN operation. The profiling phase first decides how much sensor sampling is necessary for achieving the required user state detection accuracy. Optimal sensor sam- pling rate for user state detection may be computed using approaches described in [21]. It then runs the sensor analysis algorithms, that are specific to the WBAN implemen- tation, locally on the mobile phone. It will use sensor samples collected over a short profile time window to perform local analysis. AEP then measures the energy consump- tion for local computation. AEP measures the communication cost of transmitting the same sensor data to a back-end where the analysis algorithms may run on a server. It then receives the results from data analysis from the server. AEP measures the energy consumption of the data transmission using both uncompressed and compressed data. Once the profiling phase is complete AEP then switches to a regular operating mode. During regular operation sensors are sampled at the optimal rate as calculated during profiling phase for achieving desired accuracy. If local computation is more energy effi- cient than remote computation then AEP selects that option for data analysis. If remote computation is more efficient AEP then selects the wireless radio with least energy con- sumption as measured during profiling phase based on network signal quality. Thus, during regular operation energy consumption is same as the lowest energy option mea- sured during profile phase. However, the regular operating mode may drift from the optimal operating mode over time. Hence, the regular operating mode is interrupted whenever there is change in the system state. A change can be triggered due to three reasons: (1) when the user moves to a different location, or (2) when an interesting event is detected during sensor data analysis, or (3) after a predefined time quantum has elapsed. AEP as described above is a simple and yet powerful approach to automatically optimize WBAN’s energy consumption. 29 Use 3G Networks START Transmit Data END Get Activity Data Get GPS Coordinate Profiling data Exist? Energy Profiling Choose most energy efficient method START Transmit Data No Yes END Get Activity Data Moved? Yes No Use previous config Get GPS Coordinate Changed? Yes No Scan Available Access Point (a) Baseline Use 3G Networks START Transmit Data END Get Activity Data Get GPS Coordinate Profiling data Exist? Energy Profiling and Store Data Calculate energy cost and Choose most energy efficient method START Transmit Data and Update Profiled Data No Yes END Get Activity Data Moved? Yes No Use previous config Changed? Yes No Get GPS Coordinate Scan Available Access Point Changed? Yes No Retrieval data (b) Profile Energy and Store Data Figure 2.9: Comparing Energy Saving Method AEP Implementation in KNOWME The implementation of AEP within KNOWME is shown in the flowchart in Figure 2.9. In this implementation we use AEP to optimize on the two most energy consuming operations in KNOWME, namely GPS and data transmission. The flow chart on the left shows the sequence of steps in the baseline KNOWME. The system collects activity data by sampling the sensors. It buffers sensor samples for 10 minutes. KNOWME runs the sedentary analyzer routine (SA) once every minute that uses just ACC data to determine if a user is sedentary or not. It then gets position information using Network-based GPS once every 10 minutes. KNOWME then uploads the 10 minute geo-stamped sensor data to a back-end server using 3G. The flow chart on the right in Figure 2.9 shows AEP applied to KNOWME. The sys- tem still collects sensor data and buffers sensor samples for 10 minutes, just as in the 30 baseline KNOWME. At the start of the WBAN operation AEP collects basic informa- tion regarding compression energy cost per bit and how it scales with data size. It also computes the typical compression ratios obtained for the data collected from the first few data samples. It stores the energy costs of compression and compression ratios in a local database. AEP uses SA to decide if the user has moved. In other words, if the user state is classified as sedentary, AEP assumes the user has not moved position. When the user has not moved then it simply operates the WBAN using the current operational set- tings. In our current implementation the operational settings include (1) Whether or not to collect GPS information, (2) whether or not to do data compression, and (3) which wireless radio to use in the presence of multiple transmission options. If the user state is classified as not-sedentary then the system assumes that the user has moved. If the user has moved AEP then scans for Wi-Fi access points (APs) to detect change in location based on detected APs. If APs have changed then the system requests for position infor- mation using GPS. If a Wi-Fi AP is available it then probes its internal database to see if there is any existing profile information for that Wi-Fi AP and GPS position. If such a profile information exists it then uses that profile information to set the WBAN opera- tional settings. If no such profile information exists then it runs a profile phase with 10 minute sensor data to measure the energy consumption with various wireless radios and data packet sizes with and without compression. Note that when multiple Wi-Fi APs are available AEP profiles the energy cost for sending data from each of the Wi-Fi APs. It then selects the most energy efficient setting from the profile run and sets the WBAN’s operational settings. These settings are also stored in the profile information database and that entry is associated with a key formed by combining Wi-Fi AP and GPS. 31 0 0.4 0.8 1.2 1.6 2 2.4 10 15 20 Power (mW) Time (Minutes) 0 0. 4 0. 8 1. 2 1. 6 2 2. 4 10 15 20 25 30 35 40 Time (Minutes) Bluetooth Communication with Two Alive Heart Rate Monitors 3G w/o AEP 3G w/ AEP 3G Upload Positioning Bluetooth Communication with Two Alive Heart Rate Monitors 3G Upload w/ Gzip Energy Profiling Positioning WiFi Scan Two WiFi APs & 3G w/ AEP WiFi Upload w/ Gzip Profiling with Three Networks Positioning WiFi Scan SA SA Figure 2.10: Power usage of various cases Using the above algorithm energy savings come from multiple optimizations: First, AEP can skip GPS sensing whenever a user has not moved within a 10 minute inter- val. Second, AEP selects between cellular and Wi-Fi AP. Few limitations are worth noting. First, the sampling rate for external sensors in KNOWME are currently not programmable. Hence, we can not optimize external sensor sampling rate in current KNOWME. Second, KNOWME runs SA on ACC data for providing real time feedback to the user regarding their sedentary state. SA consumes little energy for doing the clas- sification. Hence, SA computation is always run locally. The more complex user state detection using multi-modal signal processing is done on the back-end. The last limi- tation is that Symbian S60 does not allow user level access for selecting EDGE or 3G network. Hence, we could not select EDGE network even when EDGE uses less energy in cases where the data packet size is small. We always ended with 3G in our operating locations. Results from AEP Implementation For the data presented in this section we ran KNOWME on two N95 phones (identical in all respects, including battery age). On one phone we ran the baseline KNOWME and the second phone ran AEP implementation of KNOWME, as described in the previous 32 section. In this experiment the user is moving throughout the experiment, detected as non sedentary by SA, between different locations and comes in contact with Wi-Fi AP as well. The sensor layer consisted of two external sensors, ECG and ACC sensors. Figure 2.10 shows how the power consumption varies with time with both approaches for a 40 minute KNOWME run where data is transmitted to back-end once every 10 minutes. The figure is divided into three segments. The first segment (left most segment) shows how power consumption varies with time in the baseline KNOWME. The first spike in the power consumption is the GPS and second spike is 3G upload. The small spikes that occur once every minute correspond to the power consumption of SA that runs every minute. This pattern repeats every 10 minutes irrespective of user states. The last two segments of Figure 2.10 show the power consumption with AEP. The sec- ond segment shows when the user is roaming and hence there is only 3G network. The last segment shows power consumption when the user comes in proximity of two known Wi-Fi APs. The system first does a Wi-Fi scan to see if user has moved. But at the start it has no position information. So it gets position information by GPS sensing just as in baseline KNOWME. After GPS sensing the new power spike corresponds to energy profiling. During this profile run the system tries to upload data using 3G with and with- out Gzip. Once the profiling phase is over the system uses 3G to upload compressed data as it was determined to be the best energy saving option. At the next interval there is a short Wi-Fi scan power spike since the user is not sedentary. In this case the user has not moved to a new location and hence Wi-Fi scan also determines the same since no APs are detected. During this intervals there is a clear absence of GPS and profiling power spikes since the system detects that the user has stayed within the same area as during the profile phase. Hence AEP simply uses WBAN settings that were selected from the previous profile phase. 33 918.329 980.549 1114.187 0 200 400 600 800 1000 1200 (c) (b) (a) Energy (Joules) Bluetooth SA Transmission GPS Energy Profiling WiFi Scan Gzip Figure 2.11: Comparison of Energy Consumption (a) Baseline (b) AEP with 3G Only (c) AEP with Two Wi-Fi APs & 3G At time 25 the user has moved to a new location where there are two Wi-Fi APs in addi- tion to 3G. At 30 before starting energy intensive GPS and data transmission operation AEP starts the profile phase. There is a power spike corresponding to Wi-Fi scan which now determines that user has moved and it also determines that there are two new APs at this location. The system then uses GPS sensing which confirms that the user has in fact moved to a new location. The profile phase now has three spikes that correspond to the system sending data using the two Wi-Fi APs and 3G. Once profiling is complete it selects Wi-Fi. It continues to use Wi-Fi access point to reduce overall energy consump- tion as long the user has not moved too far away. Note that this figure just demonstrates how AEP dynamically adapts to changing conditions. In a true WBAN operation the user does not move as frequently as we have shown in this figure. Hence, profiling is not done as frequently and the system operates in optimal mode for long periods of time before needing a new profile run. The net energy reduction after one hour of operation with AEP is shown in Figure 2.11. Each bar measures the total energy for one hour of KNOWME operation. The first bar is the baseline KNOWME. The second bar is AEP with only 3G. The third bar 34 is AEP with 3G and two Wi-Fi APs. The Bluetooth data transmission costs stay the same in all approaches, as to be expected given that external sensor sampling rate could not be programmed, due to hardware limitation in our current off-the-shelf components implementation. The cost of profiling is less than 1% of the total energy cost. However, the benefits of profiling are clear. AEP with just 3G reduces the energy consumption from 1114 Joules to 980 Joules, a 12% improvement. AEP with two Wi-Fis and 3G reduce the energy cost to 918 Joules, an 18% energy reduction. targets for reduction it reduces the energy consumption to 118 joules, a 62% energy improvement. 2.10 Related Works The popularity of WBANs for health monitoring has been increasing in the recent years. They are being deployed to assist in physical rehabilitation [22], obesity monitoring [23, 24], assisted living [25]. All these prior studies focused on the usability of the sys- tem in terms of computer-human interface. But as shown in our research, understanding energy implications of WBAN, from design to operation, will significantly improve the battery life and benefit many of these prior WBAN implementations. In [26] the authors described MAUI which is an automated system that can switch between local and remote computation based on round trip time of a request response cycle. MAUI relies on flexibility of managed code environment to create two versions of any computational task, one that runs locally on a phone and second that runs remotely on a server. During runtime it computes the energy cost of local computation versus transferring state to a remote site based on current network conditions. It then invokes remote computation using RPC whenever that option is more energy efficient. Our 35 research differs in two notable ways. We first characterizes the energy consumption of all aspects of a typical WBAN, software development platform, sensing, GPS, data buffering, and finally remote versus local computation. Furthermore, the AEP approach takes a broader set of criteria into consideration to automatically decide on the best WBAN operational point. Viredaz et al. [27] discussed methods for improving energy consumption in hand-held devices. They detect periods of idle time in a mobile device and use voltage frequency scaling to reduce power consumption. Shih et al. [28] showed that using wake-on- wireless a PDA can be put into sleep mode and woken up only on an incoming call or when the user is actively using the device. Turdecken [29] demonstrates how to use hierarchical power management to reduce energy consumption. In this context they attach a low power mote to a mobile node and use the mote to continuously monitor incoming packets. The mote wakes up the mobile device when an incoming packet is detected. The notion of hierarchical energy management is exploited at a much finer granularity in EEMSS [2]. EEMSS categorizes all the sensors on a mobile phone into a hierarchy and then activates low energy sensor which will in turn decide when to activate a higher energy intensive sensor. Energy efficiency in the context of communication is well studied. Energy efficient algo- rithms for wireless sensor networks have been proposed in [30–32]. Recently studies have also done energy measurement of wireless data transmission using various network interfaces [33]. They show that data transmission energy varies widely from one location to another and may also vary at the same location depending on the time. Hence, they also argue that dynamic routing is necessary for optimizing energy. In [34] the authors 36 introduce Wiffler which is designed for a vehicular network for optimizing data through- put. Wiffler can change network interface between Wi-Fi and 3G depending on network condition. It uses historical data to predict future available Wi-Fi APs. Since Wiffler is geared toward vehicular network it cares more about bandwidth, whereas a WBAN is sensitive to battery consumption. Breadcrumbs [35] tracks user’s movement to generate connectivity forecasts Ra et al. [36] focused on dynamically selecting between various wireless radios. They introduce SALSA that uses Lyapunov optimization framework to automatically decide when to send data and when to defer data transmission and wait for better channel availability so as to optimize the overall energy-delay trade-offs. Again many of these approaches use prediction of Wi-Fi APs or user movement to opti- mize energy of at least one component that is used in KNOWME. We need to carefully develop these prediction models in KNOWME context and we can then take advantage of these prior studies to further reduce energy. Even without relying on user movement prediction AEP provides significant benefits. While there are disparate sources of some of the energy consumption information, none of the previous studies have done a systematic and comprehensive analysis of the energy consumption of a WBAN starting with the initial system design choices to the data trans- mission. We believe that our research also sheds new light on issues such as programma- bility and energy efficiency, energy compression costs of sensor data and the energy cost of storing data locally on mobile phones. Given the energy consumption uncertainty in WBAN it is necessary to use a profile based dynamic adaptation to minimize energy consumption across all layers of a WBAN. 37 Chapter 3 Characteristics of Android Applications In the previous chapter we presented a comprehensive quantification and analysis of energy consumption of the various components in KNOWME, which is one implemen- tation of a WBAN. There are several trade-offs a designer has to make to reduce energy consumption. At the software level our results showed that improving energy efficiency is as important a metric as programmer productivity in WBANs. We also quantified how energy consumption can be curtailed by reducing the sensor sampling rate or using approximate sensing, such as network based location sensing. Moreover, we proposed Active Energy Profiling (AEP) that is a dynamic approach that automatically optimize the energy consumption based on the real time operating conditions of a WBAN. Given the uncertainty of energy consumption across various WBAN tasks no single statically selected operating point can effectively reduce WBAN energy consumption. Hence, AEP uses a short profile run to dynamically compute the energy costs of various WBAN tasks and automatically selects the operating point that best reduces energy consumption during that time interval. However, AEP still needs programmer effort to modify their source code to explicitly use the AEP APIs for enabling computation offloading features. But our ultimate goal is to entirely relieve the burden on the programmer to modify their code to take energy 38 efficiency into consideration. To achieve this goal, we propose FREEME which is a runtime environment that acts as an intermediary between the mobile operating system and application layers as shown in Figure 1.2. To effectively leverage the advantage of the offloading, the offloadable parts of the program should be carefully determined, since the legacy Android applications are not designed for computation offloading. Android Runtime Applications Home Contacts Phone Browser … Application Framework Activity Manager Window Manager Content Providers View System Package Manager Telephony Manager Resource Manager Location Manager Notification Manager Surface Manager Media Framework SQLite Core Libraries Dalvik VM Libraries OpenGL | ES FreeType WebKit SGL SSL libc Linux Kernel Display Driver Camera Driver Flash Memory Driver Binder(IPC) Driver Keypad Driver WiFi Driver Audio Driver Power Management Figure 3.1: Android’s architecture diagram 3.1 Background We provide a brief background information on the structure of the Android operating system (OS) and how applications are executed on Android. 39 3.1.1 Android OS Android is a mobile operation system developed by Google [37]. Android consists of several layers as outlined in Figure 3.1. Kernel: Android Kernel is developed using a baseline Linux kernel. In particular, mobile devices differ from traditional computer systems in terms of the kinds of devices attached to a widely varying range of mobile devices. As such the Android kernel has several device drivers to support the underlying hardware for each of the mobile plat- forms that it supports. Libraries: Libraries are native code written in C or C++ that expose the functional- ity of the underlying hardware to the higher layers of application software. Various Android Frameworks and Java APIs provided to the programmer are built on top of these libraries. Runtime: Android applications are written using a mix of native and Java APIs. As such each application runs on top of a virtual machine runtime. Dalvik is the virtual machine that executes applications developed for Android. The bytecode generated from the Java methods is interpreted Dalvik. However, interpreting code at runtime has been shown to have high runtime overhead, even though bytecode provides interoperability across platforms. Given the limited compute resources on a mobile platform, both the battery and CPU power, an alternative runtime environment named Android Runtime(ART) was introduced with Android 4.4. ART uses Ahead-Of-Time (AOT) compilation that compiles application’s DEX formatted bytecode to machine code during installation time of the application. It has become the default runtime from Android 5.0. 40 Android Package File(.apk) … resource files…. x86 Contains private libraries for various CPU architectures Contains application resources, such as drawablefiles, layout files, and string values Contains raw asset files Dalvik Executable files Control file assets res lib armeabi armeabi-‐v7a classes.dex…. AndroidManifest.xml raw asset files… Figure 3.2: Android Application Package Structure Framework: Framework is set of services that manage the basic functions of a phone. For instance, the view system framework combines all the functionality necessary to manage the visual representation of an application, user interfaces and interactions with the application. The various components necessary to manage a given services are pack- aged into a set of Java classes which are bundled into a single framework. Applications: Applications are the programs written by the developers to meet a user’s computing need. Each application is provided as an Android application package which can be downloaded by the users and installed on the phone. 41 ELF header .text(executable instructions) .rodata(read-only data) .bss(uninitialized data) .plt (procedure linkage table) .got(globaloffset data) .data(initialized data) .dynsym(dynamic linking symbol table) .symtab(global symbol table) .init_array(initialization routines) .fini_array(termination routines) … (optional sections,not required) [Mac:~] objdump -h libmyjni.so Sections: Idx Name Size VMA LMA File off Algn 0 .interp 00000013 00000134 00000134 00000134 2**0 1 .dynsym 000003e0 00000148 00000148 00000148 2**2 2 .dynstr 000004ee 00000528 00000528 00000528 2**0 3 .hash 00000194 00000a18 00000a18 00000a18 2**2 4 .rel.dyn 00000058 00000bac 00000bac 00000bac 2**2 5 .rel.plt 00000060 00000c04 00000c04 00000c04 2**2 6 .plt 000000a4 00000c64 00000c64 00000c64 2**2 7 .text 00000ea6 00000d08 00000d08 00000d08 2**2 8 .ARM.extab 0000003c 00001bb0 00001bb0 00001bb0 2**2 9 .ARM.exidx 00000110 00001bec 00001bec 00001bec 2**2 10 .rodata 00000007 00001cfc 00001cfc 00001cfc 2**0 11 .fini_array 00000008 00002e9c 00002e9c 00001e9c 2**2 12 .init_array 00000004 00002ea4 00002ea4 00001ea4 2**0 13 .dynamic 000000f8 00002ea8 00002ea8 00001ea8 2**2 14 .got 00000060 00002fa0 00002fa0 00001fa0 2**2 15 .data 00000008 00003000 00003000 00002000 2**2 16 .bss 00000004 00003008 00003008 00002008 2**2 17 .comment 00000010 00000000 00000000 00002008 2**0 18 .ARM.attributes 00000034 00000000 00000000 00002034 2**0 Figure 3.3: ELF File Format and Sample 3.1.2 Android Application Package Typically, an Android application is distributed as an APK (Android Application Pack- age) file format that consists of resource files, aclasses:dex (Dalvik Executable) file, native libraries, and XML configuration files as shown in Figure3.2. Dalvik Executable: classes.dex is an archive of the compiled java bytecode that may be executed on the Dalvik virtual machine. The DEX file format currently has a limitation of 65,536 method references in a single DEX file. Hence, applications that exceed this method limit may contain the multidex library to support additional DEX files for Android OS version 4.4 or older. Resources: Resources are the additional files and static content that are accessed by the application. For instance, images, layout definitions used by an application are stored as resource files. . 42 Libraries: Libraries are implemented usually in either C or C++ and are compiled to run on a specific underlying instruction set architecture (ISA). In the library direc- tory, there are several directories each matched to a specific ISA, such as armeabi or armeabi-v7a. Thus, a single APK may store multiple versions of the libraries for dif- ferent underlying hardware platforms. Libraries are compiled as ELF formatted shared library files. As shown in Figure3.3, ELF format consists of several sections and sym- bol tables containing information for loading procedures into native memory space by a linker. For example, a section named .text keeps executable instructions and .dynsym has a function list exposed to other libraries. Typically, to reduce the final ELF file size, sections not needed for execution may be removed when packaging the binary. How- ever, the removed sections may contain information necessary for debugging or code recovery, even if they are not actually useful for the program execution. For example, .symtab section stores information regarding local functions, such as size, start address and type. Thus, without .symtab it will be hard to find start and end addresses of the function location within the binary. Assets: Asset directory contains application’s raw files, such as sound and data files. Control File: AndroidManifest.xml keeps essential information about the application, such as a package name, permissions needed to run that application, and components in the application such as services supported in the application. 43 main() …… …… …… Call Graph Converted by Dexpler Generated with VTA Entry points Generated by Analyzer Dalvik Executable (DEX) Native Libraries Native Method (Jimple) Sequence of Native Method Invocations APK ① ② ③ ④⑧ ⑤ ⑥ ⑦ Native Analyzer Soot framework Figure 3.4: Process of Static Analysis 3.2 Comprehensive Static Analysis for Legacy Android Application Recall that FREEME’s goal is to offload method calls from within the unmodified Android applications. As such the first goal is to understand what method calls are available for potential offloading within an APK file. In this section, we describe the approach used in FREEME to analyze both Java and user-defined native methods in Android apps. We first describe the approach we developed for analyzing the Java methods and then describe the more complex process of analyzing user-defined native methods. To exam- ine and handle Java methods, we use Soot framework [38] that is designed for opti- mizing Java bytecode. Figure 3.4 depicts the process of static analysis method for an Android package file. The details of the process are walked through in the following subsections. 44 3.2.1 Call Graph Construction To identify connectivity between methods in the target Android app, the FREEME ana- lyzer builds a call graph for the app using Soot with Variable-Type Analysis (VTA) [39]. At its core, Soot translates DEX formatted files to Jimple representation [40] and tra- verses the code to generate a call graph. However, Soot is developed for analyzing Java applications that run on traditional desktop computing environments. Hence there are several limitations in Soot that need to be overcome before it can be used for Android app analysis. Challenge #1: Call graph root identification Soot builds the call graph dependencies starting from the main method in a Java appli- cation as the call graph root. However, Android apps do not have such an explicit main method. To solve this challenge, the analyzer first identifies the main activity that is invoked in the Android app. As described earlier, an Android app is usually packaged as an APK file. An Android app starts execution with a class inherited from the Activ- ity class. The AndroidManifest.xml file specifies what is the starting activity for an app and what are all the other activities that the app can perform over its lifetime. Each of these activities goes through state transitions in a life cycle of the Activity class as shown in Figure 3.5. An activity is first created by calling onCreate() method and this activity moves through various transitions such as onStart(), onResume(), onPause(), onRestart(), onStop(), and eventually enters the destroyed state through the onDestory() method. 45 Created Started (visible) Resumed (visible) Paused (partially visible) Stopped (hidden) Destroyed Start onCreate() onRestart() onStart() onStart() onResume() onResume() onPause() onStop() onDestroy() Figure 3.5: Activity Lifecycle redrawn from [1] The analyzer creates a dummy main method for the app. The dummy main method will in turn call each of the activity state methods, starting with the main activity that triggers the app execution as specified in AndroidManifest.xml. Thus, the dummy main method will have a sequence of calls for ActivityXX.onCreate() ... ActivitiyXY.onDestroy(). The purpose of this dummy main method is to transform the Android app into a Java appli- cation with a sequence of method calls to all the activities that are performed in the Android app. The Soot framework can then start with the dummy main function and follows each activity’s call sequence and traces all the functions that are invoked within each activity, essentially building the call graph for each of the activity. However, some methods are not reachable in the call graph since they are not triggered by a traditional callee-caller relationship. For instance, methods defined within callback functions or listeners that are called only when the corresponding system event occurs such as clicking a button or tapping a screen. Similarly, run() method of a class inherited from Runnable interface will not be included in the call graph. Thus, the FREEME analyzer inspects and modifies the function body of all Java methods in the app by inserting a set of instructions to manually invoke the implicit methods that may be called by such event triggered methods, before entering the call graph construction step, so that none of potential executable method will be missed during the call graph construction. 46 Other app components such as Service, Content provider, and Broadcast receiver are also handled in the same manner. Challenge #2: Handling native methods The above described approach works well to generate call graphs for Java methods within an Android app. However, app developers are increasingly using Native Develop- ment Kit (NDK) to develop native methods which are then compiled to a shared library. These native methods can be invoked by Java methods via Java Native Interface (JNI). Due to their performance advantages, many apps now have native methods. Once a Java method within the call graph invokes a native method, Soot is unable to trace native methods, because Soot cannot translate machine code into a high level intermediate source code representation that is amenable for call graph analysis. One of the funda- mental contributions of FREEME is to augment the Soot framework so as to traverse the native methods to complete the call graph building process. Note that native methods access resources within the Dalvik virtual machine through parameters, such as JNIEnv (JNI function pointer). When native methods store data in native memory such data is accessible only to other native methods, but not for Java methods. Thus, native methods and Java methods are intricately inter-twined in mobile applications. Rather than analyzing native methods separately from Java methods, we propose to develop a unified static analysis where native methods are inspected in the same way as Java methods to build the call graph using Soot. Note that existing ARM binary static 47 analyzers, such as BAP [41] and Vine [42] do not support ARM/Thumb mixed instruc- tions that are prevalently used in Android native methods. Due to these challenges, most prior work [43–45] ignores native methods for static analysis. Detecting system and library function calls, e.g., malloc, free, and ioctl, within a native procedure using a static analysis method is fairly easy, since they are identified by a fixed address mapped with a unique name defined in .plt section. However, a JNI method call is difficult to identify with a static analysis method, because it is likely to appear as an indirect branch where an architected register may store the destination address and that address is will be determined at runtime. Thus, during static analysis the target address of an indirect branch is unknown. Nevertheless, a JNI method call still can be detected under specific constraints, such as those based on function pointers and its offset. Our approach to analyzing native methods can be best described using a simple illustra- tive example. Figure 3.6 describes how JNI methods can be detected using only static analysis.À is a regular Java method that returns a string by concatenating Hello and the given input parameter string. Á does the same operation as a native method.  shows the C++ JNI code equivalent toÀ; note that every JNI method has a string “Java ” in the function name. The native code in turn calls multiple functions, GetStringUTFChars, NewStringUTF , and ReleaseStringUTFChars.Ä is disassembled code ofÂ.à is trace of registers at a certain address. Note that parameter types of the method can be obtained from the definition of the native method,Á, with Soot. In our example code in each of our target function’s virtual address is marked. For instance, GetStringUTFChars is located at address offset of 0x2A4 from JNIEnv. It is the responsibility of the analyzer to locate the branch to subroutine instruction (blx:branch and link instruction in ARM) whose target address is at an offset of 0x2A4 from JNIEnv. In this example ARM binary 48 register r3 is as used to store target address of blx instruction and register r0, r1 and r2 are used to store the parameters. The analyzer tracks r3 value for each blx instruction to identify the virtual address of any subroutine. For instance, inÄ the r3 value is loaded with 169 at c96 and then left shifted by 2 at c98. Then r3 is loaded with an effective address of 0x2A4 (169*4) at c9a. Hence, the analyzer can reconcile the information it has obtained from jni.h to recognize the first blx located at address ca2 is going to branch to a subroutine that is present at address 0x2A4. Note that jni.h is part of NDK and it contains definitions of JNI function pointers to identify each function’s address offset. In addition, the analyzer can also track the values of r0, r1, r2 registers to iden- tify the input parameters and how they are consumed.à is trace of register values at the beginning of each blx instruction in the binary code. Note that generally a JNI function is appeared as an indirect branch within code as above. Thus, it cannot be identified by static binary analysis alone without parameter information given by a Java static analysis tool, such as Soot. Thus, JNI method calls can be identified by first identifying the branch to subroutine instruction and then tracking the definition of the register that stores the target address and the input parameters to the function as follows: R0 has the address of JNIEnv given by a method parameter or global memory space initialized by JNI OnLoad() function at library loading time. R1 has the address of the second parameter of the target method. It is usually a java class or a java object type. The register that stores the branch target instruction has the offset value within JNIEnv to the target native method. 49 C++ JAVA Public sta+c String getHello(String str) { return “Hello ” + str; } public sta+c na+ve String getHello(String str); jstring JNICALL Java_HelloWorld_getHello(JNIEnv *env, jclass, jstring jstr) { jstring retval; char *hello = "Hello "; const char *str = env-‐>GetStringUTFChars(jstr, NULL); char *pRetStr = (char*)malloc(strlen(hello) + strlen(str) + 1); strcpy(pRetStr, hello); strcat(pRetStr, str); retval = env-‐>NewStringUTF(pRetStr); env-‐>ReleaseStringUTFChars(jstr, str); free(pRetStr); return retval; } Addr R0 R1 R2 R3 0xc90 JNIEnv class jstr -‐ 0xca2 JNIEnv jstr 0 JNIEnv+0x2A4 0xcce JNIEnv pRetStr JNIEnv JNIEnv+0x29C 0xce0 JNIEnv jstr str JNIEnv+0x2A8 00000c90 <Java_HelloWorld_getHello>: c90: b5f7 push {r0, r1, r2, r4, r5, r6, r7, lr} c92: 1c17 adds r7, r2, #0 c94: 6802 ldr r2, [r0, #0] c96: 23a9 movs r3, #169 c98: 009b lsls r3, r3, #2 c9a: 58d3 ldr r3, [r2, r3] c9c: 1c39 adds r1, r7, #0 c9e: 2200 movs r2, #0 ca0: 1c04 adds r4, r0, #0 ca2: 4798 blx r3 ...... cc4: 23a7 movs r3, #167 cc6: 009b lsls r3, r3, #2 cc8: 1c29 adds r1, r5, #0 cca: 58d3 ldr r3, [r2, r3] ccc: 1c20 adds r0, r4, #0 cce: 4798 blx r3 cd0: 6822 ldr r2, [r4, #0] cd2: 23aa movs r3, #170 cd4: 009b lsls r3, r3, #2 cd6: 58d3 ldr r3, [r2, r3] cd8: 1c39 adds r1, r7, #0 cda: 1c32 adds r2, r6, #0 cdc: 9001 str r0, [sp, #4] cde: 1c20 adds r0, r4, #0 ce0: 4798 blx r3 .….. cea: bdfe pop {r1, r2, r3, r4, r5, r6, r7, pc} ① ② ③ 169*4=0x2A4 167*4=0x29C 170*4=0x2A8 Registers 0x2A4 0x29C 0x2A8 ③ ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ ⑬ ⑭ ⑮ ⑯ ⑰ ⑱ ⑳ ⑲ ④ ⑤ Figure 3.6: Detecting JNI Method Calls In the next few subsections, we present how the above illustrated approach is imple- mented within the FREEME call graph analyzer. ARM/Thumb mixed ISA code recovery Instructions for procedures are located in .text section of ELF file format without any boundary tag. Instead, ELF format has two tables named .dynsym and .symtab to keep the information of exported and local procedures. In the ARM platform, a procedure is compiled to ARM or Thumb instruction set, and these two different instruction sets can coexist with the same library file. Note that Thumb instruction is a more restrictive and hence a compact version of the full ARM instruction set. The information maintained in .dynsym and .symtab also indicate the ISA used for each method. However, .symtab and other debug information are stripped out by default at compile time, since they are not necessary for execution. 50 NaSve Library 1 (N1,P1, P2, …) NaSve Library 2 (N2, …) 0x400 Global Memory Abstract Memory Space D2 N1 N2 P1 P4 P2 P3 Procedure Call Graph Stack for N1 Ⓓ Dalvik Method Ⓝ NaSve Method Ⓟ Procedure Dalvik Memory Space NaSve Memory Space .plt .bss .data .text .rodata …. 0x400 +S 1 0x400 +S 1 +S 2 0x400 + Sx x=1 n ∑ Stack D1 0xEFFFFFFF Register NaSve Library 2 (P3, P4, …) … R15 R0 R1 Figure 3.7: Abstract memory space Thus, the FREEME analyzer uses the industry standard IDA Pro Disassembler [46] developed by Hex-Rays to recover .symtab as shown in of Figure 3.10. Although, Chen [47] proposed an algorithm to discover ARM/Thumb mixed ISA from binary code, its primary purpose is to just discover the instruction set used within the binary and it does not describe how to detect the start and end of native methods. Note that objdump(version 2.24.90) [48], one of the most popular disassembler, does not support ARMv7 instruction set. Hence, we relied on IDA Pro that can identify local functions from stripped binary code using reverse-engineer algorithms, such as pattern matching and Fast Library Identification and Recognition Technology [49]. It then, regenerates .symtab with an address of the local function. We refer the reader to the prior work [49] to learn more details of the reverse engineering algorithms, which are outside the scope of this thesis. Abstract Interpretation Once the starting and ending address of each method is identified by reconstruction the .symtab the next step is to identify the target address of a branch to subroutine address. 51 The target address is then used to search in the symbol table to identify the method that is being called by that branch to subroutine instruction. As illustrated in the earlier example to identify the target address of a branch instruction it may be necessary to track how the target address register is computed in the code. For this purpose we will rely on abstract interpretation, which essentially mimics how the code would be executed on a real machine, but within the static call graph analyzer. We implemented an abstract interpreter to trace values within a native memory space inspired by Jakstab [50]. The interpreter has an abstract memory space shown as Fig- ure 3.7. There are three different types of memory spaces that a program may access: architected register file, stack memory, and global memory space. Typically on entering a new method several architected registers may be first saved on the caller stack and then these registers are reset. These registers may be used by the callee for its internal com- putations. Also a new stack frame is allocated at the start of each method call so that the method can use the stack memory to store its parameters and local variables. The global memory on the other hand is persistent across method calls and may be modified by any method. For instance, Figure 3.7 depicts the global memory space when the native method N1 is being processed along with three libraries used by the native method. Note that a stand-alone application has only one entry point defined in .init, such as main(), whereas in JNI library, every native method can be regarded as an entry point. When a native method is invoked by a Java method, the analyzer collects parameter types or values from Java methods using Soot framework. Note that the caller function here is the Java method and hence Soot can identify all the parameters that are being sent to the native method. The FREEME’s abstract interpreter initializes registers and the stack memory with the identified parameter values. For instance, in an ARM based platform, 52 Ini\alize Class Procedures listed in clinit() START Library Loaded? Yes Class Ini\alized? Yes No Class Ini\alized Load Library Procedures listed in [.init_array] JNI_OnLoad() Library Loaded Ini\alize Registers & Stack Na\ve Func\on END No Figure 3.8: Initialization sequence first four registers, r0 to r3 are used for passing parameters and the fifth parameter and above are stored into a stack in descending order. The r0 register is used for storing the return value from the callee back to the caller. The abstract interpreter loads libraries related to the the target procedure into the global memory space and initializes the global memory space. Note that, in a few cases, where an app loads libraries from a directory other than “lib” using its own custom approach instead of relying on the traditional loadLibrary() call, the analyzer will fail to load such libraries. For example, “Facebook app” has libraries in a directory named “raw” and loads them using their own custom methods. In such cases FREEME will not be able track these native methods. Figure 3.8 shows the memory initialization steps. Initialize memory: When a library file is loaded typically several procedures must be executed first. The first procedure is initialization routine listed in .init array section of the library file. The second one is a static function named 53 JNI OnLoad() that is generally used to retrieve JNIEnv, the environment variable for the java native interfaces. These environment variables are stored with a global variable. The last step is to initialize native method calls within clinit(), if they are exist. Note that clinit() is a static method generated at compile time, and called by JVM after class loading for initializing static variables in a Java class. After initialing the global memory, FREEME begins the process of abstract interpreta- tion, where the interpreter executes every instruction within the target native method that is necessary to track the call graph. As such it only executes instruction that are used in address calculations for a branch to subroutine such as, e.g., add, sub, mov, push, pop. In particular, the interpreter ignores many instruction categories which are usually not part of the address calculations, such as floating-point instructions. In addition, the inter- preter emulates well known system calls and library functions that are usually related with memory management, such as malloc, free, and memcpy. During interpreting, all global variables including dynamically allocated memory are tracked, since they are needed to be migrated to a server for computation offloading. For non-deterministic con- ditional statements, the interpreter examines all possible cases by going down different paths of the conditional statements. To handle unpredictable values for a forward data- flow analysis, the interpreter uses z3 Satisfiability Modulo Theories(SMT) Solver [51] where unknown values are treated as Boolean expressions and BitVectors [52]. Spe- cially for conditional statement, several constrains can be added to an unknown value to make the statement True or False. Figure 3.9 illustrates how the interpreter deals with conditional statement execution. There are two functions, cal and test. cal is invoked by test when the input parameter i only if greater than 10. Since i value unknown at compile time, the interpreter traces 54 int cal(inti) { return i+20; } JNIEXPORT jintJNICALL test (JNIEnv*env, jobject obj, jint i) { if(i>10) { i= cal(i); } return i; } 0D14 EXPORT _Z3cali 0D14 _Z3cali 0D14 ADDS R0, #0x14 0D16 BX LR 0D18 test 0D18 PUSH {R3,LR} 0D1A SUBS R0, R2, #0 0D1C CMP R0, #0xA 0D1E BLE locret_D24 0D20 BL _Z3cali 0D24 locret_D24 0D24 POP {R3,PC} parameter i START mkBVLE(R0, 10) mkBVGT(R0, 10) I <= 10 i<= 10 YES ① ② 0x0D1E RETURN NO i = cal(i) NO YES 0x0D1E START RETURN i = cal(i) i <= 10 (a) (b) (c) (d) Figure 3.9: Example for Z3 Solver: (a) Source code (b) ARM Thumb ASM (c) Flowchart of function test (d) Flowchart of function test with Z3 constrains both the YES and NO cases at 0x01E of Figure 3.9(c). Therefore, a Boolean constrain,À orÁ is assigned to i instead of a fixed value before interpreting 0x0D1C as shown in Figure 3.9(d), so that following instructions of each case deals with the same constraint for value i. Note that the interpreter executes an instruction only when the result of z3 solver for all constrains related with operands is true. Trace global variables: Since global variables within a library can be shared by multiple procedures and modified at anytime, they should be handled carefully. .bss section in ELF format keeps all addresses for global variables. Thus, the analyzer tags a value when it is stored into any address listed in .bss, so that the analyzer is aware of the required memory space for each native method. For dynamically allocated memory, indirect addressing is used where an address in .got indicates an empty space with “0” within .bss and the empty space will be replaced with an address of dynamic allocated memory. Therefore, analyzer will mark a return address of a memory related function if the function’s target address is related with .bss or .got. 55 Native Analyzer … R15 R0 R1 Library1 Library2 … Stack extracted binary DEX method info resource info instructions symbol table .got, .bss, .data, .rodata, … interpreter IDA Pro Library File Soot APK File ① ② ③ ④ Figure 3.10: Static Data Flow Use of static values: Many JNI method calls also use parameters that are known at compile time. There are several sources of static or initial value variables from the APK file as shown in Figure 3.10. For example,À If a parameter is a numeric value and it is a resource id then an actual value of the resource can be retrieved from resource files within the APK file. Á When a method takes static values as a parameters, such as String or Number, the actual values can be acquired directly from Soot or raw files instead of their types. à When a procedure loads a data from an address in range of .rodata section, the data will be read from .rodata section by indirect addressing. Thus, the analyzer can reduce uncertainty by relying on static values when possible. Generate Jimple Code In the final step, the analyzer generates Jimple code related JNI method calls as shown inÆ of Figure 3.4. A empty Jimple body of each native method will be filled out with 56 Jimple code reflected from detected JNI method calls, so that Soot handles a native method as a ordinary Java method. Thus, the call graph construction is performed more than two times to generate the input data for the native analyzer and the final call graph, which are denoted byà andÄ in Figure 3.4 respectively. The steps fromà toÇ will be performed repeatedly until all user-defined native methods within the call graph are converted to Jimple code. Limitation Although, the abstract interpreter can efficiently generate Jimple code, sometimes it is not possible to accurately do the interpretation do to the following limitations: Multiple assigned global pointer values: When a global variable is a pointer type and it is assigned multiple times with a different type of value or structure each time, the interpreter cannot determine a type of the global value and will fail to produce meaningful result. Random method invocations: Sometimes, it is hard to determine a right order of method invocation using a Control Flow Graph(CFG), specially when the method calls are enclosed within conditional statements. In this case, some values will be referenced without initialization and it will generate wrong results. A large number of method parameters: Every parameter value that cannot be statically determined must be handled the z3 solver. Thus, the more parameter values are unknown, the longer the z3 solver will need to analyze the code with all possible values for the parameter values. As such the analysis time can increase exponentially making it impractical complete the code analysis. 57 Encrypted files: To protect files from unauthorized manipulation, some apps have encrypted library files and they are loaded by its own method at runtime. In this case, the analyzer performs static analysis for DEX files only. 3.3 Dynamic Analysis Strategy It should be clear that the static analysis process described above is not completely fail proof. There are instances when abstract interpreter is unable to follow the code execution paths due to uncertainty in the values being used during interpretation step. Hence, FREEME relies on a dynamic analysis phase to detect call graph sequences that may go undetected in a static analysis phase. The dynamic analysis strategy relies on essentially running the application with a set of pre-defined input and event sequences. As shown in Figure 3.11, the dynamic analysis system has three main components, Monkey [53], ftrace and android.os.Debug. Monkey is a command-line tool that mimics the execution of an application by an end-user by sending a set of android system user events from diverse sources including randomly generated values to the application execution. This tool is part of the Android OS and is typically used by developers to automate their application testing. We modified Mon- key to handle android.os.Debug class and ftrace that trace Java method calls and user- defined native method calls at runtime. 58 Android Runtime Activity Event Touch Event Motion Event Trackball Event Key Event Activity Manager Service Zygote Android Monkey Activity Thread IDA Pro (disassembler) ftrace (detect library and system call) Debug startActivityAndWait() exec() library symbol table Process.start() fork() startMethodTracing() New Activity start method trace info libraries Kernel system call JNI trap breakpoints ① ② ③ ④ ⑤ ⑥ ⑦ Figure 3.11: Process of Dynamic Analysis 3.3.1 Trace Method Calls Figure 3.11 depicts the process of tracing the method calls using the dynamic analysis approach.À Monkey executes ftrace with app’s name.Á ftrace retrieves symbol tables of native libraries within the target application from IDA Pro and starts monitoring sys- tem calls invoked by Zygote. Monkey then sends an Intent to Activity Manager Service using startActivityAndWait() to start the target application. à Zygote forks a new pro- cess and then replaces its name with application’s name. Ä At this time, ftrace could catch the fork() and a process id of the new process and it then starts monitoring for the new process with breakpoints generated from the symbol table. The breakpoints gener- ated from the symbol table essentially trigger a break point on each method invocation. Å android.os.Debug class sets a profile flag then Dalvik reports a method information to android.os.Debug class when it starts and finishes a method execution. 59 Zygote ftrace fork() Zygote’ Zygote’ Zygote’ … Zygote’ Zygote’ pid=1001 pid=1002 pid=1003 prctl(PR_SET_NAME ⋯) Activity Thread matched name pid=1002 ⋯ ① ② ③ Kernel ptrace_attach() signal Figure 3.12: Using ftrace to trace method calls 3.3.2 Trace Native Method Calls ftrace is an extended version of ltrace [54] that uses ptrace [55] function provided by kernel for debugging. Once ftrace has been registered at a kernel using ptrace attach, it receives signals when a kernel meets a system call or a debug point, so that ftrace can trace procedure calls inside the native libraries. Figure 3.12 shows how ftrace works. À To start monitoring the app right before staring the target activity, ftrace waits on a signal for fork() from Zygote. Á It then catches a renaming process perform by prctl and compares given application’s name with the second parameter of prctl. If the two names are matched to each other, ftrace keeps monitoring the process. Otherwise, the process will be ignored. In essence ftrace is able to capture the invocation of the appli- cation execution that we are interested in analyzing while ignoring all other applications that may be executed. Once the application of interest is tagged for monitoring by ftrace it then has to capture JNI method call invocations. 60 … …. 0x2a008e78 JNIInvokeInterface* … … 0x2a008e88 0x2a0fd8d0(JNIEnv*) … … 0x2a0fd8d0 JNINativeInterface* 0x2a0fd8d4 0x4073b119(function*) … … … … 0x4073b129 GetVersion() 0x4073b0fd DefineClass() … …. … …. 0x4079ded8 gDvmJni … … 0x4079dee0 0x2a008e78(JavaVM*) ptrace(PTRACE_PEEKTEXT, pid, address….) ❶ ❷ ❸ ❺ heap .text .bss libdvm.so(0x406F2000) struct JavaVMExt struct DvmJniGlobals struct JNINativeInterface struct JNIEnvExt ❹ Figure 3.13: Trace JNI functions Capture JNI Method Calls To trace JNI calls, all addresses of JNI functions in libdvm.so needed to be registered to kernel as debugging points using ptrace, so that whenever JNI function call is invoked, ftrace gets a signal from a kernel. Note that the debugging points of all jni methods should be enabled only when user-defined native methods are executed, because JNI functions may be used by Java internal methods as well. However, it is not easy to figure out addresses of JNI functions, since they are defined as static methods which are not listed in symbol tables of ELF formatted library file. Instead, a global variable named gDvmJni in libdvm.so can be used to find JNIEnv that keeps all function pointers of JNI functions. Note that usually JNIEnv is passed to JNI methods as the first parameter. Figure 3.13 illustrates how to trace JNIEnv. In this example, libdvm.so is loaded at address of 0x406F2000. Since gDvmJni is a global variable, it is assigned at .bss sec- tion of ELF formatted file and listed in a dynamic symbol table.¶ ftrace figures out an address of gDvmJni from libdvm.so then adjusts it with the base address of libdvm.so 61 on a memory. ·According to a structure of DvmJniGlobals, gDvmJni + 8 indicates an address of JavaVMExt structure created in a heap memory space. ¸ a structure of JavaVMExt has a pointer of JNIEnv. ¹ The second values of JNIEnv structure indi- cates JNINativeInterface that keeps all function pointers of JNI functions. º Finally, ftrace gets actual addresses of JNI functions. It then register them to a kernel using ptrace. Note that since libdvm.so belongs to the target application, ftrace cannot access a memory space of the library without ptrace(PTRACE PEEKTEXT, pid...). 3.3.3 Input Sources for Dynamic Profiling To perform dynamic analysis, the target application needs to be executed with a set of input events, such as touching screen and clicking buttons. Monkey takes several input sources including automatically generated values with a pseudo random function and a script file that has a set of commands that are used by that application. Probabilities for events can be adjusted by parameters. Specially, using the same value of the seed will generate the same sequence of events for each trial. 3.4 Evaluation In this section, we shows results from static and dynamic analysis of hundreds apps downloaded from the top free application charts of Google Play. The static and dynamic analysis system is implemented with Java and all experiments performed on an Ubuntu Linux server that has a quad-core Intel Xeon 2.5GHz CPU and 16GB of memory. For 62 dynamic analysis, Android 4.1.2 running on an ARM emulator without an user interface is used. 3.4.1 Fundamental App Data Categories We categorized all the apps into 22 categories as shown in Figure 3.14(a). Interestingly, Games are the most popular category of top free apps that constitute 47% of the total apps that we analyzed. File Size Figure 3.14(b) presents a histogram of file sizes. Typically, games tend to have a large file size due to resource files, such as graphic and multimedia files. Average file size of the all apps is around 20MB. Excluding games, the average file size is around 12MB. 7% of apps have extension files and all these apps belong to the GAME category, since Android has a package size limitation where an app cannot exceed 50MB. Extension files is one way to get around this file size limitation and games seem to routinely use this approach. MultiDEX 11% of apps have multiple DEX files since they contain more than 65,536 methods as mentioned in Section 3.1.2. 63 1.40% 0.60% 4.60% 0.80% 6.80% 1.40% 46.60% 1.00% 2.00% 1.40% 3.60% 0.20% 1.40% 2.80% 4.20% 4.00% 5.60% 1.00% 7.40% 0.40% 1.40% 1.40% Books & Reference Business Communication Education Entertainment Finance Game Health & Fitness Lifestyle Media & Video Music & Audio News & Magazines Personalization Photography Productivity Shopping Social Sports Tools Transportation Travel & Local Weather (a) Percentage of Apps in Category 0 5 10 15 20 25 30 35 2 10 18 26 34 42 50 Frequency File Size (MB) Education Health & Fitness News & Mangines Business Personalization Sports Finance Weather Books & Reference Media & Video Travle & Local Photography Lifestyle Shopping Productivuty Music & Audio Entertainment Tools Social Communication Game (b) Histogram of File Size Figure 3.14: Categories of Apps 64 43% 67% 78% 50% 65% 71% 92% 40% 30% 86% 61% 100% 43% 93% 62% 45% 54% 100% 62% 50% 71% 14% Books & Reference Business Communication Education Entertainment Finance Game Health & Fitness Lifestyle Media & Video Music & Audio News & Magazines Personalization Photography Productivity Shopping Social Sports Tools Transportation Travel & Local Weather (a) Percentage of NDK Library in Category ARM v5 ARM v7a x86 17% 7% 20% 2% 0% 23% 6% 25% (b) Supported ISA Figure 3.15: NDK Library Native Libraries Figure 3.15(a) shows percentage of apps containing JNI libraries. 75% of the apps contain native methods. Specially, more than 90% of games are developed with NDK. Figure 3.15(b) shows supported ISA by apps. Given the dominance of ARM ISA in 65 mobile devices it is not surprising that none of apps are developed just for x86 ISA only. 25% of the apps support both of ARM and x86 ISA. 3.4.2 Sample Applications For static and dynamic analysis, we select 50 apps based on category and popularity. Since the test bed is using a headless (no screen) Android on an emulator, apps that have 3D features are excluded from the list. Table 3.1 shows detail information of the sample apps that were analyzed in detail. 3.4.3 Call graph Construction We generated call graphs for the all target application using static and dynamic analysis. For the dynamic analysis, every experiment is performed with the same seed(100), and one-second interval. Table 3.2 shows the results of static, and the dynamic analysis with 200 of random input events. Note that more random input events can be generated for testing which increases our analysis time. Due to the computational resource limitations we limited the number of random input events to 200. Note that dynamic analysis has the limitation that only the methods that are triggered on the random input events are captured. Hence, in general dynamic analysis has lower cov- erage of the method call graphs compared to static analysis. However, dynamic analysis is good for identifying commonly accessed methods and also those method calls that evade the static analyzer due to the failure of abstract interpreter while tracking the code execution. Moreover, static analysis fails to construct a call graph in some cases because 66 # Package Size Category NDK Library DEX x86 ARM #of Class #of Method #of Files v5 v7a Java Native 1 com.google.android.chess 0.4M Game 42 427 0 1 2 com.pandora.android 8.6M Music & Audio 7045 50639 0 1 3 com.snapchat.android 9.9M Social x x 7104 43994 22 1 4 com.surpax.ledflashlight.panel 5.0M Productivity x 6709 41224 10 1 5 com.netflix.mediaclient 12.6M Entertainment x 7485 49316 40 1 6 kik.android 13.2M Communication x x x 6809 32847 1 1 7 com.amazon.mShop.android.shopping 23.3M Shopping x 15749 96513 714 5 8 net.zedge.android 5.7M Personalization 7130 44364 0 2 9 com.weather.Weather 21.5M Weather 8242 53870 0 1 10 com.cleanmaster.security 6.3M Tools x 5690 38064 70 1 11 com.ubercab 14.4M Transportation x x x 6919 45029 109 1 12 com.yelp.android 14.3M Travel & Local x x x 6573 41527 11 1 13 com.cheerfulinc.flipagram 19.4M Photography x 7311 49126 7 1 14 com.chase.sig.android 7.7M Finance x x x 1887 11432 5 1 15 org.wikipedia 3.2M Books & Reference 2178 15480 0 1 16 com.nbadigital.gametimelite 25.6M Sports x x 4117 31834 87 1 17 com.myfitnesspal.android 21.4M Health & Fitness x x x 8526 52815 21 1 18 com.indeed.android.jobsearch 2.4M Business 4532 33732 0 1 19 com.zillow.android.zillowmap 7.7M Lifestyle 5686 49396 0 1 20 com.quvideo.xiaoying 24.5M Media & Video x 4623 36753 296 1 21 org.pbskids.video 10.4M Education 5301 35131 0 1 22 com.foxnews.foxnewselection 19.8M News & Magazines x x 8735 54356 131 1 23 byrne.fractal 0.07M Entertainment x x 30 189 20 1 24 com.greenecomputing.linpackpro 0.16M Tools 231 1399 0 1 25 com.passmark.pt mobile 3.1M Tools 91 424 0 1 26 uk.co.aifactory.chess 9.3M Game x x x 3919 24328 26 1 27 com.facebook.orca 18.09M Communication x 21092 122088 225 4 28 com.prettysimple.criminalcaseandroid 40.47M Game x 3109 17869 59 1 29 com.instagram.android 7.92M Social x 5991 34534 39 1 30 com.cleanmaster.mguard 13.63M Tools x 13003 81081 84 4 31 com.spotify.music 19.14M Music & Audio x 8747 50184 63 1 32 com.jb.emoji.gokeyboard 11.26M Tools x x x 4536 32011 26 1 33 com.whatsapp 14.6M Communication x 4058 28791 55 1 34 com.yodo1.crossyroad 28.54M Game x x 7139 43579 60 1 35 com.pinterest 15.66M Social x x x 7217 51181 355 1 36 com.kiloo.subwaysurf 42.42M Game x 6351 39610 71 1 37 com.wallapop 9.25M Shopping 6137 43889 1 1 38 com.apusapps.launcher 2.56M Personalization x x 2410 15488 3 2 39 com.machinezone.gow 46.65M Game x 5652 36103 38 1 40 com.qisiemoji.inputmethod 16.06M Tools x 1827 11464 22 1 41 com.dianxinos.dxbs 6.18M Productivity 3863 23508 0 1 42 com.emoji.coolkeyboard 19.37M Productivity x 4439 28506 64 1 43 com.yahoo.mobile.client.android.mail 15.08M Communication x x x 6239 36084 64 1 44 com.dianxinos.optimizer.duplay 8.69M Tools x x x 3138 17540 11 1 45 com.bigduckgames.flow 7.89M Game x 7012 43119 81 1 46 com.tp.android.surgerysim 18.75M Game x 5057 30257 23 1 47 com.sec.android.easyMover 47.89M Tools x 5869 44488 30 1 48 com.pinkpointer.wordsearch 9.5M Game 3984 23771 0 1 49 com.ebay.mobile 12.99M Shopping x x x 8305 47827 17 1 50 com.google.android.keep 9.94M Productivity 5345 34873 0 2 Table 3.1: Selected Applications of lack of free memory or unspecified errors. Thus, both static and dynamic approaches have complementary objectives. Static analysis captures most of the call graph while dynamic approach may capture the commonly used and difficult to track methods. In 67 # Static Analysis Dynamic Analysis (200 Events) Login Req. #of Class #of Method #of Class #of Method Total DEX Total(Native) DEX(Native) Total DEX Total(Native) DEX(Native) 1 102 40 536(9) 300(0) 444 11 3173(249) 84(0) 2 3465 3041 16943(71) 14760(0) 648 231 2492(156) 636(0) x 3 2666 2279 12426(73) 10682(3) 1011 388 4348(212) 1126(0) x 4 3224 2852 15874(74) 13973(10) 904 176 4104(247) 529(0) 5 2168 1859 9400(67) 7995(22) 681 207 2880(200) 395(3) x 6 2487 2169 11655(50) 10163(1) 504 225 1552(104) 444(0) x 7 8414 7912 44391(212) 41791(134) 929 340 3972(218) 980(0) 8 2678 2319 11098(48) 9348(0) 602 50 3066(227) 163(0) 9 3931 3472 19614(64) 17410(0) 556 266 1979(125) 750(0) 10 3299 2915 15504(77) 13795(12) 814 216 4225(291) 671(0) 11 2310 1890 9967(71) 8129(16) 735 383 2429(139) 933(0) x 12 4016 3579 18971(73) 16783(8) 789 271 3344(194) 797(0) 13 2761 2389 14288(55) 12321(0) 842 251 3603(206) 696(0) x 14 1483 1202 6470(39) 5229(4) 241 15 1035(104) 38(2) x 15 1289 1011 5887(34) 4625(0) 1235 415 6329(315) 1458(0) 16 2028 1724 9893(50) 8559(16) 370 113 1370(114) 265(0) 17 4808 4383 22380(88) 20182(16) 832 347 3426(226) 759(0) x 18 902 703 4102(18) 3246(0) 793 84 4485(287) 285(0) 19 3389 3010 22445(46) 20685(0) 919 318 3791(227) 788(0) 20 2704 2296 14614(175) 12780(111) 698 93 3667(262) 266(1) 21 931 740 4108(23) 3273(0) 560 59 2938(221) 148(0) 22 3974 3560 17393(51) 15315(1) 768 398 2714(148) 1000(0) 23 66 21 288(31) 150(20) 234 18 1316(62) 117(14) 24 288 150 1056(19) 659(0) 272 19 1412(52) 52(0) 25 182 68 581(19) 252(0) 284 13 1569(55) 44(0) 26 513 379 2038(48) 1496(30) 329 107 1606(65) 373(11) 27 13865 13319 65997(151) 62829(80) 702 360 1976(159) 465(6) x 28 834 680 2769(57) 2131(47) 677 149 3659(239) 472(3) 29 - - - - 1033 259 4754(300) 234(0) x 30 - - - - 623 109 3046(222) 157(0) 31 5930 5421 28849(80) 26003(4) 1311 489 5417(318) 840(0) x 32 - - - - 587 142 2576(185) 453(0) 33 2908 2424 15498(136) 13163(36) 569 110 2305(166) 162(1) x 34 - - - - 800 118 3513(245) 278(8) 35 3626 3218 16579(67) 14475(30) 735 273 2806(170) 622(0) x 36 2985 2605 13166(71) 11365(28) 478 26 2755(233) 65(10) 37 2790 2437 14388(35) 12389(0) 1192 474 4823(260) 1215(0) 38 2104 1710 10280(40) 8434(2) 755 172 3842(291) 305(0) 39 1049 795 4733(38) 3666(21) 562 68 2777(203) 202(10) 40 1190 884 5964(42) 4556(10) 702 132 3041(209) 302(0) 41 - - - - 870 284 2802(216) 101(0) 42 2535 2079 13153(73) 10692(24) 909 274 3593(210) 626(0) 43 4204 3704 20311(71) 17819(26) 689 134 2639(207) 163(0) x 44 2677 2296 12083(45) 10337(7) 925 299 4027(256) 739(0) 45 2187 1861 9065(48) 7562(7) 936 190 4911(280) 665(7) 46 - - - - 999 266 4747(307) 842(2) 47 2682 2286 12772(27) 11093(0) 563 58 3182(235) 149(0) 48 1717 1501 7585(17) 6668(0) 829 306 3380(186) 906(0) 49 3954 3534 20163(40) 18119(4) 942 364 4596(213) 1460(0) x 50 - - - - 798 209 4489(257) 640(0) Table 3.2: Static vs. Dynamic Analysis 68 0 10 20 30 40 50 60 70 80 90 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Method Coverage(%) Application's # 1 5 10 20 100 200 # of inputevents Figure 3.16: Method Coverage with Various number of Input Events general our dynamic approach captured about 9.4% of the method calls compared to the static approach, although the method calls captured by the two approaches are not necessarily the same. Dynamic analysis counts system methods a lot more than static analysis. There are several reasons for this; 1) The static analysis counts system methods in DEX files only, whereas the dynamic analysis counts all system methods during an execution time, even if they are included in the application package. 2) Not all methods in the DEX are used in actual execution. Some of them are implemented for an inherited method or exception handling. 3) There may be a loop formed with few methods and to get out of it, a special sequence of input sets is required, such as authentication. For example, com.snapchat.android starts with a log-in screen and it cannot be skipped without typing a valid email address. Figure 3.16 shows percentages of the method coverage of dynamic analysis compared to a static analysis with different number of random input events. Specially for apps requiring authentication, variation of the method coverage is very small for all number 69 package # of Func(Native Method) # of Indirect Branch # of JNI Calls # of Sys & Lib Calls com.surpax.ledflashlight.panel 58(10) 24 13 75 byrne.fractal 20(20) 2 2 3 com.yelp.android 22(8) 1 1 67 Table 3.3: Native method analysis of the input events, since the random events were not able to continue beyond the log-in screen. 3.4.4 Analysis of Native Method Call We also analyzed user-defined native methods as shown in Tables 3.3. Each pack- age has one native library compiled for ARM ISA. The analyzer could handle indirect branches unlikely other static analyzer, such as IDA Pro, using an abstract interpreter where all registers can be traced with virtual or actual values. A screen shot of the abstract interpreter is depicted Figure 3.17. All parameters extracted from a definition of the target Java native method are handled as a virtual value and it can be represented as logical formulas as shown in the last line on Figure 3.17. A function for a native method can invoke subroutines and JNI calls to access resources of JVM. For example, com.surpax.ledflashlight.panel has 10 functions for native methods and they invoked 48 different subroutines belongs to the same library file. It also invokes 75 system and library functions from a kernel and android libraries. Thus, our native method call anal- ysis approach can vastly increase the reach of call graph analysis. By using the proposed FREEME call graph analyzer a computational offloading system can now target a much richer set of methods that may be targeted for computational offloading. Prior to this work all the previous approaches simply ignored native methods from their analysis, which may lead to sub-optimal offloading decisions. 70 Figure 3.17: Screenshot of Native Analyzer 3.5 Related Work The static analysis of Android application has been intensively studies in security field. Schmidt et al. [56] proposed a framework to detect malware from a set of applications. They basically identify malware by reading the instructions from the application’s Exe- cutable and Linkable Format(ELF) file and comparing the sequence of instructions with the typical sequence of instructions of well known malware. However, the framework simply references a list of system function call of ELF file extracted by readelf rather than analyze binary code of the the file. DroidChecker [57] is a static analyzer that detects the capability exploitable data paths in Android applications. DroidChecker analyzes the decompiled Java code of an application by using interprocedural control 71 flow graph searching and taint checking, but it cannot handle native methods. Droid- Scope [58] is an Android analysis platform that detects malware while tracing the exe- cution of instructions in run time. The capability of native function analysis of Droid- Scope allows a detailed analysis. However, as their method for native function analy- sis can be only applied at run-time, our static analysis is orthogonal to their approach. CloneCloud [10] performs a static analysis with jchord [59] to extract remote executable methods. 72 Chapter 4 Framework for Runtime Energy Efficient MobileApp Execution In the previous chapter we described the call graph generation and analysis process within FREEME. The call graph analyzer determines which method calls may be exe- cuted on a remote server and which methods must be executed only locally on a mobile phone. A method call is remotely executable if it meets all the following requirements: (1) Any method that accesses the underlying hardware-specific function, particularly sensors, I/O devices, cannot be offloaded to a server. (2) A method should not access local files besides resource files within an app package file. In the current implemen- tation of FREEME, an app cannot read or store any files from a file system of mobile phone on a server. (3) A method should not access data from other threads within the system. For instance, when a method in thread T1 interacts with a different thread T2 there are concurrency and data race concerns that prevent relocation of T1 to a server. Once a given method is tagged as remotely executable then sub-graph consisting of all the child nodes of the parent method are also remotely executed. In this chapter we describe how FREEME decides which methods to offload and when to offload computations to the cloud. Recall that in Section 2.9 we described the AEP approach. The motivation for AEP is that the energy efficient operation drifts due to 73 Install Application Choose Application Analyze Application Build FREEME Map Build XML meta file Run Application Invoke Method Phone Server One-Time Registration Application Execution Figure 4.1: A Logical flow of FREEME external conditions and a runtime approach must monitor the current operating condi- tions to decide when it is beneficial to offload. While AEP focuses on method calls that are explicitly tagged for offloading by the programmer, using the AEP’s API, FREEME uses the call graph analysis output to determine which methods to offload. Hence, FREEME is entirely transparent to the programmer. 4.1 FREEME System Architecture 4.1.1 System Overview The overall system architecture of FREEME is shown in Figure 4.1. The figure shows the logical flow of how FREEME operates. The flow can be divided into two steps. 74 Application analysis: There is a one-time registration step that envelopes sev- eral tasks that are executed only once for each chosen app on the mobile phone. To initiate the process of using FREEME, any app installed by the user must be explicitly registered to be executed under the FREEME environment. Once an app is selected, it is then sent to the server where a unified static analysis framework is developed to analyze mobile apps that have a mix of Java and native method calls. The static analysis builds a call graph and identifies all methods that can be potentially executed on the server along with the required data fields for remote execution of each method call. This information is generated into an XML meta- data file and returned back to the mobile phone. Dalvik Virtual Machine(DVM) on the phone creates a FREEME map using the XML metadata received from a sever. The FREEME map identifies all method calls that can be potentially offloaded. Application execution: The second set of tasks are carried out during applica- tion execution. Once the app starts execution, if a remotely executable method is about to be invoked, DVM consults a decision maker module that estimates the energy consumption of local computation versus remote computation. If a deci- sion is made to execute remotely then FREEME service serializes the necessary data fields and invokes remote execution process. Remote execution results are sent back to the mobile phone which then integrates the results into the current execution context. Figure 4.2 depicts FREEME components for Android OS. The main components on the phone side are: FREEME manager, DVM, and FREEME service. On the server side, there are four main components; RPC server, FREEME package analyzer, DVM compiled for x86 Linux without an ARM emulator, and a method database for each 75 RPC Server Method Invocation Runtime Package Analyzer Method Inspector Method Database FREEME Service Profiler Decision Maker RPC Client Stub Dalvik VM (ARM) Android Applications Server Android Phone Profile Database Network Dalvik VM (x86) FREEME Manager Android Linux Kernel Ubuntu Linux Kernel Houdini ARM Binary Stub Figure 4.2: FREEME Structure in Android app. Additionally, Intel Houdini [60] library is used for executing ARM native methods included in an Android package file(APK). We will describe these components in a logical flow that an app typically follows from its installation to execution. 4.1.2 FREEME manager The FREEME manager is an Android app that is used by the end users for registering apps as well as configuring system settings. Since the goal of FREEME is to offload computations of any mobile app to a cloud server, the user has to set the server IP address and a port number where the server is going to receive requests for offloaded computations. In our current implementation, we configured FREEME to execute on a server within our research facility. But we envision in future that this configuration management will be automatically done either by the device vendors or phone service providers. In addition to configuring the IP address the user can also explicitly request which apps should be registered to run within the FREEME environment. By default every app that is downloaded from Google Play [61] will be registered for execution 76 Class C1 { inta(String s) { C2.b(s); C3 obj= new C3(); obj.c(this); C4.d(….); } } Class C2{ static int b(String s) { C5 obj= new C5(); obj.e(….); C6.f(….); } } Class C3 { intc(C1 obj) { Camera c = C7.g(); } } Class C7 { static Camera g() { Camera c = null; try { c = Camera.open(); // get a Camera instance } catch (Exception e){ } return c; // returns null if camera is unavailable } } …. Depth-First Search a b c d e f g h (a→b→e→h→f→c→g→d ) dynamic method static method Figure 4.3: Exploring Method Causality within FREEME. The user may optionally disable specific apps from running within FREEME. 4.1.3 FREEME Package analyzer Any app that is registered will be first sent to FREEME package analyzer on the server. The package analyzer generates the call graphs and uses the call graph to determine which method calls may be executed on a remote server and which methods must be executed only locally on a mobile phone. The call graph analysis has already been described in detail in Chapter 3. 77 a b c d e f g h Call chain with a restricted method Partitioning call chain a b c d e f g h Identifying the large sub-call chains for offloading ①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑳⑲ ① ② ③ ④ ⑤ (a) Call chain with a restricted method a b c d e f g h Call chain with a restricted method Partitioning call chain a b c d e f g h Identifying the large sub-call chains for offloading ①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮⑯⑰⑱⑳⑲ ① ② ③ ④ ⑤ (b) Partitioning call chain Figure 4.4: Identifying the largest sub-call chains for offloading Extract remote executable methods As described earlier, since not all methods can be remotely executed in an legacy Android app, the package analyzer explores all methods inside a call graph generated by the static analysis to eliminate restricted method call chains. The package analyzer uses depth first search(DFS) algorithm to avoid infinite loops during the analysis as shown in Figure 4.3. When the analyzer meets a restricted method call, it partitions the call chain to extracts the largest sub-call chains as depicted in Figure 4.4. For example,À when the analyzer visits the method g in Figure 4.4(a), it marks the method as a restricted method, since it accesses Camera resource that will not be available on a server side.  Thus, the analyzer marks all callers of the restricted method as a restricted method as well.à When the analyzer reaches at the root of the call chain,à it splits child call chains into independent call chains. à It then continues on analyzing the next method call. 78 void m(String) { .... $r0 := @this: Hello; $r1 := @parameter0: String; .... $r2 = virtualinvoke $r2.< ... append(String)>($r1); $r3 = $r0.<Hello: String c>; .... } String m1(String) { ... $r0 := @this: Hello; $r1 := @parameter0: String; ... $r2 = virtualinvoke $r2.< ... append(String)>("Hello"); $r1 = virtualinvoke $r2<…toString()>; $r1 = staticinvoke<…m3(String)>($r1); ... return $r1; } void m2(String) { ... $r0 := @this: Hello; $r1 := @parameter0: String; ... } static String m3(String) { ... $r0 := @parameter0: java.lang.String; ... $r1 = virtualinvoke $r1.<…append(String)>($r0); $r0 = <Hello: String b>; $r1 = virtualinvoke $r1.<… append(String)>($r0); …. $r0; } ① ② ③ 1 m class Hello { static String b = “world”; final static String a = “Hello”; String c; void m(String s) { String s1 = s + c; ….. } String m1(String s) { String s1 = s + a; ….. return m3(s1); } void native m2(String s); static String m3(String s) { String s1 = s + b; ….. return s1; } } m1 m2 m3 dynamic method static method native method ① ② ③ ④ ④ (a) Java Code void m(String) { .... $r0 := @this: Hello; $r1 := @parameter0: String; .... $r2 = virtualinvoke $r2.< ... append(String)>($r1); $r3 = $r0.<Hello: String c>; .... } String m1(String) { ... $r0 := @this: Hello; $r1 := @parameter0: String; ... $r2 = virtualinvoke $r2.< ... append(String)>("Hello"); $r1 = virtualinvoke $r2<…toString()>; $r1 = staticinvoke<…m3(String)>($r1); ... return $r1; } void m2(String) { ... $r0 := @this: Hello; $r1 := @parameter0: String; ... } static String m3(String) { ... $r0 := @parameter0: java.lang.String; ... $r1 = virtualinvoke $r1.<…append(String)>($r0); $r0 = <Hello: String b>; $r1 = virtualinvoke $r1.<… append(String)>($r0); …. $r0; } ① ② ③ m class Hello { static String b = “world”; final static String a = “Hello”; String c; void m(String s) { String s1 = s + c; ….. } String m1(String s) { String s1 = s + a; ….. return m3(s1); } void native m2(String s); static String m3(String s) { String s1 = s + b; ….. return s1; } } m1 m2 m3 dynamic method static method native method ① ② ③ ④ ④ Callchain From variable m new s1 parameter s this c m1 new s1 parameter s Hello a new s1(m3) m1 s(m3) Hello b(m3) m3 new s1 parameter s Hello b Index map m3 pass (b) Jimple Code void m(String) { .... $r0 := @this: Hello; $r1 := @parameter0: String; .... $r2 = virtualinvoke $r2.< ... append(String)>($r1); $r3 = $r0.<Hello: String c>; .... } String m1(String) { ... $r0 := @this: Hello; $r1 := @parameter0: String; ... $r2 = virtualinvoke $r2.< ... append(String)>("Hello"); $r1 = virtualinvoke $r2<…toString()>; $r1 = staticinvoke<…m3(String)>($r1); ... return $r1; } void m2(String) { ... $r0 := @this: Hello; $r1 := @parameter0: String; ... } static String m3(String) { ... $r0 := @parameter0: java.lang.String; ... $r1 = virtualinvoke $r1.<…append(String)>($r0); $r0 = <Hello: String b>; $r1 = virtualinvoke $r1.<… append(String)>($r0); …. $r0; } ① ② ③ m class Hello { static String b = “world”; final static String a = “Hello”; String c; void m(String s) { String s1 = s + c; ….. } String m1(String s) { String s1 = s + a; ….. return m3(s1); } void native m2(String s); static String m3(String s) { String s1 = s + b; ….. return s1; } } m1 m2 m3 dynamic method static method native method ① ② ③ ④ ④ Callchain From variable m new s1 parameter s this c m1 new s1 parameter s Hello a new s1(m3) m1 s(m3) Hello b(m3) m3 new s1 parameter s Hello b Index map m3 pass (c) Call Chains Figure 4.5: Inspection for Method Offloading Identifying minimum input data for each method One of the main impediments for the remote execution of unmodified apps is how much data is sent for enabling remote execution. Since not every method within the app is designed for RPC, a naive remote execution approach may have to send all the objects related to parameters including this object(current object) for each remote execution, especially for dynamic methods. But transferring whole objects can be extremely expen- sive since the size of the object can be nearly the same size as the total memory state of a mobile application. To tackle this challenge, FREEME analyzer inspects instructions in each method that is marked for potential offloading and extracts the minimum set of parameters and variables that are necessary for that method invocation on a remote server. 79 Figure 4.5 shows Java code(a), its equivalent Jimple code(b), and call chains for each method(c). FREEME analyzer collects the class name and a field name whenever it encounters a method invocation or a field access within a Jimple body during method inspection. It then compares the base of each field and method to identify if it is related to given input parameters of the method. The base and field names are obtained by using getBase() and getField() methods provided by Soot. For example, atÀ in Figure 4.5, $r3 is assigned a field name c and its base is $r0. The value of $r0 points to this object. Thus, the analyzer can obtain the field and base name and then figure out that the field c of Hello class is required for executing method m. In the absence of a detailed parameter usage analysis, the entire this object may have to be transferred to the server in this case. Similarly, atÁ the string a is being used. But this string is a static final field and it is directly incorporated into the Jimple code as shown atÁ, which means there is no need to send this field to the remote server at all. $r0 at does not have a base that means it is a static field within the Hello class and hence only the static field is required to be transferred. Particularly for method m1, all essential fields of method m3 are also required, since method m1 invokes m3. In this case, however, the parameter of m3 is not required, since it will be created by method m1 during the method execution as $r1 at Ã. For this reason, the analyzer keeps a index map to trace a source of each variable as depicted in Figure 4.5(c). In the same manner, all required fields for a call chain should be combined and migrated to the server as defined in Figure 4.6. The ability to identify and classify all the parameters for each method call reduces the size of data needed for remote execution and thus enables FREEME to target many more method calls for offloading at runtime. 80 <?xml version="1.0" encoding="utf-8"?> <package name=”edu.usc.freeme" id="3"> <method id="-2123" class=”edu.usc.freeme.Hello" name=“m(Ljava/lang/String;)V" static="false”> <params id="0" contains="1"> <field name=“c" type=“java.lang.String"/> </params> <params id=“1" contains=”-1”/> </method> <method id="-2412" class="edu.usc.freeme.Hello" name=“m1(Ljava/lang/String;)Ljava/lang/String;” static="false"> <params id="0" contains=“0"> <params id="1" contains="-1"/> <static class=“edu.usc.freeme.Hello” name=“b” type="java.lang.String"/>/> </method> <method id="-2131" class="edu.usc.freeme.Hello" name=“m2(Ljava/lang/String;)V” static="false"> <params id="0" contains=“0"> <memory name="myjni" size="2"> <address value="4bd4" size="0"/> <address value="4b14" size="0" > <offset value="4" size="0" /> </address> </memory> </method> <method id="-2321" class="edu.usc.freeme.Hello" name=“m3(Ljava/lang/String;)Ljava/lang/String;” static="true"> <params id="0" contains="-1"/> <static class=“edu.usc.freeme.Hello” name=“b” type="java.lang.String"/>/> </method> </package> 12/11/15 2 A Framework for Runtime Energy Efficient Mobile Execution Figure 4.6: XML formatted Metadata FREEME analyzer output FREEME analyzer inspects each method body with the generated call graph to pre- cisely identify all the methods and the minimum set of input parameters needed for remote execution. Thus, the final result of FREEME analyzer is a list of the remote executable methods from a given APK file. It then builds an XML metadata file similar to that shown in Figure 4.6 to indicate which methods can be offloaded and what are the required parameters that need to be passed along for remote execution. The XML file is stored on the server side and a copy is also sent back to the mobile phone. We will describe how the map file is used to reduce the serialization overhead in Section 4.1.5. 81 4.1.4 Modifying Dalvik Virtual Machine DVM is a virtual machine that runs apps on the Android platforms. It takes any app consisting of .dex and resources and executes them within the Android runtime. In FREEME, DVM is made aware of all the methods that can potentially be offloaded to a remote server with the XML metadata built by FREEME analyzer for each app. To identify method calls that may be offloaded, DVM loads FREEME map and stores it into a hashmap before starting an app. For every method identified in the FREEME map, DVM tags a flag in the corresponding method structure of the bytecode at initialization time of each method. The process of tagging methods is done only once for each app during its first execution. Efficient Method Call Trapping in DVM In FREEME, method calls are trapped by DVM and then executed in the most energy efficient manner. The decision on whether or not to offload a method call is actually made by a separate module called FREEME Decision Maker based on runtime condi- tions (and this process will be described shortly). To implement the method call trap- ping function, we modified a few ARM7 ASM files and C files related with the bytecode interpreter of DVM. Figure 4.7 describes the flow of how the modified DVM works. To identify method calls that may be offloaded, DVM loads the FREEME map generated by the FREEME analyzer into a hashmap before starting an app. For every method identified in the FREEME map, DVM tags a flag in the corresponding method structure of the bytecode at initialization time of each method. The process of tagging methods is done only once for each app during its first execution. 82 During app execution, DVM checks the tag to see if a method call can be offloaded. For minimum overhead, we implemented the tag detection part within ARM7 assembly where it needs only two instructions and thus the overhead of detecting the tagged meth- ods is negligible compared to the total overhead of interpreting the bytecode. When the DVM detects a FREEME tag inside a method structure, it suspends interpreting the byte- code and then calls the method status to check if the method is available for offloading. As described later, the FREEME service checks the network conditions to determine whether it is feasible to offload the code. For instance, in the absence of a network connection there is no way to initiate remote execution and hence DVM will resume the execution of original code. If the method is available for offloading then the the FREEME service checks to see if that method has been profiled earlier. The profiling process determines the energy cost of computing the method call locally. The energy cost is estimated from the mobile phone CPU frequency and the amount of time that method call took to run on the phone at that frequency. This is done once for each method during the first invocation of that method. Note that there may be some input dependent changes to the method execution that may alter the energy cost of computing that method call locally. Hence, the profiling data may be regenerated at infrequent intervals to sample the energy consumption. In addition, the profiling phase also determines the energy cost of transmitting the data necessary for remote execution. The transmission energy cost may vary based on location and time of the day. The profiler measures the energy cost per the Maximum Transmission Unit (MTU) of transmission at each location during each time interval. The energy estimation process is described in the profiler section later. If it is determined the method call should be offloaded a remote procedure call (RPC) is invoked by DVM in manner similar to execution of a native method. 83 ANSI C (Dalvik) ARM7 ASM (Dalvik) Start Go to the next instruction Get Arguments Call JNI Method Native Method? After Process returning from method call Interpret method code FREEME Tagged? No No Yes Yes Call Method Status Profile Exist? No Call Remote Function Yes Success? No Call Start Profiler Profiling? Call Stop Profiler No Yes Method Available? No Yes Yes Android Java (FREEME Service) Method Status Remote Function Start Profiler Stop Profiler Make Decision Modified Method Invocation inside Dalvik VM Figure 4.7: FREEME Method Passing on Dalvik 4.1.5 FREEME Service FREEME service module handles all the code offloading functionality at runtime. It has three main elements as shown in Figure 4.2: energy profiler, decision maker, and RPC Client. Energy Profiler The purpose of energy profiler is to estimate the energy cost of executing a method locally on the phone as well as the energy cost of using data transmission for offload- ing the necessary data to a remote server, including the cost of data marshalling and unmarshalling on the phone. We designed a simple software energy profiler inspired by PowerTutor [62]. 84 In order to estimate CPU energy consumption, we use a first-order approximation equa- tion. This equation assumes that CPU energy cost is simply a function of computation time, operating frequency, and the amount of CPU utilization. More complex estimation functions can be easily integrated into FREEME but those approaches are outside the scope of this thesis. E cpu =nE cpu on + n X t=0 U t (E l +(E h E l )( f t f l f h f l )) whereE cpu represents energy consumption a method running on a CPU for timen sec- onds,U t is CPU’s utilization at timet,E l andE h are CPU’s energy consumptions for the lowest and highest frequencies,f t is a CPU’s frequency while running a method call at timet,f l andf h are CPU’s lowest and highest frequencies, andE cpu on is CPU’s base energy consumption in an awake mode. We use a simple test app that does CPU inten- sive work at highest and lowest CPU frequency at peak utilization to measuresE l ,E h andE cpu on once per each phone model. To reduce run-to-run variations the energy con- sumption of each unit is measured multiple times and the average energy consumption for each resource is used. We then use the above equation to estimate the power consumption for a method execu- tion at runtime. At runtime we only need the amount of time a method call took to run on the mobile phone, the mobile CPU operating frequency while running the method call. These values are obtained from the decision maker described in the next section. In order to estimate the data transmission cost with Wi-Fi, we use the following equa- tion. For Wi-Fi power modeling, we measured uplink and downlink energy cost with 85 Applica8on Dalvik Kernel Profiler Power Model Profile Database Ac8vity Data (a) Estimate Energy Sample Data Power Source Energy Consump6on Data Test Server Power Monitor (b) Power Modeling Figure 4.8: Energy Profiler several lengths of sample data. Usually, network transmission costs consist of three phases, connection, data transmission, and tail energy [63]. E wifi =(E conn +E up P send +E down P recv )+E tail whereE conn is the energy cost for establishing connection to a server,E tail is an energy cost after disconnection which is the time period during which power eventually drops to zero after disconnection, andP send is the number of packets uploaded andP recv is the number of packets downloaded at timet. E up andE dn are uplink and downlink trans- mission energy costs per each packet, and is an environmental factor that accounts for dynamic variations such as TCP slow start and link quality, but is set to one in our evaluations since the link quality was excellent during all our experiments. This value can be easily changed based on real time measurement of network quality which can be easily obtained from existing Android APIs or from proxy measurements such as round trip time. Similar equations can be used to measure the energy consumption for all cel- lular networks, such as 3G and LTE. We use another test app that sends sample data to a test server, to measure the energy consumption of each network interface. 86 In order to collects the various parameters used in the power model, we run several CPU test apps and data transmission test apps that were developed primarily to measure energy consumption of each component of interest. For instance, the CPU test apps turn off all wireless radios and uses only the CPU to measure the power consumption parameters of a CPU as shown in Figure 4.8(b). To measure the power consumption parameters while executing these specific apps, the mobile phone power supply is routed through the Monsoon power monitor [64] which measures actual energy consumption of a phone with a time stamp. The power monitor which is logged on a separate machine calculates the profile time measurements such asE l ,E h andE cpu on for the CPU andE conn ,E tail ,E scan ,E up andE dn for each radio. These computed values are then loaded into a profile database on the mobile phone as shown in Figure 4.8(a). Decision Maker Decision Maker module informs the DVM when it is energy efficient to use local com- putation of a given a method call within an app and when it is better to perform a target method on the remote server, while taking into account the energy costs for communi- cation. Decision maker estimates energy consumption of the target method based on profiled data given by Energy Profiler. If the profiled data does not exist, it executes Energy Profiler to collect the required energy measurement parameters described in the previous section during local execution of the target method. This process is shown in Figure 4.7. The profiled data is a simple metadata holder that stores the execution time and required data information of a method call. Note that the method call execution time is measured for the entire method call chain. The metadata is stored in permanent 87 storage and is managed by energy profiler. For instance the energy profiler may update this data on future app invocations that may invoke method calls. During the first invocation the CPU time consumed by the method call is measured along with the CPU utilization. Then using the CPU energy consumption equation shown ear- lier the cost of local computation is estimated. Note that there may be some input depen- dent changes to the method execution that may alter the energy cost of computing that method call locally. Hence, the profiling data may be regenerated at infrequent inter- vals to sample the energy consumption. Furthermore, to reduce noisy data the decision maker actually estimates the local computation cost over multiple invocations and then uses an averaged value. In addition, the profiling phase also determines the energy cost of transmitting the data necessary for remote execution. During the first invocation of a method Energy Profiler measures the required data that may be sent to the remote server. Using this input data size and the transmission energy equation it computes the energy cost of transmitting the data to the backend server on each of the available network interfaces, such as Wi- Fis, 3G and LTE networks. Again to account for noisy data this process is repeated over multiple invocations and the average energy consumption for each of the network interfaces is computed. If remote computation is suggested the decision maker identifies the most energy effi- cient network interface to transmit the data for remote computation. After the profiling process is complete at run time the decision maker first checks to see if the preferred net- work interface is available. For instance, if Wi-Fi is preferred but no Wi-Fi is available it simply falls back to local computation. Note that the transmission energy cost may vary based on location and time of the day. But in our experience the most significant 88 determinant of energy cost was the size of the data that needs to be communicated; only rarely has the network congestion played a major role in determining the overall energy. Hence, the decision maker only uses data size and the availability of a network interface in making the decision, but not the network condition. This assumption will be relaxed in our future work. If it is determined that the method call should be offloaded then a computational offloading is invoked by DVM. RPC Client Once a method is selected for offloading an RPC (remote procedure call) client handles the offloading process. When the RPC client gets a request for offloading from DVM, it retrieves method’s information from the XML metadata map and then collects necessary data for method call, such as a class instance of the method, parameters, static fields and a native memory. It then marshals and sends the data to the server using TCP. After receiving data from the server without an error, it extracts a method result and parameters from the data. When the remote execution fails, it returns false to DVM so that DVM executes the method locally. 4.1.6 Minimizing data transmission size The RPC client handles the transmission and reception of required data for remote exe- cution. As mentioned earlier, one of the contributions of FREEME is its extensive focus on reducing the size of the data transmitted between the mobile device and server. It achieves this goal using three unique approaches. 89 Java serialization challenge Java provides a function called Serialization that converts a class instance into sequen- tial bytes. To enable remote execution, every data field that is required on the server side must be serialized by the RPC client and sent to the server. However, there are two stringent constraints before a class can be deemed as serializable. The class must implement the java.io.Serializable interface and all member fields within the class must also be serializable. Unfortunately, most classes in Android apps do not implement the java.io.Serializable interface and hence most class instances cannot be serialized or transmitted to the server. Even if we assume serialization will be supported in future, most serialization approaches simply serialize the entire class instance without regard to which variables are needed for remote execution. The challenges to data serialization are illustrated in Figure 4.5. In this figure, the three lines of code are labeled asÊ,Ë, andÌ. The code labelledË andÌ uses variables declared as static in the class. Java serialization does not keep static values within a class during serialization. Hence, these two variables will not be packaged for remote execution. Second, static String a has “final” keyword, so it cannot be changed during program execution. Hence, there is no need to repeatedly send this data to the remote server since the server already has an instance of this variable that it has saved during static analysis phase. In other word, the value can be reproduced on the server side without transmitting any data. Hence, only the value of static String b is required. To get around existing Java serialization constraints, FREEME builds a customized seri- alization methods to enable serialization of the classes. The customized serialization method works closely with the XML file created by FREEME analyzer to serialize only 90 the required field for each method. To address the issue described in Figure 4.5, this cus- tomized method does not drop any static variables from within a class instance during serialization approach. Our custom serialization code is built into a stub on the client and server side as shown in Figure 4.2. For example, the XML file in Figure 4.6 is the ana- lyzed result from the Hello class shown in Figure 4.5. It has information of all required fields for each method. In case of method m1, it indicates that the first parameter, the class instance of Hello class is not required whereas the second parameter, String s and a staticString b of Hello class are required. Thus, the serialization methods can avoid serializing unnecessary data as well as dropping any required static field with the infor- mation. To the best of our knowledge, no prior work on computational offloading has developed a serialization framework that works in conjunction with the static analysis outputs. Avoiding serialization for reproducible data through remote instance In addition to identifying minimum input set for each method call, FREEME proposes a notion of remote instances that can be potentially reproduced on the server side without the need to explicitly send that object to the server. If none of the user-defined methods modify member fields of a super class then that super class can be reproduced on the server side by using the information saved during static analysis . For instance, Activity class has several system related classes and numerous member fields. Thus, if a class is inherited from Activity class, it is hard to serialize this class since the size of serialized data will exceed the maximum heap and stack size due to member fields of its super class. Fortunately, most classes inherited from Activity class only read the member fields but do not modify these fields. During static analysis FREEME analyzer tags any object that satisfies the above requirement as a remote instance. 91 Class Name Remote Type android.app.Activity Normal android.content.Context Blank android.content.res.Resources Blank android.content.res.AssetManager Blank Table 4.1: Remote instances in FREEME When a class inherits a class tagged as a remote instance class then FREEME sepa- rates the remote instance class and treats only the user-defined fields in the class as required fields for remote computation. The logic behind this approach is that any remote instance can be easily reproduced on the server side without the original object. Only the data of user-defined fields on top of this superclass are sent to the server which can then integrate these inputs to create the complete class instance. All objects can be potentially analyzed to check if they can be remote instances. But currently, FREEME supports four remote classes: Activity,Context,Resources, and AssetManager as shown in Table 4.1. These are the most common and also the largest objects that satisfy the read-only property. If these large objects have to be transmit- ted that can significantly degrade the ability for remote execution of any class that is inherited from them. By using the remote instances approach along with the minimum data set analysis, FREEME significantly reduces the data size that is needed for remote execution. 92 Static Values Class Instance Parameters Native Memory Database Package Id Method Id + RPC Client Static Values Class Instance Parameters Native Memory Result RPC Server Package Manager DexClass Loader APK File stub stub 1 3 2 4 5 6 7 Unmarshalling Preparing Invocation Invoking Method Figure 4.9: Data flow for Remote Execution 4.1.7 FREEME Handler FREEME service runs as a separate process on the mobile phone. To facilitate access to this service across multiple apps we created FREEME handler that provides static methods to access FREEME service using the Android Binder IPC (Inter Process Com- munication). It also can be used by developers to apply FREEME to their apps manually. 4.1.8 Remote Procedure Call Server Figure 4.9 describes the steps in invoking a method on the server side. À RPC client sends package & method IDs along with essential data for a method invocation to RPC server using a stub. Á RPC server checks the IDs then unmarshals serialized data.  A package name and method information are retrieved from database, such as required static fields and parameters. 93 à Package info is created by Android Package Manager that handles an APK file. Ä RPC server sets up an environment for invocation of the method. For example, it rebuilds a class instance and parameter objects as well as puts values of the required static fields to related objects. Å RPC server invokes the method then gets a result. Æ A stub on RPC server marshals the result and all received data that were modified during the invocation. To ensure maximum compatibility, we used DVM compiled for Intel x86 Architec- ture on the server to execute offloaded computation under a sever version of Linux that does not provide several Android-specific hardware access functions. The lack of such hardware drivers does not create any problem because of the restriction that hardware dependent methods can not be migrated anyway within FREEME. For user- defined native methods, we use Intel’s Houdini which can execute ARM binaries on an x86 platform. When executing native methods, the memory accessed by native methods on the phone are properly dumped and correctly restored to the server memory before initiating remote execution. Native Memory Handling To execute a native method on the remote side, a memory space required by the method should be considered if the space is shared by another methods as we mentioned in the previous section. Figure 4.10 shows a process of memory save and restore by FREEME. 94 libc … libmyjni.so … … 0x80000 <memory name="myjni" size="2"> <address value="4bd4" size="0"/> <address value="4b14" size="0" > <offset value="4" size="0" /> </address> </memory> … 0xabcd … 0xbcd0 … 4bd4 4b14 Memory libmyjni.so libc … libmyjni.so … … 0xa0000 … 0xabab … 0xcdc8 … 4bd4 4b14 Memory libmyjni.so Phone Server ① ② ③ 0x84b14 0x84bd4 0xa4b14 0xa4bd4 Dump Restore 0xbcd4 0xcdcc Figure 4.10: Dump and Restore of Native Memory À is an XML meta data indicating required native memory space from Figure 4.6. Á is a memory space of a phone and is a memory space of a server. For example, if libmyjni.so is loaded at 0x80000, the required memory addresses will be relocated to 0x84b14 and 0x84bd4. Note that 0 value for the size indicates the memory space is dynamic allocated. Thus, FREEME service will retrieve the actual size from the allo- cated memory space at runtime, 0xabcd and 0xbcd0. It then dumps data with allocated size. After the server received the dumped data, FREEME Runtime restores the data into local memory space. 1. load libmyjni.so then retrieves an address of the library. 2. calculate relocated address then allocate dynamic memory space with the target address. 3. restore the data into the memory. For the example, libmyjni.s is loaded at 0xa0000 and allocated addresses are 0xabab and 0xcdc8. 95 Optimizing result transfer size Recall that FREEME analyzer statically identifies all the necessary data that is needed for remote execution. In addition, by using the notion of remote instances, only the name of a remote instance is transmitted instead of the entire object. While these steps already reduce the size of data, there is a still an opportunity to further reduce the size of the data transferred at runtime. For instance, methods calls may refer data fields, but they may perform read-only accesses on that data. In this case, if a server can identify such an unmodified data then they do not need to be returned to the client, so the transferred data size can be further reduced. Note that, in all prior work, cloud servers send back whole received data to a client, since they does not know what data were modified during remote execution. When remote execution is completed the server sends only the modified result data back to the client. The stub on the client receives the result from the server and it then needs to update the results as well as any other variables that were modified by the server during remote execution. Figure 4.11 describes how the process works. Differ&Updater In Figure 4.11, the table marked asÀ shows that mobile phone sends an array object, “A” for remote execution. This object has two objects, “B” and “C”, both are inte- gers with a value of 10 and 1, respectively. The first column in the table shows the serial number of each of these objects. The serial number is the order in which the data serializer converts the array object into bytes. The stub within RPC server receives the serialized bytes from the mobile phone. This step is marked asÁ. The server then 96 Stub (Client) Stub (Server) ObjectOutputStream ObjectUpdater Reference Table ObjectInputStream ObjectDiffer Reference Table O O’ Runtime Static Fields & Result Serialized Data Modified Data & Result 6 2 3 4 Seq.# Type Value 1 array A 2 object in (A) B 3 int in (B) 10 4 object in (A) C 5 int in (C) 1 6 remote R Seq.# Type Value 1 local 1 2 local 2 3 local 4 4 int in (4) 10 5 local 6 6 object result 1 5 Figure 4.11: Local & remote instances in Stub reconstructs the class instance twice on the server side from the serialized bytes. It saves the first instance, “O” to reserve original values and forwards the second instance, “O 0 ” to the runtime.  After completing the method call execution the stub on the server side receives the results. à The ObjectDiffer gets a reference table from the ObjectIn- putStream then compares “O” and “O 0 ” to identify modified data. In this example only C value is modified from 1 to 10. Ä The server stub then marshals “O 0 ” where the objects that were not modified as simply noted as same as thelocal values on the mobile phone. It then sends only the sequence numbers of unmodified data, and skips sending the actual data for these objects. For modified data it sends both the sequence number and the actual data. In stepÅ ObjectUpdater on the mobile phone can update the origi- nal object with new values, int “10” using the sequence number ‘‘4”. This optimization dramatically reduces the data size that is transmitted from the server to the phone. 4.1.9 Putting it All Together To summarize, to use FREEME an app has to execute the following sequence of steps. (1) First, the apps should be registered to FREEME (2) The app is then sent to FREEME analyzer on the server which performs static analysis and creates a FREEME map which 97 includes all methods that satisfy the requirements for remote execution. (3) DVM on the mobile phone receives FREEME map and uses the map information to tag all remote executable methods. (4) The energy profiler generates an estimate of energy consump- tion of various components within the mobile phone. (5) During app execution, DVM traps any tagged method call and invokes FREEME service to determine whether to execute the call remotely or not. (6) If a decision is made to execute remotely then FREEME service invokes RPC client to serialize the necessary data from the related class and then transmits to the remote server. (7) The remote server stub resurrects the data to invoke the method call. (8) Once completed, the results and the received data are sent back to the client. 4.1.10 Implementation We implemented the full software stack described above on top of Android OS version 4.1.2. We use two models of Samsung Galaxy S2 Skyrocket to run FREEME. On the server end we used an Ubuntu server with a quad-core Intel Xeon 2.4GHz CPU, and 16GB of memory. To build a power model for the mobile phone CPU, we used the CPU intensive test app as described in Section 4.1.5 and collectedE l ,E h andE cpu on for the phone model. We also collected the wireless transmission energy costs while running the data transmission test app. Table 4.2 shows average energy costs for CPU operation, and data transmission energy for one MTU of data, averaged over three different data transmission sizes of 1M, 512K and 100k. . 98 Model Galaxy S2 single dual E l 29.52mA 53.02mA E h 221.52mA 366.52mA f l 384MHz f h 1512MHz E cpu on 24.48mA Model Galaxy S2 802.11g 802.11n E scan 278mW E up 0.797mJ 0.731mJ E dn 0.522mJ 0.486mJ MTU 1500 Table 4.2: Profiled data for CPU and Wi-Fi Interface 13 10 11.7 14.2 14.3 6.9 9.9 18.1 35.6 15.3 18 21.1 7.2 6.2 3.5 13.1 3.9 3.9 5.8 5.8 13 12.9 3.1 3.2 4 6.1 6.3 4 3.8 4 6.3 8 12.1 15.8 3.1 3.3 4.1 7.3 7 5.8 6 6.2 12.9 14.2 14.3 24.4 3.1 3.3 4.1 0 5 10 15 20 25 30 35 40 Integer Math Floating Point Math Find Prime Numbers Random String Sorting Data Encryption Data Compression Level 4 Level 5 Level 6 Level 7 3mins 6mins 9mins CPU Benchmarks (a) Chess Game (b) KNOWME Client (c) Energy (J) Local 802.11g 802.11n 802.11n w/o Decision Maker 170 Local Figure 4.12: Energy Consumption For Applications 4.2 Evaluation We evaluate FREEME using existing commercial apps, PassMark PerformanceTest Mobile Version 1.0.4 [65] and Chess for Android [66] and a mobile health monitoring app called KNOWME [63]. KNOWME is a wireless body area measurement system that was deployed in Los Angles which reads physiological sensor data and automati- cally characterizes user state for obesity monitoring. Table 4.3 shows results of static analysis for each application. To compare with other offloading system, we implemented a system using a thread migration method as proposed in CloneCloud [67] and used COMET [68] for Android 4.1.2 obtained from the author’s web site. The APK size of PerformanceTest is 3.2Mbytes. Package analyzer step visited a total of 622 methods (the number of nodes in the call graph) and extracted a total of 120 methods. However, many of them are just simple methods like a getter or setter that 99 Application PassMark Chess KNOWME File Size 3.2M 0.4M 16MB Analysis Time 2sec 6sec 5sec Analyzed Methods 622 434 340 (Native) - - (193) Remote Executable 120 106 86 Final Extracted 21 35 11 Registration Time 20sec 29sec 16sec Table 4.3: Results of Static Analysis returns or assign a private field. Thus, analyzer eliminates such simple methods that have no further nested methods. Then, a total of 21 methods are extracted with all the necessary input data elements for executing each method call. This process consumed 2 seconds on the server side. The package analyzer generated an XML metadata for this app, about 5KB, and sent it back to the mobile phone for method call tagging. It took approximately 4 seconds to do all the data transfers associated with app and map exchange between server and the phone using Wi-Fi 802.11n. It took an additional 14 seconds on the phone for the modified Dalvik and FREEME manager modules to pro- cess the metadata. Hence, it took a total of 20 seconds after the app registration before the app was ready for execution within the FREEME environment. The KNOWME app contains an NDK library that has a total of 193 native functions and 147 Java methods and the analyzer successfully extracted and analyzed all these methods and eventually selected 11 computationally demanding methods for remote execution. Not surpris- ingly, FREEME extracted 35 methods for remote execution from the Chess app which has more computationally demanding tasks. 100 (a) Local (b) 802.11n (c) Chess (d) KNOWME Figure 4.13: Screenshots of Applications 4.2.1 PassMark PerformanceTest Mobile The PerformanceTest is an Android benchmark app that measures device’s performance with five major application categories: CPU Tests, Disk Tests, Memory Tests, 2D Graphics Tests and 3D Graphics Tests. While all these tests run within FREEME, we focus on CPU tests that are computationally demanding. Other tests are computationally insignificant for offloading. CPU tests were performed under six subcategories, namely Integer Math, Floating Point Math, Find Prime Numbers, Random String Sorting, Data Encryption, and Data compression. We run several tests for each measurement and results are averaged across all runs. Run to run results varied by less than 5%. A screen shot of the phone running the CPU benchmark in default setting, without FREEME labeled as Local, and with FREEME labeled as 802.11n are shown in Figure 4.13. We quantified the reduction in data size with FREEME in Table 4.4. For CPU benchmarks the number of methods that were exe- cuted remotely on the server are ranged from 10 to 45, as shown in the second row of this table. The next two rows show the total size of the data transmitted and received with 101 Application Integer Floating Prime Random Encryption Compression Chess(Lvl.7) KNOWME FREEME #of Offloaded Methods 15 10 15 15 10 45 3024 4 Total Tx 1.2Mbytes 1.2Mbytes 1Kbytes 6.9Mbytes 1Kbytes 1Kbytes 4.5Mbytes 135Kbytes Total Rx 375bytes 375bytes 375bytes 375bytes 590bytes 375bytes 3.6Mbytes 8Kbytes COMET Total Tx 6.7Mbytes 6.7Mbytes 6.7MBytes 15.6Mbytes 6.8Mbytes 6.8MBytes x x Total Rx 1.7Mbytes 1.7MBytes 1.7MBytes 3.2MBytes 1.8Mbytes 1.9MBytes x x CloneCloud Total Tx 8Mbytes 8Mbytes 8Mbytes 18Mbytes 8Mbytes 8Mbytes 1.9Gbytes x Total Rx 8Mbytes 8Mbytes 8Mbytes 18Mbytes 8Mbytes 8Mbytes 1.9Gbytes x Table 4.4: Comparison of total transmitted data size for Remote Execution FREEME. The last four rows show the same data withCOMET andCloneCloud. As expected, with the thread migration approach used in CloneCloud both up and down link data sizes are the same since the entire class instance data is exchanged between the server and phone in both directions. Even though COMET uses DSM and auto- matically synchronizes memory between the phone and server, it still transmitted and received significantly more data than FREEME in this case, because during the remote execution the system needs to synchronize any memory that has been altered from the last invocation. FREEME dramatically reduces the size of data transmission. On the uplink the pack- age analyzer created metadata identifying only the necessary data elements needed for remote execution. The reduction in downlink data is attributed to the fact that most of the data that is sent for remote execution is used in read-only fashion. Hence, the server stub analysis eliminates sending unmodified data back to the phone. Thus, the size of the data packets sent back to the phone dropped by nearly three orders of magnitude with FREEME. Hence, the packet size for uploads and downloads drop significantly. Note that in Random String benchmark the package analyzer ended up tagging most class data as necessary for remote execution. Hence, the uplink size did not reduce significantly. However, the downlink size has reduced significantly. The result of the data size reduction with FREEME also translates into significant energy savings for most benchmarks except for random string sorting benchmark as shown in 102 Figure 4.12(a). In this figure the first bar shows energy consumption if the computation was always performed locally. The second and third bars show energy consumption using FREEME where the decision maker decides whether to run computation locally or remotely, when there is 802.11g or 802.11g network, respectively. The last column shows the energy consumption if all the final extracted methods in Table 4.3 were executed remotely without involving decision maker. In the absence of decision maker, no trade-off analysis is done to decide whether remote computation is useful based on the data size that needs to be transmitted. For random string sorting the decision maker never had the opportunity to select remote computation due to the large data size needed for remote computation. Hence, even in the presence of FREEME the decision maker always chose local computation. Without the decision maker (the last bar) it is clear that remote computation would consume significant energy in data transmission cost. The reason for the energy savings with FREEME are better visualized in Figure 4.14. In this figure the time varying energy consumption of Integer Math app is plotted for each of the three studied categories. The power consumption of Wi-Fi-based FREEME is significantly lower than local computation. In fact the big spikes seen during the time interval were mostly due to local computations when offloading was not enabled due to poor network conditions. Application Local 802.11g 802.11n IntegerMath 6.30sec 5.10sec 4.90sec FloatingMath 7.12sec 6.71sec 6.63sec FindPrimeNo 7.58sec 1.48sec 1.61sec RandomString 7.96sec x x Encryption 8.50sec 3.3sec 3.26sec Compression 7.60sec 8.36sec 8.48sec Table 4.5: Execution Time For CPU Benchmarks 103 0 0.5 1 1.5 2 2.5 3 3.5 0 1 2 3 4 5 Power(W) Time (Seconds) (a) Local 0 0.5 1 1.5 2 2.5 3 3.5 0 1 2 3 4 5 Power(W) Time (Seconds) (b) 802.11n 0 0.5 1 1.5 2 2.5 3 3.5 0 1 2 3 4 5 Power(W) Time (Seconds) (c) 802.11g Figure 4.14: Energy Consumption Phases for Integer Math Since FREEME offloads computation to the remote server under appropriate network conditions the computation latency also decreases for most apps, even after accounting for the data transmission delays as shown in Table 4.5. Note that FREEME always selects local computation with RandomString application due to large data transmission demands. Note that the compression algorithm pays an additional 8% latency penalty with FREEME, although the overall energy consumption has been reduced since the mobile CPU is simply idle for longer periods of time when remote execution is being performed. 4.2.2 Chess for Android The Chess for Android is one of the most popular chess game apps in the Google play. For evaluation, chess pieces were preset as shown in Figure 4.13(c). We derived the preset from the Chess World Championships. The test performed a single move of the chess when the Knight moves to f1 from h2 with several player difficulty levels chosen from 20 available levels. 104 Playing at level 4, FREEME sent 108 remote execution requests and each remote execu- tion request had 12 methods an average in the method call chain. Thus, a total of 1296 methods were remotely executed at level 4. Level 7 sent 252 remote execution requests and thus a total of 3024 methods (252*12) were executed remotely (see the second row in Table 4.4). Table 4.4 also shows the total data size for FREEME, and CloneCloud. At level 7 Chess requires an enormous 3.8 GB data to be transmitted between the server and mobile phone for CloneCloud [67]. But FREEME reduces that data size to 8.1 MB by sending only required data fields to the server and the server avoids sending unmodified data back to the client. Note that COMET [68] was unable to run the Chess game. Figure 4.12(b) shows the energy consumptions of level 4-7. The energy consumptions of local computation (first bar in each group) linearly increases with computation demand for each level whereas the energy consumptions of remote execution is impacted only by the number of transmissions and the amount of data in each transmission. It is interesting to note that 802.11g has lower energy consumption than 802.11n. The reason is that the chess game sends several small chunks of data which lead to higher connection and tail energy costs in 802.11n. 4.2.3 Mobile Health Monitoring App As described in Chapter 2 KNOWME is mobile phone centric wireless body area net- work that was deployed in Los Angeles for pediatric obesity prevention and treatment. KNOWME collects Bluetooth enabled body sensor data from tri-axial accelerometers (ACC) and electrocardiograph sensor(ECG) then uses that data to classify user’s state into one of nine categories in real time, such as walking, running, fidgeting, standing 105 and sitting. We collected data for three, six, and nine minutes to perform this classifica- tion. Figure 4.13(d) shows a screenshot of the user-activity analysis that uses a pie-chart to classify user state into various categories. Table 4.6 shows execution times for local and remote execution. The data size transmitted and received is shown in Table 4.4. Figure 4.12(c) shows energy consumption. The energy consumption of local compu- tation alone is significantly worse than using computational offloading with FREEME. Note that, COMET and CloneCloud are not designed to handle user-defined native methods, so they are unable to handle the KNOWME app for computational offloading. Collection Time/Size Local 802.11g 802.11n 3mins / 135Kbytes 24.6sec 9.1sec 9.1sec 6mins / 270Kbytes 28.6sec 11.2sec 10.8sec 9mins / 405Kbytes 33.1sec 14.5sec 14.3sec Table 4.6: Execution Time for KNOWME App 4.3 Related Work Some of the early works on remote execution were devised on top of RPC protocol pro- posed in [69] . Box et al. [3] proposed Simple Object Access Protocol(SOAP) based on XML that is used for invoking a remote web service by exchanging messages. Sim- ilar approaches were taken in Enterprise Java Beans [70] and Remote Method Invoca- tion [71]. However, those methods are usually for wall-powered or large battery com- puting devices, where the focus is on increasing performance and fault tolerance, rather than energy conservation. Computational offloading in mobile devices is an area of growing interest. In [26] the authors described MAUI that relies on the flexibility of managed code environment to 106 create two versions of any computational task – one version that runs locally on a phone and second that runs remotely on a server. The app developer is expected to mark different method calls explicitly for remote execution. MAUI estimates the amount of task state that needs to be transferred between local and remote sites. It then invokes remote computation using RPC whenever remote computation is more energy efficient. CloneCloud [67] uses a static analyzer to partition unmodified Android apps to identify migration points. It then sends data dump of all related objects in the heap of an app thread for remote execution whenever it meets a migration point as determined by an optimization solver. Due to a large amount of data migration, this approach seems to work best when the app has a long execution time. For instance, CloneCloud uses a face recognition algorithm on 10MB of photo files which took 33 minutes to complete on a phone. Recognizing this concern, in a follow up work [72] the authors propose to transfer Essential Heap Object (EHO) associated with each method rather than the entire data dump within CloneCloud. This technique avoids sending data from a super class and its member objects that do not appear in a target method. This approach still sends each object touched in the target method in its entirety. COMET [68] uses distributed shared memory(DSM) to synchronize the memory of a target thread between a phone and a server to minimize data transmission. On a first remote execution the entire memory image is transferred and for subsequent invocations only memory image differences are sent. But most of the memory image differences are actually unnecessary for remote execution. In the absence of a careful static analysis it is not possible to identify which parts of the memory are really needed for remote execution. Cukcoo [73] and ThinkAir [74] are similar to MAUI in terms of providing simple ways for Android app developers to help utilize cloud systems, but they still needs developer’s effort to 107 determine a set of feasible candidate methods that can be offloaded. Since ThinkAir is based on CloneCloud, it also transfers large memory dump for remote execution. Apart from data transfer size, one of the critical limitations of these prior systems is that they do not handle user-defined native methods, where the method is compiled to a binary code that runs on the underlying hardware. An increasing number of apps are written using Android Native Development Kit (NDK) to extract highest performance without paying the runtime overhead [75]. Indeed, 75% of the 300 top free apps on the Google Play are developed using NDK as of April 2014. FREEME uses a unified static analysis framework to analyze Android apps that have both Java and native methods calls. The analysis identifies methods, both Java and native methods, that can be remotely executed without requiring programmer input. FREEME identifies sub-components of all parameters that are needed for each method execution. In addition, when the remote server sends the data back after the method invocation, it only sends a result of the execution and parts of received data that have been mod- ified, which are then integrated into the app’s execution context on the mobile phone. Thus, this fine grain data analysis enables FREEME to exchange significantly smaller amounts of data. The techniques developed in FREEME can broaden the opportunities for all computational offloading systems and are thus orthogonal to many prior efforts on computational offloading. FREEME is inspired by RPC protocols proposed in [69]. Currently, Java provides RMI to allow remote invocation of a method with input parameters that must be serializable, and the methods are generally stateless, which implies that the remote method does not need any other method state from the client other than the method’s input parameters. RMI then uses Java object serialization approach for data migration where all related 108 objects are converted to byte stream. Since Android does not support RMI, FREEME implements its own RPC solution using TCP. FREEME also targets broader set of meth- ods including stateful methods for computational offloading. Hence, FREEME develops Android-specific approaches to identify the minimum required state for each method call and then developed a offloading-friendly serialization approach that transfers the minimum data set between the device and the cloud. 109 Chapter 5 Conclusion & Summary Mobile phone battery consumption is a significant impediment to future mobile appli- cations that are likely to perform significant computations on the device. The demand for computation significantly outpaces the device performance and battery life improve- ments. One way to enhance mobile application’s compute power is to automatically offload computations to a remote server. However, relying on application developers to annotate the code for remote computation is likely to face challenges. Automatic computational offloading, however, can result in significant data transfer overhead if the method call state is not properly managed. In this thesis we present a comprehensive quantification of smartphone’s power con- sumptions with a case study of our Wireless Body Are Networks system, KNOWME. In addition, we proposed Active Energy Profiling (AEP) method that reduces energy consumptions using RPC with condition of communication environment. Inspired by the results from AEP, we designed FREEME a runtime mobile application execution environment that automatically offloads computations to the remote server. The novelty of FREEME is its ability to automatically analyze the Java class methods as well as user-defined native methods and separate which data elements are necessary for remote execution and send only the required minimum data to the server. On the server side the server performs significantly complex analysis on the class data structure 110 and determines which values are modified and send only those values back to the phone after completing the remote execution request. Both these optimizations significantly reduce the data size necessary for remote execution which enables a broad range of applications to benefit from FREEME code offloading capability. We presented various challenges to implementing FREEME and provided approaches to tackle these challenges. We evaluated FREEME on Android phones and show that sig- nificant energy and latency reductions can be achieved with FREEME. We proposed sev- eral novel extensions to FREEME to further increase the broad appeal of this approach. I particular, we propose to tackle three challenges in our ongoing and future work of this thesis. First, the current FREEME implementation does not offload any user-defined native method calls which we plan to overcome with binary translation approaches. We propose to develop new file system synchronization approach when a remote method requires access to local files on the mobile phone that are generated during the method execution or passed as parameters to the method call. The third area of improvement for FREEME can come from offloading sub-trees within a method tree and considering speculative offloading of computation when a method tree includes a restricted method that is conditionally invoked. We propose to develop new algorithms for branch predic- tion, dealing with speculative offloading and potential code rollback in the presence of mis-speculation. 111 Reference List [1] Android Architecture.http://developer.android.com. [2] Y . Wang, J. Lin, M. Annavaram, Q. A. Jacobson, J. Hong, B. Krishnamachari, and N. Sadeh, “A framework of energy efficient mobile sensing for automatic user state recog- nition,” in MobiSys ’09, pp. 179–192, 2009. [3] D. Box, D. Ehnebuske, G. Kakivaya, A. Layman, N. Mendelsohn, H. F. Nielsen, S. Thatte, and D. Winer, “Soap: Simple object access protocol,” HTTP Working Group Internet Draft, 1999. [4] V . Matena and M. Hapner, “Enterprise JavaBeans, Specification Version 1.0, Sun Microsystems, Inc.” http://www.oracle.com/technetwork/java/javaee/ ejb/, 1997. [5] A. D. Birrell and B. J. Nelson, “Implementing remote procedure calls,” ACM Trans. Com- put. Syst., vol. 2, no. 1, pp. 39–59, 1984. [6] J. Waldo, “Remote procedure calls and java remote method invocation,” Concurrency, IEEE, vol. 6, pp. 5–7, Jul 1998. [7] R. M. Metcalfe and D. R. Boggs, “Ethernet: distributed packet switching for local computer networks,” Commun. ACM, vol. 19, pp. 395–404, July 1976. [8] W. T. Sullivan, III, D. Werthimer, S. Bowyer, J. Cobb, D. Gedye, and D. Anderson, “A new major seti project based on project serendip data and 100,000 personal computers,” in IAU Colloq. 161: Astronomical and Biochemical Origins and the Search for Life in the Universe (C. Batalli Cosmovici, S. Bowyer, and D. Werthimer, eds.), p. 729, Jan. 1997. [9] M. S. Gordon, D. A. Jamshidi, S. Mahlke, Z. M. Mao, and X. Chen, “Comet: Code offload by migrating execution transparently,” in Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, pp. 93–106, 2012. [10] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, “Clonecloud: Elastic execution between mobile device and cloud,” in Proceedings of the Sixth Conference on Computer Systems, pp. 301–314, 2011. 112 [11] T. Wood, K. K. Ramakrishnan, P. Shenoy, and J. van der Merwe, “Cloudnet: Dynamic pooling of cloud resources by live wan migration of virtual machines,” in Proceedings of the 7th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environ- ments, pp. 121–132, 2011. [12] S. Yang, Y . Kwon, Y . Cho, H. Yi, D. Kwon, J. Youn, and Y . Paek, “Fast dynamic execution offloading for efficient mobile cloud computing,” in Pervasive Computing and Communi- cations (PerCom), 2013 IEEE International Conference on, pp. 20–28, March 2013. [13] E. Cuervo, A. Balasubramanian, D.-k. Cho, A. Wolman, S. Saroiu, R. Chandra, and P. Bahl, “Maui: Making smartphones last longer with code offload,” in Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services, pp. 49–62, 2010. [14] M. Li, V . Rozgica, G. Thatte, S. Lee, A. Emken, M. Annavaram, U. Mitra, D. Spruijt- Metz, and S. Narayanan, “Multimodal physical activity recognition by fusing temporal and cepstral information.,” IEEE Trans Neural Syst Rehabil Eng, vol. 18, no. 4, pp. 369– 80, 2010. [15] J. C. Sriram, M. Shin, T. Choudhury, and D. Kotz, “Activity-aware ecg-based patient authentication for remote health monitoring,” in ICMI-MLMI ’09: Proceedings of the 2009 international conference on Multimodal interfaces, pp. 297–304, 2009. [16] K. Patrick, W. G. Griswold, F. Raab, and S. S. Intill, “Health and the mobile phone,” American Journal of Preventive Medicine, vol. 35, no. 2, pp. 177–181, 2008. [17] J. Ryder, B. Longstaff, S. Reddy, and D. Estrin, “Ambulation: A tool for monitoring mobil- ity patterns over time using mobile phones,” in CSE ’09 - Volume 04, pp. 927–931, 2009. [18] J. Pan and W. J. Tompkins, “A real-time qrs detection algorithm,” Biomedical Engineering, IEEE Transactions on, vol. BME-32, pp. 230–236, March 1985. [19] The Open Source ECG Toolbox (OSET).http://ee.sharif.edu/ ˜ ecg. [20] N. Agrawal, V . Prabhakaran, T. Wobber, J. D. Davis, M. Manasse, and R. Panigrahy, “Design tradeoffs for ssd performance,” in ATC’08, pp. 57–70, 2008. [21] Y . Wang, B. Krishnamachari, Q. Zhao, and M. Annavaram, “Markov-optimal sensing pol- icy for user state estimation in mobile devices,” in IPSN ’10, pp. 268–278, 2010. [22] E. Jovanov, A. Milenkovic, C. Otto, and P. de Groen, “A wireless body area network of intelligent motion sensors for computer assisted physical rehabilitation,” Journal of Neu- roEngineering and Rehabilitation, vol. 2, no. 1, p. 6, 2005. [23] M. Annavaram, N. Medvidovic, U. Mitra, S. Narayanan, G. Sukhatme, Z. Meng, S. Qiu, R. Kumar, G. Thatte, and D. Spruijt-Metz, “Multimodal sensing for pediatric obesity appli- cations,” in UrbanSense08, November 2008. [24] S. Consolvo, D. W. McDonald, T. Toscos, M. Y . Chen, J. Froehlich, B. Harrison, P. Klasnja, A. LaMarca, L. LeGrand, R. Libby, I. Smith, and J. A. Landay, “Activity sensing in the wild: a field trial of ubifit garden,” in CHI ’08, pp. 1797–1806, 2008. 113 [25] N. Noury, “Ailisa : experimental platforms to evaluate remote care and assistive tech- nologies in gerontology,” in Enterprise networking and Computing in Healthcare Industry, 2005. HEALTHCOM 2005. Proceedings of 7th International Workshop on, pp. 67 – 72, June 2005. [26] E. Cuervo, A. Balasubramanian, D.-k. Cho, A. Wolman, S. Saroiu, R. Chandra, and P. Bahl, “Maui: making smartphones last longer with code offload,” in MobiSys ’10, pp. 49–62, 2010. [27] M. A. Viredaz, L. S. Brakmo, and W. R. Hamburgen, “Energy management on handheld devices,” Queue, vol. 1, no. 7, pp. 44–52, 2003. [28] E. Shih, P. Bahl, and M. J. Sinclair, “Wake on wireless: an event driven energy saving strategy for battery operated devices,” in MobiCom ’02, pp. 160–171, 2002. [29] J. Sorber, N. Banerjee, M. D. Corner, and S. Rollins, “Turducken: hierarchical power management for mobile devices,” in MobiSys ’05, pp. 261–274, 2005. [30] L. Stabellini and J. Zander, “Energy efficient detection of intermittent interference in wire- less sensor networks,” In International Journal of Sensor Networks (IJSNet), vol. 8, no. 1, 2010. [31] M. Shin, P. Tsang, D. Kotz, and C. Cornelius, “Deamon: energy-efficient sensor monitor- ing,” in SECON’09, pp. 565–573, 2009. [32] A. Silberstein, R. Braynard, and J. Yang, “Constraint chaining: on energy-efficient contin- uous monitoring in sensor networks,” in SIGMOD ’06, pp. 157–168, 2006. [33] N. Balasubramanian, A. Balasubramanian, and A. Venkataramani, “Energy consumption in mobile phones: a measurement study and implications for network applications,” in IMC ’09, pp. 280–293, 2009. [34] A. Balasubramanian, R. Mahajan, and A. Venkataramani, “Augmenting mobile 3g using wifi,” in MobiSys ’10, pp. 209–222, 2010. [35] A. J. Nicholson and B. D. Noble, “Breadcrumbs: forecasting mobile connectivity,” in Pro- ceedings of the 14th ACM international conference on Mobile computing and networking, MobiCom ’08, pp. 46–57, 2008. [36] M.-R. Ra, J. Paek, A. B. Sharma, R. Govindan, M. H. Krieger, and M. J. Neely, “Energy- delay tradeoffs in smartphone applications,” in Proceedings of the 8th international con- ference on Mobile systems, applications, and services, pp. 255–270, 2010. [37] Google.http://www.google.com. [38] R. Vall´ ee-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and V . Sundaresan, “Soot - a java bytecode optimization framework,” in Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research, CASCON ’99, pp. 13–, IBM Press, 1999. 114 [39] V . Sundaresan, L. Hendren, C. Razafimahefa, R. Vall´ ee-Rai, P. Lam, E. Gagnon, and C. Godin, “Practical virtual method call resolution for java,” in Proceedings of the 15th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’00, (New York, NY , USA), pp. 264–280, ACM, 2000. [40] R. Vallee-Rai and L. J. Hendren, “Jimple: Simplifying java bytecode for analyses and transformations,” 1998. [41] D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz, “Bap: A binary analysis platform,” in CAV’11, pp. 463–469, 2011. [42] D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena, “Bitblaze: A new approach to computer security via binary analysis,” in ICISS ’08, pp. 1–25, 2008. [43] S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y . Le Traon, D. Octeau, and P. McDaniel, “Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps,” in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, (New York, NY , USA), pp. 259–269, ACM, 2014. [44] A.-D. Schmidt, R. Bye, H.-G. Schmidt, J. Clausen, O. Kiraz, K. Yuksel, S. Camtepe, and S. Albayrak, “Static analysis of executables for collaborative malware detection on android,” in Communications, 2009. ICC ’09. IEEE International Conference on, pp. 1–5, June 2009. [45] A. Shabtai, Y . Fledel, and Y . Elovici, “Automated static code analysis for classifying android applications using machine learning,” in Computational Intelligence and Security (CIS), 2010 International Conference on, pp. 329–333, Dec 2010. [46] Hex-Rays, “Ida pro disassembler.” https://www.hex-rays.com/products/ ida. [47] J.-Y . Chen, B.-Y . Shen, Q.-H. Ou, W. Yang, and W.-C. Hsu, “Effective code discovery for arm/thumb mixed isa binaries in a static binary translator,” in Compilers, Architecture and Synthesis for Embedded Systems (CASES), 2013 International Conference on, pp. 1–10, Sept 2013. [48] objdump (part of the GNU Binutils). http://www.gnu.org/software/ binutils/. [49] “Fast library identification and recognition technology.” http://www.hex-rays. com/products/ida/tech/flirt/in_depth.shtml. [50] J. Kinder and H. Veith, “Jakstab: A static analysis platform for binaries,” in Computer Aided Verification, pp. 423–427, Springer, 2008. [51] L. De Moura and N. Bjørner, “Z3: An efficient smt solver,” in Tools and Algorithms for the Construction and Analysis of Systems, pp. 337–340, Springer, 2008. 115 [52] C. W. Barrett, D. L. Dill, and J. R. Levitt, “A decision procedure for bit-vector arithmetic,” in Proceedings of the 35th Annual Design Automation Conference, DAC ’98, (New York, NY , USA), pp. 522–527, ACM, 1998. [53] Monkey.http://developer.android.com/tools/help/monkey.html. [54] A library call tracer.http://linux.die.net/man/1/ltrace. [55] Process Trace.http://linux.die.net/man/2/ptrace. [56] A.-D. Schmidt, R. Bye, H.-G. Schmidt, J. Clausen, O. Kiraz, K. A. Y¨ uksel, S. A. Camtepe, and S. Albayrak, “Static analysis of executables for collaborative malware detection on android,” in Proceedings of the 2009 IEEE International Conference on Communications, pp. 631–635, 2009. [57] P. P. Chan, L. C. Hui, and S. M. Yiu, “Droidchecker: Analyzing android applications for capability leak,” in Proceedings of the Fifth ACM Conference on Security and Privacy in Wireless and Mobile Networks, pp. 125–136, 2012. [58] L. K. Yan and H. Yin, “Droidscope: Seamlessly reconstructing the os and dalvik semantic views for dynamic android malware analysis,” in Proceedings of the 21st USENIX Confer- ence on Security Symposium, pp. 29–29, 2012. [59] M. Naik and A. Rabkin, “jchord: A static and dynamic program analysis platform for java,” URL http://code. google. com/p/jchord. [60] Brian Klug, “Lava Xolo X900 Review - The First Intel Medfield Phone.”http://www. anandtech.com/show/5770. [61] Google Play.http://play.google.com. [62] L. Zhang, B. Tiwana, Z. Qian, Z. Wang, R. P. Dick, Z. M. Mao, and L. Yang, “Accurate online power estimation and automatic battery behavior based power model generation for smartphones,” in CODES/ISSS ’10, pp. 105–114, 2010. [63] S. Lee and M. Annavaram, “Wireless body area networks: Where does energy go?,” in IISWC ’12, pp. 25–35, nov. 2012. [64] Monsoon Power Monitor.http://www.msoon.com. [65] PassMark PerformanceTest Mobile.http://www.passmark.com. [66] Chess for Android. https://play.google.com/store/apps/details?id= com.google.android.chess. [67] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, “Clonecloud: elastic execution between mobile device and cloud,” in EuroSys ’11, pp. 301–314, 2011. 116 [68] M. S. Gordon, D. A. Jamshidi, S. Mahlke, Z. M. Mao, and X. Chen, “Comet: Code offload by migrating execution transparently,” in Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), (Hollywood, CA), pp. 93– 106, USENIX, 2012. [69] A. Wollrath, R. Riggs, and J. Waldo, “A distributed object model for the javatm system,” in COOTS’96, pp. 17–17, 1996. [70] N. Kassem and E. Team, Designing Enterprise Applications: Java 2 Platform. Addison- Wesley Longman Publishing Co., Inc., 2000. [71] T. B. Downing, Java RMI: remote method invocation. IDG Books Worldwide, Inc., 1998. [72] S. Yang, Y . Kwon, Y . Cho, H. Yi, D. Kwon, J. Youn, and Y . Paek, “Fast dynamic execution offloading for efficient mobile cloud computing,” in PerCom ’13, pp. 20–28, 2013. [73] R. Kemp, N. Palmer, T. Kielmann, and H. E. Bal, “Cuckoo: A computation offloading framework for smartphones.,” in MobiCASE (M. L. Gris and G. Y . 0001, eds.), vol. 76 of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommu- nications Engineering, pp. 59–79, Springer, 2010. [74] S. Kosta, A. Aucinas, P. Hui, R. Mortier, and X. Zhang, “Thinkair: Dynamic resource allocation and parallel execution in the cloud for mobile code offloading,” in INFOCOM ’12, pp. 945–953, 2012. [75] S. Lee and J. W. Jeon, “Evaluating performance of android platform using native c for embedded systems,” in Control Automation and Systems (ICCAS), 2010 International Con- ference on, pp. 1160–1163, Oct 2010. 117
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Energy optimization of mobile applications
PDF
Energy-efficient computing: Datacenters, mobile devices, and mobile clouds
PDF
Demand based techniques to improve the energy efficiency of the execution units and the register file in general purpose graphics processing units
PDF
Towards energy efficient mobile sensing
PDF
Energy proportional computing for multi-core and many-core servers
PDF
SLA-based, energy-efficient resource management in cloud computing systems
PDF
Thermal modeling and control in mobile and server systems
PDF
Optimizing task assignment for collaborative computing over heterogeneous network devices
PDF
Efficient memory coherence and consistency support for enabling data sharing in GPUs
PDF
Efficient techniques for sharing on-chip resources in CMPs
PDF
Cloud-enabled mobile sensing systems
PDF
Ensuring query integrity for sptial data in the cloud
PDF
Building straggler-resilient and private machine learning systems in the cloud
PDF
Crowd-sourced collaborative sensing in highly mobile environments
PDF
Enhancing collaboration on the edge: communication, scheduling and learning
PDF
AI-enabled DDoS attack detection in IoT systems
PDF
Resource underutilization exploitation for power efficient and reliable throughput processor
PDF
Efficient graph processing with graph semantics aware intelligent storage
PDF
Improving efficiency to advance resilient computing
PDF
Partitioning, indexing and querying spatial data on cloud
Asset Metadata
Creator
Lee, Sangwon
(author)
Core Title
A framework for runtime energy efficient mobile execution
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
02/19/2016
Defense Date
12/11/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Android,Cloud,computation offloading,increasing battery life,Mobile,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Annavaram, Murali (
committee chair
), Krishnamachari, Bhaskar (
committee chair
), Nakano, Aiichiro (
committee member
)
Creator Email
itools@gmail.com,sangwonl@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-210295
Unique identifier
UC11277830
Identifier
etd-LeeSangwon-4118.pdf (filename),usctheses-c40-210295 (legacy record id)
Legacy Identifier
etd-LeeSangwon-4118.pdf
Dmrecord
210295
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Lee, Sangwon
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Android
computation offloading
increasing battery life