Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Cloud-enabled mobile sensing systems
(USC Thesis Other)
Cloud-enabled mobile sensing systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CLOUD-ENABLED MOBILE SENSING SYSTEMS by Moo-Ryong Ra A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2013 Copyright 2013 Moo-Ryong Ra Acknowledgements I would like to express my deepest appreciation to Professor Ramesh Govindan for being an excellent mentor and providing me with insightful guidance toward academic excellence. All of my work in this dissertation has been guided by him and, as a result, has been improved tremendously. I would also like to thank my dissertation committee, including Professor Fei Sha, Professor Antonio Ortega, and Professor Gerard Medioni, for their insightful feedback and guidance on improving the quality of this document. Throughout this dissertation, I have been honored to have the opportunity to collaborate with many promi- nent researchers in both universities and industrial research labs: P3 (Chapter 3) is a joint work with Professor Antonio Ortega; Medusa (Chapter 4) is a joint work with Bin Liu and Professor Tom La Porta; Odessa (Chapter 2) is a joint work with Anmol Sheth, Professor David Wetherall, Lily Mummert, and Padmanabhan Pillai; Urban Tomography and SALSA (Chapter 5) is a joint work with Jeongyeup Paek, Abhishek Sharma, Professor Martin H. Krieger, and Professor Michael J. Neely. Although not included in this dissertation, a joint work also exists with Bodhi Priyantha, Aman Kansal, and Jie Liu. Finally, I would also like to thank my family for their sincere support; specifically my beloved wife Jean Choe, my parents Jung-Do Ra and Myung-Sook Hong, my parents-in-law Man-Kee Choe and Tae Choe, and lastly my lovely daughter, Serena S. Ra. Without their support, this dissertation would not exist. ii Table of Contents Acknowledgements ii List of Tables vi List of Figures vii Abstract x Chapter 1: Introduction 1 1.1 Dissertation Overview and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2: Odessa: Enabling Interactive Perception Applications on Mobile Devices 8 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Metrics and Methods for Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Interactive Perception Applications . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.3 Sprout: A Parallel Processing Framework . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Factors Affecting Application Performance . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2 Input Data Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.3 Variability Across Mobile Platforms . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.4 Network Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.5 Effects of Data-Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.6 Effects of Pipeline Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4 Design and Implementation of Odessa . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.1 Lightweight Application Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.2 Decision Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.2.1 Adapting Data Parallelism and Stage Offloading . . . . . . . . . . . . . 26 2.4.2.2 Adapting Pipeline Parallelism . . . . . . . . . . . . . . . . . . . . . . . 29 2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.1 Odessa’s Performance and Overhead . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.2 Comparison With Other Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5.3 Adapting to Varying Execution Contexts . . . . . . . . . . . . . . . . . . . . . . . 35 2.5.3.1 CPU Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.5.3.2 Network Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.4 Data-Parallelism and Application Fidelity . . . . . . . . . . . . . . . . . . . . . . 38 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 iii Chapter 3: P3: Toward Privacy-Preserving Photo Sharing 40 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.1 Image Standards, Compression and Scalability . . . . . . . . . . . . . . . . . . . 42 3.2.2 Threat Model, Goals and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3 P3: The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.2 Sender-Side Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.3 Recipient-side Decryption and Reconstruction . . . . . . . . . . . . . . . . . . . 49 3.3.4 Algorithmic Properties of P3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.4 P3: System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.1 P3 Architecture and Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.5.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5.2.1 The Threshold vs. Storage Tradoff . . . . . . . . . . . . . . . . . . . . 61 3.5.2.2 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.5.3 What is Lost? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Chapter 4: Medusa: A Programming Framework for Crowd-Sensing Applications 71 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Crowd-Sensing: Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3 Medusa System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.3.1 Architectural Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.3.2 How Medusa Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.4 MEDUSA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.4.1 The MedScript Programming Language . . . . . . . . . . . . . . . . . . . . . . . 82 4.4.2 Medusa Cloud Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.4.3 Medusa Runtime on the Smartphone . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.5.1 Language Expressivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.5.2 Concurrent Task Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.5.3 Scalability and Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.5.4 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Chapter 5: SALSA: Energy-Delay Tradeoffs in Smartphone Applications 109 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.2 Problem Statement, Model and Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.3 The Link Selection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.3.1 SALSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.3.2 Theoretical Properties of SALSA . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.3.3 Practical Considerations for SALSA . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.3.4 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.4.2 Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 iv Chapter 6: Literature Review 144 6.1 Offloading Computation from Mobile Devices to the Cloud . . . . . . . . . . . . . . . . . 144 6.2 Preserving Privacy on Photo-Sharing Services . . . . . . . . . . . . . . . . . . . . . . . . 146 6.3 Crowd-Sensing: Crowd-Sourcing with Wireless Remote Sensing . . . . . . . . . . . . . . 147 6.4 Managing Heterogeneous Network Interfaces on Mobile Devices . . . . . . . . . . . . . . 149 Chapter 7: Conclusions 152 References 153 Appendix A P3 Supplement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 A.1 The Examined Images on Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Appendix B Medusa Supplement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 B.1 MedScript Program Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 B.1.1 Auditioning App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 B.1.2 Citizen Journalist App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 B.1.3 Collaborative Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 B.1.4 Forensic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 B.1.5 Spot Reporter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 B.1.6 WiFi Scanner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 B.1.7 Bluetooth Scanner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 B.1.8 Road-Bump Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 B.1.9 Party Thermometer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Appendix C SALSA Supplement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 C.1 Derivation of SALSA Control Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 C.2 SALSA: Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 v List of Tables 2.1 Table summarizes the data flow graph of the three computer vision applications along with average makespan and frame rate measured when the application is running locally on netbook platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 The ten different network conditions emulated by Dummynet. The WAN bandwidths are symmetric and the RTT for the LAN configurations were under a millisecond. . . . . . . 16 2.3 Median speedup in the overall application performance across the two devices. . . . . . . 20 2.4 The accuracy and mean execution time of face detection with increasing number of worker threads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5 Average execution time in seconds and number of SIFT features detected by each thread for the object and pose recognition application. . . . . . . . . . . . . . . . . . . . . . . . 22 2.6 The table shows the stages that were offloaded to the server and the number of instances of each stage offloaded on the server by Odessa. The average degree of pipeline parallelism is shown in boldface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.7 The table shows the instances of the offloaded stages along with the number of tokens (in boldface) in the pipeline for the Domain Specific partition of the application graph. . . . . 33 4.1 Stage Binaries in Stage Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.2 Medusa System Components and Properties . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.3 Implemented Crowd-Sensing Apps. ( Line Of Code) . . . . . . . . . . . . . . . . . . . . 96 4.4 Line of Code Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.5 Delay break-down on the server (Unit: msec) . . . . . . . . . . . . . . . . . . . . . . . . 103 4.6 Delay break-down on the phone (Unit: msec) . . . . . . . . . . . . . . . . . . . . . . . . 103 4.7 The defense capability of static analyzer against arbitrary code modification. . . . . . . . . 106 5.1 Scan Cost Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 vi List of Figures 1.1 My Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 Overview of the Odessa runtime system. . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Data flow graph for the three computer vision applications. . . . . . . . . . . . . . . . . . 13 2.3 Variation in the per frame makespan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 Variation in the number of features extracted per frame. . . . . . . . . . . . . . . . . . . . 18 2.5 The figures show the variability in the completion time of three example stages running on the laptop and netbook device. These stages perform a fixed operation on the input image that is independent of the scene content. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6 Impact of the network on the performance of the face recognition application. . . . . . . . 21 2.7 Pipeline parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.8 Algorithm for adaptive offloading and data-parallelism . . . . . . . . . . . . . . . . . . . 27 2.9 Figure shows the decisions made by Odessa across the first 200 frames along with the impact on the makespan and frame rate for the three applications on the netbook. . . . . . 30 2.10 Figures shows the frame rate and makespan achieved by Odessa along with two statically partitioned application configurations across the two client devices. . . . . . . . . . . . . . 34 2.11 Figures shows the frame rate and makespan achieved by Odessa for pose detection, com- pared to the Offline-Optimizer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.12 The ability of Odessa to adapt to abrupt changes in the number of CPU cores available to the application. The number of cores are increased from 2 to 8 at frame number 250. . . . 36 2.13 Odessa adapting to changes in network performance. The network bandwidth is reduced from 100 Mbps to 5 Mbps at frame number 1237. Odessa pulls back the offloaded stages from the server to the local machine to reduce the data transmitted over the network. . . . 37 2.14 The number of features detected across different number of detection worker threads. . . . 38 vii 3.1 Privacy-Preserving Image Encoding Algorithm . . . . . . . . . . . . . . . . . . . . . . . 46 3.2 P3 Overall Processing Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3 P3 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4 Screenshot(Facebook) with/without decryption . . . . . . . . . . . . . . . . . . . . . . . 60 3.5 Threshold vs. Size (error bars=stdev) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.6 PSNR results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.7 Baseline - Encryption Result (T: 1,5,10,15,20) . . . . . . . . . . . . . . . . . . . . . . . . 62 3.8 Privacy on Detection Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.9 Privacy on Feature Extraction and Face Recognition Algorithms . . . . . . . . . . . . . . 65 3.10 Canny Edge Detection on Public Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.11 Bandwidth Usage Cost (INRIA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2 Illustration of video documentation task . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3 Video Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4 Collaborative Learning (Full listing in Appendix B.1) . . . . . . . . . . . . . . . . . . . . 97 4.5 Auditioning (Full listing in Appendix B.1) . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.6 Forensic Analysis (Full listing in Appendix B.1) . . . . . . . . . . . . . . . . . . . . . . . 98 4.7 WiFi/Bluetooth Scanner (Full listing in Appendix B.1) . . . . . . . . . . . . . . . . . . . 99 4.8 Road-Bump Monitoring (Full listing in Appendix B.1) . . . . . . . . . . . . . . . . . . . 99 4.9 Citizen Journalist (Full listing in Appendix B.1) . . . . . . . . . . . . . . . . . . . . . . . 100 4.10 Party Thermometer (Full listing in Appendix B.1) . . . . . . . . . . . . . . . . . . . . . . 100 4.11 Concurrent Execution of Multiple MedScript Programs . . . . . . . . . . . . . . . . . . . 101 4.12 Failure recovery: turn off the phone in the middle . . . . . . . . . . . . . . . . . . . . . . 105 4.13 Robustness: forcing a limit on computation . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.14 Failure recovery: harnessing network data usage . . . . . . . . . . . . . . . . . . . . . . . 107 5.1 Urban Tomography System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 viii 5.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.3 Arrival Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.4 (CDF) Link availability with failure probability . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.5 MINIMUM-DELAY vs WIFI-ONLY vs SALSA . . . . . . . . . . . . . . . . . . . . . . . . 131 5.6 Practical implication of SALSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.7 SALSA envelopes for differenta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.8 Performance across different environments. . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.9 STATIC-DELAY vs KNOW-WIFI vs SALSA . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.10 Energy Measurement Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.11 SALSA performance for different scanning intervals . . . . . . . . . . . . . . . . . . . . 138 5.12 Experimental result at the USC Campus compared to simulation results . . . . . . . . . . 141 5.13 Experimental result at Shopping Mall compared to simulation results . . . . . . . . . . . . 142 5.14 Exp. Walk Route . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 A.1 A boat image from USC-SIPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 A.2 A tree from USC-SIPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 A.3 A vegitable image from USC-SIPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 A.4 A baboon image from USC-SIPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 ix Abstract Smart mobile devices have increasingly become the computing platform of choice. As distinct features, they have sensors and many useful applications based on the sensors have been developed and are be- ing widely used. Not surprisingly, many of these applications rely on the cloud because of the resource constraints inherent to mobile devices. In this dissertation, we explore systems support for enabling the efficient processing and secure sharing of sensor data using the cloud. To achieve this goal, we adaptively use the cloud for mobile devices based on the availability of hardware components, networks, and the state of human workers. When appropriate, we exploit domain knowledge and use mathematical analysis to fulfill our goals and system requirements. We first enable compute- and data-intensive applications on mobile devices. Some emerging appli- cations, such as interactive mobile perception applications, are too slow to run on mobile devices due to both their high data-rate workload (real-time videos) and compute-intensive algorithms used (computer vision-based ones). Thus, we need to significantly improve the performance by offloading part of the application components to the cloud and parallelizing it if relevant. To achieve this goal, we conducted a measurement study on the factors that can affect the performance of the applications and developed a novel, lightweight, runtime that automatically and adaptively makes offloading and parallelism decisions for mobile interactive perception applications. We next focus on serious privacy infringements, such as the leakage of photos and the use of algorith- mic recognition technologies by providers, when using cloud-based photo sharing services. Since currently we are required to expose our data to the providers without any controls, it is impossible for users to pre- vent the providers from such mining activities. On the other hand, the providers perform useful processing x on our data to make our experience with their system better, e.g., scaling images to support mobile devices with different form factors appropriately. In order to simultaneously achieve both privacy protection and processing benefits provided by the cloud, we develop an image encryption/decryption algorithm and an associated system that can transparently work with existing cloud-based service providers. Third, we focus on crowd-sensing, a novel capability that harnesses the power of crowds to collect sensor data from a large number of mobile phone users. In such activities, it is often challenging for users to efficiently deal with labor-intensive sub-tasks, such as recruiting workers, giving incentives, etc. However, existing programming systems do not handle these concerns appropriately. Our domain-specific language and runtime enables crowd-sensing and provides significant automation for such tasks. Users only need to provide a high-level description of a crowd-sensing task. Then, the runtime automatically takes care of the rest. Fourth, When sharing large volumes of sensor data using mobile devices, we always have energy concerns. Modern smart mobile devices have multiple wireless interfaces, such as 3G/EDGE, WiFi, etc., for data transfer, but there is considerable variability in the availability and achievable data transfer rate for these networks. On the other hand, many mobile applications are often delay-tolerant, so that it is possible to delay data transfers until a lower-energy network connection becomes available. we present a principled approach to trade-off energy and delay on mobile devices using Lyapunov optimization framework. The resulting algorithm can automatically adapt to channel conditions and requires only location information to decide whether and when to defer a transmission. xi Chapter 1 Introduction In recent years, smart mobile devices, such as smartphones and tablets, have increasingly become the computing platform of choice. Compared to desktop or laptop computers, one of the distinctive features of smart mobile devices is the presence of sensors. Modern smart mobile devices already have a variety of sensors, including cameras, GPS systems, microphones, compasses, and motion sensors, such as ac- celerometers and gyroscopes, and many others. Thus, smart mobile devices now can sense, recognize, and share users’ contextual data. Based on these sensors, many useful mobile sensing applications have been developed. Video chatting, photo sharing, and location-based services are widely used and now have a large user-base. One fact worth pointing out is that many of these useful applications rely on the cloud. This is mainly because smart mobile devices are still resource-constrained in many ways, e.g., battery, stor- age, network, programmability, etc. To significantly expand their capability, cloud technology (driven by leading IT companies equipped with millions of compute-cores and unlimited storage space) has certainly presented a great opportunity to augment the capabilities of mobile devices. Along with these trends, this dissertation focuses on mobile systems and applications that rely on both sensors and cloud infrastructure, which we term “cloud-enabled mobile sensing systems and appli- cations”. As we mentioned, many useful cloud-enabled mobile sensing applications already exist that collect, process, and share sensor data. Nonetheless, since smart mobile devices are acquiring more ca- pabilities and can be easily carried everywhere, what people desire to do with their smart mobile devices 1 82 Processing Sharing Context Challenges Odessa Performance P3 Privacy Medusa Programmability SALSA Energy & Delay Figure 1.1: My Contributions is also evolving. Users of smart mobile devices are eager to have faster applications with more features and a longer battery life in order to make their lives more efficient and become better connected to one another. However, when processing and sharing sensor data on mobile devices, many unsolved problems currently exist. For instance, it is often hard to efficiently collect, process, or share sensor data. This ineffi- ciency can be caused by high data-rate sensors, such as cameras, due to their data-intensive workloads, the use of smartphone crowds because of human involvement, or the aggressive use of low-quality networks. Thus, without relevant solutions, we will lose opportunities to enable novel applications and quickly be confined within resource limits of mobile devices. Moreover, sharing sensor data with our friends and acquaintances, through the use of social networking websites and other cloud-based services, often reveals users’ personal contexts in unexpected ways. To address these concerns, solutions to these four different challenges are presented in this dissertation, shown in Figure 1.1. Performance. Some emerging applications, such as interactive mobile perception applications, are too slow to run on mobile devices due to both their high data-rate workload (real-time videos) and compute- intensive algorithms used (computer vision-based ones). Thus, we need to significantly improve the per- formance by offloading part of the application components to the cloud and parallelizing it if relevant. What factors impact offloading and parallelism decisions and how should we design underlying systems to enable such workloads on mobile devices? 2 Privacy and Security. When sharing our sensor data, e.g., photos, with others using the cloud, currently we are required to expose our data to providers without any controls. Therefore, there is no way to prevent the providers from mining our data in order to infer our personal interests and other sensitive information. On the other hand, the providers perform useful processing on our data to make our experience with their system better, e.g., scaling images to support mobile devices with different form factors appropriately. Can we protect ourselves from unintended use of our data, while still maintaining useful processing benefits provided by the service providers? Programmability. When we want to collect and process large volumes of data from smartphone crowds, it is challenging for users to efficiently deal with labor-intensive sub-tasks, such as recruiting workers, giving incentives, curating intermediate results, etc. However, existing programming systems do not handle these concerns appropriately. What kind of programming support do we need for efficient sensor data collection and processing? Energy and Delay. Modern smart mobile devices have multiple wireless interfaces, such as EDGE/3G/4G and WiFi, for data transfer, but there is considerable variability in the availability and achievable data transfer rate for these networks. On the other hand, many mobile applications, that share sensor data using the cloud, are often delay-tolerant, so that it is possible to delay data transfers until a lower-energy network connection becomes available. How do we design the control algorithm that can efficiently trade-off energy and delay? Given these challenges, my research goal is to enable efficient processing and secure sharing of sensor data using the cloud. Throughout this dissertation, we advance the state of knowledge in several ways described below. 1.1 Dissertation Overview and Contributions This dissertation made four major contributions to the field of mobile and cloud computing. 3 Enabling mobile perception applications (Chapter 2). Mobile devices are not currently powerful enough to run responsive applications that recognize objects, people, or gestures from real-time video. Hence, they need to leverage computation on external resources. The two key questions that impact per- formance are what computation to offload and how to structure the parallelism across the mobile device and server. To answer these questions, we developed and evaluated three interactive perceptual applica- tions. We found that offloading and parallelism choices should be dynamic, even for a given application, as performance depends on scene complexity as well as environmental factors such as the network and de- vice capabilities. To this end, we developed Odessa, a novel, lightweight, runtime that automatically and adaptively makes offloading and parallelism decisions for mobile interactive perception applications. The evaluation shows that Odessa’s incremental greedy strategy converges to an operating point that is close to an ideal offline partitioning. It provides more than a 3x improvement in application performance over partitioning suggested by domain experts. Odessa works well across a variety of execution environments, and responds quickly to changes in the network, device, and application inputs. Privacy-preserving photo sharing (Chapter 3). With the increasing use of mobile devices, photo sharing services are experiencing greater popularity. Aside from providing storage, photo sharing services enable bandwidth-efficient downloads to mobile devices by performing server-side image transformations (resiz- ing, cropping). On the flip side, photo sharing services have raised privacy concerns such as leakage of photos to unauthorized viewers and the use of face recognition technologies by providers. To address these concerns, we proposed a privacy-preserving photo encoding algorithm that extracts and encrypts a small, but significant, component of the photo, while preserving the remainder in a standards-compatible form. These two components can be separately stored. This technique significantly reduces the signal-to-noise ratio and the accuracy of automated detection and recognition on the publicly available photo, while pre- serving the ability of the provider to perform server-side transformations to conserve download bandwidth usage. Our prototype privacy-preserving photo sharing system, P3, works with Facebook and Flickr, and 4 can be extended to other services as well. P3 requires no changes to existing services or mobile application software, and adds minimal photo storage overhead. Programming crowd-sensing task (Chapter 4). The ubiquity of smartphones and their on-board sensing capabilities motivate crowd-sensing, a capability that harnesses the power of crowds to collect sensor data from a large number of mobile phone users. Unlike previous work on wireless sensing, crowd- sensing poses several novel requirements: support for humans-in-the-loop to trigger sensing actions or review results, the need for incentives, and privacy and security issues. Beyond existing crowd-sourcing systems, crowd-sensing exploits the sensing and processing capabilities of mobile devices. In this work, we designed and implemented Medusa, a novel programming framework for crowd-sensing that satisfies these requirements. Medusa provides high-level abstractions for specifying the steps required to complete a crowd-sensing task, and employs a distributed runtime system that coordinates the execution of these tasks between smartphones and a cluster on the cloud. We have implemented ten crowd-sensing tasks on a prototype of Medusa. We found that Medusa task descriptions are two orders of magnitude smaller than standalone systems required to implement those crowd-sensing tasks and that the runtime has low overhead and is robust to failures and resource attacks. Trading-off delay for reduced energy (Chapter 5). Nowadays, many mobile applications are enabled by the ability to capture videos on a smartphone and to have these videos uploaded to an Internet-connected server. This capability requires the transfer of large volumes of data from the phone to the infrastructure. Smartphones have multiple wireless interfaces, e.g., 3G/EDGE and WiFi, for data transfer. There is, how- ever, considerable variability in the availability and achievable data transfer for these networks. Moreover, the energy costs for transmitting a given amount of data on these wireless interfaces can differ by an order of magnitude. On the other hand, many of these applications are often naturally delay-tolerant so that it is possible to delay data transfers until a lower-energy WiFi connection becomes available. In this work, we present a principled approach for designing an optimal online algorithm for this energy-delay tradeoff using the Lyapunov optimization framework. Our algorithm, called SALSA, can automatically adapt to 5 channel conditions and requires only local information to decide whether and when to defer a transmission. We evaluated SALSA using real-world traces as well as experiments with a prototype implementation on a modern smartphone. The results show that SALSA can be tuned to achieve a broad spectrum of energy- delay tradeoffs, is closer to an empirically-determined optimal than any of the alternatives we compare it to, and can save 10-40% of battery capacity for some workloads. This work is motivated by the Urban Tomography [124] system, which enables audiovisual documentation of urban environments. We devel- oped this system and it had been used for surveillance at LAX International Airport and deployed at other places for more than two years [72]. 1.2 Dissertation Outline This dissertation is organized as follows. In Chapter 2, we present Odessa. We first introduce emerging mobile applications, characterize the workload, and discuss the design and implementation of the Odessa runtime engine. Extensive evaluation results on our prototype system are also presented. In Chapter 3, we present P3. We first introduce some examples of serious privacy infringements when using cloud-based photo sharing services. We then describe our image encryption and decryption algorithm and propose a system architecture that can transparently work with existing services. We also show our prototype implementation with the Facebook system and present extensive evaluation results regarding automatic recognition technologies. In Chapter 4, we discuss Medusa. We present several examples of crowd-sensing tasks and their capa- bilities. To enable such capabilities, we present the design and implementation of a complete programming system, which includes a domain-specific programming language and an associated runtime system. We discuss its effectiveness with 10 prototype applications. In Chapter 5, we discuss SALSA. We present how we may exploit application delay-tolerance to save energy and define the link-selection problem. We then illustrate our modeling effort based on Lyapunov 6 analysis. We show extensive evaluation results based on real-world traces and on the implementation on modern smartphones. In Chapter 6, we present a comprehensive overview of related work in the literature. Finally, in Chap- ter 7, we summarize our work and conclude the dissertation. 7 Chapter 2 Odessa: Enabling Interactive Perception Applications on Mobile Devices Resource constrained mobile devices need to leverage computation on nearby servers to run responsive applications that recognize objects, people, or gestures from real-time video. The two key questions that impact performance are what computation to offload, and how to structure the parallelism across the mo- bile device and server. To answer these questions, we develop and evaluate three interactive perceptual applications. We find that offloading and parallelism choices should be dynamic, even for a given appli- cation, as performance depends on scene complexity as well as environmental factors such as the network and device capabilities. To this end we develop Odessa, a novel, lightweight, runtime that automatically and adaptively makes offloading and parallelism decisions for mobile interactive perception applications. Our evaluation shows that the incremental greedy strategy of Odessa converges to an operating point that is close to an ideal offline partitioning. It provides more than a 3x improvement in application perfor- mance over partitioning suggested by domain experts. Odessa works well across a variety of execution environments, and is agile to changes in the network, device and application inputs. 8 2.1 Introduction As the processing, communication and sensing capabilities of mobile devices increase, a new class of mobile interactive perception applications is emerging. These applications use cameras and other high- data rate sensors to perform perception tasks, like face or object recognition, and enable natural human- machine interfaces and interactive augmented-reality experiences on mobile devices [70, 87]. For example, face recognition could be used by a social networking application that recognizes people as the user sweeps the camera across a crowded room; a gesture recognition based natural user interface could be used to control a media application running on the mobile device; and object and pose recognition can be used by an augmented reality shopping application that overlays information about an object in the user’s hand. Interactive perception applications have a unique set of requirements that stress the capabilities of mo- bile devices. First, interactive applications require crisp response. For example, to feel responsive, an augmented reality application would need to display results well under a second. Second, these applica- tions require continuous processing of high data rate sensors such as cameras to maintain accuracy. For example, a low frame rate may miss intermediate object poses or human gestures. Third, the computer vi- sion and machine learning algorithms used to process this data are compute intensive. For example, in one of the applications we study, extracting features from an image can take 7 seconds on a netbook. Finally, the performance of these algorithms is highly variable and depends on the content of the data, which can vary greatly. These requirements cannot be satisfied on today’s mobile devices alone. Even though the computing and communication capabilities of these platforms are improving, interactive perception applications will continue to push platform limits as new, more accurate but more compute-intensive algorithms are de- veloped. However, two techniques can help make mobile interactive perception a reality: offloading one or more of the compute-intensive application components to an Internet-connected server, and using par- allelism on multi-core systems to improve responsiveness and accuracy of the applications. Fortunately, interactive perception applications can be structured for offloading, and provide considerable opportunities 9 Sprout Mobile Device Stages of Interactive Pe rception Application Sprout Multi-Core Server Input Video Stream Network Odessa Stages of Interactive Pe rception Application Figure 2.1: Overview of the Odessa runtime system. to exploit parallel processing. In this chapter, we describe a runtime called Odessa that automatically and adaptively determines how best to use these techniques. This chapter makes three contributions. First, it provides an understanding of the factors which con- tribute to the offloading and parallelism decisions. We show through extensive experiments (Section 2.3) on three interactive perception applications that neither offloading decisions nor the level of data or pipeline parallelism can be determined statically and must be adapted at runtime. This is because both responsive- ness and accuracy change dramatically with input variability, network bandwidth, and device characteris- tics at runtime. Our second contribution is the design of Odessa (Figure 2.1, Section 2.4), a lightweight, adaptive runtime for mobile interactive perception applications. To our knowledge, Odessa is the first work to explore the simultaneous adaptation of offloading and level of parallelism with the goal of jointly improving responsiveness and accuracy. The key insight of our work is that the dynamics and parallelism requirements of interactive perception applications preclude prior approaches that use offline profiling and optimization based partitioning [33, 37]; instead, a simpler greedy and incremental approach delivers good performance. Finally, we provide experimental results (Section 2.5) on an implementation of Odessa that show more than 3x improvement in performance compared to a configuration by a domain expert, and 10 comparable performance to an idealized offline configuration computation that assumes infinite server re- sources and complete offline performance profiling. Our results also show that Odessa works well across a variety of execution environments, and is agile to changes in the network, device and application inputs. Odessa is qualitatively different from prior work that uses networked computing infrastructure to en- hance the capabilities of mobile devices. It is complementary to work on using offloading for conserving energy on the mobile device (e.g., MAUI [37]). Moreover, it does not require prior information on appli- cation performance [37, 94] or a set of feasible candidate partitions [20, 33] to make offloading decisions. 2.2 Background In this section, we describe the metrics and methods related to adaptive offloading and parallelism, we describe the set of interactive perception applications studied, and then discuss Sprout, a distributed pro- gramming framework on which our system is based. 2.2.1 Metrics and Methods for Adaptation Two measures of goodness characterize the responsiveness and accuracy requirements of interactive per- ception applications. Makespan is the time taken to execute all stages of a data flow graph for a single frame. The makespan is a measure of the responsiveness of the application: a low makespan ensures fast completion of a recog- nition task and thereby improves user satisfaction. Throughput is the rate at which frames are processed and is a measure of the accuracy of the application. A low frame rate may miss intermediate object poses or human gestures. Any runtime system for interactive perception must simultaneously strive to minimize makespan and maximize throughput. In general, the lower the makespan and the higher the throughput the better, but the applications can become unusable at makespans over a second, or throughput under 5 fps. 11 Application # of Stages Avg. Makespan Frame Rate Face Recognition 9 2.09 s 2.50 fps Object and Pose Recognition 15 15.8 s 0.09 fps Gesture Recognition 17 2.54 s 0.42 fps Table 2.1: Table summarizes the data flow graph of the three computer vision applications along with average makespan and frame rate measured when the application is running locally on netbook platform. In adapting interactive perception applications on mobile, three techniques can help improve makespan and throughput. Offloading moves the most computationally-intensive stages onto the server in order to reduce makespan. Pipelining allows different stages of the application (whether running on the mobile device or the server) to process different frames in parallel, thereby increasing throughput. Increasing data-parallelism, in which frames are split into multiple sub-frames that are then processed in parallel (either on a multi-core mobile device or a server or cluster), can reduce the makespan by reducing the computation time of a stage. Data and pipeline parallelism provide great flexibility in the degree to which they are used. These techniques are not mutually exclusive: pipelining is possible even when some stages are offloaded, and data-parallel execution is possible on offloaded stages, etc. The goal of Odessa is to decide when and to what degree to apply these techniques to improve both the makespan and throughput of interactive perception applications. 2.2.2 Interactive Perception Applications We use three qualitatively different interactive perception applications described below, both to motivate the problem and to evaluate the efficacy of our solutions. Often computer-vision based applications are naturally described using a data-flow model. Figure 2.2 describes the data-flow graphs for the three appli- cations, as implemented on Odessa and Sprout. Face Recognition. Figure 2.2(a) shows the application graph for face recognition. The application consists of two main logical blocks consisting of face detector and the classifier. Face detection is done using the 12 (a) Face Recognition (b) Object and Pose Recognition (c) Gesture Recognition Figure 2.2: Data flow graph for the three computer vision applications. default OpenCV [9] Haar Classifier. The face classifier takes as input the detected faces and runs an online semi-supervised learning algorithm [73] to recognize the faces from a data set of 10 people. Object and Pose Recognition Figure 2.2(b) shows the data-flow graph for the object instance and pose recognition application [112]. The application consists of four main logical blocks. As shown in the figure, each image first passes through a proportional down-scaler. SIFT features [80] are then extracted from the image, and matched against a set of previously constructed 3D models for the objects of interest. The features for each object are then clustered by position to separate distinct instances. A random sample consensus (RANSAC) algorithm with a non-linear optimization is used to recognize each instance and estimate its 6D pose. Gesture Recognition. Figure 2.2(c) shows the application graph for a gesture recognition application. Each video frame is sent to two separate tasks, face detection and motion extraction. The latter accumulates frame pairs, and then extracts SIFT-like features that encode optical flow in addition to appearance [30]. These features, filtered by the positions of detected faces, are aggregated over a window of frames using a 13 previously-generated codebook to create a histogram of occurrence frequencies. The histogram is treated as an input vector to a set of support vector machines trained for the control gestures. 2.2.3 Sprout: A Parallel Processing Framework Odessa is built on Sprout [103], a distributed stream processing system designed to make developing and executing parallel applications as easy as possible by harnessing the compute power of commodity multi-core machines. Unlike programming frameworks for parallelizing offline analysis of large data sets (MapReduce [39] and Dryad [66]), Sprout is designed to support continuous, online processing of high rate streaming data. Sprout’s abstractions and runtime mechanisms, described below, are well-suited to support the Odessa runtime system. Programming Model. Applications in the Sprout framework are structured as a data flow graphs; the data flow model is particularly well suited for media processing applications that perform a series of operations to an input video or audio stream. The vertices of the graph are processing steps called stages and the edges are connectors which represent the data dependencies between the stages. Stages within an application employ a shared-nothing model: they share no state, and interact only through connectors. This restriction keeps the programming complexity of individual stages comparable to that of sequential programming, and allows concurrency to be managed by the Sprout runtime. This programming model allows programmers to express coarse-grained application parallelism while hiding much of the complexity of parallel and distributed programming from the application developer. Automated data transfer. Sprout connectors define data dependencies and perform data transfer between processing stages. The underlying implementation of a connector depends on the location of the stage endpoints. If the connected stages are running in the same process, the connector is implemented as an in- memory queue. Otherwise, a TCP connection is used. The Sprout runtime determines the connector type, 14 and handles serialization and data transport through connectors transparently. This allows a processing stage to be written in a way that is agnostic to whether it or related processing steps have been off-loaded. Parallelism Support. The Sprout runtime supports coarse-grained data parallelism and pipeline paral- lelism. Data parallelism is supported by having multiple instances of a stage execute in parallel on separate processor cores. Pipeline parallelism is supported by having multiple frames be processed simultaneously by the different processing stages of the application. The relationship between Sprout and Odessa. Sprout provides programmers with mechanisms to dy- namically adjust running applications, change the degree of parallelism, and migrate processing stages between machines. In Odessa, we develop adaptive techniques for determining when and which stages to offload, and deciding how much pipelining and data-parallelism is necessary in order to achieve low makespan and high throughput, and then leverage the Sprout mechanisms to effect the changes. 2.3 Factors Affecting Application Performance In this section we present experimental results that highlight how the performance of interactive perception applications is impacted by multiple factors. We find that: Input variability, network bandwidth, and device characteristics can impact offloading decisions sig- nificantly, so such decisions must be made adaptively and cannot be statically determined at compile time. Once some stages have been offloaded, different choices for data-parallelism can lead to significantly different application performance, and the data-parallelism levels cannot be determined a priori. Finally, even with adaptive offloading and data-parallelism, a static choice of pipeline-parallelism can lead to suboptimal makespan or leave the pipeline underutilized. 15 Network Configuration LAN 100 Mbps WAN 20 ms 30 Mbps, 20 ms RTT, Loss rate 0% WAN 40 ms 30 Mbps, 40 ms RTT, Loss rate 0% LAN 802.11g 25 Mbps, Loss rate 0%, 1%, 2% LAN 802.11b 5.5 Mbps, Loss rate 0%, 1%, 2% 3G 0.5 Mbps, 500 ms RTT, Loss rate 0% Table 2.2: The ten different network conditions emulated by Dummynet. The WAN bandwidths are sym- metric and the RTT for the LAN configurations were under a millisecond. 2.3.1 Experimental Setup Hardware. Our experimental setup consists of two different mobile devices and a server. The two mobile devices are a netbook with a single-core Atom processor (Intel N270 processor) running at 1.4 GHz with hyper-threading turned off and 0.5 MB of cache, and a dual-core laptop with each core running at 1.8 GHz (Intel T5500 processor) that does not support hyper-threading and 2 MB of cache. The netbook is a surrogate for a future-generation smartphone platform. To compare the two devices, we use the ratio of the sum of frequencies of all available CPU cores, which we call the relative frequency ratio. The relative frequency ratio between the two devices is 2.3x. The server is an eight core Intel Xeon processor with each core running at 2.13 GHz with 4 MB of cache. The relative frequency ratio between the server and the two mobile devices is 12.2x and 4.7x for the netbook and laptop respectively. Input data set. To ensure repeatability across different experimental runs, the input data for each of the three applications is a sequence of frames captured offline in typical indoor lighting conditions at 30 fps at a resolution of 640x480 pixels per frame. The data set used for face recognition consists of 3000 frames in which 10 people each walk up to the camera and make different facial expressions. The input data set for pose detection consists of 500 frames in which the user is holding a book in one hand and manipulating the pose of the book while holding the camera pointed at the book with the other hand. The data for gesture-recognition consists of roughly 500 frames of a person sitting in front of the camera performing the different gestures. 16 (a) Face recognition (b) Object and pose recognition (c) Gesture recognition Figure 2.3: Variation in the per frame makespan. Network configurations. We emulate different network conditions between the mobile device and net- worked server by varying the delay and bandwidth of the link using Dummynet [29]. The ten different emulated network conditions are summarized in Table 2.2. Application profiler. A lightweight runtime profiler maintains the following application execution met- rics: execution time of each stage, the amount of data that flows on the connector links and the network delay time. The profiler also keeps track of the latency (or makespan) and the frame rate of the application. We describe more details about the application profiler in Section 2.4.1. 2.3.2 Input Data Variability We begin by characterizing the impact of the scene content on the execution characteristics of the applica- tion. We run all three applications on the netbook and for each frame we plot the makespan (Figure 2.3) and the number of features generated (Figure 2.4). Our results show that input scene content can cause large and abrupt changes in the execution time of different stages, and different applications respond differently to scene variability. This, in turn, can impact the decision of whether to offload the stage or not. Face Recognition. The sources of input variability are based on the presence of a face in the scene and the similarity between the test face and the other faces in the training data set. Figures 2.3(a) and 2.4(a) show the makespan for the application and number of faces detected per frame respectively. We use a single detection and classification stage for this experiment. When there are no faces detected in the scene, the 17 (a) Face recognition (b) Object and pose recognition (c) Gesture recognition Figure 2.4: Variation in the number of features extracted per frame. makespan is primarily dominated by the face detection stage (0.29 s with little variability) with the face classifier stage incurring no overhead. However, when faces are detected, the makespan can increase by an order of magnitude, and have high variability with no clear difference between the processing time of frames containing one, two or three faces. For example, between frames 1000 and 1165 in Figure 2.3(a) the makespan is highly bursty and varies between 0.6 s to 1.4 s even when there is a single person in the scene. Object and Pose Recognition. The processing time for this application is primarily dominated by the feature extraction from the input scene. A complicated scene containing edges or texture will generate a large number of SIFT features, resulting in increased execution times for both the feature extraction and model matcher stages. Consequently, there is a strong correlation between the number of features detected and the makespan of the input frame as shown in Figures 2.3(b) and 2.4(b). Beyond frame number 250 both figures show sharp peaks and valleys. This is caused by the user altering the pose of the notebook that causes a peak when the feature-rich notebook is occupying most of the input scene and a sudden valley when the notebook pose is changed to occupy a small fraction of the input scene. Gesture Recognition. The primary source of input scene variability is the extent of user motion in the scene. Figure 2.4(c) shows the number of motionSIFT features extracted per frame. The graph consists of a repeating pattern of two large peaks followed by multiple smaller peaks. The two large peaks are caused as the attention gesture requires the user to raise both hands in front of the camera and the drop them down. This is then followed by a one handed gesture that typically generates a smaller number of features 18 compared to the attention gesture. The sharp drops in the graph are caused when the user is not moving in between gestures. Unlike the previous application, the makespan of the application shown in Figure 2.3(c) has a weak correlation with the input features. The weak correlation is caused by high variability in the face detection algorithm that is the next compute intensive stage in the application pipeline. 2.3.3 Variability Across Mobile Platforms Another factor contributing to execution time variability, and hence to the offloading decision, is the con- tention between the different application threads for the compute resources (memory and CPU) available on the mobile platform. To explore the extent of impact on performance, we first compare the distribu- tion of execution time of scene-independent stages of the application graph, then benchmark the aggregate performance of the application across both platforms. Figure 2.5 shows the distribution in execution time across the two mobile platforms for three scene- independent stages: the image source stage for the face recognition application that reads an image from file to memory (Figure 2.5(a)), the frame copy stage for the object recognition application that makes copies of the image for multiple downstream stages (Figure 2.5(b)), and the image scaling stage for gesture recognition application that scales the image in memory (Figure 2.5(c)). Ideally, the two devices should demonstrate the same distribution of completion time with a constant speedup factor of at least 2.3x. However, contention between other memory and compute intensive threads of the application graph cause a large variation in execution time on the netbook platform even though the median speedup ranges between 2.5-3.0x. The laptop with a dual-core processor and additional memory exhibits a much smaller variability due to the extra memory and isolated execution of the threads on the multiple processors. This effect, compounded across the different stages, leads to a significant difference in the aggregate performance of the application across the two platforms as shown in Table 2.3. Instead of the the expected 2.3x speedup, the speedup ranges between 2.9x for the face recognition application and 5.47x for the object and pose recognition application. 19 (a) Face recognition (b) Object and pose recognition (c) Gesture recognition Figure 2.5: The figures show the variability in the completion time of three example stages running on the laptop and netbook device. These stages perform a fixed operation on the input image that is independent of the scene content. Application Makespan (s) Laptop Makespan (s) Netbook Speedup Face Recognition 0.078 0.20 2.94 Object and Pose Recognition 1.67 9.17 5.47 Gesture Recognition 0.54 2.34 4.31 Table 2.3: Median speedup in the overall application performance across the two devices. 2.3.4 Network Performance Changes in network bandwidth, delay and the packet loss rate, between the mobile device and the server can each affect interactive performance and trigger stage offloading or parallelization decisions. We char- acterize the performance of the face recognition application for two different partitions across the different network settings described in Section 2.3.1. The first application partition runs only the source and display stages locally and offloads all other stages to the server, requiring the mobile device to upload the entire image to the server. The second application partition reduces the network overhead by additionally run- ning the face detection stage locally, requiring the mobile device to upload only the pixels containing the detected faces to the server. The tradeoff between the two partitions depends on the network performance and the time taken to detect faces in the input frame on the mobile device. Transmitting the entire image incurs a fixed overhead of 921.6 KB of data per frame taking 73.7 ms over the LAN network configuration while transmitting only the detected faces requires transmitting between 31 bytes when no faces are detected and from 39.3 KB to 427.1 KB when one or more faces are detected in the frame. 20 (a) Makespan (b) Frame Rate Figure 2.6: Impact of the network on the performance of the face recognition application. Figure 2.6 shows the makespan and throughput of the application for the two application partitions. The large image transfer is very sensitive to even a small amount of delay and loss that significantly degrades application performance. Transmitting the entire image to a server on a network with a RTT of 40 ms degrades the frame rate from about 8 fps to 1.8 fps. Even a loss-less 802.11g link is bandwidth- bottlenecked and cannot support more than 3 fps. Transmitting only the detected faces over the network makes the face detection stage which runs on the mobile device a bottleneck and requires an average of 189.2 ms to detect faces, limiting the maximum frame rate to 5.31 fps. Moreover, since the network is not heavily used this application partition is robust to delay and bandwidth bottlenecks and packet loss rate on the link. The performance of the application shows negligible degradation across the different network configurations, providing a frame rate of 5.3 fps and makespan of 680 ms. In the case of the 3G network, the large network delay (500 ms RTT) significantly degrades the application performance. 2.3.5 Effects of Data-Parallelism Offloading alone may not be sufficient to reduce makespan; in many mobile perception applications, it is possible to leverage multi-core technology to obtain makespan reductions through data parallelism (pro- cessing tiled sub-images in parallel). However, as we show in this section, the performance improvement 21 # of Threads % Frames with faces Mean exec. time (ms) 1 61.66 149.0 2 24.87 15.6 3 38.11 18.0 Table 2.4: The accuracy and mean execution time of face detection with increasing number of worker threads. # of Threads Thread 1 (s) Thread 2 (s) Thread 3 (s) 1 1.203 (3323.7) - - 2 0.741 (2124.9) 0.465 (1132.6) - 3 0.443 (1203.6) 0.505 (1543.4) 0.233 (473.0) Table 2.5: Average execution time in seconds and number of SIFT features detected by each thread for the object and pose recognition application. achieved by data-parallelism can be sub-linear because of scene complexity. Additionally, for some com- puter vision algorithms, extensive use of data-parallelism can degrade application fidelity. This argues for an adaptive determination of the degree of data-parallelism. Table 2.4 shows an experiment in which the face detection stage is offloaded, and explores the impact of increasing the number of face detection threads. Increasing the number of detection threads reduces the execution time of the stage at the cost of degrading the fidelity of the face detection algorithm by 36.7% with two threads and 23.5% with three threads. This drop in fidelity is due to faces falling across image tile boundaries, which renders them undetectable by the Haar classifier based face detection algorithm. Furthermore, the reason accuracy increases by about 13% from two threads to three is because the chances that the center tile includes the face is higher for three tiles than splitting the image in the middle into two tiles. Such degradation in the fidelity of the algorithm could be avoided by using a tiler algorithm that tiles the image for different scales of the input image or using a face detection algorithm that uses SIFT-like scale invariant features that can be combined across the multiple tiles. However, both of these approaches come at a cost of either increased algorithmic complexity or higher computational overhead. 22 Table 2.5 shows the impact of input scene content on the average execution time of the different SIFT feature generator threads along with the average number of features extracted by each thread for the object- recognition application. The reason the feature generator thread execution times vary across image tiles is that SIFT features are not evenly distributed in the image; the slowest thread becomes the bottleneck and causes sub-linear speedup. From the table we observe that the average speedup is limited to 1.6x and 2.3x instead of the expected 2x and 3x speedup respectively. 2.3.6 Effects of Pipeline Parallelism A key challenge in exploiting pipeline parallelism is to maintain the balance between under-utilizing the pipeline that delivers low throughput and over-utilizing the pipeline that increases the latency of the ap- plication due to excessive waiting time. In this section, we show that, for a fixed off-loading strategy, different data-parallelism decisions can result in different optimal pipelining strategies. This argues that the degree of pipelining must track adaptations in off-loading and data-parallelism, otherwise the pipeline can be significantly under-utilized or over-utilized. In Sprout, the degree of pipelining is governed by a bound on the maximum number of tokens: each token corresponds to an image in some stage of the pipeline, so the maximum number of tokens governs the pipeline parallelism in Sprout. Sprout users or application developers must specify this maximum number. To understand the impact of the degree of pipelining, we create two different configurations of the object and pose recognition application and measure the impact of increasing the number of tokens in the pipeline on the throughput (Figure 2.7(a)) and the makespan (Figure 2.7(b)) of the application. In both configurations, the SIFT feature generator and the model matcher have been offloaded, but the configu- rations differ in the degree of data-parallelism for the two stages. The configurations F2-M5 and F5-M2 denote the number of SIFT feature generator threads and number of model matcher threads used by the application running on the server with eight cores. 23 (a) Frame rate with increasing number of tokens (b) Makespan with increasing number of tokens Figure 2.7: Pipeline parallelism From Figure 2.7(a) we observe that for both configurations the throughput initially increases linearly until the pipeline is full and then levels off beyond four tokens at which point additional tokens generated end up waiting at the head of the pipeline. The throughput of the F2-M5 configuration is higher compared to the F5-M2 configuration as the F2-M5 configuration reduces the execution time of the model matcher stage that is the primary bottleneck of the application. Figure 2.7(b) shows the makespan response to increasing number of tokens in the pipeline. For both the configurations we observe that over-utilizing the pipeline by increasing the number of tokens beyond four increases the time a frame waits at the head of the pipeline. Furthermore, comparing the rate of increase of the wait time across the two configurations, we find that the wait time for the F5-M2 configuration increases faster than the F2-M5 configuration. This is because the F5-M2 configuration has a longer compute time for the bottleneck stage, though the total execution time is comparable. This causes the makespan to rise sooner with fewer tokens and more steeply compared to the other configuration. 2.4 Design and Implementation of Odessa Motivated by the findings in the previous section, Odessa adaptively exploits pipelining, data-parallelism and offloading to improve performance and accuracy of these applications. The Odessa runtime runs on the mobile device; this enables Odessa to transparently improve application performance across different mobile platforms when the mobile is disconnected from the server. 24 The design the Odessa runtime has three goals, in order of decreasing importance: It must simultaneously achieve low makespan and high throughput in order to meet the needs of mobile interactive perception applications. It must react quickly to changes in input complexity, device capability, or network conditions. This goal ensures that transient changes in makespan or throughput are minimized or avoided. It must have low computation and communication overhead. Prior approaches for offloading frame the problem using a discrete or graph optimization formula- tion [37, 58, 75]. For this approach to be effective, accurate estimates of stage execution time are required on both the mobile device and the server, which are often obtained by offline profiling. However, the results in Section 2.3 show that the execution time can vary significantly and cannot easily be modeled offline. Odessa uses a greedy algorithm that periodically acquires information from a lightweight application profiler to estimate the bottleneck in the current configuration. Then, its decision engine uses simple predictors based on nominal processor frequencies, and a recent history of network measurements, to estimate whether offloading or increasing the level of parallelism of the bottleneck stage would improve performance. This greedy and incremental approach works very well to improve makespan and throughput, and incurs negligible overhead (as discussed in Section 2.5.1). Rarely, Odessa’s decision may need to be reversed because its estimators may be off, but it has a built-in self-correcting mechanism to maintain stability. 2.4.1 Lightweight Application Profiler The primary function of the application profiler is to maintain the complete application performance profile and make this information available to the decision engine running on the mobile device without impacting the application performance. Our profiler does not require any additional programmer input, and accounts for cycles (Figure 2.2(a)) and parallel sub-tasks (Figure 2.2(c)) in the application graph. For each frame 25 processed by the application graph, the profiler collects the following information: the execution time of each stage in the graph, the wait time on every connector, the volume of data transferred on each connector, and the transfer time across the network connector edges. Odessa implements this by having each stage piggyback its runtime profile information along with the data and forward it to the downstream stages along each of the output edges. A downstream stage receiving this profile information appends its own profile information only if there are no cycles detected and continues forwarding the data. When a splitter is encountered the profile history to that point is replicated on each output, and is pruned later on the joiner stage. The head of the pipeline that receives the aggregated data forwards it to the decision engine. This piggybacking approach simplifies the design of the decision engine as it receives all the profile data for the most recently concluded frame execution in-order and over a single RPC call. The overhead of looking for cycles and removing redundant data requires a simple linear scan of the profile data and incurs negligible overhead. Since each stage eliminates redundant profile information, the decision engine can easily compute the makespan and throughput of the application. 2.4.2 Decision Engine The functionality of the decision engine is split across two threads running on the mobile device. The first thread manages the data parallelism and stage offloading. The second thread manages the pipeline parallelism by dynamically controlling admission into the pipeline. Both of these threads make use of the application profile data to make their decisions. The data received from the profiler is maintained in a heap sorted by the slowest graph element (stage or connector) that facilitates efficient lookup. 2.4.2.1 Adapting Data Parallelism and Stage Offloading The algorithm in Figure 2.8 describes how Odessa adapts data parallelism and stage offloading. The algorithm runs periodically (every 1 second in our implementation) and in each iteration greedily selects the current bottleneck stage of the application pipeline and decides to make an incremental improvement by 26 begin bottleneck := pick the first entry from the priority heap. if bottleneck is a compute stage a. estimate the cost of offloading the stage b. estimate the cost of spawning more workers elsif bottleneck is a network edge a. estimate the cost of offloading the source stage. b. estimate the cost of offloading the destination stage. fi c. take the best choice among a., b., or do-nothing. d. sleep ( decision granularity ); end Figure 2.8: Algorithm for adaptive offloading and data-parallelism either changing the placement of a stage, increasing its data parallelism or doing nothing if the performance cannot be further improved. If the bottleneck is a compute stage, the algorithm picks between offloading the stage or increasing the degree of data-parallelism for the stage. If the bottleneck is a network edge, the algorithm estimates the cost of moving either the source or destination stage of the network edge. If Odessa decides to offload or change data-parallelism, it signals the pipeline admission controller to stop issuing tokens. When the in-flight tokens have been drained, Odessa invokes Sprout’s stage migration or thread spawning mechanisms (as appropriate), then resumes the pipeline once these have finished. The effectiveness of the algorithms rests on the ability to estimate the impact of offloading the stage or increasing data parallelism. While it is difficult to accurately estimate the impact of the decision on the performance of the application, Odessa uses simple cost estimation techniques that are guided by the runtime application profiler. Estimating cost of data parallelism. Odessa uses a simple linear estimation based cost metric to evaluate the execution time for increasing or decreasing the data parallelism. The linear estimate is based on the following simple equation: E i+1 = N N+1 E i , where E i is current execution time on ith frame and N is a current degree of data parallelism. This assumes linear speedup and uses the runtime profiler data to estimate the gains. To avoid the unbounded increase in the level of data parallelism, Odessa dampens the performance improvement estimated by the linear equation after data parallelism has reached a preset threshold. This threshold is set to twice the number of CPU cores for hyper-threaded CPU architectures and 27 the scaling factor is N+1 T hresh for N > T hresh. As we have discussed in Section 2.3.5, data parallelism may not always give linear speedup because of image complexity: if that is the case, Odessa will re-examine the decision in the next interval (see below). Estimating cost of offloading a stage. Moving a compute stage from the mobile device to the server or vice versa should account for change in the execution time of the stage as well as the impact of the change on the network edges. Consider a stage X with a single input connector and a single output connector. Suppose X and its predecessor and successor stages are running on the mobile. Now to estimate the latency that would result from offloading X to the server, we need to compute (a) the reduced processing time of X on the server, and (b) the increased latency induced by having its input and output connectors traverse the network between the mobile and the server. Odessa estimates the reduced processing time as T m (X) F m F s , where T m (X) is the execution time for X on the mobile, obtained from the application profiler (averaged over the last 10 frames), and F m and F s are, respectively, the nominal processor frequencies on the mobile device and the server, respectively. Since Odessa also keeps track of the amount of data transmitted on every connector and has an online estimate of the current network bandwidth, it can estimate the increased latency of data transfer on the new network connectors. Combining all of this information, Odessa can estimate the effective throughput and makespan that would result from the offloading. Our current implementation only acts when the resulting estimate would improve both makespan and throughput. We are conservative in this regard; a more aggressive choice where an action is taken if either metric is improved may be more susceptible to errors in the estimates. That said, because Odessa relies on estimates, an offloading or data-parallelism decision may need to be reversed at the next decision point and can result in a stage bouncing between the mobile and the server. If this occurs (and it has occurred only once in all our experiments), Odessa temporarily pins the stage in question until its compute time increases by a significant fraction (10%, in our implementation). 28 2.4.2.2 Adapting Pipeline Parallelism A pipeline admission controller dynamically maintains the optimal number of frames in the pipeline to in- crease frame rate while adapting to the variability in the application performance. The admission controller is implemented as a simple token generator that issues tokens to the frame source stage; upon receiving a token, the frame source stage reads an input image. When an image is completely processed, its token is returned to the admission controller. The admission controller ensures that no more than T tokens are outstanding at any given instant, and T is determined by the following simple equation T =d( M B )e where M is the makespan and B is the execution time of the slowest stage, both averaged over the 10 most recent frames. Note that the token generator trades off higher throughput for a slightly higher makespan by using the ceiling function that may lead to one extra token in the pipeline. 2.5 Evaluation In this section we present an experimental evaluation of Odessa. Our evaluation methodology uses the setup and data described in Section 2.3.1 and addresses the following questions. How does Odessa perform on our data sets, and what is the overhead of the runtime profiler and the decision engine? How does Odessa compare with against other candidate strategies that either use domain knowledge for static stage placement and parallelism, or that use global profiling for optimal placement? How does Odessa adapt to changes in network bandwidth availability or the availability of computing resources? 2.5.1 Odessa’s Performance and Overhead We begin by examining the performance of Odessa’s decisions for our applications, when run both on the netbook and the laptop. In these experiments (and those described in the next subsection), we used a 100 29 Application Stages Offloaded and Instances Netbook Laptop Face Recognition Face detection(2) - 3.39 Nothing - 3.99 Object and Pose Recognition Object model matching(3), Fea- ture generating(10) - 5.71 Object model matching(3), Fea- ture generating(10) - 5.14 Gesture Recognition Face detection(1), extracting Motion SIFT features(4) - 3.06 Face detection(1), extracting Motion SIFT features(9) - 5.14 Table 2.6: The table shows the stages that were offloaded to the server and the number of instances of each stage offloaded on the server by Odessa. The average degree of pipeline parallelism is shown in boldface. Figure 2.9: Figure shows the decisions made by Odessa across the first 200 frames along with the impact on the makespan and frame rate for the three applications on the netbook. Mbps network between the mobile device and the server; in a subsequent subsection, we evaluate the im- pact of constraining the network. For the face recognition application, all stages except the classifier stage begin executing on the mobile (this stage has a large database for matching); for the other applications, all stages begin executing on the mobile device. Figure 2.9 shows the timeline of the different decisions made by Odessa on the netbook (graphs for the laptop are qualitatively similar and are omitted for lack of space), with the circle indicating an offload decision, and a cross indicating the spawning of a thread (increased data parallelism). Table 2.6 shows the final stage configurations of the applications on both the laptop and the netbook. Face Recognition. Odessa first chooses to offload the detection thread at frame number 110 and spawns an additional detection thread immediately. After this change in configuration, the frame source stage that performs the JPEG decompression of the stored images is the bottleneck, which Odessa cannot offload: it 30 converges to a makespan of around 500ms and a throughput of 5 fps on the netbook. Interestingly, Odessa demonstrates the ability to adapt to input scene content by offloading and spawning an additional detection thread only when faces were detected in the input scene. Until frame number 100, there were no faces detected in the input scene and the application sends only 31 bytes of data to the classifier that is running on the server. Beyond frame number 100, as the execution time of the detection thread increases and the network traffic increases, Odessa decides to co-locate the detection stage with the classifier stage on the server. Interestingly, on the laptop, Odessa does not offload the detection stage: in this configuration, the frame source stage is the more constraining bottleneck than the detection stage. Despite this, the laptop is able to support higher pipeline parallelism and a frame rate of 10 fps because of its faster multi-core processor. Thus, from this application, we see that Odessa adapts to input scene complexity and device characteristics. Object and Pose Recognition. Odessa achieves a frame rate of 6.27 fps and 7.35 fps on the netbook and laptop respectively and a makespan of under 900 ms on both platforms. In this application, Odessa converges to the same final configuration on the netbook and laptop. Note that Odessa is able to differ- entiate between the impact of the two offloaded stages (feature generator and model matcher); as shown in Figure 2.9, Odessa continues to increase the number of feature generator threads beyond frame number 50 as it increases the frame rate of the application with minor decrease in the makespan of the application. The final configuration results in 10 SIFT feature generator threads but only 3 for model matching. From this example, we see that despite device differences, Odessa ends up (correctly) with the same offloaded stages, and is able to assign the requisite level of data parallelism for these stages. Gesture Recognition. In this example, differences in device capabilities result in different Odessa con- figurations. On the netbook Odessa offloads the single detection thread and spawns only 4 additional motionSIFT threads but has a higher average pipeline parallelism of 3.06 tokens. Beyond 4 motionSIFT threads, the frame source stage on the netbook is the bottleneck and Odessa cannot further improve the 31 performance of the application. However, on the laptop platform the frame source stage is not the bottle- neck due to the dual core architecture and faster processor speed. Consequently, Odessa spawns a total of 9 motionSIFT threads and more than doubles the average pipeline parallelism to 5.14. This leads to a 68% increase in frame rate on the laptop compared to the netbook platform. Overhead. Odessa’s overhead is negligible. The size per frame of measurement data collected by the lightweight profiler ranges from 4 KB for face recognition to 13 KB for pose detection, or from 0.1%- 0.4% of the total data transferred along the pipeline. Moreover, the profiler’s compute cost is less than 1.7 ms on each device, a negligible fraction of the application’s makespan. Finally, the decision engine takes less than 1.2 ms for an offload or a spawn decision. 2.5.2 Comparison With Other Strategies We now compare Odessa’s performance against three competing strategies. The first is the Offload-All partition in which only the video source stage and display stage run locally and a single instance of all other stages are offloaded to the server. The second is the Domain-Specific partition that makes use of the domain specific knowledge about the application and input from the application developer to identify the compute- intensive stages in the application graph. For this partition only the compute-intensive stages are offloaded to the server and number of CPU cores are equally distributed between the different parallelizable compute- intensive stages. These two strategies help us calibrate Odessa’s performance: if Odessa’s performance is not significantly better than these, an adaptive runtime may not be necessary. We also consider another idealized strategy whose performance should, in theory, be better than Odessa’s. This strategy, called the Offline-Optimizer, mimics what an optimized offloading algorithm would have achieved if it could perfectly profile execution times both on the server and the mobile device. The inputs to this optimizer are the profiled run times of each stage on both the mobile and the server; the most compute-intensive stages are profiled with maximum data parallelism. Because execution times can vary with input complexity, for this experiment we use an input which consists of a single frame replicated 32 Application Offloaded stages (instances) Face Recognition Face detection (4), Classifier (4), 2 Object and Pose Recognition SIFT Feature Generator (3), Object model matching (3), Clustering (2), 3 Gesture Recognition Face detection (1), MotionSIFT stage (8), 2 Table 2.7: The table shows the instances of the offloaded stages along with the number of tokens (in boldface) in the pipeline for the Domain Specific partition of the application graph. 500 times. The output is an optimal configuration, obtained by exhaustively searching all configurations and picking those whose makespan and throughput dominate (i.e., no other configuration has a lower makespan and a higher throughput). The pipeline parallelism is optimally controlled and the number of tokens are adjusted to fully utilize the pipeline structure. We compare Odessa against Offline-Optimizer only for the pose detection algorithm: for the other algorithms, recognition is done over sequences of frames, so we cannot use frame replication to ensure consistent input complexity, and any performance comparison would be obscured by interframe variability. Table 2.7 shows the stages and their instances along with the number of tokens for the Domain-Specific static partition. Pipeline-parallelism for the Domain-Specific and Offline-Optimizer partitions is set based on the number of compute intensive stages in the application graph: the rationale is that, since most of the time will be spent in the bottleneck stages, adding additional tokens will cause increased wait times. Thus, the face recognition application has two tokens, the object and pose recognition application has 3 and the gesture recognition application has 2 tokens. Figure 2.10 shows the aggregate performance of Odessa along with the two static partitions. Face Recognition. Odessa’s performance is comparable to Offload-All or Domain-Specific for the net- book; in this case, the frame source stage is a bottleneck, so Odessa’s choices approximate that of these two static strategies. However, on the laptop, Odessa’s throughput is almost twice that of Domain-Specific. This performance difference is entirely due to Odessa’s adaptive pipeline admission controller; although, intuitively, 2 tokens seem like a reasonable choice given that this application has two bottleneck stages, 33 (a) Netbook (b) Laptop Figure 2.10: Figures shows the frame rate and makespan achieved by Odessa along with two statically partitioned application configurations across the two client devices. Figure 2.11: Figures shows the frame rate and makespan achieved by Odessa for pose detection, compared to the Offline-Optimizer. much higher pipelining is possible because data parallelism significantly reduces the impact of the bottle- neck. Object and Pose Recognition. Odessa significantly outperforms both static partitions; for example, the frame rate achieved by Odessa is 4x higher and makespan 2.2x lower than Domain-Specific across both platforms. Unlike the latter, Odessa chooses to run the clustering algorithm on the local device for both the netbook and laptop platform which frees up extra CPU resources on the server. Odessa uses these extra CPU resources to spawn additional SIFT feature generator stages to reduce the pipeline bottleneck. Another reason for the performance difference is Odessa’s adaptive parallelism, as discussed above. 34 Finally, Figure 2.11 shows that the Offline-Optimizer has a lower makespan (about 200-300 ms lower) than Odessa, both on the netbook and the laptop. This is to be expected, since the offline optimizer does not model two important overheads to makespan: thread contention when executing 10 threads for each of the compute-intensive stages, and token waiting time. Encouragingly, Odessa achieves comparable or better throughput across the pipeline. These results indicate that, even if it were practical to profile stage execution times offline and optimize offloading decisions using these profiled times, the benefit is not significant. Gesture Recognition. The frame rate achieved by Odessa is 3.4x and 4.6x higher on the netbook and laptop platform respectively compared to the static partitioning approaches and the makespan is under 350 ms for both platforms. Although Odessa achieves comparable data-parallelism as Domain-Specific on the laptop, it is again able to achieve much higher levels of pipelining, hence the performance difference. 2.5.3 Adapting to Varying Execution Contexts The execution context of perception applications is primarily determined by the available computational resources and performance of the network. These resources can vary significantly as the mobile device moves from one execution context to another. For example, the network performance could vary signifi- cantly as the mobile device moves from an office to a home environment. Furthermore, the CPU resources available to the application could also vary if the server is shared by other applications. To address such sources of variation in available resources it is important for Odessa to be reactive and adapt the application partition quickly. 2.5.3.1 CPU Resources We begin by evaluating the ability of Odessa to respond to events where additional CPU resources become available on server. We emulate such an event by changing the number of cores available to the Odessa decision engine from two to eight during the execution of the application (frame number 250). Figure 2.12 shows the application throughput and makespan along with the decisions made by Odessa. The application 35 Figure 2.12: The ability of Odessa to adapt to abrupt changes in the number of CPU cores available to the application. The number of cores are increased from 2 to 8 at frame number 250. starts off running locally on the netbook. By frame number 31, Odessa utilizes the two CPU cores available on the server by offloading the SIFT feature generator stage and the model matcher stage and spawning an additional thread for each of the two stages. This increases the throughput from 0.1 fps to 2.5 fps. At frame number 250, 6 additional cores are available on the server. Odessa immediately reacts to this change in CPU resources and spawns 8 additional threads for the SIFT feature generator and 1 additional thread for the model matcher stage. Note than even though beyond frame number 260 the makespan does not increase, Odessa continues to spawn additional stages to further improve the throughput. Odessa completes adapting the application partition by frame number 305 after which the average throughput is 8.4 fps and the makespan is reduced to 490 ms. 2.5.3.2 Network Performance We next evaluate the ability of Odessa to adapt to changes in network performance as the mobile device moves from a high bandwidth indoor 802.11n network to a low bandwidth outdoor network. The first two 36 Figure 2.13: Odessa adapting to changes in network performance. The network bandwidth is reduced from 100 Mbps to 5 Mbps at frame number 1237. Odessa pulls back the offloaded stages from the server to the local machine to reduce the data transmitted over the network. graphs in Figure 2.13 show the throughput and makespan of the application and the third graph shows the total amount of network data. The application starts off executing all the stages running locally on the net- book that is connected to the eight-core server over a 100 Mbps. At frame 122, Odessa decides to offload the detection stage to the server and spawn another instance of the detection stage on the server. After this, the image source stage that performs the jpeg decoding is the bottleneck and the performance cannot be further improved. At frame number 1237 the bandwidth of the network is dropped to 5 Mbps which makes the network edge the bottleneck of the application. The network delay significantly increases the makespan of the application and the throughput reduces significantly. Within 70 frames, Odessa decides 37 (a) Face recognition (b) Object and Pose Detection Figure 2.14: The number of features detected across different number of detection worker threads. to pull back the two detection threads from the remote server reducing the amount of data being sent over the network and the throughput increases to about 4 fps. 2.5.4 Data-Parallelism and Application Fidelity Odessa increases data-parallelism until makespan and throughput changes are marginal. However, in- creasing data-parallelism may not always be desirable. For face recognition, or object and pose detection, increasing data parallelism by tiling images may decrease application fidelity, since, for example, a face may be split across tiles. Ideally, Odessa’s decision engine should take application fidelity into account when deciding the level of data-parallelism; we have left this to future work. However, Figure 2.14 quantifies the impact of Odessa’s current design on application fidelity. On each graph, the x-axis shows the number of threads used (the level of data-parallelism) and the y-axis shows the total number of faces or features detected. Face Recognition. In our current implementation, the Face detector uses a Haar classifier, which is un- likely to detect a face when the face is fragmented. Hence, the more an input image is divided, the fewer faces are detected (Figure 2.14(a)). In some cases, our image splitter adds redundant overlapping image 38 tiles so that the loss of fidelity by tesselation is mitigated. More generally, however, a robustly paralleliz- able face detection algorithm might avoid this degradation. Object and Pose Recognition. This application demonstrates such robustness. Its parallelizable stage, SIFT feature extraction, is scale-invariant [80], so the total number of detected features is relatively un- changed as a function of the number of threads (Figure 2.14(b)). The small fluctuations in fidelity can be explained by the loss of features at the edges of tiles. It may be possible to mitigate these fluctuations by overlapping tiles, which we have left to future work. 2.6 Conclusion In this chapter, we have explored the design of a runtime, called Odessa, that enables interactive per- ception applications on mobile devices. The unique characteristics of the applications drive many of the design decisions in Odessa, whose lightweight online profiler and simple execution time predictors help make robust and efficient offloading and parallelization decisions. Our evaluation of Odessa shows that it can provide more than 3x improvement in performance compared to application configurations by do- main experts. Additionally, Odessa can adapt quickly to changes in scene complexity, compute resource availability, and network bandwidth. Much work remains, including exploring the performance of Odessa under a broader range of applications, extending it to take advantage of the public cloud, and exploring easy deployability on mobile devices by leveraging modern browser architectures. 39 Chapter 3 P3: Toward Privacy-Preserving Photo Sharing With increasing use of mobile devices, photo sharing services are experiencing greater popularity. Aside from providing storage, photo sharing services enable bandwidth-efficient downloads to mobile devices by performing server-side image transformations (resizing, cropping). On the flip side, photo sharing services have raised privacy concerns such as leakage of photos to unauthorized viewers and the use of algorithmic recognition technologies by providers. To address these concerns, we propose a privacy- preserving photo encoding algorithm that extracts and encrypts a small, but significant, component of the photo, while preserving the remainder in a public, standards-compatible, part. These two components can be separately stored. This technique significantly reduces the signal-to-noise ratio and the accuracy of automated detection and recognition on the public part, while preserving the ability of the provider to perform server-side transformations to conserve download bandwidth usage. Our prototype privacy- preserving photo sharing system, P3, works with Facebook, and can be extended to other services as well. P3 requires no changes to existing services or mobile application software, and adds minimal photo storage overhead. 40 3.1 Introduction With the advent of mobile devices with high-resolution on-board cameras, photo sharing has become highly popular. Users can share photos either through photo sharing services like Flickr or Picasa, or popular social networking services like Facebook or Google+. These photo sharing service providers (PSPs) now have a large user base, to the point where PSP photo storage subsystems have motivated interesting systems research [24]. However, this development has generated privacy concerns (Section 3.2). Private photos have been leaked from a prominent photo sharing site [34]. Furthermore, widespread concerns have been raised about the application of face recognition technologies in Facebook [7]. Despite these privacy threats, it is not clear that the usage of photo sharing services will diminish in the near future. This is because photo sharing services provide several useful functions that, together, make for a seamless photo browsing experience. In addition to providing photo storage, PSPs also perform several server-side image transfor- mations (like cropping, resizing and color space conversions) designed to improve user perceived latency of photo downloads and, incidentally, bandwidth usage (an important consideration when browsing photos on a mobile device). In this chapter, we explore the design of a privacy-preserving photo sharing algorithm (and an asso- ciated system) that ensures photo privacy without sacrificing the latency, storage, and bandwidth benefits provided by PSPs. This chapter makes two novel contributions that, to our knowledge, have not been reported in the literature (Section 6.2). First, the design of the P3 algorithm (Section 3.3), which prevents leaked photos from leaking information, and reduces the efficacy of automated processing (e.g., face de- tection, feature extraction) on photos, while still permitting a PSP to apply image transformations. It does this by splitting a photo into a public part, which contains most of the volume (in bytes) of the original, and a secret part which contains most of the original’s information. Second, the design of the P3 system (Section 3.4), which requires no modification to the PSP infrastructure or software, and no modification 41 to existing browsers or applications. P3 uses interposition to transparently encrypt images when they are uploaded from clients, and transparently decrypt and reconstruct images on the recipient side. Evaluations (Section 3.5) on four commonly used image data sets, as well as micro-benchmarks on an implementation of P3, reveal several interesting results. Across these data sets, there exists a “sweet spot” in the parameter space that provides good privacy while at the same time preserving the storage, latency, and bandwidth benefits offered by PSPs. At this sweet spot, the public part of the image has low PSNR and algorithms like edge detection, face detection, face recognition, and SIFT feature extraction are completely ineffective; no faces can be detected and correctly recognized from the public part, no correct features can be extracted, and a very small fraction of pixels defining edges are correctly estimated. P3 image encryption and decryption are fast, and it is able to reconstruct images accurately even when the PSP’s image transformations are not publicly known. P3 is proof-of-concept of, and a step towards, easily deployable privacy preserving photo storage. Adoption of this technology will be dictated by economic incentives: for example, PSPs can offer privacy preserving photo storage as a premium service offered to privacy-conscious customers. 3.2 Background and Motivation The focus of this chapter is on PSPs like Facebook, Picasa, Flickr, and Imgur, who offer either direct photo sharing (e.g., Flickr, Picasa) between users or have integrated photo sharing into a social network platform (e.g., Facebook). In this section, we describe some background before motivating privacy-preserving photo sharing. 3.2.1 Image Standards, Compression and Scalability Over the last two decades, several standard image formats have been developed that enable interoperability between producers and consumers of images. Perhaps not surprisingly, most of the existing PSPs like Facebook, Flickr, Picasa Web, and many websites [11, 12, 105] primarily use the most prevalent of these 42 standards, the JPEG (Joint Photographic Experts Group) standard. In this chapter, we focus on methods to preserve the privacy of JPEG images; supporting other standards such as GIF and PNG (usually used to represent computer-generated images like logos etc.) are left to future work. Beyond standardizing an image file format, JPEG performs lossy compression of images. A JPEG encoder consists of the following sequence of steps: Color Space Conversion and Downsampling. In this step, the raw RGB or color filter array (CFA) RGB image captured by a digital camera is mapped to a YUV color space. Typically, the two chrominance chan- nels (U and V) are represented at lower resolution than the luminance (brightness) channel (Y) reducing the amount of pixel data to be encoded without significant impact on perceptual quality. DCT Transformation. In the next step, the image is divided into an array of blocks, each with 88 pixels, and the Discrete Cosine Transform (DCT) is applied to each block, resulting in several DCT coefficients. The mean value of the pixels is called the DC coefficient. The remaining are called AC coefficients. Quantization. In this step, these coefficients are quantized; this is the only step in the processing chain where information is lost. For typical natural images, information tends to be concentrated in the lower frequency coefficients (which on average have larger magnitude than higher frequency ones). For this reason, JPEG applies different quantization steps to different frequencies. The degree of quantization is user-controlled and can be varied in order to achieve the desired trade-off between quality of the recon- structed image and compression rate. We note that in practice, images shared through PSPs tend to be uploaded with high quality (and high rate) settings. Entropy Coding. In the final step, redundancy in the quantized coefficients is removed using variable length encoding of non-zero quantized coefficients and of runs of zeros in between non-zero coefficients. Beyond storing JPEG images, PSPs perform several kinds of transformations on images for various reasons. First, when a photo is uploaded, PSPs statically resize the image to several fixed resolutions. For example, Facebook transforms an uploaded photo into a thumbnail, a “small” image (130x130) and 43 a “big” image (720x720). These transformations have multiple uses: they can reduce storage 1 , improve photo access latency for the common case when users download either the big or the small image, and also reduce bandwidth usage (an important consideration for mobile clients). In addition, PSPs perform dynamic (i.e., when the image is accessed) server-side transformations; they may resize the image to fit screen resolution, and may also crop the image to match the view selected by the user. (We have verified, by analyzing the Facebook protocol, that it supports both of these dynamic operations). These dynamic server-side transformations enable low latency access to photos and reduce bandwidth usage. Finally, in order to reduce user-perceived latency further, Facebook also employs a special mode in the JPEG standard, called progressive mode. For photos stored in this mode, the server delivers the coefficients in increasing order (hence “progressive”) so that the clients can start rendering the photo on the screen as soon as the first few coefficients are received, without having to receive all coefficients. In general, these transformations scale images in one fashion or another, and are collectively called image scalability transformations. Image scalability is crucial for PSPs, since it helps them optimize several aspects of their operation: it reduces photo storage, which can be a significant issue for a popular social network platform [24]; it can reduce user-perceived latency, and reduce bandwidth usage, hence improving user satisfaction. 3.2.2 Threat Model, Goals and Assumptions In this chapter, we focus on two specific threats to privacy that result from uploading user images to PSPs. The first threat is unauthorized access to photos. A concrete instance of this threat is the practice of fusking, which attempts to reverse-engineer PSP photo URLs in order to access stored photos, bypassing PSP access controls. Fusking has been applied to at least one PSP (Photobucket), resulting in significant privacy leakage [34]. The second threat is posed by automatic recognition technologies, by which PSPs may be able to infer social contexts not explicitly specified by users. Facebook’s deployment of face recognition technology has raised significant privacy concerns in many countries (e.g., [7]). 1 We do not know if Facebook preserves the original image, but high-end mobile devices can generate photos with 4000x4000 resolution and resizing these images to a few small fixed resolutions can save space. 44 The goal of this chapter is to design and implement a system that enables users to ensure the privacy of their photos (with respect to the two threats listed above), while still benefiting from the image scalability optimizations provided by the PSP . Implicit in this statement are several constraints, which make the problem significantly challenging. The resulting system must not require any software changes at the PSP, since this is a significant barrier to deployment; an important implication of this constraint is that the image stored on the PSP must be JPEG-compliant. For a similar reason, the resulting system must also be transparent to the client. Finally, our solution must not significantly increase storage requirements at the PSP since, for large PSPs, photo storage is a concern. We make the following assumptions about trust in the various components of the system. We assume that all local software/hardware components on clients (mobile devices, laptops etc.) are completely trust- worthy, including the operating system, applications and sensors. We assume that PSPs are completely untrusted and may either by commission or omission, breach privacy in the two ways described above. Furthermore, we assume eavesdroppers may attempt to snoop on the communication between PSP and a client. 3.3 P3: The Algorithm In this section, we describe the P3 algorithm for ensuring privacy of photos uploaded to PSPs. In the next section, we describe the design and implementation of a complete system for privacy-preserving photo sharing. 3.3.1 Overview One possibility for preserving the privacy of photos is end-to-end encryption. Senders 2 may encrypt photos before uploading, and recipients use a shared secret key to decrypt photos on their devices. This approach 2 We use “sender” to denote the user of a PSP who uploads images to the PSP. 45 DCT Transformation & Quantization Public Secret Original Image Threshold-based Splitting DC Coefficients AC Coefficients Figure 3.1: Privacy-Preserving Image Encoding Algorithm cannot provide image scalability, since the photo representation is non-JPEG compliant and opaque to the PSP so it cannot perform transformations like resizing and cropping. Indeed, PSPs like Facebook reject attempts to upload fully-encrypted images. A second approach is to leverage the JPEG image compression pipeline. Current image compression standards use a well-known DCT dictionary when computing the DCT coefficients. A private dictio- nary [15], known only to the sender and the authorized recipients, can be used to preserve privacy. Using the coefficients of this dictionary, it may be possible for PSPs to perform image scaling transformations. However, as currently defined, these coefficients result in a non-JPEG compliant bit-stream, so PSP-side code changes would be required in order to make this approach work. A third strawman approach might selectively hide faces by performing face detection on an image before uploading. This would leave a JPEG-compliant image in the clear, with the hidden faces stored in a separate encrypted part. At the recipient, the image can be reconstructed by combining the two parts. However, this approach does not address our privacy goals completely: if an image is leaked from the PSP, attackers can still obtain significant information from the non-obscured parts (e.g., torsos, other objects in the background etc.). 46 Our approach on privacy-preserving photo sharing uses a selective encryption like this, but has a dif- ferent design. In this approach, called P3, a photo is divided into two parts, a public part and a secret part. The public part is exposed to the PSP, while the secret part is encrypted and shared between the sender and the recipients (in a manner discussed later). Given the constraints discussed in Section 3.2, the public and secret parts must satisfy the following requirements: It must be possible to represent the public part as a JPEG-compliant image. This will allow PSPs to perform image scaling. However, intuitively, most of the “important” information in the photo must be in the secret part. This would prevent attackers from making sense of the public part of the photos even if they were able to access these photos. It would also prevent PSPs from successfully applying recognition algorithms. Most of the volume (in bytes) of the image must reside in the public part. This would permit PSP server- side image scaling to have the bandwidth and latency benefits discussed above. The combined size of the public and secret parts of the image must not significantly exceed the size of the original image, as discussed above. Our P3 algorithm, which satisfies these requirements, has two components: a sender side encryption algorithm, and a recipient-side decryption algorithm. 3.3.2 Sender-Side Encryption JPEG compression relies on the sparsity in the DCT domain of typical natural images: a few (large mag- nitude) coefficients provide most of the information needed to reconstruct the pixels. Moreover, as the quality of cameras on mobile devices increases, images uploaded to PSPs are typically encoded at high quality. P3 leverages both the sparsity and the high quality of these images. First, because of sparsity, most information is contained in a few coefficients, so it is sufficient to degrade a few such coefficients, in order to achieve significant reductions in quality of the public image. Second, because the quality is high, quan- tization of each coefficient is very fine and the least significant bits of each coefficient represent very small incremental gains in reconstruction quality. P3’s encryption algorithm encode the most significant bits of 47 (the few) significant coefficients in the secret part, leaving everything else (less important coefficients, and least significant bits of more important coefficients) in the public part. We concretize this intuition in the following design for P3 sender side encryption. The selective encryption algorithm is, conceptually, inserted into the JPEG compression pipeline after the quantization step. At this point, the image has been converted into frequency-domain quantized DCT coefficients. While there are many possible approaches to extracting the most significant information, P3 uses a relatively simple approach. First, it extracts the DC coefficients from the image into the secret part, replacing them with zero values in the public part. The DC coefficients represent the average value of each 8x8 pixel block of the image; these coefficients usually contain enough information to represent thumbnail versions of the original image with enough visual clarity. Second, P3 uses a threshold-based splitting algorithm in which each AC coefficient y(i) whose value is above a threshold T is processed as follows: Ifjy(i)j T , then the coefficient is represented in the public part as is, and in the secret part with a zero. Ifjy(i)j> T , the coefficient is replaced in the public part with T , and the secret part contains the magnitude of the difference as well as the sign. Intuitively, this approach clips off the significant coefficients at T . T is a tunable parameter that rep- resents the trade-off between storage/bandwidth overhead and privacy; a smaller T extracts more signal content into the secret part, but can potentially incur greater storage overhead. We explore this trade-off empirically in Section 3.5. Notice that both the public and secret parts are JPEG-compliant images, and, after they have been generated, can be subjected to entropy coding. Once the public and secret parts are prepared, the secret part is encrypted and, conceptually, both parts can be uploaded to the PSP (in practice, our system is designed differently, for reasons discussed in Section 3.4). We also defer a discussion of the encryption scheme to Section 3.4. 48 P3 Encrypt Public Part Secret Part Friends Sender P3 Decrypt Processing (Black-Box) Figure 3.2: P3 Overall Processing Chain 3.3.3 Recipient-side Decryption and Reconstruction While the sender-side encryption algorithm is conceptually simple, the operations on the recipient-side are somewhat trickier. At the recipient, P3 must decrypt the secret part and reconstruct the original image by combining the public and secret parts. P3’s selective encryption is reversible, in the sense that, the public and secret parts can be recombined to reconstruct the original image. This is straightforward when the public image is stored unchanged, but requires a more detailed analysis in the case when the PSP performs some processing on the public image (e.g., resizing, cropping, etc) in order to reduce storage, latency or bandwidth usage. In order to derive how to reconstruct an image when the public image has been processed, we start by expressing the reconstruction for the unprocessed case as a series of linear operations. Let the threshold for our splitting algorithm be denoted T . Let y be a block of DCT coefficients corresponding to a 8 8 pixel block in the original image. Denote x p and x s the corresponding DCT coefficient values assigned to the public and secret images, respectively, for the same block 3 . For ex- ample, if one of those coefficients is such that abs(y(i))> T , we will have that x p (i)= T and x s (i)= sign(y(i))(abs(y(i)) T). Since in our algorithm the sign information is encoded either in the public or in the secret part, depending on the coefficient magnitude, it is useful to explicitly consider sign information here. To do so we write x p = S p a p , and x s = S s a s , where a p and a s are absolute values of x p and x s , S p and S s are diagonal matrices with sign information, i.e., S p = diag(sign(x p ));S s = diag(sign(x s )). Now let w[i]= T if S s [i]6= 0, where i is a coefficient index, so w marks the positions of the above-threshold coefficients. 3 For ease of exposition, we represent these blocks as 64x1 vectors 49 The key observation is that x p and x s cannot be directly added to recover y because the sign of a coef- ficient above threshold is encoded correctly only in the secret image. Thus, even though the public image conveys sign information for that coefficient, it might not be correct. As an example, let y(i)<T , then we will have that x p (i)= T and x s (i)=(abs(y(i)) T), thus x s (i)+ x p (i)6= y(i). For coefficients below threshold, y(i) can be recovered trivially since x s (i)= 0 and x p (i)= y(i). Note that incorrect sign in the public image occurs only for coefficients y(i) above threshold, and by definition, for all those coefficients the public value is x p (i)= T . Note also that removing these signs increases significantly the distortion in the public images and makes it more challenging for an attacker to approximate the original image based on only the public one. In summary, the reconstruction can be written as a series of linear operations: y = S p a p + S s a s + S s S s 2 w (3.1) where the first two terms correspond to directly adding the correspondig blocks from the public and secret images, while the third term is a correction factor to account for the incorrect sign of some coefficients in the public image. This correction factor is based on the sign of the coefficients in the secret image and distinguishes three cases. If x s (i)= 0 or x s (i)> 0 then y(i)= x s (i)+x p (i) (no correction), while if x s (i)< 0 we have y(i)= x s (i)+ x p (i) 2T = x s (i)+ T 2T = x s (i) T: Note that the operations can be very easily represented and implemented with if/then/else conditions, but the algebraic representation of (3.1) will be needed to determine how to operate when the public image has been subject to server-side processing. In particular, from (3.1), and given that the DCT is a linear operator, it becomes apparent that it would be possible to reconstruct the images in the pixel domain. That is, we could convert S p a p , S s a s and S s S s 2 w into the pixel domain and simply add these three images pixel by pixel. Further note that the third image, the correction factor, does not depend on the public image and can be completely derived from the secret image. 50 We now consider the case where the PSP applies a linear operator A to the public part. Many interesting image transformations such as filtering, cropping 4 , scaling (resizing), and overlapping can be expressed by linear operators. Thus, when the public part is requested from the PSP, A S p a p will be received. Then the goal is for the recipient to reconstruct A y given the processed public image A S p a p and the unprocessed secret information. Based on the reconstruction formula of (3.1), and the linearity of A, it is clear that the desired reconstruction can be obtained as follows A y= A S p a p + A S s a s + A S s S s 2 w (3.2) Moreover, since the DCT transform is also linear, these operations can be applied directly in the pixel domain, without needing to find a transform domain representation. As an example, if cropping is involved, it would be enough to crop the private image and the image obtained by applying an inverse DCT to S s S s 2 w. We have left an exploration of nonlinear operators to future work. It may be possible to support certain types of non-linear operations, such as pixel-wise color remapping, as found in popular apps (e.g., Instagram). If such operation can be represented as one-to-one mappings for all legitimate values 5 , e.g. 0-255 RGB values, we can achieve the same level of reconstruction quality as the linear operators: at the recipient, we can reverse the mapping on the public part, combine this with the unprocessed secret part, and re-apply the color mapping on the resulting image. However, this approach can result in some loss and we have left a quantitative exploration of this loss to future work. 3.3.4 Algorithmic Properties of P3 Privacy Properties. By encrypting significant signal information, P3 can preserve the privacy of images by distorting them and by foiling detection and recognition algorithms (Section 3.5). Given only the public 4 Cropping at 8x8 pixel boundaries is a linear operator; cropping at arbitrary boundaries can be approximated by cropping at the nearest 8x8 boundary. 5 Often, this is the case for most color remapping operations. 51 part, the attacker can guess the threshold T by assuming it to be the most frequent non-zero value. If this guess is correct, the attacker knows the positions of the significant coefficients, but not the range of values of these coefficients. Crucially, the sign of the coefficient is also not known. Sign information tends to be “random” in that positive and negative coefficients are almost equally likely and there is very limited correlation between signs of different coefficients, both within a block and across blocks. It can be shown that if the sign is unknown, and no prior information exists that would bias our guess, it is actually best, in terms of mean-square error (MSE), to replace the coefficient with unknown sign in the public image by 0. 6 Finally, we observe that proving the privacy properties of our approach is challenging. If the public part is leaked from the PSP, proving that no human can extract visual information from the public part would require having an accurate understanding of visual perception. Instead, we rely on metrics commonly used in the signal processing community in our evaluation (Section 3.5). We note that the prevailing methodol- ogy in the signal processing community for evaluating the efficacy of image and video privacy is empirical subjective evaluation using user studies, or objective evaluation using metrics [111]. In Section 3.5, we resort to an objective metrics-based evaluation, showing the performance of P3 on several image corpora. Other Properties. P3 satisfies the other requirements we have discussed above. It leaves, in the clear, a JPEG-compliant image (the public part), on which the PSP can perform transformations to save storage and bandwidth. The threshold T permits trading off increased storage for increased privacy; for images whose signal content is in the DC component and a few highly-valued coefficients, the secret part can encode most of this content, while the public part contains a significant fraction of the volume of the image in bytes. As we show in our evaluation later, most images are sparse and satisfy this property. Finally, our approach of encoding the large coefficients decreases the entropy both in the public and secret parts, resulting in better compressibility and only slightly increased overhead overall relative to the unencrypted compressed image. 6 If an adversary sees T in the public part, replacing it with 0 will have an MSE of T 2 . However, if we use any non-zero values as a guess, MSE will be at least 0:5(2T) 2 = 2T 2 because we will have a wrong sign with probability 0.5 and we know that the magnitude is at least equal to T. 52 Photo Taker Storage Service Apps P3 Friends Apps P3 … … Photo-Sharing Services Trusted Proxy Trusted Proxy Public part Secret part P3 Components Figure 3.3: P3 System Architecture However, the P3 algorithm has an interesting consequence: since the secret part cannot be scaled (because, in general, the transformations that a PSP performs cannot be known a priori) and must be downloaded in its entirety, the bandwidth savings from P3 will always be lower than downloading a resized original image. The size of the secret part is determined by T : higher values of T result in smaller secret parts, but provide less privacy, a trade-off we quantify in Section 3.5. 3.4 P3: System Design In this section, we describe the design of a system for privacy preserving photo sharing system. This system, also called P3, has two desirable properties described earlier. First, it requires no software mod- ifications at the PSP. Second, it requires no modifications to client-side browsers or image management applications, and only requires a small footprint software installation on clients. These properties permit fairly easy deployment of privacy-preserving photo sharing. 3.4.1 P3 Architecture and Operation Before designing our system, we explored the protocols used by PSPs for uploading and downloading photos. Most PSPs use HTTP or HTTPS to upload messages; we have verified this for Facebook, Picasa Web, Flickr, PhotoBucket, Smugmug, and Imageshack. This suggests a relatively simple interposition 53 architecture, depicted in Figure 4.1. In this architecture, browsers and applications are configured to use a local HTTP/HTTPS proxy and all accesses to PSPs go through the proxy. The proxy manipulates the data stream to achieve privacy preserving photo storage, in a manner that is transparent both to the PSP and the client. In the following paragraphs, we describe the actions performed by the proxy at the sender side and at one or more recipients. Sender-side Operation. When a sender transmits the photo taken by built-in camera, the local proxy acts as a middlebox and splits the uploaded image into a public and a secret part (as discussed in Section 3.3). Since the proxy resides on the client device (and hence is within the trust boundary per our assumptions, Section 3.2), it is reasonable to assume that the proxy can decrypt and encrypt HTTPS sessions in order to encrypt the photo. We have not yet discussed how photos are encrypted; in our current implementation, we assume the existence of a symmetric shared key between a sender and one or more recipients. This symmetric key is assumed to be distributed out of band. Ideally, it would have been preferable to store both the public and the secret parts on the PSP. Since the public part is a JPEG-compliant image, we explored methods to embed the secret part within the public part. The JPEG standard allows users to embed arbitrary application-specific markers with application- specific data in images; the standard defines 16 such markers. We attempted to use an application-specific marker to embed the secret part; unfortunately, at least 2 PSPs (Facebook and Flickr) strip all application- specific markers. Our current design therefore stores the secret part on a cloud storage provider (in our case, Dropbox). Note that because the secret part is encrypted, we do not assume that the storage provider is trusted. Finally, we discuss how photos are named. When a user uploads a photo to a PSP, that PSP may transform the photo in ways discussed below. Despite this, most photo-sharing services (Facebook, Picasa Web, Flickr, Smugmug, and Imageshack 7 ) assign a unique ID for all variants of the photo. This ID is returned to the client, as part of the API [47, 50], when the photo is updated. 7 PhotoBucket does not, which explains its vulnerability to fusking, as discussed earlier 54 P3’s sender side proxy performs the following operations on the public and secret parts. First, it uploads the public part to the PSP either using HTTP or HTTPS (e.g., Facebook works only with HTTPS, but Flickr supports HTTP). This returns an ID, which is then used to name a file containing the secret part. This file is then uploaded to the storage provider. Recipient-side Operation. Recipients are also configured to run a local web proxy. A client device downloads a photo from a PSP using an HTTP get request. The URL for the HTTP request contains the ID of the photo being downloaded. When the proxy sees this HTTP request, it passes the request on to the PSP, but also initiates a concurrent download of the secret part from the storage provider using the ID embedded in the URL. When both the public and secret parts have been received, the proxy performs the decryption and reconstruction procedure discussed in Section 3.3 and passes the resulting image to the application as the response to the HTTP get request. However, note that a secret part may be reused multiple times: for example, a user may first view a thumbnail image and then download a larger image. In these scenarios, it suffices to download the secret part once so the proxy can maintain a cache of downloaded secret parts in order to reduce bandwidth and improve latency. There is an interesting subtlety in the photo reconstruction process. As discussed in Section 3.3, when the server-side transformations are known, nearly exact reconstruction is possible 8 . In our case, the precise transformations are not known, in general, to the proxy, so the problem becomes more challenging. By uploading photos, and inspecting the results, we are able to tell, generally speaking, what kinds of transformations PSPs perform. For instance, Facebook transforms a baseline JPEG image to a progressive format and at the same time wipes out all irrelevant markers. Both Facebook and Flickr statically resize the uploaded image with different sizes; for example, Facebook generates at least three files with different resolutions, while Flickr generates a series of fixed-resolution images whose number depends on the size of the uploaded image. We cannot tell if these PSPs actually store the original images or not, and we con- jecture that the resizing serves to limit storage and is also perhaps optimized for common case devices. For 8 The only errors that can arise are due to storing the correction term in Section 3.3 in a lossy JPEG format that has to be decoded for processing in the pixel domain. Even if quantization is very fine, errors maybe introduced because the DCT transform is real valued and pixel values are integer, so the inverse transform of S s S s 2 w will have to be rounded to the nearest integer pixel value. 55 example, the largest resolution photos stored by Facebook is 720x720, regardless of the original resolution of the image. In addition, Facebook can dynamically resize and crop an image; the cropping geometry and the size specified for resizing are both encoded in the HTTP get URL, so the proxy is able to determine those parameters. Furthermore, by inspecting the JPEG header, we can tell some kinds of transformations that may have been performed: e.g., whether baseline image was converted to progressive or vice a versa, what sampling factors, cropping and scaling etc. were applied. However, some other critical image processing parameters are not visible to the outside world. For ex- ample, the process of resizing an image using down sampling is often accompanied by a filtering step for antialiasing and may be followed by a sharpening step, together with a color adjustment step on the down- sampled image. Not knowing which of these steps have been performed, and not knowing the parameters used in these operations, the reconstruction procedure can result in lower quality images. To understand what transformations have been performed, we are reduced to searching the space of possible transformations for an outcome that matches the output of transformations performed by the PSP 9 . Note that this reverse engineering need only be done when a PSP re-jiggers its image transformation pipeline, so it should not be too onerous. Fortunately, for Facebook and Flickr, we were able to get reasonable reconstruction results on both systems (Section 3.5). These reconstruction results were obtained by exhaustively searching the parameter space with salient options based on commonly-used resizing techniques [64]. More precisely, we select several candidate settings for colorspace conversion, filtering, sharpening, enhancing, and gamma corrections, and then compare the output of these with that produced by the PSP. Our reconstruction results are presented in Section 3.5. 3.4.2 Discussion Privacy Properties. Beyond the privacy properties of the P3 algorithm, the P3 system achieves the privacy goals outlined in Section 3.2. Since the proxy runs on the client for both sender and receiver, the trusted 9 This approach is clearly fragile, since the PSP can change the kinds of transformations they perform on photos. Please see the discussion below on this issue. 56 computing base for P3 includes the software and hardware device on the client. It may be possible to reduce the footprint of the trusted computing base even further using a trusted platform module [122] and trusted sensors [78], but we have deferred that to future work. P3’s privacy depends upon the strength of the symmetric key used to encrypt in the secret part. We assume the use of AES-based symmetric keys, distributed out of band. Furthermore, as discussed above, in P3 the storage provider cannot leak photo privacy because the secret part is encrypted. The storage provider, or for that matter the PSP, can tamper with images and hinder reconstruction; protecting against such tampering is beyond the scope of this work. For the same reason, eavesdroppers can similarly poten- tially tamper with the public or the secret part, but cannot leak photo privacy. PSP Co-operation. The P3 design we have described assumes no co-operation from the PSP. As a result, this implementation is fragile and a PSP can prevent users from using their infrastructure to store P3’s public parts. For instance, they can introduce complex nonlinear transformations on images in order to foil reconstruction. They may also run simple algorithms to detect images where coefficients might have been thresholded, and refuse to store such images. Our design is merely a proof of concept that the technology exists to transparently protect the privacy of photos, without requiring infrastructure changes or significant client-side modification. Ultimately, PSPs will need to cooperate in order for photo privacy to be possible, and this cooperation depends upon the implications of photo sharing on their respective business models. At one extreme, if only a relatively small fraction of a PSP’s user base uses P3, a PSP may choose to benevolently ignore this use (because preventing it would require commitment of resources to reprogram their infrastructure). At the other end, if PSPs see a potential loss in revenue from not being able to recognize objects/faces in photos, they may choose to react in one of two ways: shut down P3, or offer photo privacy for a fee to users. However, in this scenario, a significant number of users see value in photo privacy, so we believe that PSPs will be incentivized to offer privacy-preserving storage for a fee. In a competitive marketplace, even if one PSP were to offer privacy-preserving storage as a service, others will 57 likely follow suit. For example, Flickr already has a “freemium” business model and can simply offer privacy preserving storage to its premium subscribers. If a PSP were to offer privacy-preserving photo storage as a service, we believe it will have incentives to use a P3 like approach (which permits image scaling and transformations), rather than end to end encryption. With P3, a PSP can assure its users that it is only able to see the public part (reconstruction would still happen at the client), yet provide (as a service) the image transformations that can reduce user-perceived latency (which is an important consideration for retaining users of online services [24]). Finally, with PSP co-operation, two aspects of our P3 design become simpler. First, the PSP image transformation parameters would be known, so higher quality images would result. Second, the secret part of the image could be embedded within the public part, obviating the need for a separate online storage provider. Extensions. Extending this idea to video is feasible, but left for future work. As an initial step, it is possible to introduce the privacy preserving techniques only to the I-frames, which are coded independently using tools similar to those used in JPEG. Because other frames in a “group of pictures” are coded using an I-frame as a predictor, quality reductions in an I-frame propagate through the remaining frames. In future work, we plan to study video-specific aspects, such as how to process motion vectors or how to enable reconstruction from a processed version of a public video. 3.5 Evaluation In this section, we report on an evaluation of P3. Our evaluation uses objective metrics to characterize the privacy preservation capability of P3, and it also reports, using a full-fledged implementation, on the processing overhead induced by sender and receiver side encryption. 58 3.5.1 Methodology Metrics. Our first metric for P3 performance is the storage overhead imposed by selective encryption. Photo storage space is an important consideration for PSPs, and a practical scheme for privacy preserving photo storage must not incur large storage overheads. We then measure the efficacy of privacy preserva- tion using PSNR (peak signal-to-noise ratio), a metric commonly used in signal processing. While the shortcomings of this metric in terms of quantifying perceptual quality are well known, it does provide a simple objective way of quantifying degradation. Note also that the public images that are highly degraded with values of PSNR will be commonly agreed to represent very poor quality. To complement PSNR, we also present the visual representation of the public part of an image, to let the reader judge the efficacy of P3; lack of space prevents us from a more detailed exposition. We then evaluate the efficacy of privacy preservation by measuring the performance of state-of-the-art edge and face detection algorithms, the SIFT feature extraction algorithm, and a face recognition algorithm on P3. We conclude the evaluation of privacy by discussing the efficacy of guessing attacks. Finally, we quantify the reconstruction performance, bandwidth savings and the processing overhead of P3. Datasets. We evaluate P3 using four image datasets. First, as a baseline, we use the “miscellaneous” volume in the USC-SIPI image dataset [13]. This volume has 44 color and black-and-white images and contains various objects, people, scenery, and so forth, and contains many canonical images (including Lena) commonly used in the image processing community. Our second data set is from INRIA [8], and contains 1491 full color images from vacation scenes including a mountain, a river, a small town, other interesting topographies, etc. This dataset contains has greater diversity than the USC-SIPI dataset in terms of both resolutions and textures; its images vary in size up to 5 MB, while the USC-SIPI dataset’s images are all under 1 MB. We also use the Caltech face dataset [2] for our face detection experiment. This has 450 frontal color face images of about 27 unique faces depicted under different circumstances (illumination, background, facial expressions, etc.). All images contain at least one large dominant face, and zero or more additional 59 Figure 3.4: Screenshot(Facebook) with/without decryption faces. Finally, the Color FERET Database [4] is used for our face recognition experiment. This dataset is specifically designed for developing, testing, and evaluating face recognition algorithms, and contains 11,338 facial images, using 994 subjects at various angles. Implementation. We also report results from an implementation for Facebook [46]. We chose the Android 4.x mobile operating system as our client platform, since the bandwidth limitations together with the availability of camera sensors on mobile devices motivate our work. The mitmproxy software tool [88] is used as a trusted man-in-the-middle proxy entity in the system. To execute a mitmproxy tool on Android, we used the kivy/python-for-android software [69]. Our algorithm described in Section 3.3 is implemented based on the code maintained by the Independent JPEG Group, version 8d [65]. We report on experiments conducted by running this prototype on Samsung Galaxy S3 smartphones. Figure 3.4 shows two screenshots of a Facebook page, with two photos posted. The one on the left is the view seen by a mobile device which has our recipient-side decryption and reconstruction algorithm enabled. On the right is the same page, without that algorithm (so only the public parts of the images are visible). 60 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Threshold File Size (Normalized) Public+Secret Public Secret Original (a) USC-SIPI 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Threshold File Size (Normalized) Public+Secret Public Secret Original (b) INRIA Figure 3.5: Threshold vs. Size (error bars=stdev) 0 20 40 60 80 100 0 10 20 30 40 50 Threshold PSNR(dB) Avg (Public) Avg (Secret) Stdev (Public) Stdev (Secret) (a) USC-SIPI 0 20 40 60 80 100 0 10 20 30 40 50 Threshold PSNR(dB) Avg (Public) Avg (Secret) Stdev (Public) Stdev (Secret) (b) INRIA Figure 3.6: PSNR results 3.5.2 Evaluation Results In this section, we first report on the trade-off between the threshold parameter and storage size in P3. We then evaluate various privacy metrics, and conclude with an evaluation of reconstruction performance, bandwidth, and processing overhead. 3.5.2.1 The Threshold vs. Storage Tradoff In P3, the threshold T is a tunable parameter that trades off storage space for privacy: at higher thresholds, fewer coefficients are in the secret part but more information is exposed in the public part. Figure 3.5 61 (a) Public Part (b) Secret Part Figure 3.7: Baseline - Encryption Result (T: 1,5,10,15,20) reports on the size of the public part (a JPEG image), the secret part (an encrypted JPEG image), and the combined size of the two parts, as a fraction of the size of the original image, for different threshold values T . One interesting feature of this figure is that, despite the differences in size and composition of the two data sets, their size distribution as a function of thresholds is qualitatively similar. At low thresholds (near 1), the combined image sizes exceed the original image size by about 20%, with the public and secret parts being each about 50% of the total size. While this setting provides excellent privacy, the large size of the secret part can impact bandwidth savings; recall that, in P3, the secret part has to be downloaded in its entirety even when the public part has been resized significantly. Thus, it is important to select a better operating point where the size of the secret part is smaller. Fortunately, the shape of the curve of Figure 3.5 for both datasets suggests operating at the knee of the “secret” line (at a threshold of in the range of 15-20), where the secret part is about 20% of the original image, and the total storage overhead is about 5-10%. Figure 3.7, which depicts the public and secret parts (recall that the secret part is also a JPEG image) of a canonical image from the USC-SIPI dataset, shows that for thresholds in this range minimal visual information is present in the public part, with all of it being stored in the secret part. We include these images to give readers a visual sense of the efficacy 62 of P3; we conduct more detailed privacy evaluations below. This suggests that a threshold between 10-20 might provide a good balance between privacy and storage. We solidify this finding below. 3.5.2.2 Privacy PSNR. One of the earliest objective metrics used for evaluating the quality of image reconstruction is the peak signal-to-noise ratio (PSNR). In Figure 3.6, we present average PSNRs and standard deviations of the public and secret part of the USC-SIPI and the INRIA dataset, as a function of different thresholds, when compared to the original image. The secret parts show high PSNRs, especially when we consider the fact that 35-40dB is regarded as perceptually loseless in the image processing community. Nonetheless, note that our encryption algorithm uses a single threshold across entire image blocks and does not consider block energy distributions. As a result, even if we get about 40dB in the secret part, we can identify non-trivial block effects when we closely observe the image (Figure 3.7). It is encouraging that the PSNR values of the public part are all around 10-15 dB, and that they increase only slightly with threshold. The extraction of the DC component into the secret part plays a major part in leading to such low PSNR values. For the range of (low) PSNRs that we observe here (e.g., around 15 dB) it is widely accepted that quality is so degraded that these images are practically useless. However, this alone is not an indication that P3 preserves privacy; an examination of the public part of threshold 100 (not shown) reveals some of the features in the original image. At lower thresholds these features are no longer visible (Figure 3.7), but the difference in PSNR between a threshold of 10 and 100 is negligible. For this reason, we consider using several other metrics to quantify the privacy obtained with P3. These metrics quantify the efficacy of automated algorithms on the public part; each automated algorithm can be considered to be mounting a privacy attack on the public part. 63 0 20 40 60 80 100 0 20 40 60 80 100 Threshold Matching Pixel Ratio (a) Edge Detection 0 20 40 60 80 100 0 0.5 1 1.5 Threshold Avg. Number of Detected Faces On Public Part Original Image (b) Face Detection Figure 3.8: Privacy on Detection Algorithms Edge Detection. Edge detection is an elemental processing step in many signal processing and machine vision applications, and attempts to discover discontinuities in various image characteristics. We apply the well-known Canny edge detector [28] and its implementation [59] to the public part of images in the USC-SIPI dataset, and present images with the recognized edges in Figure 3.10. For space reasons, we only show edges detected on the public part of 4 canonical images for a threshold of 1 and 20. The images with a threshold 20 do reveal several “features”, and signal processing researchers, when told that these are canonical images from a widely used data set, can probably recognize these images. However, a layperson who has not seen the image before very likely will not be able to recognize any of the objects in the images (the interested reader can browse the USC-SIPI dataset online to find the originals). We include these images to point out that visual privacy is a highly subjective notion, and depends upon the beholder’s prior experiences. If true privacy is desired, end-to-end encryption must be used. P3 provides “pretty good” privacy together with the convenience and performance offered by photo sharing services. It is also possible to quantify the privacy offered by P3 for edge detection attacks. Figure 3.8(a) plots the fraction of matching pixels in the image obtained by running edge detection on the public part, and that 64 0 20 40 60 80 100 0 0.5 1 1.5 Threshold # of Features (Normalized) Detected Features Matched Features (a) SIFT Feature 0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Rank Cumulative Recognition Rate Normal−Normal T100−Public−Public T100−Normal−Public T20−Public−Public T20−Normal−Public T10−Public−Public T10−Normal−Public T1−Public−Public T1−Normal−Public (b) Face Recognition Figure 3.9: Privacy on Feature Extraction and Face Recognition Algorithms obtained by running edge detection on the original image (the result of edge detection is an image with binary pixel values). At threshold values below 20, barely 20% of the pixels match; at very low thresholds, running edge detection on the public part results in a picture resembling white noise, so we believe the higher matching rate shown at low thresholds simply results from spurious matches. We conclude that, for the range of parameters we consider, P3 is very robust to edge detection. Face Detection. Face detection algorithms detect human faces in photos, and were available as part of Facebook’s face recognition API, until Facebook shut down the API [7]. To quantify the performance of face detection on P3, we use the Haar face detector from the OpenCV library [9], and apply it to the public part of images from Caltech’s face dataset [2]. The efficacy of face detection, as a function of different thresholds, is shown in Figure 3.8(b). The y-axis represents the average number of faces detected; it is higher than 1 for the original images, because some images have more than one face. P3 completely foils face detection for thresholds below 20; at thresholds higher than about 35, faces are occasionally detected in some images. SIFT feature extraction. SIFT [82] (or Scale-invariant Feature Transform) is a general method to detect features in images. It is used as a pre-processing step in many image detection and recognition applications 65 (a) T=1 (b) T=20 Figure 3.10: Canny Edge Detection on Public Part 1 5 10 15 20 0 50 100 150 200 Threshold Bandwidth (KByte) Uploaded Size−720x720 Overhead−720x720 Overhead−130x130 Overhead−75x75 Figure 3.11: Bandwidth Usage Cost (INRIA) from machine vision. The output of these algorithms is a set of feature vectors, each of which describes some statistically interesting aspect of the image. We evaluate the efficacy of attacking P3 by performing SIFT feature extraction on the public part. For this, we use the implementation [81] from the designer of SIFT together with the default parameters for feature extraction and feature comparison. Figure 3.9(a) reports the results of running feature extraction on the USC-SIPI dataset. 10 This figure shows two lines, one of which measures the total number of 10 The SIFT algorithm is computationally expensive, and the INRIA data set is large, so we do not have the results for the INRIA dataset. (Recall that we need to compute for a large number of threshold values). We expect the results to be qualitatively similar. 66 features detected on the public part as a function of threshold. This shows that as the threshold increases, predictably, the number of detected features increases to match the number of features detected in the original figure. More interesting is the fact that, below the threshold of 10, no SIFT features are detected, and below a threshold of 20, only about 25% of the features are detected. However, this latter number is a little misleading, because we found that, in general, SIFT detects different feature vectors in the public part and the original image. If we count the number of features detected in the public part, which are less than a distance d (in feature space) from the nearest feature in the original image (indicating that, plausibly, SIFT may have found, in the public part, of feature in the original image), we find that this number is far smaller; up to a threshold of 35, a very small fraction of original features are discovered, and even at the threshold of 100, only about 4% of the original features have been discovered. We use the default parameter for the distance d in the SIFT implementation; changing the parameter does not change our conclusions. 11 Face Recognition. Face recognition algorithms take an aligned and normalized face image as input and match it against a database of faces. They return the best possible answer, e.g., the closest match or an ordered list of matches, from the database. We use the Eigenface [123] algorithm and a well-known face recognition evaluation system [27] with the Color FERET database. On EigenFace, we apply two distance metrics, the Euclidean and the Mahalinobis Cosine [26], for our evaluation. We examine two settings: Normal-Public setting considers the case in which training is performed on normal training images in the database and testing is executed on public parts. The Public-Public setting trains the database using public parts of the training images; this setting is a stronger attack on P3 than Normal-Public. Figure 3.9(b) shows a subset of our results, based on the Mahalinobis Cosine distance metric and using the FAFB probing set in the FERET database. To quantify the recognition performance, we follow the methodology proposed by [101, 102]. In this graph, a data point at (x;y) means that y% of the time, the 11 Our results use a distance parameter of 0.6 from [81]; we used 0.8, the highest distance parameter that seems to be meaningful ( [82], Figure 11) and the results are similar. 67 correct answer is contained in the top x answers returned by the EigenFace algorithm. In the absence of P3 (represented by the Normal-Normal line), the recognition accuracy is over 80%. If we consider the proposed range of operating thresholds (T=1-20), the recognition rate is below 20% at rank 1. Put another way, for these thresholds, more than 80% of the time, the face recognition algorithm provides the wrong answer (a false positive). Moreover, our maximum threshold (T=20) shows about a 45% rate at rank 50, meaning that less than half the time the correct answer lies in the top 50 matches returned by the algorithm. We also examined other settings, e.g., Euclidean distance and other probing sets, and the results were qualitatively similar. These recognition rates are so low that a face recognition attack on P3 is unlikely to succeed; even if an attacker were to apply face recognition on P3, and even if the algorithm happens to be correct 20% of the time, the attacker may not be able to distinguish between a true positive and a false positive since the public image contains little visual information. 3.5.3 What is Lost? P3 achieves privacy but at some cost to reconstruction accuracy, as well as bandwidth and processing overhead. Reconstruction Accuracy. As discussed in Section 3.3, the reconstruction of an image for which a linear transformation has been applied should, in theory, be perfect. In practice, however, quantization effects in JPEG compression can introduce very small errors in reconstruction. Most images in the USC-SIPI dataset can be reconstructed, when the transformations are known a priori, with an average PSNR of 49.2dB. In the signal processing community, this would be considered practically lossless. More interesting is the efficacy of our reconstruction of Facebook and Flickr’s transformations. In Section 3.4, we described an exhaustive parameter search space methodology to approximately reverse engineer Facebook and Flickr’s transforma- tions. Our methodology is fairly successful, resulting in images with PSNR of 34.4dB for Facebook and 68 39.8dB for Flickr. To an untrained eye, images with such PSNR values are generally blemish-free. Thus, using P3 does not significantly degrade the accuracy of the reconstructed images. Bandwidth usage cost. In P3, suppose a recipient downloads, from a PSP, a resized version of an up- loaded image 12 . The total bandwidth usage for this download is the size of the resized public part, together with the complete secret part. Without P3, the recipient only downloads the resized version of the orig- inal image. In general, the former is larger than the latter and the difference between the two represents the bandwidth usage cost, an important consideration for usage-metered mobile data plans. This cost, as a function of the P3 threshold, is shown in Figure 3.11 for the INRIA dataset (the USC dataset results are similar). For thresholds in the 10-20 range, this cost is modest: 20KB or less across different resolu- tions (these resolutions are the ones Facebook statically resizes an uploaded image to). As an aside, the variability in bandwidth usage cost represents an opportunity: users who are more privacy conscious can choose lower thresholds at the expense of slightly higher bandwidth usage. Finally, we observe that this additional bandwidth usage can be reduced by trading off storage: a sender can upload multiple encrypted secret parts, one for each known static transformation that a PSP performs. We have not implemented this optimization. Processing Costs. On a Galaxy S3 smartphone, for a 720x720 image (the largest resolution served by Facebook), it takes on average 152 ms to extract the public and secret parts, about 55 ms to encrypt/decrypt the secret part, and 191 ms to reconstruct the image. These costs are modest, and unlikely to impact user experience. 3.6 Conclusions P3 is a privacy preserving photo sharing scheme that leverages the sparsity and quality of images to store most of the information in an image in a secret part, leaving most of the volume of the image in a JPEG- compliant public part, which is uploaded to PSPs. P3’s public parts have very low PSNRs and are robust 12 In our experiments, we mimic PSP resizing using ImageMagick’s convert program [63] 69 to edge detection, face detection, or sift feature extraction attacks. These benefits come at minimal costs to reconstruction accuracy, bandwidth usage and processing overhead. Acknowledgements. This research was sponsored in part under the U.S. National Science Foundation grant CNS-1048824. Portions of the research in this chapter use the FERET database of facial images col- lected under the FERET program, sponsored by the DOD Counterdrug Technology Development Program Office [102, 101]. 70 Chapter 4 Medusa: A Programming Framework for Crowd-Sensing Applications The ubiquity of smartphones and their on-board sensing capabilities motivates crowd-sensing, a capability that harnesses the power of crowds to collect sensor data from a large number of mobile phone users. Unlike previous work on wireless sensing, crowd-sensing poses several novel requirements: support for humans-in-the-loop to trigger sensing actions or review results, the need for incentives, as well as privacy and security. Beyond existing crowd-sourcing systems, crowd-sensing exploits sensing and processing capabilities of mobile devices. In this chapter, we design and implement Medusa, a novel programming framework for crowd-sensing that satisfies these requirements. Medusa provides high-level abstractions for specifying the steps required to complete a crowd-sensing task, and employs a distributed runtime system that coordinates the execution of these tasks between smartphones and a cluster on the cloud. We have implemented ten crowd-sensing tasks on a prototype of Medusa. We find that Medusa task descriptions are two orders of magnitude smaller than standalone systems required to implement those crowd-sensing tasks, and the runtime has low overhead and is robust to dynamics and resource attacks. 71 4.1 Introduction The ubiquity of smartphones and other mobile devices, and the plethora of sensors available on them, have inspired innovative research that will, over time, lead to sophisticated context-aware applications and systems. Some of this research has explored ways in which mobile phone users can contribute sensor data towards enabling self-reflection, environmental awareness, or other social causes. In this chapter, we consider a specific form of acquiring sensor data from multiple smartphones that we call crowd-sensing. Crowd-sensing is a capability by which a requestor can recruit smartphone users (or workers) to provide sensor data to be used towards a specific goal or as part of a social or technical ex- periment (e.g., tasks such as forensic analysis, documenting public spaces, or collaboratively constructing statistical models). Workers’ smartphones collect sensor data, and may process the data in arbitrary ways before sending the data to the requestor. Crowd-sensing is a form of networked wireless sensing, but is different from prior research on net- worked sensing for two reasons: each smartphone is owned by an individual, and some sensing actions on the smartphone may require human intervention (e.g., pointing a camera at a specific target, or initiating audio capture at the appropriate moment). Crowd-sensing is different from crowd-sourcing in one impor- tant respect: a crowd-sourcing system like the Amazon Mechanical Turk (AMT [1]) permits requestors to task participants with human intelligence tasks like recognition or translation, while a crowd-sensing system enables requestors to task participants with acquiring processed sensor data. For these reasons, any software system designed to support crowd-sensing is qualitatively different from systems for wireless sensing or crowd-sourcing. Crowd-sensing systems must support ways in which workers can be incentivized to contribute sensor data, since participating in crowd-sensing may incur real monetary cost (e.g., bandwidth usage). In some cases, it may be necessary to support “reverse” incentives, where workers pay for the privilege of participating in the requestor’s task. They must also support worker- mediation, by which (human) workers can mediate in the sensor data collection workflows. Specifically, humans may trigger sensing actions, and annotate collected sensor data. In addition, a crowd-sensing 72 system must support privacy of sensor data contributions, and worker anonymity (in cases where work- ers desire anonymity). Identifying the requirements for crowd-sensing is our first important contribution (Section 4.2). In this chapter, we tackle the challenge of programming crowd-sensing tasks. A high-level program- ming language can simplify the burden of initiating and managing crowd-sensing tasks, especially for non-technical requestors who we assume will constitute the majority of crowd-sensing users. Our sec- ond contribution is the design of a high-level programming language called MedScript and an associated runtime system called Medusa for crowd-sensing (Section 4.4). MedScript provides abstractions called stages for intermediate steps in a crowd-sensing task, and for control flow between stages (called con- nectors). Unlike most programming languages, it provides the programmer language-level constructs for incorporating workers into the sensing workflow. The Medusa runtime is architected (Section 4.3) as a partitioned system with a cloud component and a smartphone component. For robustness, it minimizes the task execution state maintained on the smartphone, and for privacy, it ensures that any data that leaves the phone must be approved by the (human) worker. Finally, it uses AMT to recruit workers and manage monetary incentives. Our third contribution is an evaluation of Medusa on a complete prototype (Section 4.5). We use our prototype to illustrate that Medusa satisfies many of the requirements discussed above. We demonstrate that Medusa can be used to compactly express several novel crowd-sensing tasks, as well as sensing tasks previously proposed in the literature. For three different tasks, Medusa’s task descriptions are almost two orders of magnitude more compact than stand-alone implementations of those tasks. Medusa’s overhead is small, and it provides effective sandboxing against unauthorized access to smartphone resources as well as to excessive resource usage. Medusa represents a convergence of two strands of research (Section 6.3): participatory sensing sys- tems like [91] and crowd-sourcing frameworks like [1]. Unlike the former, Medusa explicitly supports incentives and worker-mediation; unlike the latter, Medusa enables requestors to gather raw or processed sensor data from mobile devices in a substantially automated way. 73 4.2 Crowd-Sensing: Motivation and Challenges Motivation. Consider the following scenario. A social science researcher, Alice, is interested in studying the dynamics of “Occupy” movements across the globe. Her methodology is to obtain ground truth videos about life in “Occupy” encampments across the United States and overseas. This methodology of obtaining observations of human life in urban spaces was pioneered by William Whyte [125] and is accepted practice in public planning. However, it is logistically difficult for Alice to visit each of these locations to conduct research, so she resorts to an idea that we call crowd-sensing, which leverages the prevalence of mobile devices with sensors and the power of crowds. She recruits volunteers at each one of the encampments by offering them small monetary incentives to provide her with videos of daily life at the camps. Some volunteers take one or more videos on a mobile device, and upload summaries of each video (e.g, a few frames or a short clip) to an Internet-connected service. Other volunteers upload summaries of videos they may have taken before being recruited. Alice obtains a large corpus of summaries, which she makes available to other volunteers to curate, selecting only the relevant subset that may be useful for research. She then asks the volunteers whose summaries have been chosen to upload the full videos, and pays them for their participation. She analyzes the curated set of videos, and draws conclusions about similarities and differences between “Occupy” communities across the globe. In the scenario above, we say that Alice is interested in conducting a task, that of collecting relevant videos for her research. A task is initiated by a requestor. Each task consists of several stages. In our example, the stages were, in order: recruiting volunteers, volunteers taking videos on their smartphones, uploading summaries, curating summaries using another set of volunteers, and uploading the full videos. V olunteers who participate in tasks are called workers; workers have smartphones 1 that are used to perform one or more stage actions. 1 Henceforth, we will use the term smartphones, although our designs apply equally well to other mobile devices such as tablets. 74 In this chapter, we consider the architectural requirements, the design of a high-level programming framework and an associated runtime system for crowd-sensing. In the absence of such a programming framework, Alice may need to manually recruit volunteers (e.g., by advertising in mass media or using her social network), follow-up with the volunteers to upload summaries and videos, manually recruit volun- teers for curation (e.g., students in her class), implement software that simplifies the curation process, and ensure that payments are made to all volunteers. Ideally, our programming framework should abstract all these details, and allow Alice in our scenario to compactly express the stages in her task. The associated runtime should automate stage execution, relieving the requestor from the burden of manually managing stages. There are many other examples of crowd-sensing, beyond video documentation: Spot Reporter A journalist is under deadline pressure to write an article about a developing forest fire and with its impact on nearby communities. He recruits volunteers with smartphones to send pictures of the extent of fire damage, interview local residents, and report on the efficacy of the first responders. Auditioning A television station invites budding actors to submit videos of their acting skills, in order to select actors for a second stage of in-person auditioning for an upcoming television show. Collaborative Learning A software company is developing a lifestyle monitoring mobile app, which lets users self-reflect on their activities. It would like to recruit volunteers to submit labeled samples of accelerometer readings; these are used to build a machine-learning classifier of human activity which will be embedded in the mobile app. Forensic Analysis Security officials at a stadium are investigating an outbreak of violence and would like to determine people who were at the scene of the outbreak when the violence occurred. They request volunteers, who may have taken pictures during the event, to send snapshots of faces in the stadium. 75 We describe these tasks in detail later in Section 4.5. Requirements. A practical programming framework that supports crowd-sensing tasks must satisfy sev- eral challenging requirements. Some of these requirements govern the expressivity of the programming language, while others constrain the underlying runtime system. We discuss these in order. Expressivity Requirements. Requestors must be able to specify worker-mediation, i.e., a worker may need to perform an action to complete a stage such as initiating or stopping the recording of the video or audio clip, labeling data or approving data for upload (see below). However, not all stages will require workers to initiate sensor data collection; requestors may want to design stages that access stored sensor data, such as in the forensic analysis task. Stages must support not just sensing, but also in-network processing; this helps conserve network band- width for users with data usage limits or reduces upload latency by reducing the volume of information that needs to be transmitted on a constrained connection. For example, in our video documentation exam- ple, video summaries were first extracted from the video in order to weed out irrelevant videos. Similarly, feature vectors may be extracted in the collaborative learning task, or faces in the forensic analysis task. In addition, the programming language must be extensible, in order to support new sensors or new in-network processing algorithms. Tasks may have timeliness requirements and any contribution received after the deadline are discarded; in general, we anticipate task deadlines to be on the order of hours or days after task initiation, so traditional quality-of-service concerns may not apply. Tasks may require crowd-curation; in our scenario, Alice relies on this to pre-select videos likely to be most relevant to her research objectives. Requestors must be able to specify monetary incentives for workers. In our scenario, Alice promises to pay participants for uploading videos. Some tasks may also require reverse-incentives, where the workers 76 pay the requestors for the privilege of participating in the task. In our auditioning example, a requestor may charge a small fee in order to disincentivize frivolous participants. Runtime Requirements. Workers must be able to sign up for multiple concurrent tasks, and requestors should be able to initiate multiple tasks concurrently. Thus, a smartphone user may wish to participate in collaborative learning, but may also sign up to be a spot reporter. Stages in a task may be executed at different times by different workers; thus, stage execution is not necessarily synchronized across workers. In the video documentation task, one worker may upload their summaries several hours after another worker has uploaded his or her video. Task execution must be robust to intermittent disconnections or longer-term outages on smartphones. The runtime should preserve subject anonymity with respect to requestors, and should contain mech- anisms for ensuring data privacy. It should use best practices in secure communication, and ensure the safety of execution on worker smartphones. We discuss later what kinds of anonymity, privacy, security, and safety are achievable in our setting. Finally, the runtime must respect user-specified resource usage policies on the smartphone for crowd-sensing; for example, if generating a video summary is likely to consume significant battery power, the framework must terminate the stage or take other actions discussed later. Summary. This is a fairly complex web of requirements, but we believe they are necessary for crowd- sensing. Worker-mediation and crowd-curation leverage human intelligence to ensure high-quality sensor data. Support for concurrent tasks and asynchrony enables high throughput of tasks. Monetary incentives, as well as anonymity, privacy and security features as well as resource controls incentivize worker partic- ipation. Ensuring task execution robustness in the runtime relieves programmers of the burden of dealing with failures. In the next section, we describe an architecture that meets these requirements. 77 Data Repository MedScript Program (xml-based) MedScript Interpreter Stage Library Stage Database Stage Database Binary Checker Metadata Table Metadata Table Raw Data Amazon Mechanical Turk Stage Tracker MedBox … Control Message Data Transfer Command Push (SMS, C2DM) Transform Manager Storage Manager stage stage Data Transfer Report program states, uid list, review results Download Aqualet(s) Register Tasks, Get results Cloud (Server) Mobile Devices Stage State Table Task Tracker Worker Manager Type Checker Task Task … Upload Manager Sensor Manager Figure 4.1: System Architecture 4.3 Medusa System Architecture In this section, we first describe the architectural principles that guide the design of our crowd-sensing programming system, called Medusa. We then illustrate our design (Figure 4.1) by describing how the video documentation crowd-sensing task is executed by Medusa. 4.3.1 Architectural Principles Medusa is a high-level language for crowd-sensing. In Medusa, programmers specify crowd-sensing tasks as a sequence of stages. Thus, Alice would program her video documentation task using a description that looks approximately like this: Recruit -> TakeVideo -> ExtractSummary -> UploadSummary -> Curate -> UploadVideo 78 This describes the sequence of steps in the task. The Medusa runtime executes these tasks. The design of the Medusa runtime is guided by three architectural decisions that simplify overall system design. Principle #1: Partitioned Services. Medusa should be implemented as a partitioned system that uses a collection of services both on the cloud and on worker smartphones. This constraint follows from the following observation. Some of the requirements are more easily and robustly accomplished on an always Internet-connected cloud server or cluster: task initiation, volunteer recruitment, result storage, and mone- tary transactions. Others, such as sensing and in-network processing, are better suited for execution on the smartphone. Principle #2: Dumb Smartphones. Medusa should minimize the amount of task execution state that is maintained on smartphones. This design principle precludes, for example, large segments of a task from being completely executed on the smartphone without the cloud being aware of execution progress. We impose this constraint to enable robust task execution in the presence of intermittent connectivity failures, as well as long-term phone outages caused, for example, by workers turning off their smartphones. Principle #3: Opt-in Data Transfers. Medusa should automatically require a user’s permission before transferring any data from a smartphone to the cloud. Data privacy is a significant concern in crowd- sensing applications. Our principle ensures that, at the very least, users have the option to opt-out of data contributions. Before workers opt-in, they may view a requestor’s privacy policy. Discussion. Other designs of the runtime are possible. For example, it might be possible to design a purely peer-to-peer crowd-sensing system, and it might also be possible to empower smartphones to exclusively execute crowd-sensing tasks. In our judgement, our architectural principles enable us to achieve all of the requirements outlined in the previous section without significantly complicating system design. Further- more, these principles do not precisely determine exactly what functionality to a place on the cloud or on the phone: our design is one instantiation that adheres to these principles. Some of these principles may appear to conflict each other. For example, if the smartphone components are designed to be “dumb”, then intermediate data products during task execution must be communicated 79 Time Flow Medusa Alice Bob Charlie Task Initiation Create ‘Recruit’ stage Sign-up to ‘recruit’ stage Sign-up to ‘recruit’ stage Initiate ‘TakeVideo’ stage Taking Video Taking Video Uploading Video Uploading Video Initiate ‘UploadVideo’ stage Got the Video Sending the Video to Alice … … … Figure 4.2: Illustration of video documentation task to the cloud, which may require significant user opt-in interactions and hamper usability. We describe later how we address this tension. Finally, our opt-in requirement can impact usability by requiring user intervention whenever data leaves the phone. In the future, we may consider relaxing this constraint in two ways. First, users may be willing to sacrifice privacy for convenience or monetary reward. In the future, we may consider policies that allow users to opt-out of approving data transfers for tasks which promise significant monetary incentives, for example. An understanding of these kinds of exceptions can only come from experience using the system. Second, some of the in-network processing algorithms may be designed to be privacy-preserving (e.g., by using differential privacy [41]), so user assent may not be necessary for data generated by these algorithms. 4.3.2 How Medusa Works In keeping with our partitioned services principle, Medusa is structured as a collection of services running on the cloud and on the phone (Figure 4.1). These services coordinate to perform crowd-sensing tasks. We give a high-level overview of Medusa by taking our running example of Alice’s video documentation task. In Medusa, Alice writes her video documentation task in MedScript, a high-level language designed for non-experts, which provides stages as abstractions and allows programmers to express control flow 80 between stages. She submits the program to the MedScript interpreter, which is implemented as a cloud service. Figure 4.2 illustrates the sequence of actions that follows the submission of the task to Medusa. The interpreter parses the program and creates an intermediate representation which is passed on to a Task Tracker. This latter component is the core of Medusa, coordinating task execution with other components as well as with workers’ smartphones. For theRecruit stage, Task Tracker contacts Worker Manager (a back-end service), which initiates the recruitment of workers. As we shall describe later, Worker Manager uses Amazon Mechanical Turk [1] for recruitment and a few other tasks. When workers agree to perform the task, these notifications eventually reach Task Tracker through Worker Manager; different workers may agree to perform the task at different times (e.g., Bob and Charlie in Figure 4.2). More generally, workers may perform different stages at different times. Monetary incentives are specified at the time of recruitment. Once a worker has been recruited, the Task Tracker initiates the next stage, TakeVideo, on that worker’s smartphone by sending a message to the Stage Tracker, one instance of which runs on every phone. The TakeVideo stage is a downloadable piece of code from an extensible Stage Library, a library of pre-designed stages that requestors can use to design crowd-sending tasks. Each such stage exe- cutes on the phone in a sandboxed environment called MedBox. TheTakeVideo stage requires human intervention – the worker needs to open the camera application and take a video. To remind the worker of this pending action, the stage implementation uses the underlying system’s notification mechanism to alert the worker. Once the video has been recorded, Stage Tracker notifies the Task Tracker of the completion of that stage, and awaits instructions regarding the next stage. This is in keeping with our dumb smartphones principle. The video itself is stored on the phone. The Task Tracker then notifies the Stage Tracker that it should run the ExtractSummary stage. This stage extracts a small summary video comprised of a few sample frames from the original video, then uploads it to the Task Tracker. Before the upload, the user is required to preview and approve the contribution, in keeping with our opt-in data transfers principle. 81 Execution of subsequent stages follows the same pattern: the Task Tracker initiates thecurate stage while a completely different set of volunteers rates the videos. Finally, the selected videos are uploaded to the cloud. Only when all stages have completed is Alice notified (Figure 4.2). Medusa maintains persistent state of data contributions in its Data Repository as well as the state of stage execution for each worker in its Stage State Table. As we shall discuss later, this enables robust task execution. In the next section, we discuss each of these components in greater detail. 4.4 MEDUSA Design Medusa achieves its design objectives using a cloud runtime system which consists of several components, together with a runtime system on the smartphone. Before we describe these two subsystems, we begin with a description of the MedScript programming language. 4.4.1 The MedScript Programming Language The MedScript programming language is used to describe crowd-sensing tasks, so each task described in Section 4.2 would be associated with a corresponding MedScript. More precisely, a MedScript defines a task instance, specifying the sequence of actions performed by a single worker towards that crowd- sensing task. In general (as described in the next section), requestors may ask the Medusa runtime to run multiple instances of a MedScript task at different workers, and a single worker may also run multiple task instances. Moreover, a single worker may concurrently run instances of different tasks (possibly) initiated by different requestors. Listing 4.1 shows the complete listing of the video documentation task. We use this program listing to illustrate various features of the MedScript language. MedScript consists of two high-level abstractions: stages and connectors. A stage (e.g., line 8) de- scribes either a sensing or computation action, or one that requires human-mediation, and connectors (e.g., 82 line 65) express control flow between stages. In our current realization, MedScript is an XML-based domain-specific language; we have left it to future work to explore a visual programming front-end to MedScript. 1 <xml> <app> 3 <name> VideoDocumentation</name> <r r i d>[User ’ s R e q u e s t o r ID]</ r r i d> 5 <r r k e y>[User ’ s R e q u e s t o r Key]</ r r k e y> <d e a d l i n e>21:00:00 12/16/2011</ d e a d l i n e> 7 <s t a g e> 9 <name>R e c r u i t</name><type> HIT</type> <b i n a r y>r e c r u i t</b i n a r y> 11 <c o n f i g> <stmt> Video Documentation App . Demonstration</stmt> 13 <e x p i r a t i o n>21:00:00 12/16/2011</ e x p i r a t i o n> <reward>.05</reward> 15 <o u t p u t> W WID</o u t p u t> </c o n f i g> 17 </s t a g e> <s t a g e> 19 <name> TakeVideo</name><type> SPC</type> <b i n a r y> mediagen</b i n a r y> 21 <t r i g g e r>useri n i t i a t e d</t r i g g e r><review> none</review> <c o n f i g> 23 <params>t video</params> <o u t p u t> VIDEO</o u t p u t> 25 </c o n f i g> </s t a g e> 27 <s t a g e> <name> ExtractSummary</name><type> SPC</type> 29 <b i n a r y> videosummary</b i n a r y> <t r i g g e r> immediate</t r i g g e r><review> none</review> 31 <c o n f i g> <i n p u t> VIDEO</i n p u t> 33 <o u t p u t> SUMMARY</o u t p u t> </c o n f i g> 35 </s t a g e> <s t a g e> 37 <name> UploadSummary</name><type> SPC</type> <b i n a r y>u p l o a d d a t a</b i n a r y> 39 <t r i g g e r> immediate</t r i g g e r> <c o n f i g> 41 <i n p u t> SUMMARY</i n p u t> </c o n f i g> 43 </s t a g e> <s t a g e> 45 <name>Curate</name><type> HIT</type> 83 <b i n a r y> vote</b i n a r y><wid>a l l</wid> 47 <c o n f i g> <stmt>J u d g i n g from Video Summaries</stmt> 49 <e x p i r a t i o n>21:00:00 12/16/2011</ e x p i r a t i o n> <reward>.01</reward> 51 <numusers>2</numusers> <i n p u t> SUMMARY</i n p u t> 53 <o u t p u t> $SMASK</o u t p u t> </c o n f i g> 55 </s t a g e> <s t a g e> 57 <name> UploadVideo</name><type> SPC</type> <b i n a r y>u p l o a d d a t a</b i n a r y> 59 <t r i g g e r> immediate</t r i g g e r> <c o n f i g> 61 <i n p u t> $SMASK, VIDEO</i n p u t> </c o n f i g> 63 </s t a g e> 65 <c o n n e c t o r> <s r c>R e c r u i t</s r c> 67 <d s t><s u c c e s s> TakeVideo</s u c c e s s><f a i l u r e>R e c r u i t</f a i l u r e></d s t> </c o n n e c t o r> 69 <c o n n e c t o r> <s r c> TakeVideo</s r c> 71 <d s t><s u c c e s s> ExtractSummary</s u c c e s s><f a i l u r e>R e c r u i t</f a i l u r e></d s t> </c o n n e c t o r> 73 <c o n n e c t o r> <s r c> ExtractSummary</s r c> 75 <d s t><s u c c e s s> UploadSummary</s u c c e s s><f a i l u r e>R e c r u i t</f a i l u r e></d s t> </c o n n e c t o r> 77 <c o n n e c t o r> <s r c> UploadSummary</s r c> 79 <d s t><s u c c e s s>Curate</s u c c e s s><f a i l u r e>R e c r u i t</f a i l u r e></d s t> </c o n n e c t o r> 81 <c o n n e c t o r> <s r c>Curate</s r c> 83 <d s t><s u c c e s s> UploadVideo</s u c c e s s><f a i l u r e>R e c r u i t</f a i l u r e></d s t> </c o n n e c t o r> 85 </app> </xml> Listing 4.1: Video Documentation Stages. A stage is a labeled elemental instruction in MedScript. Thus,TakeVideo is a label for the stage described starting on line 18. Stage labels are used in connector descriptions (see below). 84 Medusa defines two types of stages: a sensing-processing-communication stage or (SPC stage), and a human-intelligence task stage (or HIT stage). An SPC stage extracts sensor data from either a physical sensor (such as a camera, GPS, or accelerometer) or a logical sensor (such as a system log or a trace of network measurements), processes sensor data to perform some kind of summarization or recognition, or communicates sensor data to the requestor. A HIT stage exclusively requires human review and/or input and does not involve sensing or autonomous data processing. Thus, for example, TakeVideo or ExtractSummary or UploadVideo (lines 18, 27, and 56) are all examples of SPC stages, while Recruit and Curate (lines 8 and 44) are examples of HIT stages. Stages may take parameters; for example, thereward parameter for theRecruit stage specifies a monetary reward (line 14). Each stage has an associated stage binary, referenced by thebinary XML tag (e.g., line 10). Medusa maintains an extensible library of stage binaries in its Stage Library. Stage binaries may be re-used across tasks, and also within the same task: both the UploadSummary and UploadVideo stages use the uploaddata binary. Connectors. Each stage can have one of two outcomes: success or failure. A stage may fail for many reasons: if a stage does not encounter failure, it is deemed to be successful. For each stage, connectors specify control-flow from that stage to the next stage upon either success or failure. Thus, in our example, ifTakeVideo is successful (lines 69-72), control flow transfers toExtractSummary, else control flow reverts to theRecruit stage. In general, for each outcome (success or failure), there is at most one target next stage. If a target is not specified for an outcome, the Medusa runtime terminates the corresponding task instance. Medusa allows a special control-flow construct called the fork-join. From one of the outcomes of the stage (usually the success outcome), a programmer may define multiple connectors to different target stages. This indicates that the target stages are executed concurrently. This construct is useful, for example, to concurrently read several sensors in order to correlate their readings. We give a detailed example of this construct later. Medusa enforces, through compile-time checks, that both the success and failure outcomes 85 of these target stages are connected to (or join) a single stage, one which usually combines the results of the target stages in some way. Data Model and Stage Inputs and Outputs. Each stage in Medusa produces exactly one output. This output is explicitly assigned by the programmer to a named variable. In our example, named variables include VIDEO and SUMMARY. The scope of a named variable is the task instance, so that any stage in an instance may access the named variable. If two instances execute on the same device, their variable namespaces are distinct. Each stage can take zero or more inputs. Inputs to stages are defined by variables. TheExtractSummary stage takes VIDEO as input (line 32). Each named variable has an associated type (see below), and the MedScript compiler performs static type checking based on the known types of stage outputs. In general, a named variable’s value is a bag of one or more homogeneously-typed items, and the named variable is said to have the corresponding type. Supported types in Medusa include integer, floating-point, and many predefined sensor data types. Each sensor data type represents a contiguous sequence of sensor data, and examples include: a video clip, an audio clip, a contiguous sequence of accelerometer readings, and so on. Medusa does not define an assignment operator, so variables can only be instantiated from stage out- puts. For example, VIDEO is instantiated as the output of TakeVideo (line 24) and SUMMARY as the output ofExtractSummary (line 33). Both of these variables are bags of video clips: the former con- tains videos taken by the user and the latter the summary videos generated byExtractSummary. Stages and User-Mediation. One of the distinguishing aspects of Medusa is its support for worker- mediation. Workers can be in-the-loop in four ways. First, the HIT stages Recruit and Curate require user intervention. Workers have to agree to perform a task instance during theRecruit stage. Requestors can specify monetary incentives as well as a deadline for recruitment. TheCurate stage permits humans to review sensor data submissions and select submissions relevant to the task at hand. This takes as input a named variable containing one or 86 more data items (in our exampleSUMMARY, line 52), and produces as output a bit mask (SMASK, line 53), that indicates selected submissions. This bitmask can be used in later stages to determine, for example, which data items to upload. Second, only one stage binary, uploaddata, can transfer data from the phone to the cloud and, before data is uploaded, the worker is presented with a preview of the data and is given an opportunity to opt-in to the submission. Optionally, at this stage, the requestor may specify a link to a privacy policy that describes how workers’ submissions will be used. If a worker chooses to opt out of the upload, the uploaddata stage fails. Third, some stages, such as the TakeVideo stage, may require explicit worker action in order to initiate sensing. This form of worker-mediation leverages worker intelligence to determine the appropriate time, location, and other environmental conditions to take sensor data samples. This is indicated by a trigger parameter to a stage (e.g., line 21). Fourth, requestors may provide humans with the option to annotate the output of any stage. Once a stage has produced an output, users are presented with a prompt to label the output by either selecting from a list of choices or by adding freeform text annotations. These annotations are stored as metadata with the data items. This capability is useful in our collaborative learning application, where users submit labeled training samples that, for example, indicate activities (like walking, running, sitting). Task Parameters. In addition to stage parameters, programmers may specify task parameters, usually before the stage and connector definitions. In our example, the requestor specifies a deadline for the task. If a task instance is not completed before its deadline, it is deleted. Other parameters can include requestor credentials, described below. Failure Semantics. As described above, each stage has two possible outcomes: success or failure. Stages can fail for several reasons. The smartphone may be turned off by its owner or its battery may die. The deadline on a stage expires because a worker failed either to initiate a sensing action or to annotate a sensor 87 result within the deadline. The merge or join stage after a fork may fail, because one of the forked stages failed. Finally, stages may fail because components of the runtime fail. Deadline and runtime failures result in task instances being aborted. However, other failures may result in one or more stages being re-tried; the requestor can specify, using appropriate connectors, which stages should be retried upon failure. 4.4.2 Medusa Cloud Runtime The Medusa cloud runtime consists of several components described below. MedScript Interpreter. Requestors submit their XML task descriptions to the MedScript interpreter and indicate how many instances of tasks should be run by Medusa. For example, if Alice wishes to receive about 50 videos in the hope that about 10-15 of them are of sufficient quality for her research, she can indicate to the interpreter that she wishes to spawn 50 instances. Requestors can add additional instances if their initial estimates proved to be incorrect. The interpreter performs compile-time checks on the submitted task descriptions: static type checks, checks on restrictions for fork-join connectors, and sanity checks on named variables. It then converts the task description into an intermediate representation and stores this in the Stage State Table, together with the number of instances to be run. Finally, it notifies the Task Tracker, which is responsible for coordinating the stage execution for all instances of this task. Task Tracker. In Medusa, each submitted task is associated with a fresh instance of a Task Tracker. The Task Tracker spawns multiple worker instances and keeps track of their execution. Recall that different stages may execute at different times at different workers. The Task Tracker creates instance state entries that store the current state of execution of each instance in the Stage State Table, a persistent store. 88 For HIT stages, the Task Tracker uses the Worker Manager to initiate those stages. Thus, if Alice wishes to invoke 50 instances of her video documentation task, the Task Tracker first instantiates 50 in- stances of the Recruit stage using Worker Manager. When a worker signs up for one of the task in- stances, the Worker Manager notifies the Task Tracker, which then proceeds to instantiate theTakeVideo stage on the worker’s smartphone. To do this, the Task Tracker notifies the Stage Tracker on the worker’s smartphone to begin executing the corresponding stage. We describe the notification mechanism later in this section. When that stage is completed on the smartphone, the stage tracker returns a list of references (pointers) to data items in that stage’s output variable; the data items themselves are stored on the smartphone. These references are stored as part of the instance state; if the smartphone is turned off immediately after theTakeVideo stage, the Task Tracker has all the information necessary to execute the subsequent stage when the smartphone restarts. More generally, the Task Tracker notifies the Stage Tracker instructing it to execute an SPC stage on the phone. It passes the references corresponding to input variables for that stage. At the end of the execution, the Stage Tracker sends the Task Tracker all references for data items in the output of that stage. Thus, in Medusa, the Task Tracker coordinates the execution of every stage and maintains instance state information in persistent storage. A single worker may concurrently sign up for multiple instances of the same task, and/or instances of many different tasks. If a Task Tracker fails, it can be transparently restarted and can resume stage execution at each worker. Between the time when the Task Tracker fails, and when it resumes, some stages may fail because the Stage Tracker may be unable to contact the Task Tracker; these stages will usually be re-executed, depending upon the control flow specified by the requestor. The Task Tracker keeps track of the task deadline; when this expires, the Task Tracker terminates the corresponding worker instance by updating the program state appropriately. The Task Tracker also keeps internal timeouts for stage execution, and may decide to retry stages whose timeout expires. This ensures robustness to smartphone outages and connectivity failures, as well as to humans who may not have initiated sensing actions on time or completed an annotation of sensor data. 89 The Task Tracker does not know the identities of the requestor or the worker, instead referring to them in its internal data structures using an opaque machine-generated ID. Below, we describe how these IDs are mapped to identities of individual workers or of the requestor. This design ensures that knowledge about identities of participants is localized to a very small part of the system (described later), thereby reducing the vulnerability footprint. The Task Tracker thus supports concurrent execution of worker instances, tracks unsynchronized stage execution across different workers, enforces timeliness of stage execution, as well as ensures robustness. Worker Manager. The Worker Manager supports HIT stage execution, monetary transactions, crowd- curation, and worker smartphone notifications. To achieve these objectives, Worker Manager uses the Amazon Mechanical Turk (AMT [1]) system as a back-end. AMT was designed to support crowd-sourcing of human intelligence tasks, such as translation, behavioral surveys, and so forth. When the Task Tracker encounters aRecruit stage, it invokes the Worker Manager and passes to it the monetary incentive specified in the stage description, together with the number of desired instances. The Worker Manager posts AMT tasks (using the AMT API) seeking workers who are interested in signing up for the crowd-sensing task. Usually, requestors will include an optional URL as a parameter to the Recruit stage so that potential workers can see a description of the work involved. Workers may use a browser on their smartphone to sign up on AMT for this task instance. When a worker signs up, AMT notifies the Worker Manager, which in turn notifies the Task Tracker. In order to post a task on AMT, the Worker Manager needs to be entrusted with an Amazon-issued ID and key for the requestor. These requestor credentials are specified as task parameters. However, the ID and key do not reveal the identity of the requestor. Thus, while requestor identity is not exposed, limited trust must be placed in the Worker Manager not to misuse IDs and keys. When a task instance completes (e.g., when a worker uploads a selected video in our video documen- tation example), the Task Tracker contacts Worker Manager, which, in turn, presents the uploaded data to the requestor and seeks approval. Once the requestor approves the submission, money is transferred from the requestor’s AMT account to the worker’s AMT account. 90 Crowd-curation works in a similar manner. The Worker Manager spawns an AMT “voting” task invit- ing volunteers to vote on submissions in return for a monetary incentive. When the voting is complete, AMT notifies the Worker Manager, which in turn notifies the Task Tracker. The Worker Manager also supports reverse-incentives, so workers can pay requestors for the privilege of contributing sensor data. Our auditioning example requires this, where the reverse incentive is designed to dissuade frivolous participation. The Worker Manager achieves this by posting a task requiring a null contribution on AMT with roles reversed: the worker as the “requestor” and the requestor as the only permitted “worker”. As soon as the requestor signs up and the worker “accepts” the contribution, the requisite payment is transferred from the workers AMT account to that of the requestor. Our use of AMT requires workers and requestors to have AMT accounts and to manage their bal- ances on these accounts. Both workers and requestors have an incentive to do this: requestors are able to complete tasks using the system that they might not otherwise have been able to, and workers receive micro-payments for completing sensing tasks that require little cognitive load (since many of the steps can be automated). Moreover, AMT preserves subject anonymity so that worker identities are not revealed to requestor (unless workers choose to). In Medusa, this subject anonymity is also preserved within the rest of the system (the runtimes on the smartphone and cloud) by identifying requestors and workers only using the opaque AMT IDs. However, since Medusa supports crowd-sensing, the sensing data submitted by a worker may reveal the identity of the worker. By opting-in to an upload of the sensor data, workers may give up subject anonymity; they can always choose to opt-out, or carefully initiate sensor data collection (e.g., taking videos whose content does not reveal the sender), should they wish to retain anonymity. Finally, because workers are anonymous to the rest of the system, and AMT is the only component that knows the identity of the worker, the Task Tracker uses AMT (through the Worker Manager) to notify workers’ smartphones of SPC stage initiation. This notification is done by using AMT to send an SMS message to the smartphone, which is intercepted by the Stage Tracker. That component initiates stage execution on the phone. When an SPC stage completes, the Stage Tracker directly transfers program 91 Stage Name Description probedata Searches for data items matching specified predicates. uploaddata Uploads data corresponding to specified input variable. mediagen Invokes external program to sense a data item (e.g., video clip or audio clip). videosummary Generates a motion jpeg summary from a original high-qualify video data. facedetect Extracts faces from videos or image inputs netstats Scans Bluetooth and Wifi interfaces. Users can set filters to get selective information. vcollect Collects samples directly from a sensor (e.g., from accelerometer or microphone). vfeature Extracts feature vectors from input raw sensor data. gpsrawcollect Extracts sequence of location samples at specified rate. combiner Temporally merges two sensor data streams. cfeature Extracts feature vectors from temporally merged streams. Table 4.1: Stage Binaries in Stage Library. states to the Task Tracker without AMT involvement. However, this transfer contains enough identifying information to uniquely identify the task instance. Stage Library. Stage Library is a reusable and extensible library of SPC stage binary implementations. For example, the binaries for theTakeVideo,ExtractSummary, andUploadVideo stages (namely mediagen, videosummary and uploaddata) are stored here. These binaries are stored on the cloud, but when a stage is initiated on a smartphone, the Stage Tracker is responsible for downloading the binaries. Stage binaries may be cached on smartphones, and caches are periodically purged so that the latest versions of the stage binaries may be downloaded. Table 4.1 gives some examples of stages that we have implemented. The Stage Library ensures Medusa’s extensibility. When a requestor needs a specific functionality to be implemented as a stage binary, they may provide this implementation themselves or (more likely) make a feature request to the Medusa system developers. A larger-scale deployment, which is left to future work, will provide us with insights into how often a new stage needs to be added to Stage Library. We anticipate that there will likely be many such additions to the Stage Library initially, but, with time, requestors will be able to reuse stages frequently. To support the Stage Library’s extensibility, stages are stored on the cloud and downloaded on demand to the phone. Before stage binaries are admitted into the Stage Library, they are vetted for safety. As we describe below, stages are only allowed to invoke a restricted set of functionality on the smartphone. For example, 92 stages cannot access the underlying file system on the phone or transmit data to arbitrary websites. Medusa ensures this by using static program analysis to determine the set of APIs invoked by stage binaries; it conservatively rejects any stage binary that accesses disallowed APIs or uses dynamic API binding. In this way, Medusa achieves safety and extensibility of stages that implement sensing, in-network processing, and communication functions. 4.4.3 Medusa Runtime on the Smartphone The Medusa runtime on the smartphone consists of the two components described below. Stage Tracker. As described above, the Stage Tracker receives stage initiation notifications by intercepting SMS messages. Each SMS message contains an XML description of the stage; the Stage Tracker parses this description, downloads the stage binary if necessary, and initiates the execution of the stage. The SMS message also contains the values of named variables instantiated during previous stage computations; recall that these contain pointers to data items stored on the phone. The Stage Tracker is responsible for setting up the namespace (containing these named variables and their values) before the stage executes. It is also responsible for implementing the triggers necessary to initiate stage execution (e.g., to start recording a video clip), as well as the human-mediation required for annotations. Once a stage’s execution is complete, stage state is transferred back to the Stage State Table. As discussed before, this includes only references to data items, not actual data (the one exception is the upload stage, which may transfer data but only after the worker explicitly opts in). Task Tracker polls the Stage State Table periodically to determine stage completion. If a stage execution fails for any reason (e.g., resource constraints or implementation bugs in a stage), the Stage Tracker returns a failure notification to the Task Tracker. If the Task Tracker does not hear back from the Stage Tracker within a timeout period, it tries to contact the Stage Tracker before determining whether the stage has failed or not. By design, the Stage Tracker does not know (or care) to which instance of which task a given stage belongs. Furthermore, the Stage Tracker can initiate multiple concurrent stages. The Stage Tracker itself 93 System Components Requirements MedScript Interpreter Timeliness enforcement, robustness Task Tracker Supports multiple users, un-synchronized user operation, timeliness enforce- ment, robustness Worker Manager Incentives, crowd-curation, anonymity, security, worker-mediation Stage Library In-network processing, sandboxing, extensibility Stage Tracker Multiple concurrent stages, robustness, worker-mediation, privacy MedusaBox Stateless execution, sandboxing, concurrent stages, resource monitoring, access to stored data Table 4.2: Medusa System Components and Properties is stateless, and transfers all intermediate program state to the Task Tracker which maintains all instance state. If a Stage Tracker fails, it can be transparently restarted: any ongoing stage executions can complete successfully, but a stage that completed before the Stage Tracker was restarted may be marked as having failed and may then be re-tried. In this manner, the Stage Tracker robustly supports multiple concurrent task instances. MedBox. On the smartphone, stages executed in a sandboxed environment called MedBox. MedBox exports capabilities necessary for crowd-sensing, and monitors resource usage by stages. MedBox provides libraries that allow stages to access sensors, or that provide functions to manipulate sensitive data, as well as to transfer sensor data. Moreover, it also provides a library to access restricted storage on the phone where sensor data is stored. Stages can query this storage system using metadata attributes like location and time, allowing them to extract previously generated sensor data. Finally, Med- Box supports triggers for stage execution, including triggers based on time or location. For example, a worker may be notified that she is near the location of an “Occupy” site. Notifications of these triggers may be delivered to users using any of the methods available in the underlying operating system. Stages are restricted by cloud-side static analysis to only access these libraries (as described above). Finally, Medusa allows smartphone owners to specify limits on usage of system components such as CPU, network, or memory on a stage or task instance. MedBox tracks the usage of each stage and aborts the stage if it exceeds its allocated resources. In the future, we envision extending the expressivity of these policies to permit privacy controls (e.g., opt-out for certain kinds of uploads) or reward-related stage-specific limits (e.g., tighter limits for stages where the monetary incentives are lower). 94 Summary. Medusa’s various components together satisfy the complex web of crowd-sensing require- ments discussed in Section 4.2. Table 4.2 summarizes which components satisfy which requirements; a requirement may be satisfied by multiple components. 4.5 Evaluation In this section, we demonstrate four properties of Medusa: the expressivity of MedScript, the ability of the Medusa runtime to support concurrent task instance execution, the scalability of the runtime, and the robustness of Medusa. Our experiments are conducted on a prototype of Medusa. The MedScript interpreter and Task Tracker are written using Python. Apache and PHP are used for accessing the MySQL based Stage State Table, as well as Stage Library. Worker Manager runs Tomcat with a Java Servlet environment and accesses AMT through a Java SDK 2 . Finally, the Stage Library uses thedex2jar utility to convert Dalvik bytecode to Java, and thejavap program from the Sun JDK for static program analysis. Our smartphone software runs on Android v2.3.6. Stage Tracker is implemented as an Android service, and intercepts SMS and MMS messages 3 , and uses secure-HTTP based connections to the Stage Library and the Task Tracker. MedBox is implemented as a collection of Java threads. To evaluate our system, we use commodity hardware both for the server and client side. In all our experiments, workers use either the Google Nexus One or the Nexus S. On the server, our cloud-based service is emulated by a Dell PowerEdge 2950, which has a dual-core Intel Xeon E5405 2.00GHz proces- sor and 6MB built-in cache. Finally, we use data service from two GSM carriers in the United States to evaluate our push mechanism via SMS messaging. 2 http://mturksdk-java.sourceforge.net/ 3 Our notifications may be longer than that allowed by SMS. Some carriers break up long initiation XML messages into multiple SMSs, others convert the SMS into an MMS. 95 Application LOC Sensors Properties Video Documentation 86 Camera In-network processing, Crowd curation Collaborative Learning 62 Accelerometer, Audio Different sensors Auditioning 86 Camera Reverse incentive mechanism, Crowd curation Forensic Analysis 86 GPS, Audio Access to stored data Spot Reporter 45 Camera, Audio Text/V oice tagging Road Monitoring 90 Accelerometer Fork-join construct, App from PRISM [38] Citizen Journalist 45 GPS Multiple triggers, App from PRISM [38] Party Thermometer 62 GPS, Audio Fork constraint, App from PRISM [38] WiFi & Bluetooth Scanner 45 Network Sensors Apps from AnonySense [35] Table 4.3: Implemented Crowd-Sensing Apps. ( Line Of Code) 4.5.1 Language Expressivity To demonstrate the expressivity of MedScript, we have implemented ten qualitatively different crowd- sensing tasks (Table 4.3). Each task demonstrates different facets of Medusa, ranging from the use of different sensors to the use of different facilities provided by MedScript. Some have been proposed for participatory sensing in related work: we have implemented these in Medusa to demonstrate the power of the system. Novel Crowd-Sensing Tasks. Many of the tasks in Table 4.3 are novel and are enabled by Medusa. HIT stage Non-HIT stage Recruit Curate TakeVideo ExtractSummary UploadSummary UploadVideo Figure 4.3: Video Documentation Video Documentation. Although we have explained the code for the video documentation task in List- ing 4.1, we discuss a few of the subtleties of this task here. First, for this and for other tasks discussed below, we illustrate the task program as a control flow diagram shown in Figure 4.3 (lack of space pre- cludes a full program listing for the other tasks discussed below). HIT stages are represented by squares and SPC stages by ovals. Only the success connectors are shown, and the failure connectors are limited for brevity. 96 This task demonstrates some of the Medusa’s key features: in-network processing for extracting video summaries, crowd-curation, and opt-in data transfers (both when summaries are uploaded as well as when full videos are uploaded). Some stages are worker-initiated (e.g. TakeVideo), while others are automat- ically triggered when the previous stage completes (e.g., ExtractSummary). Worker-initiated stages have an unfilled rectangle before them, automatically triggered stages do not. A user may take multiple videos, so that multiple summaries can be uploaded for curation. Each upload stage requires worker opt-in (denoted by the filled rectangle before the stage). Only a subset of these summaries may be selected; the Curate stage returns a bitmask, which is used to determine the set of full videos that need to be up- loaded. Finally, this task has multiple data upload stages: MedScript does not restrict uploads to the end of the control flow. Recruit GetRawData GetFeatures UploadFeatures HIT stage Non-HIT stage Figure 4.4: Collaborative Learning (Full listing in Appendix B.1) Collaborative Learning. The Collaborative Learning (Figure 4.4) task allows, for example, software ven- dors to solicit training samples for constructing robust machine learning classifiers (Section 4.2). Com- pared with video documentation, this task uses a different set of sensors (either an accelerometer or a microphone) and a different in-network processing algorithm (feature extraction). This task requires an- notation, denoted by an unfilled rectangle after the GetRawData stage: workers are required to label the collected samples (e.g., as a sample of “walking”, or “running” etc.). If the worker collects multiple samples, she has to label each sample. Auditioning. Our auditioning task allows, for example, TV show producers to solicit videos from aspiring actors (Figure 4.5). The novel feature of this task is the support for reverse incentives: workers are re- quired to pay the requestor for the privilege of submitting a video for auditioning. Reverse incentives are 97 Recruit SendCredentials FeePayment MakeVideo UploadVideo HIT stage Non-HIT stage Figure 4.5: Auditioning (Full listing in Appendix B.1) implemented by two stages. First,SendCredentials is an SPC stage that sends the worker’s AMT cre- dentials (stored on the phone) to the Medusa cloud runtime. This step is necessary to set up the payment, which is implemented as a HIT stage FeePayment. This stage is conceptually similar to Recruit, except that the Medusa runtime sets up AMT as explained in Section 4.4. Once the requestor accepts and completes this task, payment is transferred; the worker can subsequently take a video and upload it. Uploaded videos are sent to the requestor (the TV producers in our example); we assume that selected participants are notified out-of-band (e.g., invited to an on-site auditioning). Recruit Curate GetImages GetFaces UploadFaces UploadImages HIT stage Non-HIT stage Figure 4.6: Forensic Analysis (Full listing in Appendix B.1) Forensic Analysis. Our forensic analysis task permits security officials to obtain images of people who were at a specified location at a specified time when an incident occurred (Figure 4.6). The novel aspect of this task is access to historical sensor data stored on the smartphone. This access is accomplished through the GetImages stage, which, given a spatio-temporal predicate, obtains images whose meta- data matches the specified predicate. The subsequent GetFaces stage extracts faces from each image. Only the extracted faces are uploaded for curation. A security officer might visually inspect the uploaded 98 faces to select faces corresponding to potential suspects. The images corresponding to the selected faces are then uploaded. Previously-Proposed Sensing Applications. Prior work has explored several participatory sensing appli- cations, and Medusa can be used to program these. We have implemented these applications as crowd- sensing tasks, each of which begins with aRecruit stage using which workers can sign up for the task. Recruit ScanWiFi ScanBluetooth UploadData HIT stage Non-HIT stage Figure 4.7: WiFi/Bluetooth Scanner (Full listing in Appendix B.1) WiFi/Bluetooth Scanner [35] enables requestors to collect Wifi and Bluetooth neighborhoods from workers. This is implemented using anetstats stage, which models this network neighborhood infor- mation as outputs of a sensor. Requestors can use other stages to preprocess (e.g., filter) the data before upload. Recruit Combiner GetFeatures UploadFeatures GetRawGPSData GetRawAccData HIT stage Non-HIT stage Figure 4.8: Road-Bump Monitoring (Full listing in Appendix B.1) Road-Bump Monitoring [89], can be implemented using the task shown in Figure 4.8. This task collects GPS and accelerometer data, correlates data from the two sensors, and extracts features that are used to determine the location of bumps on roads. Its novelty is the use of the fork-join construct, which allows independent stages to concurrently sample GPS and the accelerometer. 99 Recruit GetImage UploadImages HIT stage Non-HIT stage Figure 4.9: Citizen Journalist (Full listing in Appendix B.1) Citizen Journalist discussed in [53], asks workers to take an image at a specified location and label the image before uploading. The corresponding crowd-sensing task, shown in Figure 4.9, is unique in that one of the stages supports two triggers for initiating the stage: a location-based trigger which notifies the worker (shown as a hashed rectangle), who is then expected to initiate the taking of a photo. Recruit GetRawData GetFeatures UploadFeatures HIT stage Non-HIT stage Figure 4.10: Party Thermometer (Full listing in Appendix B.1) Finally, we have also implemented a crowd-sensing task for the Party Thermometer application [38]. This task uses a location trigger to autonomously collect ambient sound, generate feature vectors, and upload them (Figure 4.10). Medusa Standalone Video Documentation 86 8,763 Collaborative Learning 62 7,076 Spot Reporter 45 17,238 Table 4.4: Line of Code Comparison Quantifying Expressivity. Our Medusa task descriptions are very compact (Table 4.3): no task exceeds 100 lines. We also use code size to compare the complexity of three crowd-sensing tasks with standalone implementations of these tasks. Our comparison is only approximate because the implementations being compared are not precisely functionally equivalent. 100 Time 19h08m 19h10m 19h12m 19h14m 19h16m 19h18m 19h20m 19h22m 19h24m 19h26m 19h28m 19h30m 19h32m 19h34m 19h36m 19h38m 19h40m Bluetooth Scanner (1) Video Documentation (2) Forensic Analysis (3) Auditioning (4) Spot Reporter (5) Citizen Journalist (6) Collaborative Learning (7) Party Thermometer (8) WiFi Scanner (9) Road-Bump Monitoring (10) s s Worker subscribed to the Task s s SMS Command Arrival Stage Execution s s s (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) s s s s Worker I Medusa Worker II Worker III Worker IV Figure 4.11: Concurrent Execution of Multiple MedScript Programs Our work on Medusa has been inspired by our earlier stand-alone prototypes for video documentation and collaborative learning. The earlier prototypes did not include support for worker-mediation or incen- tives but had additional user interface elements for software configuration and management. In addition, a research group with whom we collaborate has implemented a spot reporter application, which tasks human agents to send audio, video, or text reports from the field. Our implementation of this task requires three stages, one to recruit agents, another to take the video (together with annotation) and the third to upload the result. Table 4.4 quantifies the difference in code complexity using lines of code as a metric. Even given the differences in implementations, Medusa versions are about two orders of magnitude more compact than the standalone versions! Medusa’s runtime implements many functions common to these standalone applications, thereby reducing the programmer’s burden. 4.5.2 Concurrent Task Instances The dynamics of task execution in Medusa can be quite complex, since computational and sensing stages can be interleaved with worker mediation. Thus, execution dynamics can be strongly influenced by human actions. At the same time, Medusa task throughput can be enhanced by its support for concurrent task instance execution. 101 To illustrate some of these properties of Medusa, we conducted an experiment where we instantiated one task instance for each of the ten prototype applications discussed above. Then, we asked four volun- teers to sign up for a subset of the task instances. Each volunteer was given a phone and assigned an AMT worker account, but were otherwise unconstrained in how and when to complete the tasks. Figure 4.11 depicts a timeline of the progress of the experiment. All task instances were completed within about 30 min. by the four workers. The upper panel shows the completion time of the task instances: these range from 6 min. to 30 min., with most instances completing within about 10 mins. The lone outlier is Citizen Journalist, which was location-triggered and could only be initiated when the worker had reached the specified location. The timeline also depicts several interesting features of Medusa. Task instances can have differing numbers of SPC stages, indicated by the vertical bars along each timeline. Workers can concurrently sign up for multiple task instances; for example, worker I executes three task instances simultaneously. Workers sign up for tasks at different times (between 1 min and 6 min into the experiment). Moreover, stage execution times for different tasks vary widely. For example, Video Documentation and Bluetooth Scanner have much longer stages than Spot Reporter. For some stages, stage execution commences immediately after Stage Tracker receives a command from Task Tracker; for others, waiting for worker initiation delays stage execution. Medusa seamlessly handles all these execution dynamics in a manner transparent to the task requestor. 4.5.3 Scalability and Overhead To quantify the scalability and overhead, we instrumented our prototype to measure the time taken to perform several individual steps involved in task instance execution, both on the cloud runtime and the phone runtime. We then executed a single task instance on a Nexus One phone, where the task consisted of two stages: aRecruit (which requires human-mediation) followed by a null SPC stage (i.e., one that returns immediately and requires no worker-mediation). We present results from 10 trials. 102 Component Avg Max Min Task Interpretation and Type Checking 26.64 26.89 26.45 Task Tracker Initiation Delay 0.84 0.90 0.77 Task Tracker latency amonst stages 0.87 0.94 0.83 [HIT] Delay upon the task registration request 4.83 12.03 3.61 [HIT] Waiting time for the registration response 2177.99 2824.52 1858.45 [HIT] Task execution delay 31039.83 50060.72 20028.91 [SPC] Delay on the request to AMT for commanding workers 1.30 2.56 0.77 [SPC] Messaging request confirmation from AMT 732.5 833.54 700.61 [SPC] Delay on SMS/MMS command message delivery 27000 78000 17000 [SPC] Task execution delay 3038.54 7039.54 1029.20 Total processing time 34.47 43.31 32.42 Total waiting time 63988.86 138758.3 40617.16 Total execution time 64023.33 138801.6 40649.58 Table 4.5: Delay break-down on the server (Unit: msec) Component Avg Max Min Retrieve and parse SMS/MMS command message 38.2 58 26 Stage binary initiation time 402.7 506 374 Stage runner latency 6.2 14 3 Stage termination latency 12.7 14 3 Total overhead imposed 459.8 598 409 Table 4.6: Delay break-down on the phone (Unit: msec) Medusa Server Overhead. Table 4.5 shows server-side delays. Task interpretation and type checking together take 26.64ms on average, Task Tracker initialization takes much less, 0.84ms. For theRecruit stage, a HIT request requires only 4.83ms; however, the time to register the task with AMT and receive a response requires 2.18s. To run an SPC stage, Task Tracker must prepare an XML command and send it to Worker Manager so that it can send a request for AMT to send SMS/MMS messages to the phone. Those two steps together incur 732.5ms of delay. The largest component of delay in our system is the time that it takes to deliver an SMS message to the phone: between 20 to 50 seconds. This is because our implementation uses email-to-SMS gateway-ing; a more optimized implementation might use direct SMS/MMS delivery, reducing latency. That said, latency of task execution is not a significant concern for the kinds of tasks we consider: with humans in the loop, we anticipate crowd-sensing tasks to have deadlines on the order of hours from initiation. Indeed, the 103 second largest component of delay in our system is the time that it takes for a human to sign up for the task. The SPC stage execution time, measured from the perspective of the cloud runtime as the time from when the Task Tracker received confirmation message delivery from Worker Manager, to the time stage detection is completed is about 3 seconds. A large component of this latency is network delay, but another contributor is the polling period in the Task Tracker. Recall that Task Tracker polls the Stage State Table to determine if a state has completed. This polling time adds to the latency; in the future, we plan to alter our system so that Stage Tracker directly notifies Task Tracker of stage completion, which then saves state in the table, eliminating this latency. Overall, we are encouraged by these results. The dominant delays in the system arise from notification using SMS and from waiting for human input. Actual processing overheads are only 34.47ms on average, which permit a throughput of 1740 task instances per minute on a single server. Moreover, this component is highly parallelizable, since each task instance is relatively independent and can be assigned a separate Task Tracker. Thus, if task throughput ever becomes an issue, it would be possible to leverage the cloud’s elastic computing resources to scale the system. Medusa Runtime Overhead on the Phone. Table 4.6 presents the break-down of delay on the Medusa runtime on a Nexus One phone. Retrieving a stage initiation message from Task Tracker involves merging multiple SMS messages and parsing the resulting XML data object. This takes 38.2ms of delay on average. The major component of delay on the phone is the time to instantiate the stage implementation binary (in our experiments, the stage implementation was cached on the phone), a step which takes 402.7ms. This overhead is imposed by Android; our stage implementations are Android package files and the Android runtime must unpack this and load the class files into the virtual machine. Stage Tracker then performs some bookkeeping functions before invoking the stage, which takes 6.2ms. When the stage finishes its execution, Stage Tracker performs additional bookkeeping before notifying the cloud runtime, and this takes 12.7ms. The total delay imposed by Medusa platform is only 459.8ms per stage, of which the dominant component is the stage binary initialization. On the timescale of expected task completion (e.g., 104 Time Task Initiation (20:00:15) User Joins the Task (20:01:08) Phone received a message (20:03:47) SMS is sent (20:01:11) SMS is re-sent (20:11:16) Phone received a message (20:11:41) Data is delivered (20:12:15) Data is uploaded (20:13:15) Turned off the Phone Timeout window 20:00 20:05 20:10 Figure 4.12: Failure recovery: turn off the phone in the middle several 10s of minutes in our experiments above), this represents negligible overhead. Furthermore, we are aware of methods such as binary alignment which will allow us to optimize stage loading, but have left it for future work. 4.5.4 Robustness Finally, we briefly demonstrate Medusa’s robustness. Failure Recovery. The Medusa runtime maintains internal timeouts to recover from transient errors caused by messaging failure, battery exhaustion, or user actions (e.g., accidentally turning off the phone). To demonstrate failure recovery, we conducted an experiment where we turned off the phone during the scanning stage of the WiFi scanner task. Then we turned on the phone again after a couple of minutes. Figure 4.12 shows the sequence of events. Medusa waits for a pre-defined timeout (10 minutes in our implementation) before restarting the scanner stage. This stage then completes execution, eventually re- sulting in task completion. This simple retry mechanism works well across any transient failure mainly because stage execution state is maintained on the cloud. Static Analysis on Stage Binary. To demonstrate the efficacy of using static analysis for sandboxing, we recruited three volunteers (none are involved in this work) to add malicious code to existing stage implementations. We then tested whether our analysis was able to deter such attacks, which included opening HTTP connections, invoking sleep operations, accessing the SD card, and so forth. Of the nine modifications made by the volunteers, our static analyzer caught 7 correctly (Table 4.7). The two attacks 105 Code Modification Result Make one function sleep for a long time REJECT Write a file to SD card REJECT Time string format translation REJECT Open HTTP connection REJECT Delete all files in SD card REJECT Vector operation ACCEPT Throws exceptions REJECT System.exit(-1) REJECT Recursive calls ACCEPT Fills the heap until memory is exhausted ACCEPT Table 4.7: The defense capability of static analyzer against arbitrary code modification. 0 5.36 1.58 14.43 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Elapsed Time (s) 1st stage 2nd stage started Finished Figure 4.13: Robustness: forcing a limit on computation that it did not catch targeted resource usage directly: infinite recursion, and attempts to exhaust the heap. These attacks require dynamic tracking: Medusa also has a dynamic resource tracking component, which we demonstrate next. Limiting Resource Usage. Medusa allows smartphone owners to place limits on the computation used by a single stage (on a per-stage basis) and the amount of data transferred per task instance. Using these controls, users can implement policies which, for example, apportion more resources to tasks with higher monetary reward. To illustrate how these limits are enforced, we created a single long-running stage, and concurrently ran two instances of that stage with limits of 5 and 10 seconds respectively. MedBox maintains a watch- dog timer, which, in our implementation, checks stage resource usage every three seconds. As shown in Figure 4.13, it successfully terminated our stages at 5.36 seconds and 12.85 seconds; given the granularity of our timer, stages may get bounded additional resources beyond their limits. This kind of mechanism can protect against CPU resource exhaustion attacks. 106 0 1000 2000 3000 0 10 20 30 40 50 60 70 80 90 KByte Elapsed Time (s) Budget 0 100 200 300 0 10 20 30 40 50 60 70 80 90 KByte Elapsed Time (s) Transferred Rejected Figure 4.14: Failure recovery: harnessing network data usage Medusa can also protect against excessive network usage per task instance. Since all data transfers must use MedBox’s data transfer library, that library can account for data transfers on a per task basis. We conducted an experiment in which a task instance attempted to upload 60 image files and was allocated a 3MB limit: MedBox successfully terminated the task (by notifying the Task Tracker) just before it reached the the 3MB limit (the transfer library transfers data in 200KB chunks, and checks whether the limit would be exceeded before transferring data) as depicted in Figure 4.14. 4.6 Conclusions Medusa is a programming system for crowd-sensing. The MedScript programming language provides high-level abstractions for stages in crowd-sensing tasks, and for control flow through the stages. Medusa supports specifying various forms of worker mediation in the sensing workflow. A runtime, partitioned between the cloud and smartphones, achieves a complex web of requirements ranging from support for incentives to user-specified controls on smartphone resource usage. Medusa task descriptions are concise, and the runtime system has low overhead. 107 Finally, our work suggests several future directions, including: gaining deployment experience with more crowd-sensing tasks; reducing the overhead of the Medusa runtime and the latency of its SMS- based notification; replacing the XML-based specification language with a visual paradigm using Open- Blocks [113] and AppInventor [126]; using Medusa a distributed programming framework that crowd- sources computations to mobile devices. 108 Chapter 5 SALSA: Energy-Delay Tradeoffs in Smartphone Applications Many applications are enabled by the ability to capture videos on a smartphone and to have these videos uploaded to an Internet-connected server. This capability requires the transfer of large volumes of data from the phone to the infrastructure. Smartphones have multiple wireless interfaces – 3G/EDGE and WiFi – for data transfer, but there is considerable variability in the availability and achievable data trans- fer rate for these networks. Moreover, the energy costs for transmitting a given amount of data on these wireless interfaces can differ by an order of magnitude. On the other hand, many of these applications are often naturally delay-tolerant, so that it is possible to delay data transfers until a lower-energy WiFi connection becomes available. In this chapter, we present a principled approach for designing an optimal online algorithm for this energy-delay tradeoff using the Lyapunov optimization framework. Our algo- rithm, called SALSA, can automatically adapt to channel conditions and requires only local information to decide whether and when to defer a transmission. We evaluate SALSA using real-world traces as well as experiments using a prototype implementation on a modern smartphone. Our results show that SALSA can be tuned to achieve a broad spectrum of energy-delay tradeoffs, is closer to an empirically-determined optimal than any of the alternatives we compare it to, and, can save 10-40% of battery capacity for some workloads. 109 Figure 5.1: Urban Tomography System 5.1 Introduction As video-enabled smartphones become more prevalent, many new and interesting applications will be enabled. Our Urban Tomography system [124, 71] is a good example. It allows a user to capture video clips, and then automatically uploads them in the background to a server. The system has been operational for over a year and has found several, qualitatively different, uses. A team of security officials, equipped with smartphones, has been using it for surveillance at a large transportation hub in Los Angeles. The team is able to visually document parts of the facility not covered by fixed cameras, is able to provide in situ views of developing situations, and, because the videos are automatically uploaded to a server, the team’s supervisors are able to accurately assess a developing situation. A company that specializes in behavior analysis of developmental disabilities in children has also been piloting the system. Their mobile childcare specialists visit area schools, and record the behavior of children for analysis by parents and medical experts. A professor of public planning and her students have used our system to document construction in post-Katrina Mississippi, with the goal of evaluating zoning regulations and revising existing ordinances. 110 These, and other, users have generated a corpus of over 5000 videos. Figure 5.1 presents a screenshot of the system’s Web interface, showing some user-generated video-clips from our users. Our users report that battery lifetime is a critical usability issue, and video uploads use a significant fraction of the energy in our system. This chapter explores robust methods for reducing this cost. Recent smartphones have multiple wireless interfaces – 3G/EDGE (Enhanced GPRS) and WiFi – that can be used for data transfer. These two radios have widely different characteristics. First, their nominal data rates differ significantly (from hundreds of Kbps for EDGE, to a few Mbps for 3G, to ten or more Mbps for WiFi). The achievable data rates for these radios depends upon the environment, can vary widely, and are sometimes far less than the nominal values. Second, their energy-efficiency also differs by more than an order of magnitude [14, 17]. While the power consumption on the two kinds of radios can be comparable, the energy usage for transmitting a fixed amount of data can differ an order of magnitude or more because the achievable data rates on these interfaces differ significantly. Finally, the availability characteristics of these two kinds of networks can vary significantly. At least as of this writing, the penetration of some form of cellular availability (EDGE or 3G) is significantly higher than WiFi, on average. A similar observation has been made in [109] where the authors report 99% and 46% experienced availability, respectively, in their traces for EDGE and WiFi. Thus, uploading or downloading large data items using WiFi can be more energy-efficient than using the cellular radio, but WiFi may not always be available. Fortunately, many uses of video capture are naturally delay-tolerant, to differing degrees, so that it is possible to delay data transfers until a lower-energy WiFi connection becomes available. In general, our users would like captured videos to appear on the server “as quickly as possible” (so that they, or their colleagues or supervisors, can quickly review the captured video), and are willing to tolerate some delay in upload in exchange for high-quality video capture and extended phone lifetime. However, different users have different delay tolerances: surveillance experts can be, depending on the situation being monitored, less tolerant of delay than behavioral analysts or public policy experts. This chapter explores this energy-delay trade-off in delay-tolerant, but data-intensive, smartphone ap- plications. The example in Figure 5.2 illustrates this trade-off. The topmost plot in the figure depicts a 111 Figure 5.2: Example scenario in an urban environment where the availability and the achievable data transfer rate over three different wireless networks – EDGE, 3G, and WiFi – varies with time (each tick on the x-axis marks a 30 seconds interval). In this example, EDGE is always available but can only support 10 KB/s data rate. WiFi APs are available over 3 short time periods and provide 200 KB/s data transfer rate in two of those periods but only 50 KB/s in the other. Finally, 3G is available for the similar duration of time as WiFi but at different times, and provides a lower data rate (40 KB/s). Application data arrives at time t = 0 and t = 300s as video files with size equal to 5 MB each. Suppose that the power consumption of the 3G/EDGE and the WiFi interface on the smartphone is 1W (this roughly matches our measurements on the Nokia N95 smartphone). In Figure 5.2, we depict the data transmission decisions of three different data upload decision strate- gies, and their performance in terms of total energy consumption on the smartphone and the delay in uploading the data. Whenever data is available for upload, the Minimum-delay strategy selects the link with the fastest data transfer rate whereas the Always-use-WiFi strategy uploads data using only WiFi APs. For comparison, we also show the Energy-Optimal decision strategy that would result in minimum energy 112 consumption in this scenario. We can see from Figure 5.2 that the Minimum-delay algorithm achieves the smallest delay but consumes (almost) 2.5 and 5 times more energy than the Always-use-WiFi and the Energy-optimal strategies, respectively. Hence, in this scenario, delaying data upload to avoid using 3G/EDGE networks leads to significant energy savings at the expense of 1-1.5 minutes of additional delay. However, reducing the energy consumed in data transfer is not simply a matter of choosing WiFi over 3G/EDGE. The Energy-optimal strategy consumes only half as much as energy as the Always-use-WiFi strategy by not using the (poor quality) WiFi AP with 50 KB/s rate at the expense of only slightly higher delay. The previous example illustrates several decisions involved in managing data intensive and delay- tolerant smartphone applications in an energy-efficient manner. How long should the system wait before using the energy expensive but nearly ubiquitous cellular network? If several WiFi APs are available, which AP should it choose? How can the system estimate the quality of a new WiFi AP? At their core, all these decisions involve an energy-delay tradeoff. The problem we consider in this chapter is the design of an algorithm for making this energy-delay tradeoff. More precisely, the problem can be formulated as a link selection problem (S 5:2 ): given a set of available links (cellular, WiFi access points), determine whether to use any of the available links to transfer data (and, if so, which), or to defer a transmission in anticipation of a lower energy link becoming available in the future, without increasing delay indefinitely. Because it trades off delay for energy, the link selection problem can be naturally formulated using an optimization framework. Contributions. In this chapter, we present a principled approach for designing an online algorithm for this energy-delay tradeoff using the Lyapunov optimization framework [55, 93]. We formulate the link selection problem as an optimization formulation which minimizes the total energy expenditure subject to keeping the average queue length finite. The Lyapunov optimization framework enables us to design a control algorithm, called SALSA (Stable and Adaptive Link Selection Algorithm), that is guaranteed to achieve near-optimal power consumption while keeping the average queue finite (S 5:3 ). Specifically, 113 we show that, in theory, SALSA can achieve power consumption arbitrarily close to the optimal. To our knowledge, prior work has not explored this link selection problem, and our use of the Lyapunov framework for solving this problem is also novel (S 6:4 ). Our second contribution is an exploration of two issues that arise in the practical implementation of SALSA. First, although control algorithms based on the Lyapunov framework have a single parameter V , the theory does not give any guidance on how to set that parameter V . We design a simple but effec- tive heuristic for a time-varying V , which allows users to tune the energy-delay tradeoff across a broad spectrum. Second, SALSA requires an estimate of the potentially achievable transmission rate on avail- able link, in order to make its control decision. We devise a hybrid online-offline estimation mechanism that learns link rates with use, but uses an empirically derived mapping between an RSSI reading and the average achieved transfer rate during the learning phase. Our third contribution is an extensive trace-driven evaluation of SALSA using video arrivals from users of our Urban Tomography system and link arrivals obtained from three different locations in the Los An- geles area. Our trace-based simulations show that SALSA, which makes its transmission decisions based on three factors, transmission energy, the volume of backlogged data, and the link quality is significantly better than other alternatives that do not incorporate all of these factors in their decisions. Moreover, SALSA’s energy-delay tradeoff can be tuned across a wide spectrum using a single parametera. Finally, SALSA can save between 10 and 40% of the total energy capacity of a smartphone battery, relative to a scheme that does not tradeoff increased delay, on many of our video traces. Finally, we validate our trace-based simulations using extensive experiments on a SALSA implemen- tation as part of a video transfer application on the Nokia N95 phones. Our experimental results are strikingly consistent with our trace-based results, suggesting that our conclusions are likely to hold in real-world settings. 114 5.2 Problem Statement, Model and Objective To precisely describe the problem we consider in this chapter, let L[t] denote the set of links visible to a smartphone at time t. A link denotes a cellular radio connection (EDGE, 3G or other standard, depending upon the carrier) or a connection to a visible WiFi access point (AP). In general, current smartphone software does not provide applications with the ability to select between different visible cellular radio networks, or control which cell tower to associate with, so we do not assume this capability. However, it is possible, at least on certain smartphone operating systems, to select a WiFi AP for data transfer. L[t] is time-varying: as the user moves, the availability of cellular connectivity will vary, as will the set of visible WiFi APs. The problem we consider in this chapter is the link selection problem: if at time t, the smartphone has some data to upload, which link in L[t], if any, should it select for the data transfer so as to conserve energy? Our goal in the chapter is to design a link selection algorithm that solves the link selection problem. One important feature that distinguishes our work from prior work is that the link selection algorithm can choose to defer the transfer in anticipation of a future lower energy transmission opportunity. Thus, our link selection algorithm trades off increased delay for reduced energy. Because different applications may have different delay tolerances, our link selection algorithm must provide the ability to control the trade- off. The link selection problem can be naturally formulated using one of many optimization frameworks. The formulation we choose is based on the following intuition. Suppose that the application data generated on the smartphone is placed in a queue. For delay tolerant applications, it might be acceptable to hold the data in the queue and defer transmission in anticipation of a lower energy link becoming available in the future, but not indefinitely. In other words, as the queue becomes longer, it may reach a point where it may no longer be appropriate to trade-off additional delay for energy. One natural optimization formulation that arises from this intuition is to minimize the total energy expenditure subject to keeping the average queue length finite. 115 It is this formulation we adopt in the chapter, and we introduce a model and associated notation to formally state the optimization objective and constraint. Our model provides a framework for the design and analysis of our online interface selection algorithm, discussed in S 5:3 . For ease of exposition of the model, we assume that time is slotted; our model and algorithm can easily be generalized to the continuous time case (indeed, our implementation, described in S 5:5 , assumes continuous time). Let A[t] represent the size of video data in bits generated during time slot t. A[t] represents the arrival process, and we model it as a discrete random variable. We denote by P[t] the power consumption due to data transmission during the t-th time slot. P[t] is zero if the link selection algorithm chooses to defer transmissions during this time slot. If the algorithm chooses a cellular link, P[t] is P C , and if it chooses a WiFi link, P[t] is P W . More generally, the framework we discuss below is capable of incorporating transmit power control, but since smartphones do not support that capability, we have not incorporated it. Letm[t] denote the amount of data transferred during timeslot t. This value depends on several factors. First, m[t]> 0 only if our interface selection algorithm decides to transmit data during slot t; it is zero otherwise. If m[t]> 0, then it also depends on the following factors: (i) the quality of the link selected for data transfer, (ii) the transmit power, and (iii) the amount of data available for transmission. As we have discussed above, video data generated for uploads are queued awaiting transmission. Let U[t] denote the queue backlog (number of bits in queue) at the beginning of timeslot t. For a link l2 L[t], let S l [t] denote the quality of the wireless link. We model S l [t] as a random variable that takes values from a finite setS according to probability distribution p s for all t. We model m[t] as the random output of a function as defined next. m[t] M = C(I[t];l;S l [t];U[t];P[t]) (5.1) where I[t] is an indicator random variable that is equal to 1 if the smartphone decides to transmit data during slot t and 0 otherwise. If I[t]= 0, the smartphone does not transmit during slot t (regardless of the other inputs l, S l [t], U[t], etc.). l denotes the link selected for transmission and S l [t] denotes the quality of 116 link l during slot t. Since U[t] denotes the queue backlog at the beginning of slot t, we have m[t] U[t] always. P[t] denotes the transmit power. Over time, the queue backlog evolves as follows: U[t+ 1]= U[t]m[t]+ A[t] (5.2) where m[t] (defined in (5.1)) is the amount of data transferred during timeslot t, and A[t] is the application data added to the queue during slot t. Given this notation, we are now ready to formally state the queuing constraint we impose on our link selection algorithm, called stability. We define the queue U[t] to be stable if: U = limsup t!¥ 1 t t1 å t=0 EfU[t]g<¥ (5.3) The stability constraint ensures that the average queue length is finite. Under this constraint, we seek to design a link selection algorithm that minimizes the time average transmit power expenditure, defined as: P= limsup t!¥ 1 t t1 å t=0 EfP[t]g<¥ (5.4) where P[t]2f0;P C ;P W g depending on the link selected for transmission during slott. 5.3 The Link Selection Algorithm In this section, we describe our link selection algorithm. This algorithm is designed using the Lyapunov optimization framework [55, 93], and has the property that it is guaranteed to be stable, and can provide near-optimal energy consumption even with varying channel conditions, under some idealized assump- tions. Accordingly, we call our algorithm SALSA (Stable and Adaptive Link Selection Algorithm). We 117 first present SALSA, briefly describe its design using the Lyapunov framework and state its performance properties, and finally discuss its practical application to a real-world system. 5.3.1 SALSA SALSA decides, every timeslot t, whether to transmit data from its queue, and which (if any) of its available links to use. To do this, it observes the amount of new application data A[t] and its current queue backlog U[t]. For a parameter V > 0 (we describe later how to select this parameter), it chooses a link ˜ l[t] for data transfer during timeslot t as follows: ˜ l[t]= argmax l2 L[t][/ 0 (U[t]Efm[t]j l;S l [t];P l [t]gV P l [t]) (5.5) where ˜ l[t]= / 0 represents both the cases – when no link is available or when the smartphone chooses not to use any of the available links.Efm[t]j l;S l [t];P l [t]g is an estimate of the transfer rate that can be achieved on link l, given the current channel condition S l [t] and the transmit power P l [t]. In a later section, we discuss how to estimate this value. To understand the intuition behind this control decision, consider a specific WiFi link l such that P l [t]= P W . If V is fixed, this control decision chooses link l only when either the queue backlog U[t] is high or the available rate on link l is high. Thus, the algorithm implicitly queues data for “long enough” or sends if it sees a good quality link. When P W is higher, the bar for transmission is automatically raised. Of course, the performance of this algorithm critically depends upon the choice of V , and we discuss this later. SALSA may decide not to use any of the available links if and only if U[t]Efm l [t]j l;S l [t];U[t];P l [t]g V P l [t]< 0 for all l2 L[t]. Such a situation will typically arise if the data transfer rate to all the available links is small, either because the nominal rate of the link is small, or the effective transfer rate is small as a result of poor channel conditions. 118 5.3.2 Theoretical Properties of SALSA We have formally derived SALSA’s control decision (5.5) using the Lyapunov optimization framework [55, 93]. This framework enables the inclusion of optimization objectives – energy expenditure, fairness, throughput maximization etc. – while designing an algorithm to ensure queue stability using Lyapunov drift analysis. Lyapunov drift is a well-known technique for designing algorithms that ensure queue sta- bility. The technique involves defining a non-negative, scalar function, called a Lyapunov function, whose value during timeslot t depends on the queue backlog U[t]. The Lyapunov drift is defined as the expected change in the value of the Lyapunov function from one timeslot to the next. The Lyapunov optimization framework guarantees that control algorithms that minimize the Lyapunov drift over time will stabilize the queue(s) and achieve near-optimal performance for the chosen optimization objective – for SALSA, power consumption. We have discussed the derivation in Appendix C.1. Our derivation is similar to that of other opti- mization formulations that use the framework [93], but, to our knowledge, we are the first to apply this framework to the link selection problem defined in Section 5.2. It is possible to derive an analytical bound on the time average power consumption achieved by SALSA compared to an optimum value. We state the following theorem, and prove it in Appendix C.2: Theorem 1 Suppose the arrival process A[t] and the channel states are i.i.d. across timeslots with distribu- tions p A andp s , respectively. We assume that the data arrival ratel is strictly within the network capacity region. For any control parameter V > 0, SALSA achieves a time average power consumption and queue backlog satisfying the following constraints: P = limsup t!¥ 1 t t1 å t=0 EfP[t]g P + B V (5.6) U = limsup t!¥ 1 t t1 å t=0 EfU[t]g B+V P e (5.7) 119 where e > 0 is a constant meaning the distance between arrival pattern and the capacity region boundary, P is a theoretical lower bound on the time average power consumption, and B is an upper bound on the sum of the variances of A[t] and m[t] (each of which is assumed to have finite variance). The theorem shows that SALSA can achieve an average power consumption P arbitrarily close to P (with a corresponding delay trade-off) while maintaining queue stability. However, this reduction in power consumption is achieved at the expense of a larger delay because the average queue backlog U grows linearly in V . This [O(1=V);O(V)] trade-off between power consumption and delay is a fundamental aspect of all control algorithms designed using the Lyapunov optimization techniques [55]. Moreover, this trade-off does not assume prior knowledge of the distributions of the stochastic processes A[t] (data arrival) and S l [t] (link quality), merely that the variances of the arrival process and the transfer rates are finite. 5.3.3 Practical Considerations for SALSA The SALSA algorithm discussed above is idealized in several respects. It uses fixed timeslots, assumes that the available rate on a link m l [t] is known a priori, and does not specify how to select the parameter V . When implementing it in practice, it is easy to change the fixed timeslot assumption and invoke the control decision whenever data is inserted into the queue or a new link becomes visible. In this section, we discuss how to deal with the other idealizations. Choosing a “good” V . In general, the Lyapunov optimization framework is elegant because its control algorithms depend on a single parameter V . However, the framework itself does not give any guidance on parameter selection. Intuitively, V can be thought of as a threshold on the queue backlog beyond which the control algorithm decides to transmit ((5.5)), so V controls the energy-delay tradeoff. Most existing work in this area chooses not to address the parameter selection issue explicitly, and simply explores the sensitivity of their results to the choice of parameters. However, since we are interested in implementing a system based on this framework, we need to explicitly address parameter selection. One obvious choice is to estimate the parameter V online: as the 120 system runs, we can adapt V (e.g., using a binary search) to find a setting where the energy delay trade-off is optimal. This can take a long time to converge, since at each step we would have to have run the system long enough for the average queue length to have converged. We design a technique to determine the value of V automatically with two goals in mind. Our first goal is to pick a V value that achieves good power consumption vs. delay trade-off. The second one is to enable some degree of explicit control over the energy-delay tradeoff — recall from S 5:1 that different video capture applications have different delay tolerances. To identify a good V value, observe that the upper bound on the time average power consumption from (5.6) is proportional to 1=V . Based on this, we make a simplifying assumption that the actual time averaged power consumption P P + B=V ((5.6)). Since P is a constant, P is a hyperbolic function that exhibits diminishing returns, beyond a point, in energy reduction with increasing V . Thus, a good operating point would be to pick a V value where a unit increase in V yields a very small reduction in P. At this point, the energy gains may not be worth the delay increase resulting from increasing V (since delay is proportional to V ). More formally, we can choose an a > 0 that satisfies the following equation (a is the slope of P curve): d(P + B=V) dV = B V 2 =a =) V = r B a (5.8) In setting V according to (5.8), we need to determine the value of the constant B, which involves estimating the variance of the arrival process A[t], and the transmission process m[t]. SALSA computes B based on all the A[t] and m[t] values observed over some large time window. It initializes V = 0 and then updates its value according to (5.8) whenever the estimate for B is updated. 121 To achieve our second goal, we adapt V to the instantaneous delay in data transfer using an application- specified parameter. Such a mechanism enables a smartphone application to express its delay-tolerance. Rather than use a fixed V during each timeslot, SALSA modifies (5.8) as follows: V[t]= s B[t] a(D[t]+ 1) a (5.9) where D[t] denotes the instantaneous delay in data transfer (i.e., the time that the bit at the head of the queue has been resident in the queue) measured at the beginning of timeslot t. Note that the upper bound B now becomes B[t]. However, the intuition is simple: as data stays longer in the queue, V(t) decays (at a rate determined by a, which can be controlled by the application) until it becomes low enough to trigger a transmission by (5.5). Hence, SALSA reacts to an increase in instantaneous delay by trying to transmit data whenever an access point is available. While this reduces the delay in data transfer, it can result in higher power consumption as SALSA may select an access point for data transfer that is not energy- efficient instead of delaying transmissions till an access point with high data transfer rate appears. Thus, applications that can tolerate delay and would prefer to maximize energy savings can set a close to zero, while less delay-tolerant applications can seta to be larger at the expense of energy usage (we explore the behavior of the algorithm to differenta values in S 5:4 ). Note that the parameter a appears twice in the denominator in (5.9) – as an multiplicative term and also as the exponent of (D[t]+1). We needa as a multiplicative term in order to get a “good” V value when D[t]= 0 (in which case (5.9) reduces to (5.8)). Instead of using a different parameterb as the exponent of (D[t]+ 1) in (5.9), we chose to use a in order to have only one free parameter in SALSA. As we show in S 5:4 , a single parameter is sufficient to explore a range of delay-tolerances. The bounds on average power consumption and average queue size in (5.6)-(5.7) hold when V[t]= V for all t. For the case of time varying V values, it is difficult to derive similar bounds. However, we can easily see that, compared to the case of a fixed V , SALSA with time-varying V values achieves smaller average queue backlog (hence, smaller average delay) at the expense of higher average power consumption. 122 That is because the instantaneous-delay based term triggers transmissions earlier in SALSA with time- varying V than in SALSA with fixed V , at the possible cost of increased energy incurred by transmitting on a less-than-optimal link. Rate estimation. In practice, the transfer rate on link l during slot t, m l [t], may not be known. SALSA uses a combination of offline and online estimation. In online rate estimation, as the smartphone uses each link l, SALSA computes m l [t] as the average rate achieved over the last, say, 10 uses of link l for data transfer. This windowed average, because it is specific to a link, can be accurate but would require several uses of a link before a reliable estimate could be found. Until a reliable estimate is available, SALSA uses results from an offline rate estimation technique that samples several access points to obtain a distribution of achievable rates. There are many ways of doing this, but the simplest (and the one we use), estimates the distribution of achievable transfer rates as a function of the Received Signal Strength Indicator (RSSI) for a given environment. SALSA simply derives a rate estimate from this distribution for each link, based on its RSSI. Admittedly, this is a very coarse characterization, since data rates are only partially dependent upon RSSI. However, as we show in this chapter, even this rough estimate results in excellent SALSA performance. 5.3.4 Extensions SALSA is also flexible enough to accommodate extensions that may be desirable for smartphone appli- cations. We now discuss two such extensions, but have left their evaluation to future work. In addition to these, SALSA can be extended to accommodate prioritized data transmissions, or bounds on average power consumption. We have omitted a discussion of these for lack of space. SALSA for download. SALSA can also be extended for link selection for data downloads. Many applica- tions can live with a delay-tolerant download capability. Such applications download, in the background, large volumes of data (e.g., videos, images, maps, other databases) from one or more Internet-connected 123 servers in order to provide context for some computation performed on the phone. A good example is Sky- hook’s [119] WPS hybrid positioning service, which prefetches relevant portions of precomputed hotspot location database. To get SALSA to work for such applications, we need to change the definition of A[t] and U[t]. Specifi- cally, we define A[t] as the size of the request by an application during timeslot t, and U[t] as the backlog of content that has not been downloaded yet. In applying SALSA to the download scenario, we assume that it is possible to know the size of the content requested by an application prior to downloading the content. This is certainly feasible for static content hosted by a server, and for dynamically generated content for which the server is able to estimate size. SALSA for peer-assisted uploads. In a peer-assisted upload, data is opportunistically transferred to a peer smartphone with the expectation of reducing the latency of upload. In general, peer-assistance will require the right kinds of incentives for peers to participate. However, for certain cooperative participatory sensing campaigns, where a group of people with a common objective collectively set out to gather information in an area, peer-assisted uploads are a viable option to increasing the effective availability of network connectivity. For the peer-assisted upload case, we can model the connection to the peer as a link. The important change is that the achievable rate m l [t] of this link takes into account an estimate of the upload delay (the time when the peer expects to meet a usable link). When a smartphone meets a peer, it queries the peer to get an estimate ofm l [t] on the link l between them. The peer computes this quantity by estimating the time that it is likely to meet the next AP (say t m ), and the achievable rate r to that AP. It advertises m l [t] as r t m , which is an estimate of the effective data rate that would be observed by a transfer handed-off to this peer. Recent work [95] suggests that it might be possible to accurately forecast r, and t m can be estimated using GPS and trajectory prediction. 124 5.4 Evaluation In this section, we present our evaluation of SALSA using trace-driven simulations. We motivate and describe our methodology, then discuss our results. We have also implemented SALSA on the Nokia N95 smartphone as part of the Urban Tomography system (S 5:1 ): in the next section, we use this implementation to validate our simulation results. 5.4.1 Methodology Overview. In our evaluation of SALSA, we are interested in two questions: How does SALSA perform over a wide range of scenarios? How does it compare to other plausible link selection algorithms? The performance of SALSA (or any other algorithm) depends upon two characteristics: the arrival process A[t], and the time variation in link quality and availability as defined by m l [t]. To understand the performance of SALSA over a wide range of arrival processes and link availability and quality charac- teristics, we use trace-driven simulation, with arrival traces derived from users of our urban tomography system in real-world settings, and link availability traces generated empirically by carrying a smartphone on a walk across different environments. We describe the methodology in detail below. We also compare SALSA against two baseline algorithms, one which attempts to minimize delay and the other which always uses WiFi to conserve energy, as well as two other threshold-based algorithms. We begin a detailed discussion of the simulator and traces that we use. Then, we discuss the alternative strategies we use for comparison. We conclude this section with a discussion of our metrics. Simulator Details. We wrote a custom simulator to explore the performance of SALSA and compare with other algorithms. Our simulator allows us to explore the impact of different application data arrival patterns and link availability characteristics on an algorithm’s performance. It also enables us to charac- terize the effect of our heuristic for determining the V value and our rate estimation scheme on SALSA’s performance. 125 0 5 10 15 20 25 30 35 40 45 0 50 100 150 Dist. of the number of videos Arrival Trace Number of Videos 0 20 40 60 80 100 120 0 0.2 0.4 0.6 0.8 1 MBytes Fraction Dist. of video sizes(CDF) Figure 5.3: Arrival Patterns Our simulator takes three different inputs – (1) the power consumption of the different radio interfaces on the smartphone, (2) the application data arrival patterns, and (3) the link availability. All our simulation results are for a timeslotted system with 20-second time slots. Based on our measurements on the Nokia N95 smartphones, we set the transmit power consumption of the 3G/EDGE interface and the WiFi interface to 1.15 W and 1.1 W, respectively. We assume that interfaces are briefly turned on at the beginning of each timeslot to check for availability; only the radio selected for the transfer (if any) is kept on for the duration of the transfer. We ignore the energy cost of checking for availability (which may include the cost of scanning for access points): relative to the large volumes of data we transfer, this cost can be made negligible by tuning the frequency of scanning, as we show later in this section. Furthermore, all algorithms are more or less equally affected by this simplification, and since we compare algorithms, we do not expect the relative performance of these algorithms to change significantly if these costs were taken into account. Our simulator uses two kinds of traces: an arrival trace and a link trace. An arrival trace captures a data arrival pattern, and consists of a timestamped sequence of video arrivals. We use a total of 42 arrival patterns (consisting of a total of 935 videos), derived from actual use of the Urban Tomography system. In that system, users can create “events” that mark a collection of 126 0 50 100 150 200 250 300 0 0.2 0.4 0.6 0.8 1 Avg. Rate(KB/s) Fraction WiFi 0 50 100 150 200 250 300 0 0.2 0.4 0.6 0.8 1 Avg. Rate(KB/s) Fraction 3G/EDGE 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 Failure(%) WiFi−Failure 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 Failure(%) 3G/EDGE−Failure USC Mall LAX Figure 5.4: (CDF) Link availability with failure probability related videos (usually representing the documentation of a real-world event, such as a commencement ceremony, or a business trip). Each arrival trace is generated from one event. Arrival traces have widely varying characteristics; for example, Figure 5.3 shows the distribution in the number and total size of videos across different traces. Each link trace is a timestamped sequences of available APs (3G/EDGE and WiFi) together with data transfer rates. We collected link traces while we were experimenting (at different times over several months) with our system at several different locations – the USC campus, a large shopping mall near Los Angeles (Glendale Galleria), and the Los Angeles International Airport (LAX). We collected 38 traces on the USC campus, 24 traces at the Glendale Galleria, and 4 traces at LAX. At these locations, WiFi (specifically 802.11b/g) is available to different extents. On the USC campus, WiFi is deployed across most of the campus, and is freely available to registered clients. The Glendale Galleria has a few open WiFi hotspots. At the LAX airport, we purchased four T-Mobile hotspot accounts and scripted the login procedure so that association with those hotspots does not require manual intervention. Our link traces are collected by walking in the corresponding environment for an hour or more with a smartphone which periodically scans for APs, records the SSID (or cell tower ID) and the RSSI value of the available APs, and estimates the data transfer rate for these APs by uploading test data. The left 127 column in Figure 5.4 is a CDF of the average transfer rate per 20-second window (the timescale at which our link selection algorithm works) observed at different locations. For each trace, we divide entire time duration into 20-second windows, associate all available APs with the corresponding windows and then compute the time average data transfer rate per trace. Thus, in close to 20% of the 38 traces collected on the USC campus, we encountered WiFi APs with average rate better than 100 KB/s. From these figures, it is clear that the WiFi environment at our three locations varied widely: the USC campus has more dense and perhaps faster WiFis compared to LAX and the Glendale Galleria. On the other hand, the performance of the 3G/EDGE network is roughly the same at all the three locations. The average transfer rate alone is a little bit misleading, because often our TCP-based data transfers fail. During our trace collection, we also record instances of upload failure for each AP. Specifically, for each trace, we compute the number of failed attempts and divide it by the total number of data transfer attempts to compute the failure rate associated with each AP. Figure 5.4 (right column) shows the CDF of failure rates for the traces collect at each of three locations. Note that around 20% of the traces collected at the Glendale Galleria and the LAX have a failure rate of more than 60% for WiFi APs associated with them! Interestingly, the 3G/EDGE network traces collected at USC contained higher instances of failures compared to the traces from the Glendale Galleria and the LAX. Non-trivial failure rates at these environments implies that it is important to incorporate such failures (and the energy cost associated with them) in our simulations, and we do. A single simulation run uses one arrival trace and one link trace as input. Each simulation run lasts until either all the data is uploaded from the smartphone or 50,000 slots (equivalent to about 12 days) have elapsed. Overall, we have 2,772 simulation runs for each algorithm we evaluate (see below). The failure rate associated with each trace is used to model data transfer failures in our simulations as follows. For a simulation run which uses a link trace with an associated failure rate p, we assume that data transfer failures are i.i.d. Bernoulli random variables with parameter p. Our failure model provides a simple abstraction to capture the variability in instantaneous data transfer m[t] that our analytical model allows for (refer to (5.1)). 128 Finally, depending on the number and quality of links available in the trace as well as arrival patterns in the arrival trace, the time needed to upload all the data can be quite large. However, our longest link trace is close to 3 hours long. In order to complete a simulation run that is longer than its corresponding link availability trace, we continuously repeat the trace. In this way, an arrival pattern sees the variability associated with a particular environment, but repetitively: this methodology allows us to explore longer arrival patterns, while still subjecting uploads to link availabilities derived from real environments. Comparison. We compare SALSA’s performance against four different link selection algorithms – MINIMUM- DELAY, WIFI-ONLY, STATIC-DELAY, and KNOW-WIFI. The MINIMUM-DELAY algorithm always transfers data when an AP is available. It never considers the energy cost of using an AP, and is designed to minimize the amount of time application data is buffered on the smartphone awaiting transmission. The WIFI-ONLY algorithm uses only WiFi APs. This algorithm is motivated by the observation that data transfer using WiFi APs is much more energy-efficient compared to using the 3G/EDGE network. Hence, it aims to minimize the energy consumption, and is oblivious to the delay in data transmission. The STATIC-DELAY algorithm attempts to achieve an energy vs. delay trade-off using the following heuristic: it waits for a WiFi AP to become available for up to a (configurable) period of T timeslots from the creation time of the corresponding file, and if it encounters a WiFi AP within this period, it uses it. In the event that it has not seen any WiFi AP in the past T timeslots, it uses the first link that becomes available (whether 3G/EDGE or WiFi). Thus, this algorithm behaves like WIFI-ONLY for up to T timeslots, and then starts behaving like MINIMUM-DELAY. The parameter T controls the energy vs. delay trade-off. Ideally, it should depend on an application’s delay-tolerance, and the availability of WiFi APs. Without detailed information about WiFi availability, a big challenge in using this algorithm is to determine the parameter T . Finally, the KNOW-WIFI algorithm assumes information about the availability of WiFi APs in the future. It is therefore an idealized algorithm although it may be possible to estimate this availability as described 129 in [95], or determine it based on user input (for example, when a user knows the when she is going to have access to a good WiFi AP, such as at home, work, or a coffee shop) in advance. It checks for the availability of a “good” WiFi AP within the next T timeslots. We define a good WiFi AP as one that has a data transfer rate at least twice the maximum achievable 3G/EDGE rate, obtained from the corresponding link trace. If such an AP exists (i.e., the user will encounter it within the next T timeslots), the KNOW- WIFI algorithm waits until it can use that AP, and then transfers as much data as possible using it. It then resets the maximum wait period for a good AP back to T timeslots. In situations where the KNOW-WIFI algorithm knows that no good AP will appear within the next T timeslots, it behaves like the MINIMUM- DELAY algorithm, and starts using any available link. Apart from the fact that this algorithm requires knowledge of WiFi APs available in future, another practical challenge is determining the right value for T . Performance metrics. Finally, we analyze the performance of each algorithm using a novel approach that attempts to characterize the macroscopic performance of each algorithm across all our simulation runs. At the end of each simulation run, we first derive two metrics for each link selection algorithm: (1) the average energy consumed per byte (E), and (2) the average delay per byte D. Consider a VCAPS based application that generates N videos with size S 1 ;:::;S N bytes. Let E i and D i denote the total energy consumed and the delay, respectively, in transmitting the video of size S i . We define the average energy consumed per byte, E, and the (weighted) average delay per byte, D, as follows. E = å N i=1 E i å N i=1 S i ; D= å N i=1 (D i S i ) å N i=1 S i (5.10) Each simulation run results in a point on the E-D plane. The convex hull of all the points for a given algorithm in the E-D plane represents its envelope of performance. We present examples of envelopes and discuss desirable properties in S 5:4:2 . In that section, we also compare the envelopes of performance of different algorithms. 130 Figure 5.5: MINIMUM-DELAY vs WIFI-ONLY vs SALSA We also use another metric, called dispersion, to characterize how far off each algorithm is, on aver- age, from an idealized optimal. For each pair of arrival trace and link trace, we can compute the minimum achievable energy per byte (E m ) and the minimum achievable delay per byte (D m ), if each were separately optimized (instead of jointly, as SALSA does). Specifically, E m is the energy per byte used if all data were transmitted using the highest rate link in a trace. Similarly, D m is the delay per byte incurred using MINIMUM-DELAY, assuming no transmission failures. For(E m ;D m ) pair, we can also obtain for each al- gorithm, the Euclidean distance on the normalized ED plane between the achieved(E;D) and(E m ;D m ). In general, the latter point may not be achievable by any algorithm that trades-off delay for reduced energy, but it represents a lower bound. For a given algorithm, the average distance of each simulation run from the corresponding “optimal”, across all simulation runs, is defined to be the dispersion of the algorithm. 5.4.2 Performance Results Performance against Baseline Algorithms. We first compare SALSA against the two baseline algo- rithms, MINIMUM-DELAY and WIFI-ONLY. Figure 5.5 plots the performance of each of these algorithms on the E-D plane. For SALSA, we use an a of 0.2: we later explore the performance of SALSA across a range ofa values. As discussed in S 5:4:1 , each point on the E-D plane corresponds to one simulation run. One way of characterizing the overall performance of the algorithm is to understand the shape of its envelope: the 131 convex hull of all the points on the E-D plane. For the class of algorithms that make an energy delay trade- off, what characterizes a good envelope? Intuitively, a good algorithm should be capable of achieving a “good” balance between energy and delay: neither the delay per byte, nor the energy per byte should be too large. In other words, the points on the E-D plane should be clustered around the origin, and the envelope should be compact. We use this intuition to compare different algorithms throughout this section. In Figure 5.5, MINIMUM-DELAY exhibits low average delay but, its energy performance is spread out over a relatively wide range. This is expected: MINIMUM-DELAY does not attempt to explicitly trade-off delay for energy. Moreover, MINIMUM-DELAY does not take channel capacity into account, so it can incur more transmission failures and, as a result, more energy. At the other end of the spectrum, WIFI-ONLY exhibits an average delay spanning the whole range of D values. WIFI-ONLY’s performance is highly dependent upon the availability of high-quality WiFi APs. In some of our traces (especially the Glendale one), good WiFi APs are few and far between, so WIFI-ONLY can incur significantly high delay. In some traces where there is no usable WiFi, WIFI-ONLY has infinite average delay values: these are omitted in the figure. WIFI-ONLY’s energy performance is also poor: in some of our link traces, the achievable rate with WiFi varies significantly (Figure 5.4), and is sometimes less than the rate of the 3G/EDGE network. By not discriminating based on channel conditions, WIFI-ONLY can also exhibit high energy usage. By contrast, SALSA, which is explicitly designed for finite average delay (and, more than that, to keep instantaneous delay bounded) and which takes channel quality into account in its transmission decisions, achieves a much better performance. Compared to WIFI-ONLY, its envelope is much more compact. Its envelope is also more compactly clustered around the origin than that of MINIMUM-DELAY. As we have discussed before, some applications may want to explicitly control the delay-energy trade-off behavior: in particular, there maybe applications that would like to emulate WIFI-ONLY or MINIMUM-DELAY. We discuss below how different parameter settings for SALSA can be used to mimic these algorithms. The rightmost sub-figure of Figure 5.5 depicts the dispersion of these three algorithms. Recall that the dispersion measures the average distance from an empirically-determined optimal. Before we describe these results, we briefly describe how dispersion is calculated. For calculating the distance on the E-D 132 0 5 10 15 20 25 30 35 40 45 0 10 20 30 40 50 60 70 80 90 100 Energy Savings(%) Arrival Trace SALSA w/ α = 0.2 (a) SALSA Energy Savings 0 5 10 15 20 25 30 35 40 45 0 0.5 1 1.5 2 2.5 3 Delay Loss(Hour) Arrival Trace SALSA w/ α = 0.2 (b) Additional Delay Incurred by SALSA Figure 5.6: Practical implication of SALSA plane, we can use the absolute values of delay per byte and energy per byte, but the resulting distances then become very sensitive to the choice of units for delay and energy. To avoid this, we normalize the delay and energy by assigning a unit of 1 to the 95th-percentile values from all the simulations for each axis. The (normalized) dispersion of MINIMUM-DELAY is about twice that of SALSA, and that of WIFI- ONLY (ignoring the runs where WIFI-ONLY had infinite delay) is about 3:4 times that of SALSA! WIFI- ONLY’s performance is, of course, significantly affected by several large outliers from link traces which had very little WiFi availability. That SALSA is better than these two algorithms is not surprising, since they are relatively simple: later, we show that SALSA outperforms other, more sophisticated, algorithms as well. What is more interesting is that SALSA’s absolute distance from the empirical optimal is low (0.343), leaving little room for improvement. We consider the following question: does being a delay-tolerant actually save significant energy? From Figure 5.5, we can see, by considering the energy-per-byte values, that SALSA uses roughly half the energy per byte of MINIMUM-DELAY. Thus, relative to the most obvious implementation, SALSA is on average twice as energy-efficient. But, does this improvement in energy-efficiency matter in the real world, i.e., are we solving a real problem? To understand this, we measured the total energy used in Joules, for each of our arrival traces, both by SALSA and MINIMUM-DELAY. Then, we computed the ratio of the difference in energy usage to the overall battery capacity of the Nokia N95. This gives us, for each arrival trace, the fraction of battery capacity that would have become available for use by other applications if SALSA were used instead of MINIMUM-DELAY. Figure 5.6(a) plots this fraction for each arrival trace. For most 133 Figure 5.7: SALSA envelopes for differenta traces, this number is in the 5-15% range, but there exist some events where users could have extended their battery life by 20-40% by using SALSA instead of MINIMUM-DELAY for uploading their videos. In one extreme case, MINIMUM-DELAY would have required more than one complete charge of the battery to upload the corresponding videos, but SALSA could have completed it without recharging. This brings up another question: how much does SALSA pay in delay for these energy savings? Figure 5.6(b) plots the average additional delay incurred by a video when using SALSA over MINIMUM- DELAY, for each of our arrival traces, averaged over all link traces. For most videos, this additional delay is on the order of half an hour, and the worst-case average delay is about 1.5 hours. This tradeoff is quite encouraging: assuming that, for example, a user’s smartphone lasts 12 hours, she can get, in most cases, between 30 mins to 90 minutes (5-15% of battery capacity) extra usage of her smartphone, while giving up an average delay of about 30 minutes in video upload. SALSA Performance for different a. In S 5:3:3 , we described the design of a time-varying V parameter that would allow users to explicitly control energy-delay trade-offs. In this section, we explore the efficacy of our design by varyinga from 0.1 to 2.0 with steps of 0.1. Beyonda= 2:0, SALSA’s behavior converges to that of MINIMUM-DELAY; in this range, V values approach zero and with small values of V , SALSA never defers transmissions. Figure 5.7 depicts the results for a subset of the a values. By comparing with Figure 5.5, it is clear that SALSA can span a fairly broad range in the spectrum of energy delay trade-offs. For very small a, SALSA’s envelope is qualitatively similar to that of WIFI-ONLY: for smalla tending toward zero, V is high, 134 0.1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.5 2.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 α Dispersion USC Glendale LAX Figure 5.8: Performance across different environments. setting a high bar (e.g., a very good WiFi AP) for SALSA’s transmission decision. As a increases, the envelope becomes more compact and also flattens out until, at al pha= 2:0, it starts to resemble MINIMUM- DELAY. Thus, by varying a we are able to mimic both ends of the energy-delay tradeoff spectrum, and points in between. However, it is harder to intuitively understand the direct relationship between a and an application’s delay tolerance. In future work, we hope to develop rules of thumb, based on deployment experience in different environments, for suggestinga values for our users. The rightmost sub-figure of Figure 5.7 reveals a more interesting behavior. It plots the variation in dispersion as a function ofa. From this, it appears that a value ofa 0:4 is a sweet spot in the parameter space, having low dispersion. Recall thata serves two functions: one is to choose a good point of dimin- ishing returns in the energy-delay tradeoff, and the other is to control the delay tolerance. The sweet spot value strikes the best balance for these objectives. Finally, Figure 5.8 depicts the difference in SALSA’s performance across different locations. While the sweet spot behavior is also consistent across locations, the absolute values of the dispersion are much higher in environments with sparse WiFi availability, such as the Glendale mall. Comparison with threshold-based algorithms. We now compare SALSA’s performance against that of STATIC-DELAY and KNOW-WIFI. This comparison presents a methodological difficulty: both STATIC- DELAY and KNOW-WIFI have a time threshold parameter T , but its relationship to a is not clear. Thus, it 135 Figure 5.9: STATIC-DELAY vs KNOW-WIFI vs SALSA would be misleading to compare a version of SALSA with a specifica and STATIC-DELAY or KNOW-WIFI with a specific value of T . So, we adopted a slightly different methodology. Empirically, we found, for each algorithm, the most agressive (in terms of transmission) and least aggressive parameters. We determined these ends of the parameter space by manually trying different large and small values. For SALSA, for example, a = 2 is the most aggressive value; as we have discussed above, the system is not sensitive to choices of a beyond this. Its least aggressive parameter is a value close to zero; we chose 0:1. For STATIC-DELAY and KNOW-WIFI, the most agressive parameter was selected as 10 mins and the least aggressive as 16 hours. Between these parameter values, we selected 10 other parameters, and then executed each simulation run for these 12 parameters, for all three algorithms. For each algorithm, we plotted all simulation runs (across all parameter values) on the E-D plane. The resulting envelope captures the performance of the algorithm across a large range of the parameter setting, and, is likely a good indicator of the macroscopic performance of each algorithm. Figure 5.9 plots these envelopes. SALSA’s envelope is much more compact than that of the other two algorithms. It uses less energy and incurs less delay in general, and it has smaller and fewer of outliers compared to other two algorithms. STATIC-DELAY performs the worst, because it relies on a simple assumption that WiFi is more energy-efficient than 3G/EDGE. However, that is not always the case in our traces, and STATIC-DELAY sometimes pays a delay penalty waiting for WiFi only to find that the quality of the WiFi link is not significantly better than the 3G/EDGE network. 136 Interestingly, SALSA is also able to outperform an algorithm that has knowledge of upcoming good WiFi links. Clearly, KNOW-WIFI is careful in that it waits only for good WiFi connections, unlike STATIC- DELAY which indiscriminately uses the next WiFi link to come along. So why does KNOW-WIFI not perform as well as SALSA? The answer is that KNOW-WIFI does not take the queue backlog into account. Simply knowing that a good WiFi link will come along is not helpful, without knowing if that WiFi will be available long enough to be used to upload the queue backlog! The dispersion comparison, in Figure 5.9, also bears this out. STATIC-DELAY has a dispersion 55% higher than SALSA, while KNOW-WIFI’s dispersion is about 27% higher. Summary of Results. In summary, our comparison with a progression of heuristics suggests the follow- ing. The comparison with MINIMUM-DELAY suggests that significant energy benefits can be obtained by judiciously delaying transmissions. However, indiscriminately delaying a transmission until a WiFi link becomes available (as WIFI-ONLY does), doesn’t work well for two reasons: poor WiFi availability, and variable WiFi quality. Both these reasons are important: STATIC-DELAY is careful about waiting for a bounded amount of time for a WiFi link, but thereafter uses the first WiFi link that comes along. In our traces, there is significant WiFi variability, as a result of which STATIC-DELAY does not perform well. Finally, taking WiFi quality into account by looking ahead into the future, as KNOW-WIFI does, is also not sufficient for good performance: it fails to account for the duration of that link’s availability, so in many cases the entire backlog cannot be uploaded, resulting in high delay. SALSA, by explicitly or implicitly considering channel quality, backlog, as well as the effective trans- mission rate of the radio, performs the best. Of course, it may be possible to design other heuristics that take all of these factors into account, but, as we have shown, SALSA’s absolute dispersion values are quite low, and leave little room for improvement. Sensitivity to the Scanning Interval. In our trace-driven simulations, we have assumed that a WiFi scan is conducted at the beginning of every slot (i.e., every 20 seconds). Of course, WiFi scanning every 20 137 Figure 5.10: Energy Measurement Environment 20 60 120 180 240 0 1 2 3 4 5 6 7 8 9 10 Energy Cost(%) Scan Interval(Sec) Aggregated WiFi−Scan Cost SALSA w/ α = 0.4 (a) Scanning Energy Cost 20 60 120 180 240 0 0.5 1 1.5 2 2.5 3 Hour Scan Interval(Second) Average Delay(Hour) SALSA w/ α = 0.4 (b) Average Delay 20 60 120 180 240 0 10 20 30 40 50 60 70 80 90 100 Energy Cost(%) Scan Interval(Second) Energy Costs(%) SALSA w/ α = 0.4 (c) Average Energy Consumption Figure 5.11: SALSA performance for different scanning intervals seconds can incur significant energy. To understand whether a larger scanning interval can be used, we explore the sensitivity of SALSA’s performance to the choice of WiFi scanning frequency. To quantify the cost of scanning, we first measured the cost of a single WiFi scan on two different platforms: the Nokia N-95 and the Android G1. For measuring the energy consumption of a WiFi scan operation, we used both a dedicated power monitor hardware [90] and a software tool (the Nokia energy profiler v1.2 [96]). Our software and hardware setup is shown in Figure 5.10. Platform T WiFi-Scan(J) Duration(second) Nokia N95 T 1.18J 2.03s Android G1 0.63J 1.11s Table 5.1: Scan Cost Measurement 138 Table 5.1 shows the result of our measurements. The N95 consumes 1.18J per scan, which lasts 2.03s. The G1 consumes less, about 0.63J, and it lasts 1.11s. Thus, depending on how frequently scans are invoked, scanning costs can be quite significant and require careful attention to system design. To understand SALSA’s sensitivity to the scanning frequency, we ran our trace-driven simulations for four additional scanning intervals: 60s, 120, 180s and 240s. For each scanning interval, we then counted the number of scans performed by the algorithm, and then computed the fraction of total battery capacity that can be attributed to scanning. To understand this graph, it is important to realize that SALSA (or any of our other algorithms) will only scan when the queue is nonempty. In general, one might expect SALSA to scan slightly more often than MINIMUM-DELAY, because it defers transmissions and builds up the queue. Figure 5.11(a) plots the average scanning cost for each event in our trace, as a fraction of the total battery capacity. Clearly, at a 20s scan interval, SALSA’s scanning cost is a significant 3% of the total battery capacity. For a 60s scan interval and beyond, it is much more reasonable and decreases quickly. When compared to the average energy consumption per event (Figure 5.11(c)), we see that SALSA’s scanning costs become a relatively small fraction for any scanning interval greater than or equal to 60s. Interestingly, it is not possible to increase the scanning interval without a penalty. As Figure 5.11(b) shows, the average delay per interval increases fairly dramatically with scan interval, going from about 30 mins for a 20s interval to over an hour for a 240s interval. The reason for this is that a larger scan interval increases the burstiness of the departure process (relative to a smaller scan interval), and this increases the SALSA threshold V , forcing the algorithm to wait longer for better quality APs. Thus, the sweet spot for the scanning interval appears to be 60 seconds, where the cost of scanning is a small fraction of the total energy and the delay is comparable to a 20s scan interval. 139 5.5 Experimental Results In this section, we describe the implementation of SALSA within a video transfer application developed in Symbian C++ for the Nokia N95 smartphones. We then discuss the results of an experiment designed to verify the performance of SALSA under real-world conditions, and to validate our simulation results. Implementation Description. We have implemented the SALSA algorithm in our Urban Tomography system. The component of this system that runs on the smart phone and transmits videos to the backend server is called the Video CAPture System [124], or VCAPS. Our implementation runs on the Nokia N95 smartphone, which has a 802.11b/g WiFi interface as well as 3G/EDGE, a 2GB micro-SD card, and supports 640x480-resolution video recording capability at full frame rate. In our implementation, VCAPS periodically scans the environment and determines the set of usable APs. These scans occur every 20 seconds (which constitutes a timeslot), a time period empirically de- temined to work well, yet expend relatively low energy scanning for APs. At this time, it also updates all relevant statistics that are used in calculating V[t]. The videos captured by a user are placed in a designated video directory, which represents the backlog queue in our system. Whenever this queue is non-empty, VCAPS attempts to transfer data to an Internet- connected server using HTTP 1 . For each transfer attempt, VCAPS invokes SALSA’s decision algorithm (S 5:3 ) to determine which link to use, among the ones available. The video upload process runs in the background and does not require any user intervention. In practice, transfer attempts may fail, for several reasons, and SALSA has built-in robustness mecha- nisms to deal with such failures. For example, a transfer attempt may fail because current achievable rate on the current chosen link is low (either because the estimate was wrong, or because user has moved away from the AP since the last scan). If a failure happens, VCAPS waits until the beginning of next timeslot and retries. If more than 5 transfer attempts through a particular AP fail, then VCAPS blacklists that AP and waits for 20 minutes before re-using it. Re-using a blacklisted AP allows us to use a different AP that 1 Some of these videos can be viewed athttp://tomography.usc.edu. A portion of the corpus is not publicly viewable because of privacy reasons. 140 Figure 5.12: Experimental result at the USC Campus compared to simulation results may potentially have the same SSID but provides good performance. We have observed several instances of different WiFi APs using the same SSID during our trace collection, especially on the USC campus network. We implemented all the features of SALSA described in S 5:3 on the smartphone including the algorithm for time-varying V and the rate estimation scheme. However, there are few minor differences between the SALSA implementation used for simulations, and our smartphone implementation. These differences are driven by real-world considerations. First, unlike the simulator which treated the queue as a bag of bits, our implementation uses HTTP POST and attempts to transfer fixed size video chunks. These chunk transfers may fail and need to be retried. Our simulator, on the other hand, determines how many bits could have been transferred given the rate estimate, and then determines, using a weighted coin toss, whether that transfer would have resulted in a failure. This behavior was designed to mimic the theory, more than the implementation. Second, the implementation uses the online rate estimation, but the simulator does not. Rather, the simulator simply uses the rate estimates and the failure probabilities derived from the trace. The imple- mentation is potentially more accurate in this regard, because it learns the actual achievable rate from successful transfers. Finally, like the simulator, our implementation also uses a fixed nominal value for the power expen- diture P C and P W . An implementation has the potential to be more accurate if the OS were to provide fine-grained energy usage measurements, but the Symbian OS does not do this. 141 Figure 5.13: Experimental result at Shopping Mall compared to simulation results Figure 5.14: Exp. Walk Route Results. We have conducted extensive experiments using our prototype implementation for evaluating SALSA with five different parameters. The goal of this experiment was two fold: first, to demonstrate that our Urban Tomography system with SALSA works robustly for several hours; and second, to validate that the performance of SALSA under real-world settings is consistent with our simulation results, despite the differences between the simulation and the implementation. In our experiments, one volunteer carried five phones each configured with different values of a, and conducted 5 walks (each for approximately 3 hours) both on the USC campus and at the Glendale Galleria mall. The routes through the USC campus and Glendale Galleria are shown in Figure 5.14. Each phone was programmed to use the same arrival trace, obtained from one of the events recorded by users of the Urban Tomography system (i.e., we replayed, on the phone, the arrival of videos in the event). Each walk completed the upload of all videos associated with the same arrival trace. Thus, our real-world experiment corresponds to, in the terminology of S 5:4:1 , one arrival trace, and 10 link traces. On each phone, we recorded the transfer decisions made, the average delay from the creation to transfer completion for each video, all WiFi scan results, the achieved rates, the size, and the duration of each 142 transfer. Using this, and nominal values of the energy consumption of cellular and WiFi transfers, we were able to plot the performance of the SALSA algorithm in the E-D plane. Figure 5.12 and Figure 5.13 each depict the results from the USC campus and the Glendale Galleria mall. On each figure, the 5 small black crosses each correspond to one walk. For comparison, the dots in the background on each graph depict results from our trace-driven simulations for that particular envi- ronment and a value. We say that an experiment is consistent with simulation if the experimental results fall within the envelope obtained by the simulation. Consistency implies that the differences between the simulation and implementation are not significant, and that the envelope obtained by simulation may be a reasonable indicator of performance observed in the real-world. As Figures 5.12 and 5.13 show, all experimental data points fell within the corresponding envelopes. (In the Glendale experiment, there are a few instances for a = 0:4 which were just on the border of the corresponding envelope; in all other cases, the experimental points were well within the envelope.) This result is encouraging: we believe that, over a wide range of parameters, SALSA will perform well in the real-world. We intend to incorporate SALSA into our VCAPS software distribution, so that our user base can obtain the performance benefits it provides. 5.6 Conclusions and Future Work SALSA is a near-optimal algorithm for performing the energy-delay tradeoff in bandwidth-intensive delay- tolerant smartphone applications. Its transmission decisions take several factors into account: data backlog, power cost of the wireless interface, and channel quality. Algorithms which lack even one of these factors in the transmission decisions perform significantly worse. Finally, SALSA solves a real problem: many of the users of our system have collected videos for which the total transmission cost, as well as the savings obtained by SALSA, are a noticeable fraction of the overall battery capacity. In future work, we hope to get more experience with SALSA deployments from our diverse user base. 143 Chapter 6 Literature Review This dissertation covers several topics in the broad area of mobile and cloud computing, signal and image processing, and stochastic network optimization. In this chapter, we present four sets of related work based on the problems that we addressed. We first review literature related to offloading computations to cloud infrastructure, and then investigate related work in privacy-preserving photo sharing and programming framework for crowd-sensing. Finally, we discuss prior work with respect to managing heterogeneous network intefaces on smart mobile devices. 6.1 Offloading Computation from Mobile Devices to the Cloud To our knowledge, no prior work has explored the joint automatic adaptation of offloading, pipelining and data parallelism. The idea of offloading computation to networked computing infrastructure to overcome the limited capabilities of a wireless mobile device was proposed a decade ago [18, 19, 31, 114]. Since then, a variety of approaches for offloading computation to improve application performance or reduce resource usage have been proposed. Prior work makes use of three primary techniques to offload computation from the mobile device. The application partitioning is done either statically at compile time by Wishbone [94] and Coign [62], dynamically based on programmer specified partitions as in Spectra [51] and Tactics [20], or dynamically based on a run-time optimizer that uses integer-linear programming (as in MAUI [37] 144 and CloneCloud [33]), or graph-based partitioners [58, 75, 98]. These pieces of work occupy different points in the design space relative to Odessa in terms of their approach. Some have significantly different goals (like conserving energy on the mobile device) from ours. Crucially, many of these pieces of work use a history of performance measurements, collected before execution, to predict stage execution times for the offloading decision. Narayanan et al. [92] show that history-based online learning of resource utilization outperforms other static methods, and more recently CloneCloud [33] shows the effectiveness of static analysis of Java code to dynamically partition applications. In contrast, Odessa uses a greedy and incremental approach guided by the application profiler and simple predictors that works very well to improve makespan and throughput. Odessa’s partitioning for makespan has a similarity with multiprocessor scheduling with precedence constraints, which has been studied extensively (see [74] for a survey). The problem is NP-complete even in the case of two processors and non-uniform execution times [54], so heuristics are typically used. Through an offline analysis we show that the performance achieved by Odessa’s greedy heuristic is com- parable to an offline optimal decision obtained with complete profiling information. Yigitbasi et al. [129] demonstrate placement techniques to minimize makespans of interactive perception applications on gen- eral sets of heterogeneous machines. Like Odessa, they use fast heuristics and online profiling, but do not consider tuning of data parallelism or pipeline depth. The System S distributed stream processing system provides operators capable of dynamically ad- justing the level of data parallelism [116]. These elastic operators vary based on changes in workload and computational resources. Unlike in Sprout, the level of data parallelism does not extend beyond the boundaries of a single machine. Zhu et al. [131] describe an automatic tuner that operates on developer specified application parameters including the level of operator parallelism. The tuner learns application characteristics and effects of tunable parameters online to maximize application fidelity for a given latency constraint. Also related to our work are parallel processing frameworks like MapReduce [39] and Dryad [66] for offline analysis of large datasets. While their runtimes schedule data-parallel tasks to optimize throughput 145 or fairness to users, the setting is very different from ours (data-centers vs. mobile, large stored datasets vs. streams) that the details of the solutions vary. Finally, other work has looked at more general VM- based offloading mechanisms [115, 120, 57], while ours relies on the mechanisms provided by Sprout framework. 6.2 Preserving Privacy on Photo-Sharing Services We do not know of prior work that has attempted to address photo privacy for photo-sharing services. Our work is most closely related to work in the signal processing community on image and video privacy. Early efforts at image privacy introduced techniques like region-of-interest masking, blurring, or pixel- lation [40]. In these approaches, typically a face or a person in an image is represented by a blurred or pixelated version; as [40] shows, these approaches are not particularly effective against algorithmic attacks like face recognition. A subsequent generation of approaches attempted to ensure privacy for surveillance by scrambling coefficients in a manner qualitatively similar to P3’s algorithm [40, 42], e.g., some of them randomly flips the sign information. However, this line of work has not explored designs under the con- straints imposed by our problem, namely the need for JPEG-compliant images at PSPs to ensure storage and bandwidth benefits, and the associated requirement for relatively small secret parts. This strand is part of a larger body of work on selective encryption in the image processing commu- nity. This research, much of it conducted in the 90s and early 2000s, was motivated by ensuring image secrecy while reducing the computation cost of encryption [85, 79]. This line of work has explored some of the techniques we use such as extracting the DC components [121] and encrypting the sign of the coeffi- cient [118, 104], as well as techniques we have not, such as randomly permuting the coefficients [121, 107]. Relative to this body of work, P3 is novel in being a selective encryption scheme tailored towards a novel set of requirements, motivated by photo sharing services. In particular, to our knowledge, prior work has not explored selective encryption schemes which permit image reconstruction when the unencrypted part of the image has been subjected to transformations like resizing or cropping. Finally, a pending patent 146 application by one of the co-authors [97] of this work, includes the idea of separating an image into two parts, but does not propose the P3 algorithm, nor does it consider the reconstruction challenges described in Section 3.3. Some recent papers have examined complementary image security and privacy problems. Johnson et al. discuss homomorphic encryption based methods for verifying image signatures when images have been subject to transformations like cropping, scaling, and JPEG-like compression [68]. End-to-end image encryption has been explored for the JPEG 2000 image format [45], and has resulted in a standard for JPEG 2000 imaging (JPSEC) [67]. Tangentially related is a body of work in the computer systems community on ensuring other forms of privacy: secure distributed storage systems [49, 84, 25], and privacy and anonymity for mobile systems [44, 61, 35]. None of these techniques directly apply to our setting. 6.3 Crowd-Sensing: Crowd-Sourcing with Wireless Remote Sensing To our knowledge, no other prior work describes a programming language and runtime system for crowd sensing. Crowd-sensing, as we have defined it, combines two strands of research: participatory sensing and crowd sourcing. We now describe how these strands relate to Medusa. Participatory sensing systems like PEIR [91] and SoundSense [83] do not incorporate human-mediation or incentives into sensing. Moreover, they do not support programmable data collection from multiple sen- sors or an in-network processing library. Some pieces of work in this area, however, share some goals with Medusa. Campaignr [3] is an early effort on programming sensor data collection on a single phone for par- ticipatory sensing and uses an XML description language to specify data collection tasks and parameters. AnonySense [35] is a privacy-aware tasking system for sensor data collection and in-network process- ing, while PRISM [38] proposes a procedural programming language for collecting sensor data from a 147 large number of mobile phones. More recently, Ravindranath et al. [110] have explored tasking smart- phones crowds and provide complex data processing primitives and profile-based compile time partition- ing. Unlike these systems, Medusa incorporates incentives and reverse incentives, human-mediation, and support for curation into its programming framework. Several systems have integrated worker-mediation into their workflow. Most of the systems have been inspired by, or directly employ, Amazon Mechanical Turk (AMT [1]). CrowdDB [52] extends relational databases to support human-mediation in SQL query workflows. CrowdSearch [128] exploits AMT to crowd-source video search. TurKit [76] enhances the original programming model of AMT by allowing repeated execution of existing tasks, and enhances the robustness of the AMT programming model. Addi- tionally, several studies have explored who uses AMT and in what ways [77, 32]. Medusa is qualitatively different from this body of work in that it enables users to contribute processed sensor data, a capability that enables many interesting applications as described above. Complementary to our work is PhoneGap [10], a system that enables developers to program mobile applications using HTML5 and JavaScript, then automatically generates platform-specific native binaries. Thus, PhoneGap focuses on single source multi-platform code development, and not on crowd-sensing. Xiao et al. [127] considers large-scale mobile crowd-sensing and discusses potential challenges presented by virtualization, mobility, and heterogeneous sensing-requirements. Although they neither propose a programming model nor take human factors into account, their discussions of heterogeneous sensing in- terfaces and VM-based deployment model are relevant to our system. Finally, many of our applications have been inspired by prior work. Conceptual descriptions were provided in [108] for video documentation and in [86, 22] for collaborative learning. However, these papers did not discuss incentives or worker-mediation. Road-bump monitoring was described in [89], and citizen journalist in [53], but the focus of research in these papers (sensing algorithms, energy, location discovery) was very different from ours. 148 6.4 Managing Heterogeneous Network Interfaces on Mobile Devices Two preliminary pieces of work have inspired our own. Zaharia et al. [130] consider the same problem but assume that each network interface knows its future availability and has a fixed rate. Seth et al. [117] also consider supporting delay-tolerant applications, but focus on an approach to seamlessly manage mul- tiple network interfaces of varying availability, relieving the programmer of this burden. In their approach, users or applications specify an overall objective, like a delay-bound, and their runtime system attempts to achieve this objective by ensuring that the progress of data transfer is at a rate that will satisfy the appli- cation objective, while having the freedom to pick the appropriate link. Our problem statement is slightly different, since we do not attempt to guarantee a fixed delay bound, and instead focus on minimizing energy. Next closest to our work is prior work on achieving energy efficiency in smartphone applications by exploiting multiple wireless interfaces. Context-for-Wireless [109] uses the history of context information to decide whether it is beneficial, in terms of energy, to use 3G/EDGE or WiFi for data transfer. They attempt to intelligently learn and estimate WiFi network conditions without powering up the WiFi interface so as to save the energy cost of turning on the interface and re-scanning for available APs. Armstrong et al. [17] also discuss a similar problem. They report that there exist a threshold message size (30KB in their application on HP iPAQ 6325 platform) for which using WiFi is more energy efficient than 3G/EDGE, due to the wake-up cost of WiFi interfaces. However, their focus is on designing a web proxy system to reduce the size of the updated content for efficient data downloads. CoolSpots [100] aims to reduce the power consumption of wireless mobile devices with multiple radio interfaces by intelligently deciding whether and when to use WiFi and Bluetooth based on an application’s bandwidth requirement. None of these pieces of work trades-off delay for reduced energy: rather, they are interested in determining the lowest energy link among a set of available links at a given instant. Moreover, our work is focused on larger data traffic (our videos in VCAPS have from a few hundreds of KBytes to a hundred MBytes) than some of 149 these applications, and their emphasis on WiFi wake-up costs do not apply in our case, since the wake-up cost can be amortized over these larger transfers. Other pieces of work, in slightly different contexts, have attempted to exploit multiple radio interfaces to improve energy efficiency on smartphones. Cot´ e et al. [36] considers the problem of scheduling packets using 2 networks, 3G and WiFi, in order to maximize energy-efficiency. The authors conduct a competitive analysis and present a 3-competitive scheduling algorithm. Unlike our investigation, however, they did not consider delay-bound and made several important assumptions, which include a) only 2 networks, b) a fixed network-capacity, c) no transmission failure, d) a maximum of one-packet per time-slot, e) a uniform packet-size, and f) that WiFi is more energy-efficient than 3G (sometimes this is not the case, as in [21]). Micro-Blog [53] is an application for sharing and querying content through mobile phones and social par- ticipation. Its localization component aims to save energy by adaptively changing between three different localization schemes (GPS, WiFi-based, GSM-based) considering energy cost and localization accuracy requirement. Cell2Notify [14] is an energy management architecture that leverages the cellular radio sig- nal to wake-up the high-energy consumption WiFi radio for V oIP applications. Finally, COMBINE [16] leverages 3G/EDGE links of its wireless LAN peers to cooperatively download data. However, it focuses on throughput enhancement rather than energy, and does not consider an intermittently connected WiFi as the download/upload link. BreadCrumbs [95] examines WiFi connectivity changes over time and provides mobile connectivity forecasts by building a predictive mobility model. These forecasts can be used to more intelligently sched- ule network usage. This work can be complementary to ours: e.g. SALSA could benefit from this tech- nique and determine relevant a settings. A similar benefit can be obtained from WiFi databases obtained opportunistically [99, 109]. Some papers utilize multiple network interfaces to improve performance. Like SALSA, Wiffler [21] considers delay-tolerant mobile applications with the use of multiple network interfaces on mobile devices. Their focus, however, is to augment 3G networks by opportunistically using WiFi networks. Their mea- surement results demonstrate that WiFi has poorer throughput and availability than 3G in moving vehicles. 150 Thus, one needs to use WiFi only when it helps to reduce 3G usage. Intentional networking [60] takes application-specific requirements (labels) on data transfer as inputs and chooses an appropriate network interface that can improve performance. Like SALSA, it can also defer transmission opportunities if no available network is relevant for data transfer. In a different context, for a networked setting with multiple nodes transmitting data over wireless links, Neely [93] developed a joint transmit power and transmission scheduling algorithm (EECA) that mini- mizes the total system power consumption. There are two key differences between EECA and SALSA: (i) EECA assumes that each node has a single wireless interface whereas SALSA is designed for smartphones with multiple wireless interfaces, and (ii) EECA focuses on transmit power control while in SALSA, we assume that the transmit power on each wireless interface is fixed. Neely also discusses a variant of EECA that maximizes throughput given average power constraint. EECA has not been evaluated in implementa- tion, and does not specify how to determine the value of the control parameter V automatically. Georgiadis et al. [55] discuss several stable control algorithms for maximizing throughput or fair rate allocation in wired and wireless networks derived using the Lyapunov optimization framework in their book [55]. Finally, there is a large literature on smartphone applications [91, 89, 43, 53, 5, 6, 124, 23] that are data-intensive but delay-tolerant applications. Some of them (e.g., [43]) implement greedy delay-tolerant strategies, like handing off data to the first available peer or access point, and do not explicitly consider the energy/delay trade-off in their designs. In that sense, they are closer to the work on delay-tolerant net- works [48]. Finally, research in sensor network energy management has explicitly considered the energy- delay trade-off to increase network lifetime [106, 56], but only in the context of a single wireless interface. 151 Chapter 7 Conclusions In this dissertation, we have explored how we can enable efficient processing and secure sharing of sensor data using the cloud. Odessa focuses on performance and enables mobile perception applications on mobile devices by dy- namically making offloading and parallelism decisions to improve throughput and makespan simultane- ously. Odessa’s decisions are incremental so that it can quickly adapt to input and platform variability. P3 prevents the leakage of users’ photos and applications of automatic recognition technologies when sharing photos, while maintaining cloud-side processing capabilities. The P3 image encryption/decryption algorithm ensures privacy, minimal storage overhead, standard-compliant image format, and allows cloud- side processing. Also, P3 system design enables transparent deployment of the resulting implementation. Medusa makes programming crowd-sensing tasks very easy. Users just need to provide a high-level task description. The runtime then takes care of the rest automatically with minimal user intervention. SALSA can effectively trade-off delay for reduced energy by intelligently deferring transmission op- portunities. It determines whether, when, and on which network to transmit purely based on local infor- mation available. 152 References [1] Amazon mechanical turk,https://www.mturk.com/. [2] Caltech computational vision group,http://www.vision.caltech.edu/html-files/ archive.html. [3] Campaignr:configurable mobile sensing, http://urban.cens.ucla.edu/technology/ campaignr/. [4] The color feret database,http://www.nist.gov/itl/iad/ig/colorferet.cfm. [5] Cyclesense. http://urban.cens.ucla.edu/projects/cyclesense/. [6] Dietsense. http://urban.cens.ucla.edu/projects/dietsense/. [7] Facebook Shuts Down Face Recognition APIs After All, http://www.theregister.co. uk/2012/07/09/facebook_face_apis_dead/. [8] Inria holidays dataset,http://lear.inrialpes.fr/ ˜ jegou/data.php. [9] Open source computer vision,http://opencv.willowgarage.com/wiki/. [10] Phonegap,http://phonegap.com/. [11] Usage of image file formats for websites, http://w3techs.com/technologies/ overview/image_format/all. [12] Usage of JPEG for websites, http://w3techs.com/technologies/details/ im-jpeg/all/all. [13] Usc-sipi image database,http://sipi.usc.edu/database/. [14] Yuvraj Agarwal, Ranveer Chandra, Alec Wolman, Paramvir Bahl, Kevin Chin, and Rajesh Gupta. ”Wireless Wakeups Revisited: Energy Management for V oIP over Wi-Fi Smartphones”. In Mo- biSys’07, 2007. [15] Michal Aharon, Michael Elad, and Alfred Bruckstein. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. Signal Processing, IEEE Transactions on, 54(11):4311–4322, Nov 2006. [16] Ganesh Ananthanarayanan, Venkata N. Padmanabhan, Lenin Ravindranath, and Chandramohan A. Thekkath. ”COMBINE: Leveraging the Power of Wireless Peers through Collaborative Download- ing”. In MobiSys’07, 2007. [17] Trevor Armstrong, Olivier Trescases, Cristiana Amza, and Eyal de Lara. ”Efficient and Transparent Dynamic Content Updates for Mobile Clients”. In MobiSys’06, 2006. [18] Rajesh Krishna Balan. ”Simplifying Cyber Foraging”. PhD thesis, 2006. (In CMU-CS-06-120). 153 [19] Rajesh Krishna Balan, Jason Flinn, M. Satyanarayanan, Shafeeq Sinnamohideen, and Hen-I Yang. ”The case for cyber foraging”. In ACM SIGOPS European Workshop, 2002. [20] Rajesh Krishna Balan, Mahadev Satyanarayanan, So-Young Park, and Tadashi Okoshi. ”Tactics- Based Remote Execution for Mobile Computing”. In International Conference on Mobile Systems, Applications, and Services (MobiSys), 2003. [21] Aruna Balasubramanian, Ratul Mahajan, and Arun Venkataramani. Augmenting mobile 3g using wifi. In Proceedings of the 8th international conference on Mobile systems, applications, and ser- vices, pages 209–222. ACM, 2010. [22] X. Bao and R. R. Choudhury. Movi: mobile phone based video highlights via collaborative sensing. In Proc. ACM MOBISYS, 2010. [23] Xuan Bao and Romit Roy Choudhury. ”VUPoints: Collaborative Sensing and Video Recording through Mobile Phones”. In Mobiheld ’09, 2009. [24] Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. Finding a needle in haystack: facebook’s photo storage. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, OSDI’10, pages 1–8, Berkeley, CA, USA, 2010. USENIX Association. [25] Alysson Bessani, Miguel Correia, Bruno Quaresma, Fernando Andr´ e, and Paulo Sousa. Depsky: dependable and secure storage in a cloud-of-clouds. In Proceedings of the sixth conference on Computer systems, EuroSys ’11, pages 31–46, New York, NY , USA, 2011. ACM. [26] R. Beveridge, D. Bolme, M. Teixeira, and B. Draper. The csu face identification evaluation sys- tem user’s guide: version 5.0. Technical Report, Computer Science Department, Colorado State University, 2(3), 2003. [27] Ross Beveridge and Bruce Draper. Evaluation of Face Recognition Algorithms, http://www. cs.colostate.edu/evalfacerec/. [28] J Canny. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell., 8(6):679–698, June 1986. [29] Marta Carbone and Luigi Rizzo. ”Dummynet revisited”. SIGCOMM Computer Communincation Review, 40(2):12–20, 2010. [30] Ming-yu Chen and Alex Hauptmann. ”MoSIFT: Recognizing Human Actions in Surveillance Videos”. In CMU-CS-09-161, Carnegie Mellon University, 2009. [31] Jesse Cheng, Rajesh Krishna Balan, and Mahadev Satyanarayanan. ”Exploiting Rich Mobile Envi- ronment”. Technical Report CMU-CS-05-199, Carnegie Mellon University, 2005. [32] L. Chilton, J. Horton, R. Miller, and S. Azenkot. Task search in a human computation market. In Proc. HCOMP, 2010. [33] Byung-Gon Chun and Petros Maniatis. ”CloneCloud: Elastic Execution between Mobile Device and Cloud”. In Proceedings of the 6th European Conference on Computer Systems (EuroSys), 2011. [34] CNN: Photobucket leaves users exposed. http://articles.cnn.com/2012-08-09/ tech/tech_photobucket-privacy-breach. 154 [35] Cory Cornelius, Apu Kapadia, David Kotz, Dan Peebles, Minho Shin, and Nikos Triandopoulos. AnonySense: Privacy-Aware People-Centric Sensing. In Proceedings of the 6th international con- ference on Mobile systems, applications, and services, MobiSys ’08, pages 211–224, New York, NY , USA, 2008. ACM. [36] Aaron Cot´ e, Adam Meyerson, and Brian Tagiku. Energy-efficient mobile data transport via online multi-network packet scheduling. Sustainable Computing: Informatics and Systems, 1(3):196–212, 2011. [37] Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra, and Paramvir Bahl. ”MAUI: Making Smartphones Last Longer with Code Offload”. In Proc. ACM MOBISYS, 2010. [38] Tathagata Das, Prashanth Mohan, Venkata N. Padmanabhan, Ramachandran Ramjee, and Asankhaya Sharma. Prism: Platform for remote sensing using smartphones. In Proc. ACM MO- BISYS, 2010. [39] Jeffrey Dean and Sanjay Ghemawat. ”MapReduce: simplified data processing on large clusters”. Communications of the ACM (CACM), 51(1):107–113, 2008. [40] F. Dufaux and T. Ebrahimi. A framework for the validation of privacy protection solutions in video surveillance. In Multimedia and Expo (ICME), 2010 IEEE International Conference on, pages 66–71. IEEE, 2010. [41] Cynthia Dwork. Differential privacy. In ICALP, pages 1–12. Springer, 2006. [42] Touradj Ebrahimi. Privacy Protection of Visual Information. In The Tutorial in MediaSense 2012, Dublin, Ireland, 21-22 May 2012. [43] S. B. Eisenman, E. Miluzzo, N. D. Lane, R. A. Peterson, G-S. Ahn, and A. T. Campbell. ”The BikeNet mobile sensing system for cyclist experience mapping”. In SenSys’07, 2007. [44] William Enck, Peter Gilbert, Byung-Gon Chun, Landon P. Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N. Sheth. Taintdroid: an information-flow tracking system for realtime privacy moni- toring on smartphones. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, OSDI’10, pages 1–6, Berkeley, CA, USA, 2010. USENIX Association. [45] Dominik Engel, Thomas St¨ utz, and Andreas Uhl. A survey on jpeg2000 encryption. Multimedia systems, 15(4):243–270, 2009. [46] Facebook. http://www.facebook.com. [47] Facebook API: Photo. http://developers.facebook.com/docs/reference/api/ photo/. [48] Kevin Fall. ”A Delay-Tolerant Network Architecture for Challenged Internets”. In SIGCOMM ’03, 2003. [49] Ariel J. Feldman, William P. Zeller, Michael J. Freedman, and Edward W. Felten. Sporc: group collaboration using untrusted cloud resources. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, OSDI’10, pages 1–, Berkeley, CA, USA, 2010. USENIX Association. [50] Flickr API. http://www.flickr.com/services/api/upload.api.html. 155 [51] Jason Flinn, SoYoung Park, and M. Satyanarayanan. ”Balancing Performance, Energy, and Quality in Pervasive Computing”. In International Conference on Distributed Computing Systems (ICDCS), 2002. [52] Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, and Reynold Xin. CrowdDB: answering queries with crowdsourcing. In Proc. ACM SIGMOD, 2011. [53] Shravan Gaonkar, Jack Li, Romit Roy Choudhury, Landon Cox, and Al Schmidt. ”Micro-Blog: Sharing and Querying Content Through Mobile Phones and Social Participation”. In MobiSys’08, 2008. [54] Michael R. Garey and David S. Johnson. ”Computers and Intractability: A Guide to the Theory of NP-Completeness”. W. H. Freeman and Company, New York, 1979. [55] L. Georgiadis, M. J. Neely, and L. Tassiulas. ”Resource Allocation and Cross-Layer Control in Wireless Networks”. Foundations and Trends in Networking, 2006. [56] Omprakash Gnawali, Jongkeun Na, and Ramesh Govindan. ”Application-Informed Radio Duty- Cycling in a Re-Taskable Multi-User Sensing System”. In IPSN’09, 2009. [57] Mark S Gordon, D Anoushe Jamshidi, Scott Mahlke, Z Morley Mao, and Xu Chen. Comet: code offload by migrating execution transparently. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, pages 93–106. USENIX Association, 2012. [58] Xiaohui Gu, Alan Messer, Ira Greenberg, Dejan Milojicic, and Klara Nahrstedt. ”Adaptive Offload- ing for Pervasive Computing”. IEEE Pervasive Computing, 3(3):66 – 73, 2004. [59] Benjamin C. Haynor. A fast edge detection implementation in c,http://code.google.com/ p/fast-edge/. [60] Brett D Higgins, Azarias Reda, Timur Alperovich, Jason Flinn, Thomas J Giuli, Brian Noble, and David Watson. Intentional networking: opportunistic exploitation of mobile network diversity. In Proceedings of the sixteenth annual international conference on Mobile computing and networking, pages 73–84. ACM, 2010. [61] Baik Hoh, Marco Gruteser, Ryan Herring, Jeff Ban, Daniel Work, Juan-Carlos Herrera, Alexan- dre M. Bayen, Murali Annavaram, and Quinn Jacobson. Virtual trip lines for distributed privacy- preserving traffic monitoring. In Proceedings of the 6th international conference on Mobile systems, applications, and services, MobiSys ’08, pages 15–28, New York, NY , USA, 2008. ACM. [62] Galen C. Hunt and Michael L. Scott. ”The Coign automatic distributed partitioning system”. In Symposium on Operating Systems Design and Implementation (OSDI), 1999. [63] ImageMagick: Convert, Edit, Or Compose Bitmap Images. http://www.imagemagick. org/. [64] ImageMagick Resize or Scaling. http://www.imagemagick.org/Usage/resize/. [65] Independent JPEG Group. http://www.ijg.org/. [66] Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. ”Dryad: distributed data-parallel programs from sequential building blocks”. In European Conference on Computer Systems, 2007. [67] ITU-T. T.807: Information technology - JPEG 2000 image coding system - Part 8: Secure JPEG 2000. 2006. 156 [68] Rob Johnson, Leif Walsh, and Michael Lamb. Homomorphic signatures for digital photographs. Financial Cryptography and Data Security, pages 141–157, 2012. [69] Kivy. python-for-android,https://github.com/kivy/python-for-android. [70] Mathias Kolsch. ”Vision based hand gesture interfaces for wearable computing and virtual envi- ronments”. PhD thesis, 2004. (In 0-496-01704-7). [71] Martin H. Krieger, Ramesh Govindan, Moo-Ryong Ra, and Jeongyeup Paek. Commentary: Perva- sive urban media documentation. Journal of Planning Education and Research (JPER), 29(1):114– 116, 2009. [72] Martin H. Krieger, Moo-Ryong Ra, Jeongyeup Paek, Ramesh Govindan, and Jeniffer Evans- Cowley. ”Urban Tomography”. Journal of Urban Technology, 17:21–36, 2010. [73] Branislav Kveton, Michal Valko, Matthai Philipose, and Ling Huang. ”Online Semi-Supervised Perception: Real-Time Learning without Explicit Feedback”. In IEEE Online Learning for Com- puter Vision Workshop, 2010. [74] Yu-Kwong Kwok and Ishfaw Ahmad. ”Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors”. ACM Computing Surveys, 31(4):406–471, 1999. [75] Zhiyuan Li, Cheng Wang, and Rong Xu. ”Task Allocation for Distributed Multimedia Process- ing on Wirelessly Networked Handheld Devices”. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2002. [76] G. Little, L. Chilton, M. Goldman, and R. Miller. Turkit: Human computation algorithms on me- chanical turk. In Proc. ACM UIST, 2010. [77] Greg Little, L. Chilton, M. Goldman, and R. Miller. Exploring iterative and parallel human compu- tation processes. In Proc. HCOMP, 2010. [78] He Liu, Stefan Saroiu, Alec Wolman, and Himanshu Raj. Software abstractions for trusted sensors. In Proceedings of the 10th international conference on Mobile systems, applications, and services, MobiSys ’12, pages 365–378, New York, NY , USA, 2012. ACM. [79] Xiliang Liu and Ahmet M. Eskicioglu. Selective encryption of multimedia content in distribution networks: challenges and new directions. In Conf. Communications, Internet, and Information Technology, pages 527–533, 2003. [80] D. Lowe. ”Distinctive Image Features from Scale-Invariant Keypoints”. International Journal on Computer Vision (IJCV), 60(2):91–110, 2004. [81] David Lowe. Sift keypoint detector,http://www.cs.ubc.ca/ ˜ lowe/keypoints/. [82] David G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91–110, November 2004. [83] Hong Lu, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury, and Andrew T. Campbell. Soundsense: scalable sound sensing for people-centric applications on mobile phones. In Mobisys ’09: Proceed- ings of the 7th international conference on Mobile systems, applications, and services, 2009. [84] P. Mahajan, S. Setty, S. Lee, A. Clement, L. Alvisi, M. Dahlin, and M. Walfish. Depot: Cloud Storage with Minimal Trust. In OSDI 2010, October 2010. [85] A. Massoudi, F. Lefebvre, C. De Vleeschouwer, B. Macq, and J.-J. Quisquater. Overview on selec- tive encryption of image and video: challenges and perspectives. EURASIP J. Inf. Secur., 2008:5:1– 5:18, January 2008. 157 [86] E. Miluzzo, C.T. Cornelius, A. Ramaswamy, T. Choudhury, Z. Liu, and A.T. Campbell. Darwin phones: the evolution of sensing and inference on mobile phones. In Proc. ACM MOBISYS, 2010. [87] Emiliano Miluzzo, Tianyu Wang, and Andrew T. Campbell. ”EyePhone: Activating Mobile Phones With Your Eyes”. In Workshop on Networking, Systems, Applications on Mobile Handhelds (Mobi- Held). ACM, 2010. [88] mitmproxy. http://mitmproxy.org. [89] Prashanth Mohan, Venkata N. Padmanabhan, and Ramachandran Ramjee. ”Nericell: rich monitor- ing of road and traffic conditions using mobile smartphones”. In SenSys’08, November 2008. [90] Monsoon Solutions Inc. Power Monitor. http://www.msoon.com/LabEquipment/ PowerMonitor/. [91] M. Mun, S. Reddy, K. Shilton, N. Yau, J. Burke, D. Estrin, M. Hansen, E. Howard, R. West, and P. Boda. Peir, the personal environmental impact report, as a platform for participatory sensing systems research. In Proc. ACM MOBISYS, 2009. [92] Dushyanth Narayanan and M. Satyanarayanan. ”Predictive Resource Management for Wearable Computing”. In International Conference on Mobile Systems, Applications, and Services (Mo- biSys), 2003. [93] M. J. Neely. ”Energy Optimal Control for Time Varying Wireless Networks”. IEEE Transactions on Information Theory, 52(7):2915–2934, 2006. [94] Ryan Newton, Sivan Toledo, Lewis Girod, Hari Balakrishnan, and Samuel Madden. ”Wishbone: Profile-based Partitioning for Sensornet Applications”. In Symposium on Networked Systems De- sign and Implementation (NSDI), 2009. [95] Anthony J. Nicholson and Brian D. Noble. ”BreadCrumbs: Forecasting Mobile Connectivity”. In MobiCom’08, 2008. [96] Nokia Corp. Nokia Energy Profiler. http://www.forum.nokia.com/Tools_Docs_ and_Code/Tools/Plug-ins/Enablers/Nokia_Energy_Profiler/. [97] A. Ortega, S.M. Darden, A. Vellaikal, Z. Miao, and J. Caldarola. Method and system for delivering media data, January 29 2002. US Patent App. 20,060/031,558. [98] Shumao Ou, Kun Yang, and Jie Zhang. ”An effective offloading middleware for pervasive services on mobile devices”. Pervasive and Mobile Computing, 3(4):362–385, 2007. [99] Jeffrey Pang, Ben Greenstein, Michael Kaminsky, Damon McCoy, and Srinivasan Seshan. ”Wifi- Reports: Improving Wireless Network Selection with Collaboration”. In Mobisys ’09, 2009. [100] Trevor Pering, Yuvraj Agarwal, Rajesh Gupta, and Roy Want. ”CoolSpots: reducing the power consumption of wireless mobile devices with multiple radio interfaces”. In MobiSys’06, 2006. [101] P.J. Phillips, H. Moon, S.A. Rizvi, and P.J. Rauss. The feret evaluation methodology for face-recognition algorithms. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(10):1090–1104, 2000. [102] P.J. Phillips, H. Wechsler, J. Huang, and P.J. Rauss. The feret database and evaluation procedure for face-recognition algorithms. Image and vision computing, 16(5):295–306, 1998. 158 [103] Padmanabhan S. Pillai, Lily B. Mummert, Steven W. Schlosser, Rahul Sukthankar, and Casey J. Helfrich. ”SLIPstream: Scalable Low-latency Interactive Perception on Streaming Data”. In ACM International Workshop on Network and Operating System Support for Digital Audio and Video, 2009. [104] Chung ping Wu and C.-C. Jay Kuo. Fast Encryption Methods for Audiovisual Data Confidentiality. In in Multimedia Systems and Applications III, ser. Proc. SPIE, pages 284–295, 2000. [105] Royan Pingdom. New facts and figures about image format use on websites. http://royal. pingdom.com/. [106] Joseph Polastre, Jason Hill, and David Culler. ”Versatile Low Power Media Access for Wireless Sensor Networks”. In SenSys’04, 2004. [107] Lintian Qiao, K. Nahrstedt, and Ming-Chit Tam. Is MPEG encryption by using random list instead of zigzag order secure? In Consumer Electronics, 1997. ISCE ’97., Proceedings of 1997 IEEE International Symposium on, pages 226 –229, Dec 1997. [108] Moo-Ryong Ra, Jeongyeup Paek, Abhishek B. Sharma, Ramesh Govindan, Martin H. Krieger, and Michael J. Neely. Energy-delay tradeoffs in smartphone applications. In Proc. ACM MOBISYS, 2010. [109] Ahmad Rahmati and Lin Zhong. ”Context-for-Wireless: Context-Sensitive Energy-Efficient Wire- less Data Transfer”. In MobiSys’07, 2007. [110] Lenin S. Ravindranath, Arvind Thiagarajan, Hari Balakrishnan, and Samuel Madden. Code In The Air: Simplifying Sensing and Coordination Tasks on Smartphones. In The 13th International Workshop on Mobile Computing Systems and Applications (HotMobile’12), 2012. [111] I.E. Richardson. The H. 264 advanced video compression standard. Wiley, 2011. [112] Alvaro Collet Romea, Dmitry Berenson, Siddhartha Srinivasa, and David Ferguson. ”Object Recog- nition and Full Pose Registration from a Single Image for Robotic Manipulation”. In IEEE Inter- national Conference on Robotics and Automation, 2009. [113] Ricarose V . Roque. OpenBlocks: an extendable framework for graphical block programming sys- tems. Master’s thesis, Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science., 2007. [114] M. Satyanarayanan. ”Pervasive Computing: Vision and Challenges”. IEEE Personal Communica- tions, 8(4):10–17, 2001. [115] Mahadev Satyanarayanan, Paramvir Bahl, Ram´ on Caceres, and Nigel Davies. ”The Case for VM- Based Cloudlets in Mobile Computing”. IEEE Pervasive Computing, 8(4):14–23, 2009. [116] Scott Schneider, Henrique Andrade, Bugra Gedik, Alain Biem, and Kun-Lung Wu. ”Elastic Scaling of Data Parallel Operators in Stream Processing”. In IEEE International Parallel and Distributed Processing Symposium, 2009. [117] A. Seth, M. Zaharia, S. Keshav, and S. Bhattacharyya. A policy oriented architecture for oppor- tunistic communication on multiple wireless networks, 2006. [118] Changgui Shi and Bharat K. Bhargava. A Fast MPEG Video Encryption Algorithm. In ACM Multimedia, pages 81–88, 1998. [119] Skyhook Wireless. http://www.skyhookwireless.com/. 159 [120] Ya-Yunn Su and Jason Flinn. Slingshot: deploying stateful services in wireless hotspots. In Mo- biSys ’05: Proceedings of the 3rd international conference on Mobile systems, applications, and services, pages 79–92, 2005. [121] Lei Tang. Methods for encrypting and decrypting MPEG video data efficiently. In Proceedings of the fourth ACM international conference on Multimedia, MULTIMEDIA ’96, pages 219–229, New York, NY , USA, 1996. ACM. [122] Trusted Computing Group. TPM Main Specification, http://www. trustedcomputinggroup.org/resources/tpm_main_specification. [123] Matthew Turk and Alex Pentland. Eigenfaces for recognition. J. Cognitive Neuroscience, 3(1):71– 86, January 1991. [124] USC/ENL. VCAPS: Urban Tomography Project. http://tomography.usc.edu/. [125] William H. Whyte. The Social Life of Small Urban Spaces. Project for Urban Spaces, 1980. [126] David Wolber. App inventor and real-world motivation. In Proc. ACM SIGCSE, 2011. [127] Yu Xiao, Pieter Simoens, Padmanabhan Pillai, Kiryong Ha, and Mahadev Satyanarayanan. Lower- ing the barriers to large-scale mobile crowdsensing. In The 14th International Workshop on Mobile Computing Systems and Applications (HotMobile’13), 2013. [128] T. Yan, V . Kumar, and D. Ganesan. Crowdsearch: Exploiting crowds for accurate real-time image search on mobile phones. In Proc. ACM MOBISYS, 2010. [129] Nezih Yigitbasi, Lily Mummert, Padmanabhan Pillai, and Dick Epema. ”Incremental Placement of Interactive Perception Applications”. In ACM Symposium on High Performance Parallel and Distributed Computing (HPDC), 2011. [130] M.A. Zaharia and S. Keshav. Fast and optimal scheduling over multiple network interfaces. Tech- nical Report CS-2007-36, University of Waterloo, October 2007. [131] Qian Zhu, Branislav Kveton, Lily Mummert, and Padmanabhan Pillai. ”Automatic Tuning of In- teractive Perception Applications”. In Conference on Uncertainty in Artificial Intelligence (UAI), 2010. 160 Appendix A P3 Supplement A.1 The Examined Images on Edge Detection In this chapter, we provide a set of images used for the evaluation. Each raw image is presented with its canny edge detection result. Figure A.1: A boat image from USC-SIPI Figure A.2: A tree from USC-SIPI 161 Figure A.3: A vegitable image from USC-SIPI Figure A.4: A baboon image from USC-SIPI 162 Appendix B Medusa Supplement B.1 MedScript Program Codes B.1.1 Auditioning App 1 <xml> <app> 3 <name>A u d i t i o n i n g</name> <r r i d>[User ’ s R e q u e s t o r ID ]</ r r i d> 5 <r r k e y>[User ’ s R e q u e s t o r Key]</ r r k e y> <rwid>[User ’ s Worker ID ]</ rwid> 7 <d e a d l i n e >21:00:00 12/16/2011</ d e a d l i n e> 9 <s t a g e> <name>R e c r u i t</name> <type>HIT</ type> 11 <b i n a r y>r e c r u i t </ b i n a r y> <c o n f i g> 13 <stmt>R e c r u i t i n g f o r A u d i t i o n App.</ stmt> <e x p i r a t i o n >21:00:00 12/16/2011</ e x p i r a t i o n> 15 <reward>.05</ reward> <o u t p u t> W WID</ o u t p u t> 17 </ c o n f i g> </ s t a g e> 19 <s t a g e> <name>SyncENV</name> <type>SPC</ type> 21 <b i n a r y>h e l l o w o r l d</ b i n a r y> <t r i g g e r>none</ t r i g g e r> 23 </ s t a g e> <s t a g e> 25 <name>FeePayment</name> <type>HIT</ type> <r i d>W RID</ r i d> <rkey> W RKEY</ rkey> <wid>R WID</wid> 27 <b i n a r y>r e c r u i t </ b i n a r y> <c o n f i g> 29 <stmt>A u d i t i o n Fee Payment . ( 1 0 0 d o l l a r s )</ stmt> <e x p i r a t i o n >21:00:00 12/16/2011</ e x p i r a t i o n> 31 <reward>100</ reward> </ c o n f i g> 33 </ s t a g e> <s t a g e> 35 <name>MakeVideo</name> <type>SPC</ type> <b i n a r y>mediagen</ b i n a r y> 163 37 <t r i g g e r>useri n i t i a t e d </ t r i g g e r> <c o n f i g> 39 <params>t video</ params> <o u t p u t>VIDEO</ o u t p u t> 41 </ c o n f i g> </ s t a g e> 43 <s t a g e> <name>UploadVideo</name> <type>SPC</ type> 45 <b i n a r y>u p l o a d d a t a</ b i n a r y> <t r i g g e r>useri n i t i a t e d </ t r i g g e r> <review>yesno</ review> 47 <c o n f i g> <i n p u t>VIDEO</ i n p u t> 49 </ c o n f i g> </ s t a g e> 51 <s t a g e> <name>E v a l u a t i o n</name> <type>HIT</ type> 53 <r i d>W RID</ r i d> <rkey> W RKEY</ rkey> <wid>R WID</wid> <b i n a r y>vote</ b i n a r y> 55 <c o n f i g> <stmt>E v a l u a t i o n P r e s s . I f you l i k e t h e video , p r e s s Yes . O t h e r w i s e No .</ stmt> 57 <e x p i r a t i o n >21:00:00 12/16/2011</ e x p i r a t i o n> <reward>.01</ reward> 59 <numusers>1</numusers> <i n p u t>VIDEO</ i n p u t> 61 <o u t p u t>RESULT</ o u t p u t> </ c o n f i g> 63 </ s t a g e> 65 <c o n n e c t o r> <s r c>R e c r u i t</ s r c> 67 <d s t> <s u c c e s s>SyncENV</ s u c c e s s> 69 <f a i l u r e>R e c r u i t</ f a i l u r e> </ d s t> 71 </ c o n n e c t o r> <c o n n e c t o r> 73 <s r c>SyncENV</ s r c> <d s t> 75 <s u c c e s s>FeePayment</ s u c c e s s> <f a i l u r e>R e c r u i t</ f a i l u r e> 77 </ d s t> </ c o n n e c t o r> 79 <c o n n e c t o r> <s r c>FeePayment</ s r c> 81 <d s t> <s u c c e s s>MakeVideo</ s u c c e s s> 83 <f a i l u r e>R e c r u i t</ f a i l u r e> </ d s t> 85 </ c o n n e c t o r> <c o n n e c t o r> 87 <s r c>MakeVideo</ s r c> <d s t> 89 <s u c c e s s>UploadVideo</ s u c c e s s> <f a i l u r e>UploadVideo</ f a i l u r e> 164 91 </ d s t> </ c o n n e c t o r> 93 <c o n n e c t o r> <s r c>UploadVideo</ s r c> 95 <d s t> <s u c c e s s>E v a l u a t i o n</ s u c c e s s> 97 <f a i l u r e>R e c r u i t</ f a i l u r e> </ d s t> 99 </ c o n n e c t o r> </app> 101 </xml> Listing B.1: Auditioning B.1.2 Citizen Journalist App <xml> 2 <app> <name>C i t i z e nJ o u r n a l s t </name> 4 <r r i d>[User ’ s R e q u e s t o r ID ]</ r r i d> <r r k e y>[User ’ s R e q u e s t o r Key]</ r r k e y> 6 <s t a g e> 8 <name>Hiring</name> <type>HIT</ type> <b i n a r y>r e c r u i t </ b i n a r y> 10 <c o n f i g> <stmt>C i t i z e n J o u r n a l i s t Demonstration</ stmt> 12 <e x p i r a t i o n >21:00:00 12/16/2011</ e x p i r a t i o n> <reward>.05</ reward> 14 <o u t p u t> W WID</ o u t p u t> </ c o n f i g> 16 </ s t a g e> <s t a g e> 18 <name>T a k e P i c t u r e</name> <type>SPC</ type> <b i n a r y>mediagen</ b i n a r y> 20 <t r i g g e r>l o c a t i o n =34.252339j118.277907j40 , useri n i t i a t e d </ t r i g g e r> <review>t e x t d e s c</ review> 22 <c o n f i g> <params>t image</ params> 24 <o u t p u t> IMAGE</ o u t p u t> </ c o n f i g> 26 </ s t a g e> <s t a g e> 28 <name>UploadData</name> <type>SPC</ type> <b i n a r y>u p l o a d d a t a</ b i n a r y> 30 <t r i g g e r>none</ t r i g g e r> <c o n f i g> 32 <i n p u t> IMAGE</ i n p u t> </ c o n f i g> 34 </ s t a g e> 36 <c o n n e c t o r> <s r c>Hiring</ s r c> 38 <d s t> 165 <s u c c e s s>T a k e P i c t u r e</ s u c c e s s> 40 <f a i l u r e>Hiring</ f a i l u r e> </ d s t> 42 </ c o n n e c t o r> <c o n n e c t o r> 44 <s r c>T a k e P i c t u r e</ s r c> <d s t> 46 <s u c c e s s>UploadData</ s u c c e s s> <f a i l u r e>Hiring</ f a i l u r e> 48 </ d s t> </ c o n n e c t o r> 50 </app> </xml> Listing B.2: Citizen Journalist B.1.3 Collaborative Learning 1 <xml> <app> 3 <name>C o l l b o r a t i v eLearning</name> <r r i d>[User ’ s R e q u e s t o r ID ]</ r r i d> 5 <r r k e y>[User ’ s R e q u e s t o r Key]</ r r k e y> <d e a d l i n e >21:00:00 12/16/2011</ d e a d l i n e> 7 <s t a g e> 9 <name>R e c r u i t</name> <type>HIT</ type> <b i n a r y>r e c r u i t </ b i n a r y> 11 <c o n f i g> <stmt>C o l l a b o r a t i v eL e a r n i n g App . Demonstration</ stmt> 13 <e x p i r a t i o n >21:00:00 12/16/2011</ e x p i r a t i o n> <reward>.05</ reward> 15 <o u t p u t> W WID</ o u t p u t> </ c o n f i g> 17 </ s t a g e> <s t a g e> 19 <name>GetRawData</name> <type>SPC</ type> <b i n a r y>v c o l l e c t </ b i n a r y> 21 <t r i g g e r>useri n i t i a t e d </ t r i g g e r> <review>l a b e l i n g </ review> 23 <r e v i e w o p t>s i t t i n g j d r i v i n g j walking j running</ r e v i e w o p t> <c o n f i g> 25 <params>t acc r f a l s e i 100 f 50 c 3 l c l s n o t i f i c a t i o n </ params> <o u t p u t> RAW</ o u t p u t> 27 </ c o n f i g> </ s t a g e> 29 <s t a g e> <name>G e t F e a t u r e s</name> <type>SPC</ type> 31 <b i n a r y>v f e a t u r e</ b i n a r y> <t r i g g e r>none</ t r i g g e r> 33 <review>none</ review> <c o n f i g> 35 <params>t acc v d e f a u l t </ params> 166 <i n p u t> RAW</ i n p u t> 37 <o u t p u t> FEATURE</ o u t p u t> </ c o n f i g> 39 </ s t a g e> <s t a g e> 41 <name>U p l o a d F e a t u r e s</name> <type>SPC</ type> <b i n a r y>u p l o a d d a t a</ b i n a r y> 43 <t r i g g e r>none</ t r i g g e r> <c o n f i g> 45 <i n p u t> FEATURE</ i n p u t> </ c o n f i g> 47 </ s t a g e> 49 <c o n n e c t o r> <s r c>R e c r u i t</ s r c> 51 <d s t> <s u c c e s s>GetRawData</ s u c c e s s> 53 <f a i l u r e>R e c r u i t</ f a i l u r e> </ d s t> 55 </ c o n n e c t o r> <c o n n e c t o r> 57 <s r c>GetRawData</ s r c> <d s t> 59 <s u c c e s s>G e t F e a t u r e s</ s u c c e s s> <f a i l u r e>R e c r u i t</ f a i l u r e> 61 </ d s t> </ c o n n e c t o r> 63 <c o n n e c t o r> <s r c>G e t F e a t u r e s</ s r c> 65 <d s t> <s u c c e s s>U p l o a d F e a t u r e s</ s u c c e s s> 67 <f a i l u r e>R e c r u i t</ f a i l u r e> </ d s t> 69 </ c o n n e c t o r> </app> 71 </xml> Listing B.3: Collaborative Learning B.1.4 Forensic Analysis <xml> 2 <app> <name>Face D e t e c t i o n f o r S u r v e i l l a n c e </name> 4 <r r i d>[User ’ s R e q u e s t o r ID ]</ r r i d> <r r k e y>[User ’ s R e q u e s t o r Key]</ r r k e y> 6 <d e a d l i n e >21:00:00 12/16/2011</ d e a d l i n e> 8 <s t a g e> <name>R e c r u i t</name> <type>HIT</ type> 10 <b i n a r y>r e c r u i t </ b i n a r y> <c o n f i g> 12 <stmt>Face D e t e c t i o n f o r S u r v e i l l a n c e .</ stmt> <e x p i r a t i o n >21:00:00 12/16/2011</ e x p i r a t i o n> 167 14 <reward>.05</ reward> <o u t p u t> W WID</ o u t p u t> 16 </ c o n f i g> </ s t a g e> 18 <s t a g e> <name>GetImages</name> <type>SPC</ type> 20 <b i n a r y>p r o b e d a t a</ b i n a r y> <t r i g g e r>none</ t r i g g e r> 22 <c o n f i g> <params>t y p e image from 20111208 t o 20111216 T180000 l i m i t 10</ params> 24 <o u t p u t> IMAGES</ o u t p u t> </ c o n f i g> 26 </ s t a g e> <s t a g e> 28 <name>GetFaces</name> <type>SPC</ type> <b i n a r y>f a c e d e t e c t </ b i n a r y> 30 <t r i g g e r>none</ t r i g g e r> <c o n f i g> 32 <i n p u t> IMAGES</ i n p u t> <o u t p u t>FACES</ o u t p u t> 34 </ c o n f i g> </ s t a g e> 36 <s t a g e> <name>UploadFaces</name> <type>SPC</ type> 38 <b i n a r y>u p l o a d d a t a</ b i n a r y> <t r i g g e r>none</ t r i g g e r> 40 <c o n f i g> <i n p u t>FACES</ i n p u t> 42 </ c o n f i g> </ s t a g e> 44 <s t a g e> <name>Curate</name> <type>HIT</ type> 46 <b i n a r y>vote</ b i n a r y> <c o n f i g> 48 <stmt>J u d g i n g based on Faces</ stmt> <e x p i r a t i o n >21:00:00 12/16/2011</ e x p i r a t i o n> 50 <reward>.01</ reward> <numusers>1</numusers> 52 <i n p u t>FACES</ i n p u t> <o u t p u t>$BITMASK</ o u t p u t> 54 </ c o n f i g> </ s t a g e> 56 <s t a g e> <name>UploadData</name> <type>SPC</ type> 58 <b i n a r y>u p l o a d d a t a</ b i n a r y> <t r i g g e r>none</ t r i g g e r> 60 <c o n f i g> <i n p u t>$BITMASK , IMAGES</ i n p u t> 62 </ c o n f i g> </ s t a g e> 64 <c o n n e c t o r> 66 <s r c>R e c r u i t</ s r c> <d s t> 168 68 <s u c c e s s>GetImages</ s u c c e s s> <f a i l u r e>R e c r u i t</ f a i l u r e> 70 </ d s t> </ c o n n e c t o r> 72 <c o n n e c t o r> <s r c>GetImages</ s r c> 74 <d s t> <s u c c e s s>GetFaces</ s u c c e s s> 76 <f a i l u r e>R e c r u i t</ f a i l u r e> </ d s t> 78 </ c o n n e c t o r> <c o n n e c t o r> 80 <s r c>GetFaces</ s r c> <d s t> 82 <s u c c e s s>UploadFaces</ s u c c e s s> <f a i l u r e>R e c r u i t</ f a i l u r e> 84 </ d s t> </ c o n n e c t o r> 86 <c o n n e c t o r> <s r c>UploadFaces</ s r c> 88 <d s t> <s u c c e s s>Curate</ s u c c e s s> 90 <f a i l u r e>R e c r u i t</ f a i l u r e> </ d s t> 92 </ c o n n e c t o r> <c o n n e c t o r> 94 <s r c>Curate</ s r c> <d s t> 96 <s u c c e s s>UploadData</ s u c c e s s> <f a i l u r e>R e c r u i t</ f a i l u r e> 98 </ d s t> </ c o n n e c t o r> 100 </app> </xml> Listing B.4: Forensic Analysis B.1.5 Spot Reporter <xml> 2 <app> <name>SPOTR e p o r t e r</name> 4 <r r i d>[User ’ s R e q u e s t o r ID ]</ r r i d> <r r k e y>[User ’ s R e q u e s t o r Key]</ r r k e y> 6 <d e a d l i n e >21:00:00 12/16/2011</ d e a d l i n e> 8 <s t a g e> <name>R e c r u i t</name> <type>HIT</ type> 10 <b i n a r y>r e c r u i t </ b i n a r y> <c o n f i g> 12 <stmt>Spot R e p o r t e r App . Demonstration</ stmt> <e x p i r a t i o n >21:00:00 12/16/2011</ e x p i r a t i o n> 14 <reward>.05</ reward> <o u t p u t> W WID</ o u t p u t> 169 16 </ c o n f i g> </ s t a g e> 18 <s t a g e> <name>MakeReport</name> <type>SPC</ type> 20 <b i n a r y>mediagen</ b i n a r y> <t r i g g e r>useri n i t i a t e d </ t r i g g e r> <review>t e x t d e s c</ review> 22 <c o n f i g> <params>t v i d e o s n o t i f i c a t i o n </ params> 24 <o u t p u t>VIDEO</ o u t p u t> </ c o n f i g> 26 </ s t a g e> <s t a g e> 28 <name>UploadData</name> <type>SPC</ type> <b i n a r y>u p l o a d d a t a</ b i n a r y> 30 <t r i g g e r>none</ t r i g g e r> <c o n f i g> 32 <i n p u t>VIDEO</ i n p u t> </ c o n f i g> 34 </ s t a g e> 36 <c o n n e c t o r> <s r c>R e c r u i t</ s r c> 38 <d s t> <s u c c e s s>MakeReport</ s u c c e s s> 40 <f a i l u r e>R e c r u i t</ f a i l u r e> </ d s t> 42 </ c o n n e c t o r> <c o n n e c t o r> 44 <s r c>MakeReport</ s r c> <d s t> 46 <s u c c e s s>UploadData</ s u c c e s s> <f a i l u r e>R e c r u i t</ f a i l u r e> 48 </ d s t> </ c o n n e c t o r> 50 </app> </xml> Listing B.5: Spot Reporter B.1.6 WiFi Scanner 1 <xml> <app> 3 <name> ROGUEFINDER</name> <r r i d>[User ’ s R e q u e s t o r ID ]</ r r i d> 5 <r r k e y>[User ’ s R e q u e s t o r Key]</ r r k e y> <d e a d l i n e >21:00:00 12/16/2011</ d e a d l i n e> 7 <s t a g e> 9 <name>Hiring</name> <type>HIT</ type> <b i n a r y>r e c r u i t </ b i n a r y> 11 <c o n f i g> <stmt>WiFi Scanning App . Demonstration</ stmt> 13 <e x p i r a t i o n >21:00:00 12/16/2011</ e x p i r a t i o n> 170 <reward>.05</ reward> 15 <o u t p u t> W WID</ o u t p u t> </ c o n f i g> 17 </ s t a g e> <s t a g e> 19 <name>ScanWiFi</name> <type>SPC</ type> <b i n a r y>n e t s t a t s </ b i n a r y> 21 <t r i g g e r>none</ t r i g g e r> <review>none</ review> <c o n f i g> 23 <params>i w i f i f b s s i d , l e v e l p 10 c 3</ params> <o u t p u t>WIFISCAN</ o u t p u t> 25 </ c o n f i g> </ s t a g e> 27 <s t a g e> <name>UploadData</name> <type>SPC</ type> 29 <b i n a r y>u p l o a d d a t a</ b i n a r y> <t r i g g e r>none</ t r i g g e r> <review>yesno</ review> 31 <c o n f i g> <i n p u t>WIFISCAN</ i n p u t> 33 </ c o n f i g> </ s t a g e> 35 <c o n n e c t o r> 37 <s r c>Hiring</ s r c> <d s t> 39 <s u c c e s s>ScanWiFi</ s u c c e s s> <f a i l u r e>Hiring</ f a i l u r e> 41 </ d s t> </ c o n n e c t o r> 43 <c o n n e c t o r> <s r c>ScanWiFi</ s r c> 45 <d s t> <s u c c e s s>UploadData</ s u c c e s s> 47 <f a i l u r e>Hiring</ f a i l u r e> </ d s t> 49 </ c o n n e c t o r> </app> 51 </xml> Listing B.6: WiFi Scanner B.1.7 Bluetooth Scanner 1 <xml> <app> 3 <name>OBJECTFINDER</name> <r r i d>[User ’ s R e q u e s t o r ID ]</ r r i d> 5 <r r k e y>[User ’ s R e q u e s t o r Key]</ r r k e y> <d e a d l i n e >21:00:00 12/16/2011</ d e a d l i n e> 7 <s t a g e> 9 <name>Hiring</name> <type>HIT</ type> <b i n a r y>r e c r u i t </ b i n a r y> 11 <c o n f i g> 171 <stmt>B l u e t o o t h Scanning App . Demonstration</ stmt> 13 <e x p i r a t i o n >21:00:00 12/16/2011</ e x p i r a t i o n> <reward>.05</ reward> 15 <o u t p u t> W WID</ o u t p u t> </ c o n f i g> 17 </ s t a g e> <s t a g e> 19 <name>S c a n B l u e t o o t h</name> <type>SPC</ type> <b i n a r y>n e t s t a t s </ b i n a r y> 21 <t r i g g e r>none</ t r i g g e r> <review>none</ review> <c o n f i g> 23 <params>i b l u e t o o t h p 30 c 2</ params> <o u t p u t> BLUETOOTHSCAN</ o u t p u t> 25 </ c o n f i g> </ s t a g e> 27 <s t a g e> <name>UploadData</name> <type>SPC</ type> 29 <b i n a r y>u p l o a d d a t a</ b i n a r y> <t r i g g e r>none</ t r i g g e r> 31 <c o n f i g> <i n p u t> BLUETOOTHSCAN</ i n p u t> 33 </ c o n f i g> </ s t a g e> 35 <c o n n e c t o r> 37 <s r c>Hiring</ s r c> <d s t> 39 <s u c c e s s>S c a n B l u e t o o t h</ s u c c e s s> <f a i l u r e>Hiring</ f a i l u r e> 41 </ d s t> </ c o n n e c t o r> 43 <c o n n e c t o r> <s r c>S c a n B l u e t o o t h</ s r c> 45 <d s t> <s u c c e s s>UploadData</ s u c c e s s> 47 <f a i l u r e>Hiring</ f a i l u r e> </ d s t> 49 </ c o n n e c t o r> </app> 51 </xml> Listing B.7: Bluetooth Scanner B.1.8 Road-Bump Monitoring 1 <xml> <app> 3 <name>RoadBump</name> <r r i d>[User ’ s R e q u e s t o r ID ]</ r r i d> 5 <r r k e y>[User ’ s R e q u e s t o r Key]</ r r k e y> 7 <s t a g e> <name>R e c r u i t</name> <type>HIT</ type> 9 <b i n a r y>r e c r u i t </ b i n a r y> 172 <c o n f i g> 11 <stmt>RoadBump M o n i t o r i n g App . Demonstration</ stmt> <e x p i r a t i o n >21:00:00 12/16/2011</ e x p i r a t i o n> 13 <reward>.05</ reward> <o u t p u t> W WID</ o u t p u t> 15 </ c o n f i g> </ s t a g e> 17 <s t a g e> <name>GetRawAccData</name> <type>SPC</ type> 19 <b i n a r y>v c o l l e c t </ b i n a r y> <t r i g g e r>useri n i t i a t e d </ t r i g g e r> 21 <review>none</ review> <c o n f i g> 23 <params>t acc r t r u e i 10 f 50 c 10 l rb s n o t i f i c a t i o n </ params > <o u t p u t> ACC</ o u t p u t> 25 </ c o n f i g> </ s t a g e> 27 <s t a g e> <name>GetRawGpsData</name> <type>SPC</ type> 29 <b i n a r y>g p s r a w c o l l e c t</ b i n a r y> <t r i g g e r>useri n i t i a t e d </ t r i g g e r> 31 <review>none</ review> <c o n f i g> 33 <params>t gps i 10 c 10 l rb s n o t i f i c a t i o n </ params> <o u t p u t>GPS</ o u t p u t> 35 </ c o n f i g> </ s t a g e> 37 <s t a g e> <name>RawDataCombine</name> <type>SPC</ type> 39 <b i n a r y>combiner</ b i n a r y> <t r i g g e r>none</ t r i g g e r> 41 <review>none</ review> <c o n f i g> 43 <params>t acc , gps l rb</ params> <i n p u t> ACC, GPS</ i n p u t> 45 <o u t p u t>ACC GPS</ o u t p u t> </ c o n f i g> 47 </ s t a g e> <s t a g e> 49 <name>G e t F e a t u r e s</name> <type>SPC</ type> <b i n a r y>r o a d f e a t u r e </ b i n a r y> 51 <t r i g g e r>none</ t r i g g e r> <review>yesno</ review> 53 <c o n f i g> <params>t acc , gps f 50 v d e f a u l t </ params> 55 <i n p u t>ACC GPS</ i n p u t> <o u t p u t> FEATURE</ o u t p u t> 57 </ c o n f i g> </ s t a g e> 59 <s t a g e> <name>UploadData</name> <type>SPC</ type> 61 <b i n a r y>u p l o a d d a t a</ b i n a r y> <t r i g g e r>none</ t r i g g e r> 63 <c o n f i g> 173 <i n p u t> FEATURE</ i n p u t> 65 </ c o n f i g> </ s t a g e> 67 <c o n n e c t o r> 69 <s r c>R e c r u i t</ s r c> <d s t> 71 <s u c c e s s>GetRawAccData</ s u c c e s s> <f a i l u r e>R e c r u i t</ f a i l u r e> 73 </ d s t> <d s t> 75 <s u c c e s s>GetRawGpsData</ s u c c e s s> <f a i l u r e>R e c r u i t</ f a i l u r e> 77 </ d s t> </ c o n n e c t o r> 79 <c o n n e c t o r> <s r c>GetRawAccData</ s r c> 81 <d s t> <s u c c e s s>RawDataCombine</ s u c c e s s> 83 <f a i l u r e>R e c r u i t</ f a i l u r e> </ d s t> 85 </ c o n n e c t o r> <c o n n e c t o r> 87 <s r c>GetRawGpsData</ s r c> <d s t> 89 <s u c c e s s>RawDataCombine</ s u c c e s s> <f a i l u r e>R e c r u i t</ f a i l u r e> 91 </ d s t> </ c o n n e c t o r> 93 <c o n n e c t o r> <s r c>RawDataCombine</ s r c> 95 <d s t> <s u c c e s s>G e t F e a t u r e s</ s u c c e s s> 97 <f a i l u r e>R e c r u i t</ f a i l u r e> </ d s t> 99 </ c o n n e c t o r> <c o n n e c t o r> 101 <s r c>G e t F e a t u r e s</ s r c> <d s t> 103 <s u c c e s s>UploadData</ s u c c e s s> <f a i l u r e>R e c r u i t</ f a i l u r e> 105 </ d s t> </ c o n n e c t o r> 107 </app> </xml> Listing B.8: Road-Bump Monitoring B.1.9 Party Thermometer 1 <xml> <app> 3 <name>P a r t yThermometer</name> <r r i d>[User ’ s R e q u e s t o r ID ]</ r r i d> 174 5 <r r k e y>[User ’ s R e q u e s t o r Key]</ r r k e y> <d e a d l i n e >21:00:00 12/16/2011</ d e a d l i n e> 7 <s t a g e> 9 <name>R e c r u i t</name> <type>HIT</ type> <b i n a r y>r e c r u i t </ b i n a r y> 11 <c o n f i g> <stmt>P a r t y Thermometer App . Demonstration</ stmt> 13 <e x p i r a t i o n >21:00:00 12/16/2011</ e x p i r a t i o n> <reward>.05</ reward> 15 <o u t p u t> W WID</ o u t p u t> </ c o n f i g> 17 </ s t a g e> 19 <s t a g e> <name>GetRawData</name> <type>SPC</ type> 21 <b i n a r y>v c o l l e c t </ b i n a r y> <t r i g g e r>l o c a t i o n =34.0220j118.2880j40</ t r i g g e r> 23 <review>none</ review> <c o n f i g> 25 <params>t sound f 500 l p t s n o t i f i c a t i o n </ params> <o u t p u t> RAW</ o u t p u t> 27 </ c o n f i g> </ s t a g e> 29 <s t a g e> <name>G e t F e a t u r e s</name> <type>SPC</ type> 31 <b i n a r y>v f e a t u r e</ b i n a r y> <t r i g g e r>none</ t r i g g e r> 33 <review>none</ review> <c o n f i g> 35 <params>t sound v d e f a u l t </ params> <i n p u t> RAW</ i n p u t> 37 <o u t p u t> FEATURE</ o u t p u t> </ c o n f i g> 39 </ s t a g e> <s t a g e> 41 <name>UploadData</name> <type>SPC</ type> <b i n a r y>u p l o a d d a t a</ b i n a r y> 43 <t r i g g e r>none</ t r i g g e r> <c o n f i g> 45 <i n p u t> FEATURE</ i n p u t> </ c o n f i g> 47 </ s t a g e> 49 <c o n n e c t o r> <s r c>R e c r u i t</ s r c> 51 <d s t> <s u c c e s s>GetRawData</ s u c c e s s> 53 <f a i l u r e>R e c r u i t</ f a i l u r e> </ d s t> 55 </ c o n n e c t o r> <c o n n e c t o r> 57 <s r c>GetRawData</ s r c> <d s t> 59 <s u c c e s s>G e t F e a t u r e s</ s u c c e s s> 175 <f a i l u r e>R e c r u i t</ f a i l u r e> 61 </ d s t> </ c o n n e c t o r> 63 <c o n n e c t o r> <s r c>G e t F e a t u r e s</ s r c> 65 <d s t> <s u c c e s s>UploadData</ s u c c e s s> 67 <f a i l u r e>R e c r u i t</ f a i l u r e> </ d s t> 69 </ c o n n e c t o r> </app> 71 </xml> Listing B.9: Party Thermometer 176 Appendix C SALSA Supplement C.1 Derivation of SALSA Control Decision In this section, we describe the derivation of SALSA’s control decision (5.5). Following standard practice, we define the Lyapunov function as: L(U[t]) M = 1 2 (U[t]) 2 (C.1) and the one-step Lyapunov driftD(U[t]) as: D(U[t]) M = EfL(U[t+ 1]) L(U[t])j U[t]g (C.2) The Lyapunov drift for the queue backlog U[t] is specified by the following lemma. Lemma 1 Suppose the data arrivals A[t] and the wireless link quality ~ S[t] are i.i.d. over timeslots. For the queue dynamics in (5.2) and the Lyapunov function in (C.1), the one-step Lyapunov drift satisfies the following constraint for all t and U[t]: D(U[t]) B[t]U[t]Efm[t]j U[t]g+U[t]l (C.3) with B[t] equal to: B[t] M = 1 2 E A 2 [t] +E m 2 [t]j U[t] (C.4) Proof: From (5.2) we have: 1 2 U[t+ 1] 2 = 1 2 (U[t]m[t]+ A[t]) 2 1 2 U[t] 2 +m[t] 2 + A[t] 2 U[t](m[t] A[t]) and we take conditional expectations given U[t]: D(U[t]) 1 2 E m[t] 2 + A[t] 2 j U[t] U[t]Efm[t] A[t]j U[t]g Now, using (C.4) andEfA[t]j U[t]g=EfA[t]g=l, we obtain (C.3). 177 Power Constraint. To include the minimum power consumption objective into the Lyapunov drift, follow- ing the Lyapunov optimization framework [55, 93] , we add a weighted cost (power consumption during slot t) to (C.3) to get: D(U[t])+VEfP[t]j U[t]g B[t]U[t]Efm[t]j U[t]g+U[t]l+VEfP[t]j U[t]g (C.5) = B[t]U[t]EfEfm[t]j U[t];l;S l [t];P[t]gg +U[t]l+VEfP[t]j U[t]g (C.6) = B[t]Ef(U[t]Efm[t]j l;S l [t];P[t]gV P[t])j U[t]g +U[t]l (C.7) where (C.6) follows from (C.5) and (5.1) using iterated expectations, and (C.7) is derived by switching the order of expectations in (C.6). We assume that the data arrival process A[t], and the transmission process m[t] have finite variance; implying that there exist constants,A 2 and m 2 , such that E A 2 [t] <A 2 and E m 2 [t]j U[t] < m 2 . Hence, from (C.4), we have B[t]< B 8t; B=A 2 +m 2 (C.8) Minimizing the RHS of (C.7) will guarantee queue stability with minimal power consumption, as per the Lyapunov framework. However, since we have no control over the application data arrival process, and hence cannot do much about the term U[t]l. SALSA does not directly minimize the entire expression on the RHS of (C.7). Rather, in order to minimize the RHS of (C.7), it maximizes the negative term on the RHS of (C.7); hence explanation for the control decision in (5.5). C.2 SALSA: Proof of Theorem 1 Proof: Our proof is similar to the proof of an analogous result proved by Neely in the context of energy efficient transmission scheduling in wireless networks [93]. It builds upon the properties of stationary randomized policies for making control decisions. In our context, such a policy would make (randomized) link selection decisions based only on the current arrivals A[t] and link quality ~ S[t] that are i.i.d. over slots and independent of the current queue backlog U[t]. In practice, a stationary randomized link selection policy cannot be defined without prior knowledge of the probability distributions p A andp s for the arrival process A[t] and the wireless link quality S[t], respectively. Since we assume that the arrival process is strictly within the network capacity region, there exists at least one stationary randomized control policy that can stabilize the queue [93], which has following features: EfP[t]g = P (C.9) Efm[t]j U[t]g l )Efm[t]j U[t]g = l+e; e 0 (C.10) where we define P as minimum achievable power expenditure using any control policy that achieves queue stability. Then, by applying (C.9)-(C.10) to (C.5), we obtain: D(U[t])+VEfP[t]j U[t]g B[t]U[t](l+e)+U[t]l+V P (C.11) 178 (C.11) holds for all timeslot t. Taking an expection for (C.11) with respect to the distribution of U[t] and using iterative expectation law results in: EfL(U[t+ 1]) L(U[t])g+VEfP[t]g BeEfU[t]g+V P Then, summing over all timeslots t2f0;1;:::;T 1g and dividing by T yields: EfL(U[T]) L(U[0])g T + V T T1 å t=0 EfP[t]g B e T T1 å t=0 EfU[t]g+V P (C.12) Since Lyapunov function is non-negative by definition and so is P[t], a simple manipulation of (C.12) yields: 1 T T1 å t=0 EfU[t]g B+V P +EfL(U[0])g=T e Taking limits as T!¥ results in the time average backlog bound (5.6). By similarly manipulating (C.12) we obtain: 1 T T1 å t=0 EfP[t]g P + B V + EfL(U[0])g V T (C.13) Again taking limits as T!¥ yields equation (5.7). 179
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Elements of next-generation wireless video systems: millimeter-wave and device-to-device algorithms
PDF
Improving efficiency, privacy and robustness for crowd‐sensing applications
PDF
Crowd-sourced collaborative sensing in highly mobile environments
PDF
QoS-aware algorithm design for distributed systems
PDF
Adaptive resource management in distributed systems
PDF
RGBD camera based wearable indoor navigation system for the visually impaired
PDF
Toward understanding mobile apps at scale
PDF
Learning the geometric structure of high dimensional data using the Tensor Voting Graph
PDF
Edge-cloud collaboration for enhanced artificial intelligence
PDF
Energy-efficient computing: Datacenters, mobile devices, and mobile clouds
PDF
3D inference and registration with application to retinal and facial image analysis
PDF
Efficient pipelines for vision-based context sensing
PDF
Global analysis and modeling on decentralized Internet
PDF
High-performance distributed computing techniques for wireless IoT and connected vehicle systems
PDF
A framework for runtime energy efficient mobile execution
PDF
Satisfying QoS requirements through user-system interaction analysis
PDF
Exploitation of wide area motion imagery
PDF
GeoCrowd: a spatial crowdsourcing system implementation
PDF
Sense and sensibility: statistical techniques for human energy expenditure estimation using kinematic sensors
PDF
Adaptive and resilient stream processing on cloud infrastructure
Asset Metadata
Creator
Ra, Moo-Ryong
(author)
Core Title
Cloud-enabled mobile sensing systems
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
07/29/2013
Defense Date
03/15/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
mobile cloud computing,mobile systems,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Govindan, Ramesh (
committee chair
), Medioni, Gérard G. (
committee member
), Ortega, Antonio K. (
committee member
), Sha, Fei (
committee member
)
Creator Email
mra@usc.edu,ra.mooryong@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-303424
Unique identifier
UC11292831
Identifier
etd-RaMooRyong-1865.pdf (filename),usctheses-c3-303424 (legacy record id)
Legacy Identifier
etd-RaMooRyong-1865.pdf
Dmrecord
303424
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Ra, Moo-Ryong
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
mobile cloud computing
mobile systems