Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Improving efficiency, privacy and robustness for crowd‐sensing applications
(USC Thesis Other)
Improving efficiency, privacy and robustness for crowd‐sensing applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Improving Eciency, Privacy and Robustness for Crowd-Sensing Applications by Bin Liu A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) April 2014 Copyright 2014 Bin Liu Dedication To my parents, my wife, and my lovely son ii Acknowledgements The completion of this dissertation has been a long journey. Much has happened since the time I have been involved in the Ph.D program. Looking back, I would like to express my deepest appreciation to my advisor, Professor Ramesh Govindan, for his continuous support of my Ph.D research, for his patience, motivation, and immense knowledge. He is a real gentlemen, and always polite to everything in his life, even to my faults. I will never forget the picture, that Ramesh and I, we were together to discuss our proposals, challenge each other, and work collaboratively in producing fascinating research. I could not have imagined having a better mentor for my Ph.D study. Besides my advisor, I would like to thank the rest of my dissertation committee: Professor Sandeep Gupta and Professor Leana Golubchik, for their encouragement and insightful comments. I am glad to have collaborated with many prestigious researchers. Their scientific advice and insightful suggestions have indeed helped me broaden my horizons and have exposed me to various exciting knowl- edge. Specifically, I really value the experience of collaborating with Professor Fei Sha on exploring the privacy/accuracy tradeos in machine learning algorithms, with Professor Sandeep Gupta on understand- ing the impact of unreliable memories on protocol performance, and with Professor Brian Uzzi on studying emotional contagion in social networks. My sincere thanks also goes to Dr. Anmol Sheth, Dr. Udi Weinsberg, Dr. Suman Nath, and Dr. Jie Liu, for oering me the summer internship opportunities in their groups and leading me working on diverse exciting projects. I thank my fellow labmates in Networked Systems Laboratory for the sleepless nights we were working together before deadlines, and for all the fun we have had in the last five years. iii Last but not the least, I would like to thank my parents: Honlian Zhou and Xinhua Liu, for giving birth to me at the first place and supporting me selflessly throughout my life. Bin Liu April 2014 iv Table of Contents Dedication ii Acknowledgements iii List Of Tables viii List Of Figures ix Abstract xi Chapter 1: Introduction 1 Chapter 2: Optimizing Information Credibility in Crowd-Sensing Applications 5 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Terminology and Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 The One-Shot Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Problem Formulation and Complexity . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1.1 Problem Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1.2 On the Complexity of MinCost and MaxCred . . . . . . . . . . . . . . 11 2.3.2 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.2.1 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.2.2 Maximum Weighted Matching Algorithms . . . . . . . . . . . . . . . . 14 2.3.3 Using the Structure of the Credibility Function . . . . . . . . . . . . . . . . . . . 15 2.3.3.1 An Ecient Greedy Algorithm for Two Formats . . . . . . . . . . . . . 16 2.3.3.2 An Ecient Approximation Algorithm . . . . . . . . . . . . . . . . . . 17 2.3.4 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 The Renewals Problem: Randomly Arriving Events . . . . . . . . . . . . . . . . . . . . . 20 2.4.1 The General Stochastic Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.2 Corroboration Pull as a Stochastic Optimization Problem . . . . . . . . . . . . . . 24 2.4.3 An Exact Distributed Algorithm for c min = 0 . . . . . . . . . . . . . . . . . . . . 26 2.4.4 MinCost-Stochastic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5.1 One-Shot Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.2 Evaluation of Renewals Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 3: Cloud-Enabled Privacy-Preserving Collaborative Learning for Mobile Sensing 35 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Privacy-Preserving Collaborative Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.1 Pickle Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 v 3.3.2 The Regression Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3.3 Model Generation and Return . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3.4 Privacy Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3.5 Other Properties of Pickle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.4 Evaluation of Pickle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.1 Pickle Privacy: A User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4.2 Pickle Resource Costs: Prototype Evaluation . . . . . . . . . . . . . . . . . . . . 55 3.4.3 Accuracy/Privacy Tradeos and Comparisons: Dataset Evaluation . . . . . . . . . 58 3.4.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.4.3.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4.3.3 Attack models and Privacy . . . . . . . . . . . . . . . . . . . . . . . . 60 3.4.3.4 Classifier Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4.3.5 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.4.3.6 Impact of Design Choices . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4.3.7 Illustrating Other Features of Pickle . . . . . . . . . . . . . . . . . . . . 65 Chapter 4: DECAF: Detecting and Characterizing Ad Fraud in Mobile Apps 67 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2 Background, Motivation, Goals and Challenges . . . . . . . . . . . . . . . . . . . . . . . 71 4.3 DECAF Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.1 The Monkey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3.2 Fraud Checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4 Optimizations for Coverage and Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4.1 Detecting Equivalent States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4.2 Path Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4.2.1 State Equivalence Prediction . . . . . . . . . . . . . . . . . . . . . . . 81 4.4.2.2 State Importance Assessment . . . . . . . . . . . . . . . . . . . . . . . 85 4.4.3 Page Load Completion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.4.4 Fraud Checker Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.6.1 State Equivalence Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.6.2 Assessing State Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.6.3 Overall Speed and Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.6.4 False positive and false negative in DECAF . . . . . . . . . . . . . . . . . . . . . 94 4.7 Characterizing Ad Fraud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Chapter 5: Related Work 101 5.1 Information Credibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 Privacy-preserving SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.2.1 Feature Perturbation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.2.2 Dierential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.2.3 Cryptographic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.2.4 Other Related Work for Collaborative Learning and Privacy . . . . . . . . . . . . 104 5.3 Automated Ad Fraud Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.3.1 App Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.3.2 Ad Fraud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Chapter 6: Conclusions and Future Work 108 References 110 vi Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.1.1 Justification for Location Assumption . . . . . . . . . . . . . . . . . . . . . . . . 118 A.2 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.2.1 Proof of Theorem 2.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.2.2 Much Stronger Result on the Complexity . . . . . . . . . . . . . . . . . . . . . . 118 A.3 Optimal Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A.3.1 Proof of Theorem 2.3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A.4 Renewals Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 A.4.1 Proof of Theorem 2.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 vii List Of Tables 1.1 Explored methods for improving eciency, privacy and robustness in crowd-sensing . . . 2 2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.1 SVM classifier features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2 Occurrences of various fraud types among all fraudulent apps . . . . . . . . . . . . . . . . 96 viii List Of Figures 2.1 Minimal cost of 4 formats with increasing k . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2 Minimal cost in random topologies with increasing k (error bars are very small thus ignored in (b)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3 Maximal credibility of 4 formats with increasing B . . . . . . . . . . . . . . . . . . . . . 32 2.4 Maximal credibility in random topologies with increasing B (error bars are very small thus ignored in (b)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.5 Evaluation of MaxCred-Stochastic with dierent V . . . . . . . . . . . . . . . . . . . . 34 2.6 Evaluation of MinCost-Stochastic with dierent V . . . . . . . . . . . . . . . . . . . . . 34 3.1 Illustrating Pickle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2 User Study Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3 Accuracy of accelerometer-based classifier construction . . . . . . . . . . . . . . . . . . . 55 3.4 Architecture of the prototype system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5 Processing and Communication Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.6 Eect of reconstruction attack on privacy . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.7 Accuracy-Privacy tradeo of SVM with RBF Kernel . . . . . . . . . . . . . . . . . . . . 63 3.8 Comparison of Pickle to several alternatives . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.9 Eect of user diversity on accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.10 Outlier detection against model poisoning . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1 Placement Fraud Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2 The architecture of DECAF includes a Monkey that controls the execution and an extensible set of fraud detection policies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 ix 4.3 CDF of evaluation result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.4 Value coverage as a function of exploration time per app, with various prioritization algo- rithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.5 Distribution of fraudulent apps over various categories . . . . . . . . . . . . . . . . . . . 97 4.6 Distribution of ratings for fraudulent and non-fraudulent phone and tablet apps . . . . . . 98 4.7 CDF of of rating counts for phone apps . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.8 Compliance rate of publishers with multiple apps . . . . . . . . . . . . . . . . . . . . . . 99 4.9 Fraudulent app count per phone app publisher . . . . . . . . . . . . . . . . . . . . . . . . 99 x Abstract Every year, a wide variety of modern smart devices, such as smartphones and tablets, are released by big brands, like Apple, Samsung and HTC. Compared to previous generations, these smart devices are more sophisticated in two ways: (a) they run advanced operating systems which allow developers to create a large collection of complicated apps, and (b) they have more diverse sensors which can be used to perform various context-aware tasks. These two attributes, together, have conceived a new class of applications, crowd-sensing. Crowd-sensing is a capability by which a task requestor can recruit smartphone users to provide sensor data to be used towards a specific goal or as part of a social or technical experiment. For the purpose of supporting crowd-sensing tasks, professional apps are developed to provide specialized platforms, and high quality sensors are used to generate semantically rich data. In this dissertation, we focus on possible ways to improve eciency, privacy and robustness for crowd- sensing applications. First, targeting the general form of crowd-sensing, we design ecient algorithms to answer the follow- ing question: how to optimize the selection of crowd-sensing participants to deliver credible information about a task? Based on a model about credibility of information, we develop solutions for the discrete version and the time-averaged version of this problem. Second, we consider a special crowd-sensing case in which Internet-connected mobile users contribute sensor data as training samples, and collaborate on building a model for classification tasks such as activity or context recognition. Constructing the model can naturally be performed by a service running in the cloud, but users may be more inclined to contribute training samples if the privacy of these data could xi be ensured. For this, we develop algorithms and an associated system design to perform collaborative learning task in a way that preserves user data privacy without significant loss of accuracy. Finally, the technique of dynamic analysis can be employed to test many aspects of crowd-sensing apps, such as performance, security, and correctness properties. As an initial attempt, we show how to use dynamic analysis to detect placement ad fraud in which app developers manipulate visual layouts of ads in ways that result in invisible ad impressions and accidental clicks from real users. We demonstrate that the detection can be performed using optimized automated navigation methods in a large set of 1,150 tablet apps and 50,000 phone apps. xii Chapter 1 Introduction The ubiquity of smartphones and other mobile devices, and the plethora of sensors available on them, have inspired innovative research that will, over time, lead to sophisticated context-aware applications and systems. Some of this research has explored ways in which mobile phone users can contribute sensor data towards enabling self-reflection, environmental awareness, or other social causes. We consider a specific form of acquiring sensor data from multiple smartphones that we call crowd- sensing. Crowd-sensing is a capability by which a requestor (or a commander) can recruit reporters (or workers) to provide sensor data to be used towards a specific goal or as part of a social or technical ex- periment (e.g., tasks such as forensic analysis, documenting public spaces, or collaboratively constructing statistical models). Reporters’ smartphones collect sensor data, and may process the data in arbitrary ways before sending the data to the requestor. The requestor pays the reporters for their participation. As an example, consider the following scenario. A social science researcher is interested in collecting ground truth video clips about lives in several cities around the world. Instead of personally visiting all those cities, she can benefit from the framework by recruiting reporters at each one of the cities, and asking them to take video clips of daily lives in those cities for her. To incentivize these contributions, the researcher may oer micro-payments for submitting video clips. Furthermore, the concept of crowd-sensing can be generalized in many ways. For example, the process of mobile ad delivery can be treated as a special case of crowd-sensing. Currently, although most mobile 1 Operational Eciency Stochastic optimization for long-term eciency Data Privacy Perturbation, together with regression, for privacy Software Robustness Scalable dynamic program analysis for ensuring robustness Table 1.1: Explored methods for improving eciency, privacy and robustness in crowd-sensing apps seem to be available at no charge (around 80% [15]), this is actually not the case: free apps rely on targeted advertising for their revenue. In this process, workers’ smartphones collect and send context information, such as the location, IP address, OS version and phone model to a requester (i.e, an Ad Network) to request appropriate ad delivery. Crowd-sensing is a form of networked wireless sensing, but is dierent from prior research on net- worked sensing for two reasons: each smartphone is owned by an individual, and some sensing actions on the smartphone may require human intervention (e.g., pointing a camera at a specific target, or initiating audio capture at the appropriate moment). Crowd-sensing is dierent from crowd-sourcing in one impor- tant respect: a crowd-sourcing system like the Amazon Mechanical Turk (AMT [2]) permits requestors to task participants with human intelligence tasks like recognition or translation, while a crowd-sensing system enables requestors to task participants with acquiring processed sensor data. Given the unique combination of human factors and sensing requirements in crowd-sensing, our main goal is to design novel crowd-sensing algorithms and platforms that can improve the operational eciency, data privacy and software robustness of crowd-sensing applications. As shown in Table 1.1, we have explored and developed corresponding methods to ensure the three properties. Operational Eciency in Crowd-Sensing: The first property we are interested in is the operational e- ciency. Specifically, one natural characteristic of crowd-sensing is that many participants may be able to respond with the requested data. Therefore, an immediate challenge is how to determine the best set of participants to respond to the requestor. Our prior work has investigated this problem. Given a scenario that a requestor has sucient available workers to do his tasks, we focus on the following problem: how does the requestor optimize the selection of workers to deliver credible corroborating information about a task [81, 82]. We first propose a model, based on common notions of believability, about the credibility 2 of information. We then cast the problem posed above as a discrete optimization problem, prove hardness results, introduce optimal centralized solutions, and design an approximate solution amenable to decentral- ized implementation whose performance is about 20% o, on average, from the optimal while being three orders of magnitude more computationally ecient. More importantly and practically, a time-averaged version of the problem is amenable to a novel stochastic utility optimization formulation, and can be solved optimally, while in some cases yielding decentralized solutions. Finally, our approach is designed for the general cases of crowd-sensing and do not make any assumptions on the types of applications. Data Privacy in Crowd-Sensing: In addition to operational eciency, another crucial concern is the pos- sible privacy leakage of submitted sensor data. Data generated from sensors (such as audio and video clips, or accelerometer readings) may intentionally or unintentionally reveal personal identifications, such as lo- cation, chat content/history, life patterns and so on. Therefore, to better incentivize workers to participate in crowd-sensing, it is essential to ensure the privacy of sensor information. While the privacy in crowd- sending is a well-researched area, we have conducted our privacy research by considering a special case of crowd-sensing in which Internet-connected mobile workers contribute sensor data as training samples, and collaborate on building a model for classification tasks such as activity or context recognition [80]. This specific application type is also called Collaborative Learning. Constructing the model can naturally be performed by a service running in the cloud, but workers may be more inclined to contribute training samples if the privacy of these data could be ensured. Thus, in this paper, we focus on privacy-preserving collaborative learning for the mobile setting, which addresses several competing challenges not previously considered in the literature: supporting complex classification methods like support vector machines, re- specting mobile computing and communication constraints, and enabling user-determined privacy levels. Our approach, Pickle, compensates for classification accuracy using regression, even in the presence of significantly perturbed training samples, is robust to methods that attempt to infer the original data or poi- son the model, and imposes minimal costs. We validate these claims using a user study, many real-world datasets and two dierent implementations of Pickle. 3 Software Robustness in Crowd-Sensing: Software robustness is also important to crowd-sensing. People are very likely to stop using an crowd-sensing app if repeated bugs or crashes happen. While analyzing modern apps, traditional static analysis can fail to capture runtime contexts, such as data dynamically downloaded from the cloud, objects created during runtime, configuration variables, and so on. Therefore, recent research has focused on dynamic analysis that executes apps and examines their runtime properties. One popular way to scale dynamic analysis to a large number of apps is to use a software automation tool called a Monkey that can automatically launch and interact with an app in order to navigate to various execution states of the app. This technique can be applied to testing crowd-sensing apps, as well as to enable exploration of performance, security, and correctness properties of apps. I have applied this technique to detect placement ad frauds in mobile apps. Ad networks for mobile apps require inspection of the visual layout of their ads to detect certain types of placement frauds. Doing this manually is error prone, and does not scale to the sizes of today’s app stores. In this paper, we design a system called DECAF to automatically discover various placement frauds scalably and eectively. DECAF uses automated app navigation, together with optimizations to scan through a large number of visual elements within a limited time. It also includes a framework for eciently detecting whether ads within an app violate an extensible set of rules that govern ad placement and display. We have implemented DECAF for Windows-based mobile platforms, and have applied it to 1,150 tablet apps and 50,000 phone apps in order to characterize the prevalence of ad frauds. DECAF has been used by the ad fraud team in Microsoft and has helped find many instances of ad frauds. Additionally, a variant of DECAF has been adopted by the Microsoft Bing Team for extracting and indexing in-app data. This dissertation is organized as follows. In Chapter 2, we discuss the information credibility prob- lem in crowd-sensing. In Chapter 3, we present the design of Pickle, a system for performing privacy- preserving collaborative learning. In Chapter 4, we introduce the idea of using dynamic analysis to detect placement ad fraud. In Chapter 5, we provide a comprehensive overview of related work in the literature. Finally, in Chapter 6, we summarize our work and conclude the dissertation. 4 Chapter 2 Optimizing Information Credibility in Crowd-Sensing Applications 2.1 Overview With the advent of smartphone technology, it has become possible to conceive of entirely new classes of applications. Recent research has considered personal reflection [108], social sensing [104], lifestyle and activity detection [69], and advanced speech and image processing applications [29]. These applications are enabled by the programmability of smartphones, their considerable computing power, and the presence of a variety of sensors on-board. In this chapter, we consider a complementary class of potential applications, enabled by the same capabilities, that we call crowd sensing. In this class of applications, a collection of users, each armed with a smart phone, cooperatively and collaboratively engages in one or more tasks. These users often receive instructions from or send reports (a video clip, an audio report, a text message, etc.) to a director. Because directors have a global view of information from dierent users, directors are able to manage the task eciently to achieve the task’s objectives. Beyond the obvious military applications, there are several civilian ones: search and rescue, coordinated fire-fighting, and the DARPA balloon hunt challenge 1 . In these applications, an important challenge is to obtain credible (or believable) information. In general, there are three ways in which believable information might be obtained [118]: homophily, by 1 http://www.crn.com/networking/222000334 5 which people believe like-minded people; test-and-validate, by which the recipient of information tests the correctness of the information; and corroboration, where the belief in information is reinforced by several sources reporting the same (or similar) information. The process by which humans believe information is exceedingly complex, and an extended discussion is beyond the scope of this chapter. Instead, our focus is on simple and tractable models for corroboration in crowd-sensing type appli- cations. Specifically, the scenario we consider is the following. Suppose that an event (say, a balloon sighting) is reported to a task director. The director would like to corroborate this report by obtaining re- ports from other participating members: which reporters should she select? We call this the corroboration pull problem. Clearly, asking every participating member to report is unnecessary, at best: crowd-sensing can have several hundred participants, and a video report from each of them can overwhelm the network. Thus, intuitively, the director would like to selectively request reports from a subset of participating mem- bers, while managing the network resources utilized. In this chapter, we formalize this intuition and study the space of corroboration pull formulations. Our contributions are three-fold. 1) We introduce a model for the credibility of reports. This model quantifies common intuitions about the believability of information: for example, that video is more believable than text, and that a reporter closer to an event is more believable than one further away (Section 2.2). 2) We then cast the one-shot corroboration pull problem as a discrete optimization problem, prove that it is NP-hard, and that it reduces to a multiple-choice knapsack problem with pseudo-polynomial time optimal solutions. We develop strongly-polynomial, but inecient, solutions for the case when the number of formats is fixed, and an optimal algorithm for the case of two formats. Finally, we derive an approximation algorithm for the general case that leverages the structure of our credibility model. This algorithm is about 20% o the optimal, but its running time is 2-3 orders of magnitude faster than the optimal algorithm, a running time dierence that can make the dierence between winning and losing in, say, a balloon hunt. 3) We then show that, interestingly, the renewals version of the problem, where the goal is to optimize corroboration pull in a time-averaged sense, can be solved optimally, while in some cases admitting a completely decentralized solution. 6 2.2 Terminology and Model In this chapter, we consider a constrained form of a crowd-sensing application in which N participants, whom we call reporters, collaboratively engage in a well-defined task. Each reporter is equipped with a smart phone and directly reports to a task director using the 3G/EDGE network. A reporter may either be a human being or a sensor (static, such as a fixed camera, or mobile, as a robot). A director (either a human being, or analytic software) assimilates these reports, and may perform some actions based on the content of these combined reports. Each reporter reports on an event. The nature of the event depends upon the crowd sensing application: for example, in a search and rescue operation, an event corresponds to the sighting of an individual who needs to be rescued; in the balloon hunt, an event is the sighting of a balloon. Events occur at a particular location, and multiple events may occur concurrently either at the same location or at dierent locations. Reporters can transmit reports of an event using one of several formats: such as a video clip, an audio clip, or a text message describing what the report sees. Each report is a form of evidence for the existence of the event. As we discuss below, dierent forms of evidence are “believed” to dierent extents. In general, we assume that each reporter is capable of generating R dierent report formats, denoted by f j , for 1 j R. However, dierent formats have dierent costs to the network: for example, video or audio could consume significantly higher transmission resources than, say, text. We denote by e j the cost of a report f j : for ease of exposition, we assume that reports are a fixed size so that all reports of a certain format have the same cost (our results can be easily generalized to the case where report costs are proportional to their length). Finally, reporters can be mobile, but we assume that the director is aware of the location of each reporter (see Appendix A.1.1 for a detailed justification). Now, suppose that the director in a crowd-sensing application has heard, through out of band channels or from a single reporter, of the existence of an event E at location L. To verify this report, the director would like to request corroborating reports from other reporters in the vicinity of L. However, an imme- diate problem here is how should the director determine the credibility of a reporter. Our choice to model 7 the credibility is based on two notions. First, we assume that a video report is more credible than a text re- port. We extend this intuition in our model to incorporate other formats, like audio: audio is generally less credible than video (because, while it gives some context about an event, video contains more context), but more credible than text (for a similar reason). Second, the proximity of the reporter to an event increases the credibility of the report. More precisely, a report A generated by a reporter at distance d a from an event has a higher credibility than a report B generated by a reporter at a distance d b , if d a < d b . This is also a simplified model: the real world is more complex, since the complexity of the terrain, or line of sight, may matter more than geometric distance. While there are many dierent ways in which we can objectively quantify the credibility of a report, we picked the following formulation. Let S i be the position of reporter i, L be the position of event E and c i; j (S i ; L) be the credibility of the report generated by reporter i when report format f j is used. We define c i; j (S i ; L) = 8 > > > > > > < > > > > > > : j =d(S i ; L) j ; if h 0 < d(S i ; L) j =h j 0 ; if d(S i ; L) h 0 (2.1) with 1 j R, 1 2 R , and 1 > 2 >> R . Here, d(:) is the Euclidean distance between points, h 0 is a certain minimum distance to avoid division by zero as well as to bound the maximum credibility to a certain level, and j and j are constants associated with format f j , implying that credibility decays according to a power-law with exponent j when format f j is used. Our credibility model incorporates the two notions described above as follows. Credibility being de- pendent on proximity is captured by the power-law decay with distance. That video has higher credibility than text is captured by having a larger and a smaller exponent for video. This model can be extended to incorporate noise or confusion. For example, poor visibility or audible noise near a reporter may, depending upon the format used, reduce the believability of a report. The intensity of point sources of noise can be modeled as a function that decays with distance: G 1 (S i ; O 1 ) = 1 [1 + d(S i ; O 1 )] 1= 1 (2.2) 8 where S i is the position of reporter i, O 1 is the position of noise source 1, and 1 represents the strength and eective range of noise source 1. Then, if for reporter i and event E, the original credibility without noise is c i; j (S i ; L), then the credibility with X noise sources should be c 0 i; j (S i ; L) = c i; j (S i ; L) Y X p=1 1 G p (S i ; O p ) (2.3) Noise sources eectively increase the distance of the reporter from the event, reducing his or her credibility. As we show later, our solutions can incorporate this form of noise without any modification. Although we have assigned objective quantitative values to credibility, belief or disbelief is often qual- itative and subjective. Thus, we don’t expect task directors to make decisions based on the exact values of credibility of dierent reports, but rather to operate in one of two modes: a) ask the network to deliver corroborating reports whose total credibility is above a certain threshold, while minimizing cost, or b) obtain as much corroborating information that they can get from the network for a given cost. We study these two formulations, respectively called MinCost and MaxCred. Before doing so, there are two questions to be answered: What is the value of the credibility of a col- lection of corroborating reports? What is the physical/intuitive meaning of a threshold on the credibility? For the first question, there are many possible answers and we consider two. With an additive corrob- oration function, the total credibility is simply the sum of the individual credibilities. More generally, with a monotonically-increasing corroboration function, the total credibility increases monotonically as a function of the sum of the individual credibilities. The second question is important because it can help directors set thresholds appropriately. The intuition for a particular threshold value C can be explained as follows. Suppose a director would be subjectively satisfied with 3 corroborating video clips from someone within 10m of an event. One could translate this subjective specification into a threshold value by simply taking the sum of the credibilities of 3 video reports from a distance of 10m. 9 Table 2.1: Notation N the total number of available reporters c i; j the short form of (2.1) in a given event R the total number of report formats e j the cost when using report format f j C the target credibility in MinCost A the dynamic programming process of MinCost B the cost budget in MaxCred D the dynamic programming process of MaxCred In the next two sections, we formally define MinCost and MaxCred, and then consider two problem variants: a one-shot problem which seeks to optimize reporting for individual events, and a renewals problem which optimizes reporting over a sequence of event arrivals. 2.3 The One-Shot Problem In this section, we formally state the MinCost and MaxCred formulations for the additive corroboration function and in the absence of noise, discuss their complexity, develop optimal solutions for them, and then explore an approximation algorithm that leverages the structure of the credibility function for eciency. We conclude with a discussion of extensions to the formulations for incorporating the impact of noise sources, and for a monotonically-increasing corroboration function. Our exposition follows the notation developed in the previous section, and summarized in Table 2.1. 10 2.3.1 Problem Formulation and Complexity 2.3.1.1 Problem Formulations Recall that, in Section 2.2, we informally defined the MinCost problem to be: what is the minimum cost that guarantees total credibility C > 0? MinCost can be stated formally as an optimization problem: Minimize : N X i=1 R X j=1 x i; j e j (2.4) Subject to: N X i=1 R X j=1 x i; j c i; j C x i; j 2f0; 1g;8i2f1;:::; Ng;8 j2f1;:::; Rg R X j=1 x i; j 1;8i2f1;:::; Ng where x i; j is a binary variable that is 1 if reporter i uses format f j , and 0 otherwise. Analogously, we can formulate MaxCred (the maximum credibility that can be achieved for a cost budget of B> 0) as the following optimization problem: Maximize : N X i=1 R X j=1 x i; j c i; j (2.5) Subject to: N X i=1 R X j=1 x i; j e j B x i; j 2f0; 1g;8i2f1;:::; Ng;8 j2f1;:::; Rg R X j=1 x i; j 1;8i2f1;:::; Ng 2.3.1.2 On the Complexity of MinCost and MaxCred If, in the above formulation, the cost e j is also dependent on the identity of the reporter (and therefore denoted by e i; j ), the MaxCred problem generalizes to the Multiple-Choice Knapsack Problem (MCKP, [93]). Moreover, the special case of one format (and e i; j = e i ) is the well-known Knapsack problem (KP) 11 which is NP-hard. However, when the cost is dependent only on the format (i.e., e i; j = e j ), we can state the following theorem, whose proof uses a reduction from the original Knapsack problem. Theorem 2.3.1 MinCost and MaxCred are NP-Hard. The proof is given in Appendix A.2.1. A much stronger claim on the complexity can be found in Ap- pendix A.2.2. 2.3.2 Optimization Algorithms Despite Theorem 2.3.1, it is instructive to consider optimization algorithms for the two problems for two reasons. First, for many crowd sensing problem instances, the problem sizes may be small enough that optimization algorithms might apply. Second, optimal solutions can be used to calibrate an approximation algorithm that we discuss later. In this section, we discuss two classes of optimization algorithms for MinCost and MaxCred, with dierent tradeos: one based on dynamic programming, and another based on a min-cost flow formulation. 2.3.2.1 Dynamic Programming Since there exist optimal, pseudo-polynomial time algorithms for MCKP, it is natural that similar algo- rithms exist for MinCost and MaxCred. We describe these algorithms for completeness, since we use them in a later evaluation. For MinCost (2.4), we can write y i; j = 1 x i; j , where y i; j 2f0; 1g, and then we have: N R X j=1 e j Maximize N X i=1 R X j=1 y i; j e j (2.6) Subject to: N X i=1 R X j=1 y i; j c i; j N X i=1 R X j=1 c i; j C = W y i; j 2f0; 1g;8i2f1;:::; Ng;8 j2f1;:::; Rg R X j=1 y i; j R 1;8i2f1;:::; Ng 12 where the minimization problem (2.4) has been transformed into a maximization problem, and the notation in (2.6) emphasizes that the first term in the total cost N P R j=1 e j does not depend on the y i; j variables to be optimized. For a given event, the sum of the c i; j values is a constant, and so W is also a constant. This optimization problem can be solved by a dynamic programming approach if we assume all c i; j s are truncated to a certain decimal precision, so that c i; j 2f0;; 2;:::g where is a discretization unit. Then for any binary y i; j values that meet the constraints of the above problem, the sum P N i=1 P R j=1 y i; j c i; j takes values in a setW M = f0;; 2;:::; Wg. Note that the cardinalityjWj depends on N, R, the c i; j values, and the discretization unit. Now define A(l; s) as the sub-problem of selecting reporters in the setf1;:::; lg subject to a constraint s. Assuming A(l; s) values are known for a particular l, we recursively compute A(l + 1; s) for all s2W by: A(l + 1; s) = max[ (0) (l; s); (1) (l; s);:::; (R) (l; s)] (2.7) where (k) (l; s) is defined for k2f0; 1;:::; Rg: (k) (l; s) M = A(l; s P R j=1; j,k c l; j ) + P R j=1; j,k e j This can be understood as follows: The value (k) (l; s) is the cost associated with reporter l + 1 using option k2f0; 1;:::; Rg and then allocating reportersf1;:::; lg according to the optimal solution A(l; s P R j=1; j,k c l; j ) that corresponds to a smaller budget. Note that option k2f1;:::; Rg corresponds to reporter l + 1 using a particular format (so that y l+1;k = 0 for option k and y l+1;m = 1 for all m , k), and option k = 0 corresponds to reporter l + 1 remaining idle (so that y l+1;m = 1 for all m). The time complexity of this dynamic programming algorithm, called MinCost-DP, is O(NRjWj). 13 Similarly, MaxCred can be solved using dynamic programming, yielding an algorithm we label MaxCred- DP: D(l + 1; s) = max[D(l; s); (1) (l; s);:::; (R) (l; s)]; (k) (l; s) M = D(l; s e k ) + c l;k for k2f1;:::; Rg (2.8) with complexity O(NRjBj). B M = f0;; 2;:::; Bg, where is a discretization unit, and D(l; s) is the maxi- mum attainable credibility using the first l reporters with cost constraint s, and with the initial conditions D(0;) = 0, D(; 0) = 0. 2.3.2.2 Maximum Weighted Matching Algorithms We derive algorithms for MinCost and MaxCred which are polynomial in the number of reporters N, but exponential in the number of formats R. Define j to be the number of reporters reporting with format f j . Define a report vector to be ( 1 ; 2 ;:::; R ) and an ( 1 ; 2 ;:::; R )-assignment to be an assignment of formats to reporters with j reporters reporting with format f j for each j2f1;::; Rg. We find an ( 1 ; 2 ;:::; R )-assignment of formats to reporters of maximum credibility: Assign nodes for each of the N reporters. Assign i nodes for format f i , i2f1;:::; Rg. Form a complete bipartite graph between the reporter nodes and format nodes. Assign the edge connecting reporter i to format f j weight c i; j . This bipartite graph has O(N) vertices and O(N 2 ) edges. Find a maximum matching in this bipartite graph. This maximum matching corresponds to an ( 1 ; 2 ;:::; R )-assignment of formats to reporters of maximum credibility. Using this construction, it is fairly easy to define an optimal algorithm for MaxCred, that we call MaxCred-MWM. MaxCred-MWM: enumerate all possible ( 1 ;:::; R )-assignments. For each assign- ment, check whether P R i=1 i N and whether the assignment falls within the budget B: P R i=1 i e i B. If 14 so, find an ( 1 ; 2 ;:::; R )-assignment of formats to reporters of maximum credibility. Choose the assign- ment of maximum credibility as the maximizer. In a similar way, one can define MinCost-MWM: enumerate all possible ( 1 ;:::; R )-assignments. For each assignment, check whether P R i=1 i N. If so, find an ( 1 ; 2 ;:::; R )-assignment of formats to reporters of maximum credibility. If this credibility exceeds the credibility threshold C, take note of the cost. Choose the assignment of minimum cost as the minimizer. There are O(N R ) possible report vectors. Maximum matching can be solved in O(jVj 3 ) time [122] and hence in time O(N 3 ) on the constructed bipartite graph. This leads to the following lemma: Lemma 2.3.2 Both MinCost-MWM and MaxCred-MWM run in time O(N R+3 ). Note that when the number of formats R is fixed, these algorithms are polynomial in N. In addi- tion, whenjBj;jWj = !(N R+2 =R) these algorithms have lower asymptotic complexity than their dynamic programming equivalents. 2.3.3 Using the Structure of the Credibility Function The solutions discussed so far do not leverage any structure in the problem. Given an event and reporter locations, the credibility associated with each report format is computed as a number and acts as an input to the algorithms discussed. However, there are two interesting structural properties in the problem formu- lation. First, for a given reporter at a given location, the credibility is higher for a format whose cost is also higher. Second, for reporters at dierent distances, the credibility decays as a function of distance. In this section, we ask the question: can we leverage this structure to devise ecient approximation algorithms, or optimal special-case solutions either for MaxCred or MinCost? 15 2.3.3.1 An Ecient Greedy Algorithm for Two Formats When a crowd sensing application only uses two report formats (say, text and video) and credibility values decay with distance from an event, it is possible to devise optimal greedy algorithms for MaxCred and MinCost. Assume each of the N reporters can report with one of two formats, f 1 or f 2 , that reporters are indexed so that reporter i is closer to the event than reporter k, for i < k, and that credibility decays with distance, i.e. c i; j c k; j for i< k, j2f1; 2g. Furthermore, we assume WLOG e 1 e 2 . Denote by MaxCred-D-2 and MinCost-D-2 the instances of MaxCred and MinCost respectively cor- responding to these assumptions. The following algorithm, denoted MaxCred-2F, finds an assignment with maximum credibility for MaxCred-D-2 that falls within a budget B and runs in time O(N 2 ). Algorithm 1 Algorithm MaxCred-2F INPUT: (c i; j ): i2f1;::; Ng; j2f1; 2g; (e 1 ; e 2 ); Budget B Define d m M = c m;1 c m;2 for each m2f1;:::; Ng. For i2f0;:::; min[bB=e 1 c; N]g, do: 1. Define Y M = min[N i;b(B e 1 i)=e 2 c]. 2. Define the active setA i M = f1;:::; i + Yg, being the set of i + Y reporters closest to the event. 3. DefineD i as the set of i reporters inA i with the largest d m values (breaking ties arbitrarily). Then choose format f 1 for all reporters m2D i , choose f 2 for m2A i D i , and choose “idle” for all m<A i . 4. Define C i MAX as the total credibility of this assignment: C i MAX M = P m2A i c m;2 + P m2D i d m OUTPUT: i M = arg max i C i MAX ,D i ,A i D i . Theorem 2.3.3 MaxCred-2F finds an optimal solution to MaxCred-D-2. Proof For each i, we first seek to find C i MAX , the maximum credibility subject to having exactly i reporters use the expensive format f 1 . Using a simple interchange argument together with the fact that credibility of each format is non-negative and non-increasing in distance, we can show that there exists an optimal solution that activates the setA i that consists of i + Y reporters closest to the event. Indeed, if an optimal solution does not use the setA i , we can swap an idle reporter closer to the event with an active reporter further from the event, without aecting cost or decreasing credibility. 16 For each subsetD i A i that contains i reporters, define C(D i ) as the credibility of the assignment that assigns reporters m2D i the format f 1 , assigns the remaining reporters inA i the format f 2 , and keeps all reporters m<A i idle: C(D i ) = P m2A i c m;2 + P m2D i d m Then C(D i ) is maximized by the subsetD i containing the i reporters inA i with the largest d m values. This defines C i MAX , and C MAX is found by maximizing over all possible i. We can analogously give an algorithm for MinCost-D-2, denoted by MinCost-2F, which runs in time O(N 2 ). Algorithm 2 Algorithm MinCost-2F INPUT: (c i; j ): i2f1;::; Ng; j2f1; 2g; (e 1 ; e 2 ); C Define d m M = c m;1 c m;2 for each m2f1;:::; Ng. For i2f0;:::; Ng, do: For Y2f0;:::; N ig, do: 1. Define the active setA i M = f1;:::; i + Yg, being the set of i + Y reporters closest to the event. 2. DefineD i as the set of i reporters inA i with the largest d m values (breaking ties arbitrarily). Then choose format f 1 for all reporters m2D i , choose f 2 for m2A i D i , and choose “idle” for all m<A i . 3. Define C i;Y MAX as the maximum credibility of this assignment: C i;Y MAX M = P m2A i c m;2 + P m2D i d m 4. If C i;Y MAX C: cost i;Y M = ie 1 + Ye 2 Else: cost i;Y M = 1 OUTPUT: (i ; Y ) M = arg min i;Y cost i;Y ,D i ;A i D i . Theorem 2.3.4 MinCost-2F solves MinCost-D-2. The proof is given in Appendix A.3.1. 2.3.3.2 An Ecient Approximation Algorithm The structure of our credibility function can also be used to reduce computational complexity. To un- derstand this, recall that the dynamic programming algorithms described above jointly optimized both 17 reporter selection and format selection. In this section, we describe an approximation algorithm for Min- Cost, called MinCost-CC, where the structure of the credibility function is used to determine, for each reporter, the format that the reporter should use. As we shall show, MinCost-CC has significantly lower run-times at the expense of slight non-optimality in its results. MinCost-CC is based on the following intuition. Close to the location of the event, even low-cost formats have reasonable credibility. However, beyond a certain distance, the credibility of low-cost for- mats like text degrades significantly, to the point where even the small cost of that format may not be justified. Put another way, it is beneficial for a reporter to use that format whose credibility per unit cost (hence MinCost-CC) is highest — this gives the most “bang for the buck”. Thus, for a given reporter, its current distance d from the location of the event may pre-determine the format it uses. Of course, this pre-determination can result in a non-optimal choice, which is why MinCost-CC is an approximation algorithm. Formally, in MinCost-CC, if, for a reporter i: k = arg max k c i;k (S i ; L)=e k then reporter i chooses format f k . This choice can be pre-computed (since it depends only upon the credibility and cost models) with time complexity O(NR), but each reporter needs to recalculate its choice of the report format whenever its relative distance to the concerning event changes. The event locations that determine the format f k chosen by a particular reporter i form annular regions about the reporter. Once each reporter has made the format choice, it remains for the director to decide which reporter(s) to select. For MinCost-CC, the minimum cost formulation is identical to (2.6), and with comparable complexity, but with two crucial dierences: both the constantjWj and the runtime now relate only to the number N of reporters, not to N R. As we shall show below, this makes a significant practical dierence in runtime, even for moderate-sized inputs. 18 In MinCost-CC, the dynamic programming process of (2.7) is replaced by A(l + 1; s) = maxfA(l; s); e l + A(l; s c l )g (2.9) where c l replaces c l; j in (2.7), since each reporter precomputes its format of choice. Compared with (2.7), the time complexity of (2.9) is reduced to O(NjWj) with a much smallerjWj in general. Notice that this time complexity is independent of R, the number of report formats, greatly improving its computational eciency at the expense of some optimality. In addition, the overall runtime with both the time for the precomputation and the time for the dynamic programming is O(N(R +jWj)) Using steps similar to that presented in Section 2.3.2, it is possible to define a MaxCred-CC approxi- mation algorithm for maximizing credibility, where the dynamic programming process of (2.8) is replaced by D(l + 1; s) = maxfD(l; s); c l + D(l; s e l )g (2.10) Compared with (2.8), the complexity of (2.10) is reduced to O(NjBj) with a much smallerjBj in general. Similarly, the overall runtime regarding both the precomputation and the dynamic programming is O(N(R+ jBj)). We indicate that MinCost-CC and MaxCred-CC still have pseudo-polynomial running time complex- ity, but are computationally much more ecient than MinCost-DP and MaxCred-DP. 2.3.4 Extensions Incorporating sources of noise into our algorithms is straightforward, so we will mention this briefly. Recall that the way we model a noise source increases a reporter’s eective distance. Since our optimal algorithms, like MinCost-DP or MinCost-MWM, are agnostic to the structure of the credibility function, they are unaected by noise. For an algorithm like MinCost-CC, which does take structure into account, 19 recall that noise sources increase a reporter’s eective distance. Since reporters can quantify ambient noise, they can each use the eective distance to calculate the report format to use. Finally, our algorithms can, in general, deal with monotonically increasing corroboration functions where the total credibility of a collection of reporters may be a non-linear function of the individual credi- bilities. If I(:) were to represent a monotonically increasing credibility function, we only need use I(c) to replace c in our dynamic programming formulation. For example, (2.9) would become A(l + 1; s) = maxfA(l; s); e l + A(l; I(s c l ))g Similar changes can be applied to other dynamic programming formulations. 2.4 The Renewals Problem: Randomly Arriving Events In the previous section we discussed a one-shot problem: that of optimizing for a single event. We now consider a sequence of events with arrival timesft 1 ; t 2 ; t 3 ;:::g, where t k is the arrival time for event k. In this setting, we consider a stochastic variant of MaxCred, called MaxCred-Stochastic: Instead of maximizing credibility for a single event subject to a cost constraint, we maximize the average credibility-per-event subject to an average cost constraint and a per-event credibility minimum. This couples the decisions needed for each event. However, we first show that this time average problem can be solved by a reduction to individual knapsack problems of the type described in previous sections. We then show that if the per-event credibility minimum is removed, so that we only constrain the average cost (averaged over all events), then decisions can be made in a decentralized fashion. The solution technique, described below, is general and in Section 2.4.4 we also show how it can be used to solve stochastic variants of MinCost. 20 2.4.1 The General Stochastic Problem Let ![k] represent a random vector of parameters associated with each event k, such as the location of the event and the corresponding costs and credibilities. While ![k] can include dierent parameters for dierent types of problems, we shall soon use ![k] M = [(c i; j [k]); (e j [k])], where (c i; j [k]) is the matrix of event-k credibility values for reporters i2f1;:::; Ng and formats f j 2f f 1 ;:::; f R g, and (e j [k]) is a vector of cost information. We assume the process![k] is ergodic with a well defined steady-state distribution. The simplest example is when ![k] is independent and identically distributed (i.i.d.) over events k 2 f1; 2; 3;:::g. Let frame k denote the period of time [t k ; t k+1 ) which starts with the arrival of event k and ends just before the next event. For every frame k, the director observes![k] and chooses a control action[k] from a general set of feasible actionsA ![k] that possibly depends on![k]. The values![k] and[k] together determine an M + 1 dimensional vector y[k], representing network attributes for event k: y[k] = (y 0 [k]; y 1 [k];:::; y M [k]) Specifically, each y m [k] attribute is given by a general function of[k] and![k]: y m [k] = ˆ y m ([k];![k])8m2f0; 1;:::; Mg The functions ˆ y m ([k];![k]) are arbitrary and are only assumed to be deterministically bounded above and below by finite constants. Define y m as the time average expectation of the attribute y m [k], averaged over all frames: 2 y m M = lim K!1 1 K P K k=1 Efy m [k]g 2 We can generalize the definition of y m using a lim sup (being the largest limiting value over any convergent subsequence) if there is concern about whether or not the regular limit exists. 21 The general problem is to find an algorithm for choosing control actions [k] for each frame k 2 f1; 2; 3;:::g to solve: Minimize: y 0 (2.11) Subject to: y m 08m2f1; 2;:::; Mg (2.12) [k]2A ![k] 8 frames k2f1; 2;:::g (2.13) Our approach to this general problem uses a parameter V > 0, which aects a performance tradeo. Specifically, for each of the M time average inequality constraints y m 0 (for m2f1;:::; Mg), define a virtual queue Z m [k] with Z m [0] = 0, and with frame-update equation: Z m [k + 1] = max[Z m [k] + y m [k]; 0] (2.14) Then every frame k, observe the value of![k] and perform the following actions: Choose[k]2A ![k] to minimize: V ˆ y 0 ([k];![k]) + P M m=1 Z m [k]ˆ y m ([k];![k]) Update the virtual queues Z m [k] according to (2.14), using the values y m [k] = ˆ y m ([k];![k]) deter- mined from the above minimization. We call the above algorithm the drift-plus-penalty algorithm because it is derived by minimizing a bound on a drift-plus-penalty expression for a quadratic Lyapunov function [56,112]. Assuming the prob- lem (2.11)-(2.13) is feasible (so that it is possible to meet the time average inequality constraints), this algorithm will also meet all of these constraints, and will achieve a time average value y 0 that is within O(1=V) of the optimum. Typically, the V parameter also aects the average size of the virtual queues, 22 which directly aects the convergence time needed for the time averages to be close to their limiting val- ues. Specifically, below we provide a simple performance theorem for the case when ![k] is i.i.d. over frames (more complex statements are given in [112] for the same algorithm under more general non-i.i.d. cases). Let B be a finite constant that satisfies the following for all k: B 1 2 P M m=1 E n y m [k] 2 o (2.15) Such a constant B exists because the y m [k] processes are bounded above and below by finite constants. Assume the problem (2.11)-(2.13) is feasible, and define y opt 0 as the infimum value of y 0 subject to the constraints (2.12)-(2.13). Theorem 2.4.1 (Performance of Stochastic Algorithm [112]) Suppose the drift-plus-penalty algorithm is used with any constant V 0, and with initial queue values Z m [1] = 0 for m2f1;:::; Mg. Then under the above assumptions, we have that all desired constraints are satisfied: lim sup K!1 1 K P K k=1 Efy m [k]g 08m2f1;:::; Mg (2.16) Further, the time average expectation of y 0 [k] satisfies for all K2f1; 2; 3;:::g: 1 K P K k=1 Efy 0 [k]g y opt 0 + B=V (2.17) where B is defined in (2.15). Finally, for all K2f1; 2; 3;:::g we have: 1 K P K k=1 Efy m [k]gEfZ m [K + 1]=Kg 8m2f1;:::; Mg (2.18) P M m=1 E n Z m [K + 1] 2 =K 2 o 1 K h 2B + 2V(y opt 0 ) i (2.19) 23 where is a finite constant such that Efy 0 [k]g for all k. Such a value exists because y 0 [k] is determinisitically lower bounded. The above theorem says that all desired constraints are satisfied, while the time average expectation of y 0 [k] is within B=V from optimality, where B=V can be made arbitrarily small by increasing the V parameter. However, increasing V aects convergence time of the desired time average constraints, as shown by (2.18)-(2.19). In particular, the number of events K must grow large so that right-hand-side of (2.19) is small, which ensures the right-hand-side of (2.18) is small. The queues Z m [k] can be shown to have finite time average of size O(V) whenever a mild Slater condition is satisfied [112]. A stronger deterministic bound on Z m [k] is shown in Section 2.4.3 for a special case of the MaxCred-Stochastic problem defined below. 2.4.2 Corroboration Pull as a Stochastic Optimization Problem Here we formulate MaxCred-Stochastic. Define![k] M = [(c i; j [k]); (e j [k])],[k] M = (x i; j [k]), where x i; j [k] is a binary variable that is 1 if reporter i2f1;:::; Ng uses format f j 2f f 1 ;:::; f R g on frame k. The parameter c i; j [k] represents the credibility of reporter i using format j on frame k, and e j [k] represents the cost of format j on frame k. We assume throughout that these values are non-negative. The goal is to maximize the average credibility-per-frame subject to average cost constraints and to a minimum credibility level required on each frame k2f1; 2;:::g: Maximize: c (2.20) Subject to: e e av (2.21) P N i=1 P R j=1 x i; j [k]c i; j [k] c min 8frame k (2.22) x i; j [k]2f0; 1g ; P R b=1 x i;b [k] 18i; j;8frame k (2.23) 24 where e av and c min are given constants, and c and e are defined: c M = lim K!1 1 K P K k=1 P N i=1 P R j=1 E n x i; j [k]c i; j [k] o e M = lim K!1 1 K P K k=1 P N i=1 P R j=1 E n x i; j [k]e j [k] o This problem fits the general stochastic optimization framework of the previous subsection by defining y 0 [k]; y 1 [k] by: y 0 [k] = ˆ y 0 ([k];![k]) M = P N i=1 P R j=1 x i; j [k]c i; j [k] y 1 [k] = ˆ y 1 ([k];![k]) M = e av + P N i=1 P R j=1 x i; j [k]e j [k] and by definingA ![k] as the set of all (x i; j [k]) matrices that satisfy the constraints (2.22)-(2.23). The resulting stochastic algorithm thus defines a virtual queue Z 1 [k] with update: Z 1 [k + 1] = max 2 6 6 6 6 6 6 4 Z 1 [k] e av + N X i=1 R X j=1 x i; j [k]e j [k]; 0 3 7 7 7 7 7 7 5 (2.24) It then observes Z 1 [k] and the![k] parameters every frame k and chooses (x i; j [k]) subject to (2.22)-(2.23) to minimize V ˆ y 0 ([k];![k]) + Z 1 [k]ˆ y 1 ([k];![k]). This amounts to observing the queue Z 1 [k] and the parameters c i; j [k], e j [k] every frame k and solving: Maximize: P N i=1 P R j=1 x i; j [k][Vc i; j [k] Z 1 [k]e j [k]] (2.25) Subject to: (x i; j [k]) satisfy (2.22)-(2.23) (2.26) The queue Z 1 [k] is then updated by (2.24). Thus, every frame k we have a maximization problem (2.25)-(2.26) that is similar to the knapsack-like problems we have seen in previous sections. However, the knapsack weights for frame k now depend 25 on the current credibility values c i; j [k], the costs e j [k], and the virtual queue size Z 1 [k]. Specifically, the weight for variable x i; j [k] is Vc i; j [k] Z 1 [k]e j [k], and this weight can be positive, negative, or zero. 2.4.3 An Exact Distributed Algorithm for c min = 0 A simple and exact distributed implementation to (2.25)-(2.26) arises if the c min constraint (2.22) is re- moved (i.e., if c min M = 0). In this case the frame k decisions are separable over reporters and reduce to having each reporter i choose the single format f j 2 f f 1 ;:::; f R g with the largest (positive) value of Vc i; j [k] Z 1 [k]e j [k], breaking ties arbitrarily and choosing to be idle (with x i; j [k] = 0 for all j2f1;:::; Rg) if none of the weights Vc i; j [k] Z 1 [k]e j [k] are positive. The task director observes the outcomes of the decisions on frame k and iterates the Z 1 [k] update (2.24), passing Z 1 [k + 1] to all reporters before the next event occurs. In this case when c min = 0, we can also show that the Z 1 [k] queue is deterministically bounded by a constant Z max , provided that two additional boundedness assumptions hold. This is important because such a bound Z max can be viewed as the worst-case excess cost (over the desired average) that is spent on any successive set of frames. Specifically, suppose that there is a finite constant such that for all k and all i; j such that e j [k]> 0 we have: c i; j [k]=e j [k] (2.27) That is, is an upper bound on the credibility-per-cost ratio for any choice x i; j [k] on any frame k. Further suppose there is a finite constant e max such that for all k: N X i=1 R X j=1 x i; j [k]e j [k] e max (2.28) We have the following theorem. 26 Theorem 2.4.2 Suppose there are positive constants, e max such that (2.27) and (2.28) hold for all k and all i; j. Assume Z 1 [1] = 0. Then under the above distributed algorithm for the special case c min = 0, we have: Z 1 [k] V + e max 8k2f0; 1; 2;:::g (2.29) Furthermore, for any integers k 0 1 and P 1, we have: k 0 +P1 X k=k 0 2 6 6 6 6 6 6 4 N X i=1 R X j=1 x i; j [k]e j [k] 3 7 7 7 7 7 7 5 e av P + (V + e max ) (2.30) The above theorem shows that the total cost expended over any P successive frames (for any integer P > 0) is at most e av P (the average cost constraint multiplied by the number of frames) plus a constant that is independent of P. The constant, V + e max , represents the worst-case excess cost over any set of successive frames, and is linear in the V parameter. The bound in Theorem 2.4.2 holds for arbitrary![k] sample paths. On the other hand, if![k] is i.i.d., then Theorem 2.4.1 ensures the time average credibility value is within B=V from optimality. The proof of Theorem 2.4.2 can be found in Appendix A.4.1. 2.4.4 MinCost-Stochastic Again define![k] M = [(c i; j [k]); (e j [k])]. MinCost-Stochastic can be formulated as follows: Minimize: e (2.31) Subject to: c c av (2.32) P N i=1 P R j=1 x i; j [k]c i; j [k] c min 8frame k (2.33) x i; j [k]2f0; 1g ; P R b=1 x i;b [k] 18i; j;8frame k (2.34) 27 We can thus define y 0 (t) and y 1 (t) as: y 0 [k] M = P N i=1 P R j=1 x i; j [k]e j [k] y 1 [k] M = c av P N i=1 P R j=1 x i; j [k]c i; j [k] We thus define Z 1 [k] by: Z 1 [k + 1] = max 2 6 6 6 6 6 6 4 Z 1 [k] + c av N X i=1 R X j=1 x i; j [k]c i; j [k]; 0 3 7 7 7 7 7 7 5 (2.35) Every frame k, the algorithm observes the c i; j [k] and e j [k] parameters and the Z 1 [k] queue value and chooses x i; j [k] variables to minimize Vy 0 [k] + Z 1 [k]y 1 [k]. This amounts to solving: Minimize: P N i=1 P R j=1 x i; j [k][Ve j [k] Z 1 [k]c i; j [k]] (2.36) Subject to: (x i; j [k]) satisfy (2.33)-(2.34) (2.37) The virtual queue Z 1 [k] is then updated via (2.35). The problem (2.36)-(2.37) is again a knapsack-like problem for each frame k. It also yields an exact distributed implementation when c min = 0 (so that constraint (2.33) is removed). In particular, if c min = 0, then each reporter i selects the format j with the smallest (negative) weight Ve j [k] Z 1 [k]c i; j [k], choosing to remain idle if no formats have negative weights. 2.5 Performance Evaluation In this section, we conduct simulations to evaluate the performance of some of our solutions to the one-shot and renewals problems. For the one-shot problems, we compare the performance of the exact dynamic programming algorithms and the ecient approximation algorithms. We show that the approximation algorithms can speed up processing time by 2-3 orders of magnitude, while being less than 20% o the 28 optimal values on average. For the renewals problems, we investigate how the parameter V aects the performance of the optimal distributed solutions: increasing V improves the quality of the solutions, but can adversely aect convergence. 2.5.1 One-Shot Problems Evaluation of MinCost-DP and MinCost-CC. As described in previous sections, the approximation al- gorithm MinCost-CC trades o optimality for reduced computational complexity. As such, it is important to quantify this trade-o for practical crowd-sensing configurations. In this section, we compare MinCost- CC and MaxCred-CC with MinCost-DP and MaxCred-DP respectively 3 . Lacking data from crowd-sensing applications, we use two dierent data sets. First, we carefully 4 manually mine Google News for interesting events. Searching for a specific set of keywords describing an event in Google News retrieves a list of news items related to that event within 24 hours of occurrence of that event. The event location is explicitly specified in the news items. Each news item has a location, which is assumed to be the location of a reporter. We use the event location and report location as inputs to MinCost-CC and MinCost-DP. In this paper, we present results from three events: an event of regional scope, a basketball playo game between the Lakers and the Jazz (31 reporters); an event of national scope, the passage of the healthcare reform bill (63 reporters); and an event of global scope, the opening of the Shanghai exposition (88 reporters). Of course, this choice of a surrogate for crowd sensing is far from perfect. However, this data set gives a varied, realistic reporter location distribution; since our algorithms depend heavily on location, we can draw some reasonable conclusions about their relative performance. That said, we also use a dataset generated from a random distribution of reporters to ensure that we are not misled by the Google News data set, but also to explore the impact of larger task sizes. 3 In some of our evaluations, we use R = 4. In this regime, MinCost-DP is more ecient than MinCost-MWM, hence the choice. 4 We re-scaled reporter distances and did several data cleaning operations: removing blog posts, handling duplicate reports etc. We omit a discussion of these for brevity. 29 1 3 5 7 9 11 13 15 1 1.1 1.2 1.3 k Optimality gap National Global Regional (a) Optimality gap 1 3 5 7 9 11 13 15 3 4 5 6 7 8 k Log 10 (Runtime) National - CC National - DP Gloabl - CC Global - DP Regional - CC Regional - DP (b) Runtime (insec) Figure 2.1: Minimal cost of 4 formats with increasing k We are interested in two metrics: the optimality gap, which is the ratio of the min-cost obtained by MinCost-CC to that obtained by MinCost-DP; and the runtime of the computation for each of these algo- rithms. Figure 2.1 plots these two metrics as a function of the credibility threshold, expressed as a number k. A value k represents a credibility threshold corresponding to the total credibility of k reports of the highest cost format from a distance h 0 (e.g., if k is 3 and the highest cost format is video, then the director is interested in obtaining credibility equivalent to that from three video reports). In this graph, we use four data formats with h 0 , 14 , 14 and the corresponding e 14 setting to 1, (1, 1, 1, 1), (2, 1.5, 1, 0.5) and (1, 2.2, 5.4, 13.7) respectively. Other experimental settings give qualitatively similar results. From Figure 2.1(a), the optimality gap is, on average 19.7%, across dierent values of k. This is encouraging, since it suggests that MinCost-CC produces results that are not significantly far from the optimal. Interestingly, no optimal solution exists for k> 5 for the regional event: this credibility threshold experiences a “saturation”, since there does not exist a set of reporters who can collectively satisfy that threshold. Other events saturate at dierent values of k. Finally, while this is not apparent from these graphs, the minimum cost solution is approximately linear in k for MinCost-CC and MinCost-DP. More interestingly, from Figure 2.1(b), it is clear that the runtime of MinCost-CC is 2-3 orders of magnitude lower than that of MinCost-DP with the discretization settingjWj = 1000W. This dierence is not just a matter of degree, but may make the dierence between a useful application and one that is not useful: MinCost-DP can take several tens of seconds to complete while MinCost-CC takes at most a 30 1 4 7 10 13 16 19 22 25 0.85 0.95 1.05 1.15 1.25 1.35 k Optimality gap 100 reporters 200 reporters (a) Optimality gap 1 4 7 10 13 16 19 22 25 3 5 7 9 k Log 10 (Runtime) 100 reporters - CC 100 reporters - DP 200 reporters - CC 200 reporters - DP (b) Runtime (insec) Figure 2.2: Minimal cost in random topologies with increasing k (error bars are very small thus ignored in (b)) few hundred milliseconds, which might make the dierence between victory and defeat in a balloon hunt, or life and death in a disaster response task! The explanation for the performance dierence is the lower asymptotic complexity of MinCost-CC. A subtle finding is that the running time of both MinCost-CC and MinCost-DP decreases, sometimes dramatically in the case of MinCost-CC, with increasing k. Intuitively, this is because there are fewer candidate sets of reporters who can satisfy a higher credibility, resulting in a smaller search space. For random topologies, Figure 2.2 plots the optimality gap and runtime, averaged over 50 simulations. MinCost-CC is, on average 20.5% and 17.4% for 100 and 200 reporters, o the optimal for dierent values of k, but is still 2-3 orders of magnitude more ecient than MinCost-DP. The runtimes for both algorithms are slightly higher, given the larger number of reporters. Moreover, with 100 or 200 reporters, the optimality gap has the same upper bound, about 35% for large k. This is also observed in other simulations for report numbers of 50, 150 and 300 (not shown). We have left an analytical exploration of this upper bound to future work. Finally, a comparison of these results with Figure 2.1(a) reveals an interesting result. Although dierent types of reporter deployments can result in dierent optimality gap curves (the curves for the three dierent types of Google News in Figure 2.1(a) are not the same), 31 25 100 175 250 325 400 0.7 0.8 0.9 1 Cost budget Optimality gap National Global Regional (a) Optimality gap 25 100 175 250 325 400 1.5 2.5 3.5 4.5 5.5 6.5 Cost budget Log 10 (Runtime) National - CC National - DP Global - CC Global - DP Regional - CC Regional - DP (b) Runtime (insec) Figure 2.3: Maximal credibility of 4 formats with increasing B the national event seems to have a qualitatively similar optimality gap curve as the random topologies. Understanding this in greater depth is also left to future work. Evaluation of MaxCred-DP and MaxCred-CC. With the same settings exceptB = 10B, we also compare MaxCred-CC with MaxCred-DP (Figures 2.3(a) for the Google News datasets and 2.4(a) for random topologies). In these experiments, the optimality gap is, on average 13.5%, across dierent values of k for the Google News datasets, while for random topologies, MaxCred-CC is, on average 19.8% and 18.9% for 100 and 200 reporters, o the optimal for dierent values of k. In both cases, MaxCred-CC is still 2-3 orders of magnitude more ecient than MaxCred-DP. Once again, the optimality gap seems to have a lower bound, about 25% for large B in random topologies, and the national event still seems a qualitatively similar optimality gap curve as the random topologies. Finally, as an aside, in Figure 2.3(a), no data point is shown for B> 175 for the regional event: this cost budget is the largest necessary, since it is impossible to further increase the optimal credibility even with higher budget. 2.5.2 Evaluation of Renewals Problems As we have described, for the renewal problems, there exist optimal distributed algorithms that can con- strain the average cost/credibility across all the sequentially arriving events. To conduct the evaluation, we generate cost and credibility for each random renewal event (frame) according to an i.i.d. process: Assuming there are N users and F formats, we first draw a series of i.i.d values C k (1);:::; C k (N) from the standard uniform distribution for every new frame k. Then, we set c i; j [k] = C k (i) j, which indicates 32 25 180 335 490 645 800 0.7 0.77 0.84 0.91 0.98 1.05 Cost budget Optimality gap 100 reporters 200 reporters (a) Optimality gap 25 180 335 490 645 800 2 3 4 5 6 7 Cost budget Log 10 (Runtime) 100 reporters - CC 100 reporters - DP 200 reporters - CC 200 reporters - DP (b) Runtime (insec) Figure 2.4: Maximal credibility in random topologies with increasing B (error bars are very small thus ignored in (b)) that the credibility is linear in the format number (higher numbered formats oer proportionally better credibility). We also set the cost of format j as quadratic in the format number, which yields e j [k] = j 2 . Then, for evaluating the distributed algorithms, we focus on two metrics related to V. The first metric is the convergence time, which is the number of frames needed for time averages to be close to their target values. The second metric is the optimality gap, which is the dierence between the achieved time average objective function value (under a certain V) and the optimal time average value that can be achieved subject to the desired constraints. Generally speaking, with larger V, convergence is slower but the resulting time average objective function has a smaller gap to optimality (theoretically being within O(1=V) of the optimum as described earlier). We simulate the distributed versions of MaxCred-Stochastic and MinCost-Stochastic, for c min = 0 in (2.22) and (2.33). The results under N = 10, F = 3, e av = 3:5 and c av = 6 are plotted in Figures 2.5 and 2.6 for MaxCred-Stochastic and MinCost-Stochastic, respectively. Other parameter choices lead to quite similar conclusions. From Figures 2.5(a) and 2.6(a), we see how the achieved average value of the performance objective converges toward the optimal as V increases. Specifically, we compare dierent V settings against the result obtained by setting V to a relatively large value (400, in our simulations). Taking V = 10, 40, and 100 as examples, we find: For the MaxCred-Stochastic problem, the achieved values are 4.53%, 0.78% and 0.36% o the value obtained when V = 400 respectively, while they are 2.38%, 1.65% and 33 0 50 100 150 200 0 1 2 3 V Achieved average credibility the achieved average credibility the real maximal value (a) Achieved average credibility vs. V 1 400 800 1200 1600 2000 0 2 4 6 8 Frames Average cost V = 10 V = 40 V = 100 e av (b) Average cost vs. time Figure 2.5: Evaluation of MaxCred-Stochastic with dierent V 0 50 100 150 200 10 13 16 19 22 25 V Achieved average cost the achieved average cost the real minimal value (a) Achieved average cost vs. V 1 400 800 1200 1600 2000 0 2 4 6 Frames Average credibility V = 10 V = 40 V = 100 c av (b) Average credibility vs. time Figure 2.6: Evaluation of MinCost-Stochastic with dierent V 0.84% for the MinCost-Stochastic problem. Figures 2.5(b) and 2.6(b) show how the V parameter aects a tradeo in convergence time to the desired average requirements (e 3:5 in Figure 2.5(b) and c 6 in Figure 2.6(b)). The figures plot the first 2000 frames and shows convergence times. Obviously, V = 10 has the fastest convergence, V = 40 is slower, and V = 100 is slowest. Looked at another way, the time taken when V = 10 to reach within 3% of its long-term average is 25 frames for MaxCred-Stochastic and 204 frames for MinCost-Stochastic: for the other two settings, the corresponding numbers (179/791 and 558/1958 respectively) are considerably higher. Convergence time for the MinCost-Stochastic problem can be improved by using the place-holder technique in [112], which in this case reduces to using a non- zero initial condition for Z 1 [0]. Using Z 1 [0] = V works well in this case, as this is the minimum backlog required for reporters to use non-idle formats. When using this non-zero initial condition for the cases V = 10, 40, and 100, we observed the convergence times for MinCost-Stochastic were improved from 204, 791, and 1958 to 169, 570, and 1385. 34 Chapter 3 Cloud-Enabled Privacy-Preserving Collaborative Learning for Mobile Sensing 3.1 Overview As smartphones proliferate, their sensors are generating a deluge of data. One tool for handling this data deluge is statistical machine classification; classifiers can automatically distinguish activity, context, peo- ple, objects, and so forth. We envision classifiers being used extensively in the future in mobile computing applications; already, many pieces of research have used standard machine learning classifiers (Section 5). One way to build robust classifiers is through collaborative learning [23,30,103], in which mobile users contribute sensor data as training samples. For example, mobile users may submit statistical summaries (features) extracted from audio recordings of their ambient environment, which can be used to train a model to robustly recognize the environment that a user is in: a mall, an oce, riding public transit, and so forth. Collaborative learning can leverage user diversity for robustness, since multiple users can more easily cover a wider variety of scenarios than a single user. We envision that collaborative learning will be enabled by a software system that eciently collects training samples from contributors, generates statistical classifiers, and makes these classifiers available to mobile users, or software vendors. In this chapter, we address the challenges involved in designing a 35 system for collaborative learning. Such a system must support the popular classifiers (such as Support- Vector Machines or SVMs and k-Nearest Neighbors or kNNs), must scale to hundred or more contributors, and must incentivize user contribution (Section 3.2). To our knowledge, no prior work has discussed the design of a collaborative learning system with these capabilities. An impediment to scaling collaborative learning is the computational cost of constructing the classifier from training data. With the advent of cloud computing, a natural way to address this cost is to let mobile users submit their sensor data to a cloud service which performs the classifier construction. In such a design, however, to incentivize users to participate in collaborative learning it is essential to ensure the privacy of the submitted samples. Training samples might accidentally contain sensitive information; features extracted from audio clips can be used to approximately reconstruct the original sound [48, 102], and may reveal over 70% of the words in spoken sentences (Section 3.4.1). In this chapter, we present Pickle, a novel approach to privacy-preserving collaborative learning. Pickle is based on the following observation: the popular classifiers rely on computing mathematical relationships such as the inner products and the Euclidean distances between pairs of submitted training samples. Pickle perturbs the training data on the mobile device using lightweight transformations to preserve the privacy of the individual training samples, but regresses these mathematical relationships between training sam- ples in a unique way, thereby preserving the accuracy of classification. Beyond addressing the challenges discussed above, Pickle has many desirable properties: it requires no coordination among users and all communication is between a user and the cloud; it allows users to independently tune the level of privacy for perturbing their submitted training samples; finally, it can be made robust to poisoning attacks and is collusion-resistant. Pickle’s design is heavily influenced by the requirements of mobile sensing appli- cations, and occupies a unique niche in the body of work on privacy-preserving methods for classifier construction (Section ??). A user study demonstrates that Pickle preserves privacy eectively when building a speaker recognition classifier (Section 3.4.1); less than 2% of words in sentences reconstructed by attacking Pickle transfor- mations were recognizable, and most of these were stop words. Results from a prototype (Section 3.4.2) 36 show that Pickle communication and storage costs are small, classification decisions can be made quickly on modern smartphones, and model training can be made to scale using parallelism. Finally, using several datasets, we demonstrate (Section 3.4.3) that Pickle’s privacy-preserving perturbation is robust to regres- sion attacks in which the cloud attempts to reconstruct the original training samples. The reconstructed samples are significantly distributionally-dierent from the original samples. Despite this, Pickle achieves classification accuracy that is within 5%, in many cases, of the accuracy obtained without any privacy transformation. 3.2 Motivation and Challenges Modern phones are equipped with a wide variety of sensors. An emerging use of this sensing capability is collaborative learning, where multiple users contribute individually collected training samples (usually extracted from raw sensor data) so as to collaboratively construct statistical models for tasks in pattern recognition. In this chapter, we explore the design of a system for collaborative learning. What is Collaborative Learning? As an example of collaborative learning, consider individual users who collect audio clips from their ambient environments. These users may be part of a social network. Alternatively, they may have no knowledge of each other and may have volunteered to provide samples, in much the same way as volunteers sign up for distributed computing eorts like SETI@HOME; in this sense, we focus on open collaborative learning. Training samples extracted from these clips are collected and used to build a classifier that can determine characteristics of the environment: e.g., determine whether a user is at home, on a bus, in a mall, or dining at a restaurant, etc. As another example, consider a mobile health application that collects patient vital signs to build a model for classifying diseases. Collaborative learning results in classifiers of activity, or of environmental or physiological conditions etc. Many proposed systems in the mobile sensing literature (e.g., [22,30,33,86,104]) have used machine- learning classifiers, often generated by using training samples from a single user. Due to the diversity of environments or human physiology, classifiers that use data from a single user may not be robust to 37 a wide range of inputs. Collaborative learning overcomes this limitation by exploiting the diversity in training samples provided by multiple users. More generally, collaborative learning is applicable in cases, such as human activity recognition or SMS spam filtering, where a single user’s data is far from being representative. Designing a system for collaborative learning sounds conceptually straightforward, but has many un- derlying challenges. Before we describe these challenges, we give the reader a brief introduction to machine-learning classifiers. The Basics of Classification. The first step in many machine-learning algorithms is feature extraction. In this step, the raw sensor data (an image, an audio clip, or other sensor data) are transformed into a vector of features that are most likely to be relevant to the classification task at hand. Examples of features in images include edges, contours, and blobs. For audio clips, the fundamental frequency, the spreads and the peaks of the spectrum, and the number of harmonics are all examples of features. Classifiers are first trained on a set of training samples denoted by:D =f(x 1 ; y 1 ); (x 2 ; y 2 );:::; (x N ; y N )g where x i 2 R P is the i-th training feature vector, and y i is a categorical variable representing the class to which x i belongs. For example, x i may be a list of spectral features of an audio clip, and y i may identify this audio clip as “bus” (indicating the clip was recorded in a bus). In what follows, we use X to denote the data matrix with x i as column vectors, andU orV to refer to users. The goal of classification is to construct a classifier usingD such that when presented with a new test feature vector x, the classifier outputs a label y that approximates x’s true class membership. One popular, yet simple and powerful, classifier is the k-Nearest-Neighbor (kNN) classifier. Given a feature vector x and a training setD, kNN finds the k training feature vectors which are the closest to x, in terms of the Euclidean distance between them: kx x i k 2 2 = x T x 2x T x i + x T i x i : (3.1) 38 The classifier then outputs the majority of all the nearest neighbors’ labels as the label for x (ties broken arbitrarily). Support Vector Machine (SVM) is another popular and more sophisticated classifier. It leverages a non- linear mapping to map x into a very high-dimensional feature space. In this feature space, it then seeks a linear decision boundary (i.e., a hyperplane) that partitions the feature space into dierent classes [43]. For the purposes of this chapter, two computational aspects of this classifier are most relevant: The training process of SVMs rely on computing either the inner product x T i x j or the Euclidean distancekx i x j k 2 2 between pairs of training feature vectors. The resulting classifier is composed of one or more of the submitted training samples — support vectors. Design Goals and Challenges. A system for open collaborative learning must support three desirable goals. First, it must support the most commonly-used classifiers such as the Support Vector Machine (SVM) classifier and the k-Nearest Neighbor (kNN) classifier described above. These popular classifiers are used often in the mobile sensing literature for logical localization [28], collaborative video sensing [30], be- havioral detection of malware [33], device identification [94] and so on. Other pieces of work, such as CenceMe [104], EEMSS [22], and Nericell [106] could have used SVM to get better classification perfor- mance. Second, the system must scale to classifiers constructed using training samples from 100 or more users. At this scale, it is possible to get significant diversity in the training samples in order to enable robust classifiers. A major hurdle for scaling is computational complexity. Especially for SVM, the complexity of constructing the classifier is the dominant computational task, and using the classifier against test feature vectors (i.e., x above) is much less expensive. As we discuss later, it takes a few hours on a modern PC to construct a classifier using data from over 100 users; as such, this is a task well beyond the capabilities 39 of smartphones today. A less crucial, but nevertheless important, scaling concern is network bandwidth usage. Third, the system must have the right adoption incentives to enable disparate users to contribute training samples: (1) The system must ensure the privacy of the submitted samples, as we discuss below; (2) It must be robust to data poisoning, a legitimate concern in open collaborative learning; (3) It must enable users who have not contributed to the model to use the classifier, but must dis-incentivize free-riders who use classifiers directly obtained from other users. Of these, addressing the privacy goal is most intellectually challenging, since the construction of many popular classifiers, like SVM or kNN, requires calculations using accurate feature vectors which may reveal privacy (Section 3.3). Consideration of economic incentives for collaborative learning is beyond the scope of this chapter; we assume that crowd sourcing frameworks like Amazon Mechanical Turk 1 can be adapted to provide appropriate economic incentives. Cloud-Enabled, Privacy-Preserving Classification. We propose to use an approach in which mobile users submit training samples (with associated labels) to a cloud, possibly at dierent times over a period of hours or days; the cloud computes the classifier; the classifier is then sent to mobile phones and used for local classification tasks. Using the cloud addresses the computational scaling challenge, since classifier construction can be par- allelized to take advantage of the cloud’s elastic computing capability. The cloud provides a rendezvous point for convenient training data collection from Internet-connected smartphones. Finally, the cloud provides a platform on which it is possible to develop a service that provides collaborative learning of dif- ferent kinds of models (activity/context recognition, image classification, etc.). Indeed, we can make the following claim: to conveniently scale open collaborative learning, an Internet-connected cluster is neces- sary, and the cloud infrastructure has the right pay-as-you-go economic model since dierent collaborative learning tasks will have dierent computational requirements. 1 https://www.mturk.com/mturk/welcome 40 However, using the cloud makes it harder to achieve an important design goal discussed above, privacy. Privacy and the Threat Model. In cloud-enabled open collaborative learning, users contribute several training samples to the cloud. Each sample consists of a feature vector x and the associated label y. Both of these may potentially leak private information to the cloud (as we discuss below, in our approach we assume the cloud is untrusted), and we consider each in turn. Before doing so, we note that using the classifier itself poses no privacy threat 2 , since smartphones have enough compute power to perform the classification locally (Section 3.4.2). Depending upon the kind of classifier that the user is contributing to, the label y may leak information. For example, if the model is being used for activity recognition, a label may indicate that the user was walking or running at a given time. In this chapter, we do not consider label privacy because the user willingly contributes the labeled feature vectors and should have no expectation of privacy with regard to labels. However, users may (or should, for reasons discussed below) have an expectation of privacy with respect to information that may be leaked by the feature vectors. Feature vectors x often consist of a collection of statistical or spectral features of a signal (e.g., the mean, standard deviation or the fundamental frequency). Some features can leak private information. Consider features commonly used to distinguish speech from music or noise [86]: the Energy Entropy, Zero-Crossing Rate, Spectral Rollo, or Spectral Centroid etc. These statistics of the speech signals may, unintentionally, reveal information that can be used to ex- tract, for example, age, gender or speaker identity. In experiments we have conducted on audio clips from the TIMIT dataset [55] (details omitted for brevity), female voices tend to have higher average spectral rollo and average spectral centroid than male voices, while voices of younger individuals have higher average energy entropy and lower average zero-crossing rate than voices of the aged. Similar age-related dierences in measures of repeated activity have also been observed elsewhere [24]. 2 Assuming the user can trust the phone software; methods for ensuring this are beyond the scope of this chapter. 41 Worse yet, a relatively recent finding has shown that, in some cases, feature vectors can be used to reconstruct the original raw sensor data. Specifically, a commonly used feature vector in speech and music classification is the Mel-frequency cepstrum coecients (MFCC), which are computed by a sequence of mathematical operations on the frequency spectrum of the audio signals. A couple of works [48,102] have shown that it is possible to approximately reconstruct the original audio clips, given the MFCC feature vectors. In Section 3.4.1, we present the results of an experiment that quantifies information leakage by MFCC reconstruction. In the context of collaborative learning, this is an alarming finding. When a user submits a set of feature vectors and labels them as being in a cafe (for example), the cloud may be able to infer far more information than the user intended to convey. When the original audio clips are reconstructed, they may reveal background conversations, the identity of patrons, and possibly even the location of the specific cafe. A recent, equally alarming, finding is that original images may be reconstructed from their feature vectors [129]. Given these findings, we believe it is prudent to ensure the privacy of feature vectors. The alternative approach, avoiding feature vectors that are known or might be suspected to reveal private information, can aect classification accuracy and may not therefore be desirable. One way to preserve the privacy of x is to generate ˜ x from x and send only ˜ x the cloud, with the property that, with high likelihood, the cloud cannot reconstructx from ˜ x. Our approach randomly perturbs the feature vectors to generate ˜ x, but is able to reconstruct some of the essential properties of these feature vectors that are required for classifier construction, without significantly sacrificing classifier accuracy. As we show later, approaches that use other methods like homomorphic encryption, secure multi-party communication or dierential privacy make restrictive assumptions that do not apply to our setting. We make the following assumptions about the threat model. The user trusts the software on the mobile device to compute and perturb feature vectors correctly, and to transmit only the perturbed feature vectors and the associated labels. The user does not trust other users who participate in the collaborative learning, nor does she trust any component of the cloud (e.g., the infrastructure, platforms or services) The cloud has 42 probabilistic polynomial-time bounded computing resources and may attempt to reconstruct the original feature vectors. Servers on the cloud may collude with each other, if necessary, to recontruct the original feature vectors. Moreover, the cloud may collude with user A to attempt to reconstruct user B’s original feature vectors by directly sending B’s perturbed feature vectors to A. Also, user B’s perturbed feature vectors may be included in the classifiers sent to A, and A may try to reconstruct B’s original feature vectors. Given that the cloud is untrusted, what incentive is there for the cloud to build the classifier correctly (i.e., why should users trust the cloud for developing accurate classifiers)? We believe market pressures will force providers of the collaborative learning “service” to provide accurate results, especially if there is a revenue opportunity in collaborative learning. Exploring these revenue opportunities is beyond the scope of this work, but we believe there will be revenue opportunities, since a service provider can sell accurate classifers (of, for example, context) to a large population (e.g., all Facebook users who may be interested in automatically updating their status based on context). 3.3 Privacy-Preserving Collaborative Learning In this section, we discuss a novel approach to preserving the privacy of collaborative learning, called Pickle. 3.3.1 Pickle Overview In Pickle (Figure 3.1), each user’s mobile phone takesN training feature vectors, where each vector has P elements, and pre-multiplies the resultingPN matrix by a private, random matrix R, whose dimen- sionality isQP. This multiplication randomly perturbs the training feature vectors. Moreover, we set Q < P, so this reduces the dimensionality of each feature vector. A dimensionality-reducing transforma- tion is more resilient to reconstruction attacks than a dimensionality preserving one [83]. In Pickle, R is private to a participant, so is not known to the cloud, nor to other participants (each participant generates 43 Training Data Training Data Training Data 1 Statistics Public Vectors 1 1 2 Perturbed Data Perturbed Model 2 2 1 2 Regression Phase Training Phase USER SIDE CLOUD SIDE Figure 3.1: Illustrating Pickle. his/her own private random matrix). This multiplicative perturbation by a private, random matrix is the key to achieving privacy in Pickle. A dimensionality-reducing transformation does not preserve important relationships between the fea- ture vectors, such as Euclidean distances and inner products. For instance, the inner product between two data points x i and x j now becomes x T i R T Rx j . This is not identical to x T i x j unless R is an orthonor- mal matrix which necessarily preserves dimensionality. A dimensionality-reducing transformation can approximately preserve Euclidean distances [67], but even this property is lost when dierent participants use dierent private random matrices; in this case, the Euclidean distances and inner products for perturbed feature vectors from dierent users is no longer approximately preserved. Distortion in these relationships can significantly degrade classification accuracy when used directly as inputs to classifiers (Section 3.4.3). In this chapter, our focus is on methods that maintain high classification accuracy while preserving privacy. The central contribution of this chapter is the design of a novel approach to approximately re- construct those relationships using regression, without compromising the privacy of the original feature vectors, while still respecting the processing and communication constraints of mobile devices. 44 To do this, Pickle learns a statistical model to compensate for distortions in those relationships, then approximately reconstructs distance or inner-product relationships between the users’ perturbed feature vectors, before finally constructing the classifier. Conceptually, here is how Pickle works. 1. Users generate labeled raw data at their convenience: for example, Pickle software on the phone may collect audio clips, then prompt the user to label the clips. 2. Once a piece of raw data is labeled, the software will extract feature vectors, perturb them using R, and upload them, along with the corresponding label, to the cloud; as an optimization, the software may batch the extraction and upload. (In what follows, we use the term user, for brevity, to mean the Pickle software running on a user’s mobile device. Similarly, we use the term cloud to mean the instance of Pickle software running on one or more servers on a public cloud.) 3. When the cloud receives a sucient number of labeled perturbed feature vectors from contributors (the number may depend on the classification task), it constructs the classifier and sends a copy to each user. Before the classifier is generated, the cloud learns a model to compensate for the distortion introduced by perturbation. Specifically, in this regression phase: 1. The cloud sends to each participating user a collection of public feature vectors. 2. The user perturbs the cloud-generated feature vectors using its private transformation matrix and returns the result to the cloud. 3. The cloud employs regression methods to learn approximate functions for computing the desired mathematical relationships between feature vectors. The key intuition behind our approach is as follows. Pattern classifiers can eectively discriminate between dierent classes by leveraging the most important covariance structures in the underlying training data. Our regression phase learns these structures from the transformed representations on public data. However, our 45 privacy transformation suciently masks the less important components that would be required to generate the original feature vectors. This is why we are able to build accurate classifiers even without being able to regenerate the original feature vectors. 3.3.2 The Regression Phase Step 1: Randomly Generate Public Feature Vectors. In this step, the cloud randomly generatesM (in our chapter, we set M to 3P) public feature vectors as a PM matrix Z and sends this matrix to each user. The random public feature vectors have the same dimensionality as true feature vectors. In Pickle, the cloud synthesizes random public feature vectors using summary statistics provided byU users. In this method, each user sends to the cloud the mean and the covariance matrix of its private training data, derived from a fixed number (in our chapter, 4P) of its feature vectors. The cloud generates a Z that approximates the statistical characteristics of the training feature vectors of all theU users; this matrix, generated using an equally-weighted Gaussian mixture model that simulates the true distribution of user data, is used in the next two steps to learn relationships between the feature vectors and are not used to build classifiers. This method never transmits actual private feature vectors to the cloud, so preserves our privacy goal. Moreover, although the cloud knows the mean and the covariance, this information is far from sucient to generate accurate individual samples since two random draws from the same continuous distribution have zero probability of being identical. Despite this, it is possible that sample statistics of the feature vectors may leak some private information; to limit this, Pickle generates sample statistics from a very small number (4P) of data samples. Finally, in this step, the public feature vectors need not be labeled. Step 2: Perturb the Public Feature Vectors. The high-level idea in this step is to perturb Z in exactly the same way as users would perturb the actual training feature vectors. Concretely, a userU generates a private random matrix R u 3 , computes the perturbed public feature vectors R u Z, and sends R u Z to the cloud. 3 The user can choose a task-specific R u . However, once chosen, the matrix is fixed, though private to the user. A dynamically varying R u will incur high computational cost, due to the Regress phase in the next step. 46 However, this approach has the following vulnerability. If Z is invertible, the private R u can be recov- ered by the cloud when it receives the perturbed vectors R u Z: the cloud simply computes R u ZZ 1 . To raise the bar higher, Pickle computes and sendsR u (Z+ u ) to the cloud, where u is an additive noise matrix. The cloud would then need to know R u u in order to apply the inversion to obtain an accurate R u . Unlike the public feature vectors Z, however, u is private to the user. The elements of R u are drawn randomly from either a Gaussian or a uniform distribution. u has the distribution ofN( u ; 0; u Z ), where Z is the (sample) covariance matrix of Z. u is tunable, controlling the intensity of the additive noise [34]. As we show in Section 3.4.3, higher privacy can be obtained by using smaller values of Q (i.e., greater reductions in dimensionality) or bigger values of u . Step 3: Regress. The regression step is executed on the cloud. We describe it for two users; extending it to multiple users is straightforward. Assume usersU andV have chosen random matrices R u , u and R v , v respectively. The cloud receives Z u = R u (Z + u ) and Z v = R v (Z + v ). The key idea is to use Z u and Z v to approximate quantities which are pertinent to classification. Concretely, let and be two indicator variables that;2fu; vg. Also, let z i stand for the i-th public feature vector and z i the i-th transformed feature vector by R . In other words, z i and z i are the i-th columns of Z and Z . Intuitively, we would like the cloud to be able to recover the original relationships from the perturbed feature vectors. For this, we learn four functions ( f uu , f uv , f vu and f vv ) in the form of f (z i ;z j ; ) that can approximate well a certain function f (z i ;z j ) of (particularly, the distances or the inner products between) z i and z j . is the parameter vector of the function. Once these functions are learnt, they are applied to actual training data sent by users (Section 3.3.3). The parameter vectors are thus of critical importance. To identify the optimal set of parameters, we have used linear regression (LR). We now show how to approximately recover the concatenation of public feature vectors z i and z j (i.e., f (z i ;z j ) = [z T i ;z T j ] T ) using LR. The models then can be used to compute inner products and distances approximately on transformed actual training feature vectors from 47 users 4 . The approximated quantities will then be supplied to learning algorithms to construct classifiers (Section 3.3.3). For each pair of and , let Q be the matrix whose columns are concatenated z i and z j withM 2 columns (the number of total possible concatenations is M 2 , since there are M public feature vectors). Also, let Z C denote the matrix whose columns are concatenated z i and z j . Note that Q has 2Q rows, whereQ is the row-dimensionality of each user’s private transformation matrix R or R (for simplicity of description, we assume the dimensionality is the same for the two users; Pickle allows dierent users to choose dierentQ). Z C has 2P rows, whereP is the row-dimensionality of the public or original feature vectors. For linear regression, we use this equation to obtain for f Z C = Q (3.2) where the parameter is a matrix with the size of (2P 2Q). The optimal parameter set is thus found in closed form as = Z C Q + , where + denotes the pseudo-inverse. Our implementation of Pickle uses one optimization, called iPoD. In iPoD, the cloud can avoid calcu- lating the regression functions f uu and f vv (i.e., when =) by asking users directly for the corresponding inner products calculated from their own feature vectors. These inner products do not reveal the individual feature vectors. This trades o a little communication overhead (quantified in Section 3.4.2) for improved accuracy. Instead of linear regression, we could have used Gaussian Process Regression (GPR). We have found in preliminary experiments that GPR marginally improves accuracy over LR but is significantly more compute-intensive, so we omit a detailed description of GPR. Finally, all the schemes described above extend to multiple users naturally: Pickle simply computes 4 (or 2 when using iPoD) regression functions for every pair of users. 4 It is also possible to directly regress inner products and distances using the functions but we have experimentally found that directly regressing these quantities does not result in improved accuracy over the methods described. 48 3.3.3 Model Generation and Return Building the Classifier. After the cloud learns the functions f with the procedure in the previous section, it is ready to construct pattern classifiers using training samples contributed by users. In this step of Pickle, each userU collects its training feature vectors X u (in which each column is one feature vector), then perturbs these feature vectors with its private R u . Each perturbed feature vector, together with its label, is then sent to the cloud. Using perturbed feature vectors from each user, the cloud generates the classification model. Let x ui denote the unperturbed i-th feature vector from userU and likewise x v j for the userV. More- over, let ˜ x ui = R u x ui ; ˜ x v j = R v x v j (3.3) denote the perturbed feature vectors. Using the regression parameters obtained from Equation (3.2), the cloud first attempts to reconstruct the concatenation of x ui and x v j , 2 6 6 6 6 6 6 6 6 6 6 6 6 4 x ui x v j 3 7 7 7 7 7 7 7 7 7 7 7 7 5 f uv (x ui ;x v j ; uv ) = uv 2 6 6 6 6 6 6 6 6 6 6 6 6 4 ˜ x ui ˜ x v j 3 7 7 7 7 7 7 7 7 7 7 7 7 5 , 2 6 6 6 6 6 6 6 6 6 6 6 6 4 r u j r v j 3 7 7 7 7 7 7 7 7 7 7 7 7 5 where r u j and r v j are P-dimensional vectors. The cloud then approximates the inner product with the reconstructed feature vectors, x T ui x v j r T ui r v j . Similarly, to approximate the distance between two feature vectors, we use 5 kx ui x v j k 2 2 r T ui r ui 2r T ui r v j + r T v j r v j (3.4) 5 In the iPoD optimization, the first and last terms of the RHS in (3.4) can be obtained directly from the users. 49 Once inner products or distances are approximated, the cloud can build SVM or kNN classifiers using the following simple substitution: whenever these algorithms need the distances or inner products of two feature vectors, the approximated values are used. Model Return. In this step, the cloud returns the model to users, so that classification may be performed on individual phones; for the details of the classification algorithms, we refer the interested reader to [43]. The information returned depends upon the specific classifier (e.g., when using SVM, the support vectors must be returned), but must include all functions f and associated parameters for every pair of users. These are required because the classification step in many classifiers also computes distances or inner products between the test feature vectors and training feature vectors presented in the model (e.g., the support vectors in SVM); all of these vectors are perturbed so their distances and inner products must be estimated using the f functions. 3.3.4 Privacy Analysis Recall that the privacy goal in Pickle is to ensure the computational hardness of de-noising user contribu- tions by the cloud (either by itself, or in collaboration with other users) and thereby inferring X. We now show a userU who follows the steps of the protocol does not leak vital information which can be used to de-noise user contributions. In the protocol,U sends data to the cloud in Steps 1, 2 and 4 only. In Step 1,U sends the mean and covariance matrix of a small number of its private training samples. Using this, the cloud can construct synthetic vectors whose first and second-order statistics match that of U’s private data, but clearly cannot reconstruct X u . In Step 2,U sends R(Z +) to the cloud. One might assume that the cloud can filter out the additive noise R and then recover R by using the known Z 1 . However, existing additive noise filtering techniques (such as spectral analysis [71], principal component analysis, and Bayes estimation [64]) need to know at least the approximate mean and the approximate covariance of the additive noise. In Pickle, the cloud cannot know, or estimate with any accuracy, the covariance of R, since that depends upon R, a quantity private to the user. 50 Finally, in Step 4,U sends RX to the cloud. The privacy properties of this dimensionality-reducing transform are proven in [83], which shows that X cannot be recovered without knowing R — that is because there are infinite factorizations of ˜ X in the form of RX. In fact, even if R is known, because the resulting system of equations is under-determined, we can only reconstruct X in the sense of minimum norm. Given this, using u provides an additional layer of privacy. u is a random matrix with real-valued elements, so it is highly infeasible for an adversary to guess its values successfully using brute force. The adversary may attempt to find approximate values for u , but would still be faced with the challenge of determining whether the resulting approximate value for R u is correct; the only way to do this is to attempt to reconstruct the original feature vectors and see if they reveal (say) meaningful human speech or other recognizable sounds, and this is also computationally hard, as described above. However, it may be possible for an attacker to approximate X using a reconstruction attacks. In Section 3.4.3, we show that Pickle is robust to these attacks as well. Finally, Pickle is robust to collusion between the cloud and users. Since each userU independently selects a secret R, and since its feature vectors are encoded using this secret, another user cannot directly compute any ofU’s original feature vectors from perturbed feature vectors it receives from the cloud (for the same reason that the cloud itself cannot compute these). A similar robustness claim holds for collusion between cloud servers. 3.3.5 Other Properties of Pickle Besides ensuring the privacy of its training feature vectors, Pickle has several other properties. Pickle is computationally-ecient on mobile devices, and incurs minimal communication cost. It re- quires two matrix multiplications (one for the regression stage and the other during training); classification steps require computing distances or inner products. It transmits a few matrices, and a classification model over the network. All of these, as we shall validate, require minimal resources on modern phones, and modest resources on the cloud infrastructure. 51 Pickle requires no coordination among participants and provides flexible levels of privacy. Each user can independently choose the privacy transformation matrix R, and communicates only with the cloud. Users can also independently set the level of desired privacy by selecting the dimensions of R or the intensity of the additive noise matrix. In Section 3.4.3, we explore the implications of these choices. Pickle disincentivizes free-riding. A userU who does not contribute training samples, can get the model from other users, but, to use it, must also participate in at least the regression phase so that the cloud can compute f uv and f vu for all other usersV whose vectors are included in the classifier. Although we have discussed Pickle in the context of classification, it extends easily to other tasks like non-linear regression and estimating distributions; these tasks arise in the context of participatory sensing [19, 23, 36, 47, 65, 113]. Beyond SVM and kNN, Pickle can be applied to all kernel based classification and regression methods that use distances or inner-products to establish relationships between training samples. One can simply replace these distances or inner products with approximations derived by applying Pickle’s regression functions. Finally, Pickle can be made robust to poisoning attacks in which a few malicious users attempt to inject bogus data in order to render the model unusable. 6 For classification algorithms which build robust statistical models, attackers must inject distributionally dierent feature vectors in order to succeed. Prior work has examined these kinds of attacks and have proposed a distance-based approach, called Orca, to detecting poisoning attacks [31]. Because Pickle can approximately preserve distances, the cloud can run Orca even though it receives only perturbed data, as shown in Section 3.4.3. 6 Attacks in which a majority of contributors poison the model require other mechanisms. Such attacks can render a model completely useless for the corresponding classification task. In that case, a company that sells these collaboratively-designed models may oer monetary incentives to contributors, but only if the resulting model is shown to be accurate. Discussion of such mechanisms is beyond the scope of the chapter. 52 3.4 Evaluation of Pickle In this section, we perform three qualitatively dierent kinds of evaluation: a user-study which brings out the benefits of Pickle, especially for applications like speaker recognition where the un-perturbed feature vectors are known to leak privacy; measurements on a prototype that quantify the resource cost of Pickle; and an extensive characterization of the privacy-accuracy tradeos in Pickle, together with a comparison of alternatives, using an evaluation on public data-sets. 3.4.1 Pickle Privacy: A User Study In a previous section, we asserted that a commonly used feature vector for acoustic applications, MFCC, could be used to approximately reconstruct original audio clips. In this section, we demonstrate this using a small-scale user-study on acoustic data, and show that: a) a scheme like Pickle is necessary, since without it, almost the entire audio clip can be reverse-engineered from unperturbed MFCC feature vectors; b) Pickle can mitigate this privacy leakage without significant loss in classification accuracy. MFCC is widely used in acoustic mobile applications, like [30,85,86,88,103]. In particular, MFCC can be used to recognize speakers [85, 103] or their genders [88]; collaborative learning can be used to build models for both these applications. To quantify the ecacy of Pickle in MFCC for speaker recognition, we used spoken sentences from four volunteers (two men and two women) in the TIMIT dataset [55], and trained SVM (with RBF) models by extracting the standard 13-dimensional MFCC feature vectors from the audio clips, with and without Pickle. For Pickle feature vectors with a 50% dimensionality reduction and a 0.3 intensity of additive noise (denoted by (50%; 0:3)), recognition accuracy is degraded only by 4.32%! We leave a more detailed discussion of Pickle’s impact on accuracy for a later section, but now demonstrate how, with minimal accuracy loss, Pickle can greatly reduce privacy leakage. To this end, we conducted a user study which used eight testing sentences (81 words) from the training set used to construct the classifier. For each sentence, users were asked to listen to three versions of this sentence in the following order: (i) a Pickle-MFCC audio clip generated by first applying a reconstruction 53 0 0.2 0.4 0.6 0.8 1 #1 #2 #3 #4 #5 #6 #7 #8 RR Sentence number Pickle Ͳ MFCC RR MFCC RR Figure 3.2: User Study Results attack (Section 3.4.3.3) to (50%; 0:3) Pickle-transformed MFCC feature vectors 7 , and then applying [48] to reverse-engineer the audio clip from the estimated feature vector; (ii) an MFCC audio clip generated directly from the MFCC feature vectors using the method described in [48]; and (iii) the original audio clip. Users were asked to write down all the words they recognized in each of the clips. Seventeen users participated in the study, having varying levels of proficiency in English. For each participant, we calculated two Recognition Ratios (RRs) for each sentence: Pickle-MFCC RR, is the ratio of the number of words recognized from the Pickle-MFCC clip divided by the number of words recognized in the original clip; and MFCC RR, is the ratio of the number of words recognized in the MFCC clip to that recognized in the original clip. As Figure 3.2 shows, Pickle oers very good privacy protection; averaged across all sentences, Pickle has an RR of only 1.75%, while the MFCC-reconstructed clips have an RR of 73.88%. Of the words recognized in Pickle-ed clips, most were articles, prepositions or linking verbs, but three users recognized the phrase “below expectations” in one sentence, and one user recognized the words “informative prospective buyer” in another sentence. These words provide minimal information about the original sentences, since they lack vital context information. While a more extensive user study is the subject of future work, our study shows that, without Pickle, a collaborative learning task for speaker recognition can leak a majority of words in audio clips when MFCC is used as a feature vector; using Pickle, a negligible minority is leaked. 7 The MFCC feature vectors were generated using 25ms overlapping frames with an inter-frame “hop” length of 10ms 54 0.95 0.96 0.97 0.98 0.99 1 4 8 16 Dimensionality Accuracy real (25%, 0.3) (50%, 0.3) Figure 3.3: Accuracy of accelerometer-based classifier construction 3.4.2 Pickle Resource Costs: Prototype Evaluation Prototype System. We have implemented a prototype of Pickle (Figure 3.4) which consists of two com- ponents: software for Android 2.3.4 (about 8K lines of Java code, about half of which is the Pickle-SVM engine) and Windows Mobile 6.5 (about 11K lines of C# code, about 40% of which is the Pickle-SVM en- gine), and software for the cloud, written with .Net 4.0 framework (about 8K lines of code in C#, of which the Pickle-SVM engine is shared with the phone code). The prototype supports all functions required by Pickle, including regression parameter construction and interaction, raw data collection, feature vec- tor generation, transformation, upload and classification, user labeling, outlier detection, model training, and model return. The phone software supports the collection and processing of acceleration and acoustic data, and the cloud component builds a Pickle-SVM classifier with four optional kernels. Support for other sensors and other classifiers is left to future work. Experimental Methodology. Using this prototype, we have collected 16,000 accelerometer-based feature vectors, collected using smartphones, for the purpose of evaluating the resource costs for collaborative learning. For evaluating collaborative training, we cluster these feature vectors among 160 users each of whom submits, on average, 100 training samples. As shown in Figure 3.3, when each feature vector has 16 dimensions, the resulting classifier has an accuracy of over 97% even when feature vectors are perturbed by (50%; 0:3). A more detailed analysis of classifier accuracy is discussed in Section 3.4.3. Our data 55 Window based Framing Window based Framing Store and Processing Store and Processing Pickle Perturbation Pickle Perturbation Feature Extraction Feature Extraction Local Classification Local Classification Store Store Outlier Detection and Pre Ͳ processing Outlier Detection and Pre Ͳ processing Cross Validation and Parameter Search Cross Validation and Parameter Search Four Optional Kernels Four Optional Kernels Mobile Phone Processing Cloud Processing Acceleration/Acoustic Waveform Other Use Labels Provided Perturbed Data from other Users Feedback Pickle Regression Parameters Obtained from User Ͳ Cloud Interactions Model + Perturbed Data Figure 3.4: Architecture of the prototype system set is adversarially chosen; prior work on mobile-phone based sound classification [86] has used half the number of dimensions and an order of magnitude smaller training data set. Thus, from our experiments, we hope to understand how high Pickle’s resource costs can be in practice. We report on experiments conducted on both a Nexus One and an HTC Pure phone and our “cloud” is emulated by a cluster of 4 Intel(R) Core(TM)2 Duo 2.53 GHz PCs, with 3GB RAM, running Windows 7. Communication Overhead. In Pickle, each user needs to send the perturbed data, R u X u , and inner prod- ucts calculated from her own feature vectors, X T u X u to the cloud, which incurs communication overhead. (The overhead of sending the public feature vectors Z and having each user return R u (Z + u ) to the cloud is relatively small since the number of feature vectors is small (Section 3.3.2), so we do not report the cost of this operation). Since our privacy transformation reduces dimensionality, the communication cost of sending the perturbed data is actually lower than the cost of the original data. In our experiment, we use a privacy transformation, with relatively higher communication cost, which reduces dimensionality by only 25%, and adds 0.3 intensity additive noise. In our implementation, each user’s perturbed training samples only requires 15KB for the transformed feature vectors with labels and 94KB for the inner products. For comparison, the original training samples without perturbation require 21KB. The final component of the communication cost is the size of the returned model. This cost has two components for Pickle-SVM: the set of model parameters and perturbed support vectors, and the collection of regression coecients (each user needs to download only her own regression coecients, not the entire 56 5 20 40 60 80 100 120 140 160 0 100 200 300 400 500 600 700 The number of users File size (KB) Regression file size per user Pickle-SVM model file size SVM model file size (a) Communication Cost 5 20 40 60 80 100 120 140 160 0 2 4 6 8 x 10 6 The number of users Time Cost (ms) SVM Tr. - 1 core Pickle-SVM Tr. - 1 Pickle-SVM Tr. - 2 Pickle-SVM Tr. - 4 Pickle-SVM Tr. - 8 Regression (b) Processing Cost Figure 3.5: Processing and Communication Cost set of coecients). For 160 users, the former is 222 KB (Figure 3.5(a)), and the latter is 585 KB per user. For comparison, the model size for 160 users without Pickle is 264 KB. Pickle’s dimensionality-reduction results in a smaller model size. Overall, these numbers are well within the realm of practicality, especially because our evaluation setting is adversarial and our implementation is un-optimized. For example, simply compressing the data, a factor of 2-3 reduction in transfer overhead can be obtained. Computational Cost. On the mobile phone, it takes on average (over 50 runs) of less than 1 ms on both the Nexus One and the HTC Pure to multiplicatively transform a feature vector. Classifying a test vector on large model constructed from 160 users takes on average 266.7 ms and 477.6 ms on the two phones respectively. For comparison, classifying on a model generated from pure SVM (without perturbation) takes on average 128.1 ms and 231.6 ms on the two phones. The main overhead of Pickle comes from using the regression coecients to estimate distances from the vector to be classified to the support vectors. Both of these numbers are reasonable, especially since our dataset is large. On the cloud, the processing of outlier detection is very fast – only 10.55 ms on average (all numbers averaged over 10 runs). Computing regression coecients for pairs of users is shown in Figure 3.5(b), the average cost increasing from 0.55 s to 723s as the number of users increases from 5 to 160. However, the cost of model generation on the cloud is significantly higher, on average about 2.13 hours on a single core. 57 Without Pickle, model generation is a factor of 2 faster (Figure 3.5(b)). Again, Pickle’s overhead mainly comes from the regression calculations. However, a major component of model generation, performing a grid search to estimate optimal pa- rameters for Pickle-SVM, can be parallelized, and we have implemented this. As shown in Figure 3.5(b), as the number of cores increases, an almost linear speed-up can be obtained; with 8 cores, model genera- tion time was reduced to 0.26 hours. Devising more ecient parallel algorithms for model generation is left to future work. Finally, as discussed in Section 3.3, a user who has not contributed feature vectors can use the generated model, but the cloud needs to compute regression coecients for this new user, relative to other users whose vectors are included in the classifier model. This computation can be performed incrementally, requiring only 8.71s in our 160 users experiment, and adding 160 regression coecient entries (222 KB) that need to be downloaded only by the new user. 3.4.3 Accuracy/Privacy Tradeos and Comparisons: Dataset Evaluation In this section, we evaluate Pickle’s ability to preserve privacy without sacrificing classification accuracy by analyzing public datasets. We also explore the sensitivity of our results to the dierent design choices presented in Section 3.3. 3.4.3.1 Methodology Data Sets. We use four datasets to validate Pickle: Iris, Pima Indians Diabetes, Wine, and Vehicle Silhou- ettes. The datasets are from the UCI Machine Learning Repository 8 , and are some of the most widely-used datasets in the machine-learning community. All the feature values in each dataset are scaled between 0 and 1. Users. We partition each data set into several parts to simulate multiple users with private data. To do this, we clustered the feature vectors in each data set using the standard K-means clustering algorithm and 8 http://archive.ics.uci.edu/ml 58 assigned each cluster to one “user”. (Random partitions would not have been adversarial enough as our main goal is to collaboratively learn from data with disparate statistics.) Using this method, the number of users is 2, 5, 2, 5 for the four datasets respectively. Although these numbers of users are small relative to our targeted scale, we note that the level of privacy and the classification accuracy are not likely to become worse with more users. If anything, classification accuracy will improve with more users since one has more and diverse training data. In our experiments, we use all four datasets to evaluate the performance with 2 users, and also use the Diabetes and Vehicle datasets to test the performance with 5 users. After partitioning the data across users, we randomly select 80% of the labeled feature vectors from each user as the training data, and use the remaining for testing. Classifiers. We evaluate the eectiveness of Pickle using two common pattern classifiers: Support Vector Machine (SVM) and k-Nearest Neighbor (kNN). We experiment SVM with the most widely-used RBF ker- nel as well as the Linear kernel and tune SVM parameters using standard techniques like cross-validation and grid-search. We use a more accurate variant of kNN called LMNN [128] which uses Mahalonobis distances instead of Euclidean distances. 3.4.3.2 Evaluation Metrics We use two metrics to evaluate the eectiveness of Pickle. The first assesses how much privacy is preserved and how likely users’ private data are to be compromised. The second measures how much Pickle’s accuracy is aected by its privacy transformations. The Privacy Metric. Pickle distorts feature vectors to “hide” them. One way to measure privacy is to quantify the extent of this distortion. We use a slightly more adversarial privacy metric from prior work [20,124], which measures the “distance” between the original feature vector and an estimate for that vector derived by mounting a reconstruction attack. Specifically, let x d u stand for the d-th dimension of the feature vector x u , and h d u be the corresponding dimension in the reconstructed vector. Then, we can define 59 ` ud (0 ` ud 1) to be the dierence in the distributions of these two quantities, and the privacy metric` (0` 1) as` ud averaged over all users and dimensions. Intuitively, the larger the `, the more confident we are that privacy is preserved. When ` is zero, we are less confident. Note that we cannot infer directly that privacy is violated when ` = 0, as the metric only measures dierence in expectation. Furthermore, the metric is not perfect, since if the original and reconstructed vectors are distributionally dierent then, regardless of the magnitude of this dierence,` is 1. Finally, we emphasize that` is defined with respect to a specific attack on the perturbed feature vectors. Classification Accuracy. A privacy transformation can adversely aect the classification accuracy, so we are interested in measuring classification accuracy under dierent privacy levels. We compute the accuracy in a standard way, as the percentage of correctly classified test feature vectors among all test feature vectors. All reported results are averaged over 20 random splits of training, validation and testing data sets. 3.4.3.3 Attack models and Privacy In Section 3.3, we had already discussed a few attack strategies, to which Pickle is resilient. We now discuss somewhat more sophisticated attacks that are based on an intimate knowledge of how Pickle works. The Reconstruction Attack. Dimensionality-reduction techniques can be attacked by approximate recon- struction. By reconstructing original data to the extent possible, these attacks function as a preprocessing step to other types of attacks. In Pickle, the cloud sends the public data Z to a userU and receives trans- formed ones Z u = R u (Z + u ). While the cloud cannot decipher R u and u , can the cloud use its knowledge to infer important statistical properties of these variables to approximately reconstruct the user’s data when she sends actual training vectors for building classifiers? One possible approach is to build a regression model such that Z h u (Z u ;). When the user sends R u X u , the cloud applies the regression model and tries to recover H u h u (R u X u ;). 60 0.3 0.4 0.5 0.6 0.7 Privacy Diabetes Iris Wine Vehicle 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 25% 50% 75% Privacy Diabetes Iris Wine Vehicle Figure 3.6: Eect of reconstruction attack on privacy Figure 3.6 shows that, even when this attack uses Gaussian Process Regression 9 , Pickle still provides significant privacy. To generate the plot, we compute` for this attack, for various combinations of mul- tiplicative and additive transformations: reducing the dimensionality for the multiplicative transform by 25%, 50% and 75% of the original dimensionality, and adding noise with intensities (Section 3.3.2) rang- ing from 0.1 to 0.5 in steps of 0.1. The figure shows the resulting privacy-level metric for each combination of additive and multiplicative transforms under the attack; the resulting privacy levels range from 0.1-0.7. Thus, depending on the degree to which the training data have been transformed, Pickle can be significantly robust to this attack. The intuition for why Pickle is robust to this reconstruction attack is as follows. Pickle’s regression phase learns about relationships between users enough to discriminate amongst them. However, the re- gression is not powerful enough to generate the original samples; intuitively, much more information is necessary for generation than for discrimination. Followup ICA Attack. The cloud can also improve its estimate H u with a followup strategy. For example, ICA can be used for this purpose [39, 83]. However, we have experimentally verified that this strategy is unsuccessful with Pickle – the ICA algorithm fails to converge to meaningful solutions. 3.4.3.4 Classifier Accuracy In this section, we discuss results for the classification accuracy of SVM (with RBF and Linear kernels) and LMNN, using Pickle for 2 users from each dataset. Results for Diabetes and Vehicle with 5 users are 9 To the best of our knowledge, Gaussian Process Regression is one of the most accurate regressions for reconstruction. 61 omitted but are qualitatively similar except that they have higher classification accuracy because they have a larger training set. These experiments on each of our four data sets use a baseline configuration which uses synthesized public feature vectors and iPoD. In subsequent sections, we deviate from this baseline configuration to examine dierent design choices. Figure 3.7 plots the classification accuracy for each data set as a function of the privacy-level, for the SVM classifier with the RBF kernel. In this plot, the horizontal line shows the classification accuracy without Pickle. For this classifier, across all four data sets, the loss in classification accuracy is less than 6% for privacy levels up to 0.5; in the worst case (Wine) classification accuracy drops by 15% for a privacy-level of 0.65. This is an important result of the chapter: even when Pickle transforms data so that reconstructed feature vectors are distributionally dierent from the original ones, classification accuracy is only modestly aected. Other features are evident from this figure. In general, classification accuracy drops with privacy- level, but the relationship is non-monotonic: for example, for the Diabetes dataset, 50% reduction with 0.1 intensity of additive noise has higher privacy, but also higher accuracy than 25% with 0.5 intensity. Second, the RBF kernel outperforms the Linear kernel (graphs are omitted to save space) for which a 0.5 privacy-level results in a 10% reduction in classification accuracy over all datasets, and nearly 20% in the worst case. Finally, Pickle performs well even for nearest neighbor classification (figures omitted for space rea- sons). For LMNN with k = 5, Pickle is within 5% of the actual classification accuracy for each data set for privacy levels to 0.5, and in the worst case incurs a degradation of about 15%. Moreover, for LMNN, in some cases Pickle is even more accurate than without any privacy transformations. This is likely due to the regularization eect caused by noise (either additive or as a result of regression), which prevents overfitting of the models. 62 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.74 0.75 0.76 0.77 0.78 0.79 0.8 Privacy Accuracy 25% reduction 50% reduction 75% reduction real accuracy (a) Diabetes 0.32 0.38 0.44 0.5 0.56 0.62 0.68 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 Privacy Accuracy 25% reduction 50% reduction 75% reduction real accuracy (b) Iris 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.82 0.85 0.88 0.91 0.94 0.97 1 Privacy Accuracy 25% reduction 50% reduction 75% reduction real accuracy (c) Wine 0.18 0.25 0.32 0.39 0.46 0.53 0.6 0.65 0.67 0.69 0.71 0.73 0.75 0.77 0.79 Privacy Accuracy 25% reduction 50% reduction 75% reduction real accuracy (d) Vehicle Figure 3.7: Accuracy-Privacy tradeo of SVM with RBF Kernel 3.4.3.5 Comparison In this section, we compare Pickle for SVM (with RBF kernel 10 ), using the baseline configuration dis- cussed above, against three previously-proposed approaches for preserving the privacy of feature vectors. As we show below, compared to Pickle, these approaches either do not preserve privacy adequately, or are significantly inaccurate. The first algorithm only adds additive noise [34] and uses Bayes estimation [64] to attack the perturbed feature vectors. For this alternative, we compute the privacy-level based on the Bayesian reconstruction. This alternative is chosen to understand the performance of a simpler additive perturbation. The second algorithm uses the Random Projection (RP) [83] in which each user transforms feature vectors using the same multiplicative noise matrix R. To be robust to inversion, the dimensionality of R is reduced by more 10 Results for Linear kernel and LMNN are omitted but are qualitatively similar 63 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.6 0.65 0.7 0.75 0.8 0.85 Privacy Accuracy 25% − Pickle 50% − Pickle 75% − Pickle 50% − RP 75% − RP KDE Add real (a) Diabetes dataset 0.06 0.17 0.28 0.39 0.5 0.61 0.72 0.22 0.31 0.4 0.49 0.58 0.67 0.76 0.85 Privacy Accuracy 25% − Pickle 50% − Pickle 75% − Pickle 50% − RP 75% − RP KDE Add real (b) Vehicle dataset Figure 3.8: Comparison of Pickle to several alternatives than 50% relative to the dimensions of the original feature vectors. For this case, we derive the privacy- levels by using a pseudo-inverse based attack [83]. Our third algorithm is a KDE-based method ( [124]), in which users never send true data, but only send synthetic data drawn from the estimated feature vector distributions. For this case, we compute the privacy-levels using the transformed feature vectors. As Figure 3.8 shows, on the Diabetes and Vehicle datasets with 5 users, Pickle outperforms all alter- natives. The additive noise based approach 11 produces acceptable accuracy, but almost no privacy. The KDE-based method oers a little bit more privacy than the additive noise method, but with a significantly degraded accuracy. Finally, the RP method provides, in general, lower privacy than Pickle, and also lower accuracy for data points with comparable privacy. The same results are true for all the four datasets with 2 users, so we have omitted these for lack of space. 3.4.3.6 Impact of Design Choices Is Regression Necessary? Our main contribution is the use of regression to learn function relationships. For all our datasets and classifiers, turning o regression and using the transformed feature vectors directly for computing distances and inner products, leads to 15-35% accuracy degradations compared to Pickle 11 For this approach and each dimensionality setting of Pickle, we changes the additive noise intensity from 0.1 to 0.5 in steps of 0.2. 64 (graphs are omitted for space reasons). This drop is unacceptable for most applications, and motivates the importance of our approach. Other Design Choices. Disabling the iPoD extension can reduce accuracy up to 7% for SVM and up to 6% for LMNN, so it is beneficial for Pickle to use iPoD. As we have discussed, the bandwidth cost of transmitting this matrix is very small. We have also experimented with other public feature vector genera- tion methods: a DIRECT method in which the cloud obtains a few unperturbed feature vectors from users; a NOISY method which adds additive noise to the vectors of the DIRECT method; and an ARBITRARY method in which the cloud arbitrarily generates public vectors. We find that our SYNTHESIS method oc- cupies a sweet spot: it is significantly more accurate, but not much less private, than ARBITRARY, and provides higher privacy, without sacrificing accuracy, than the other two methods. 3.4.3.7 Illustrating Other Features of Pickle User Diversity. Pickle allows users to independently tune their own privacy transformations. Using SVM with RBF kernel (results for Linear kernel and LMNN are omitted but are qualitatively similar), Figure 3.9 considers the case of two dierent privacy settings: a 25% dimensionality reduction with 0.1 intensity ad- ditive noise and a 75% dimensionality reduction with 0.5 intensity additive noise. It plots the classification accuracy for three cases: when all users use the first setting, when all users use the second setting, and when users use a mixture of those two settings. The resulting classification accuracy is intermediate for the mixture setting, relative to the other settings. This is encouraging: a less encouraging outcome would have been if the accuracy of the mixture setting was closer to the second setting, since this would mean that users with high privacy requirements could dictate the performance of Pickle. Robustness to Poisoning. We have implemented the Orca [31] outlier (malicious user) detection algo- rithm as discussed in Section 3.3.5), and use our estimates of Euclidean distance in that algorithm. Orca essentially ranks suspicious feature vectors, so we conduct three experiments in which there are 5 users and 0, 1 and 2 of them (respectively) attempt to poison the model generation process by injecting completely 65 0.5 0.6 0.7 0.8 0.9 1.0 Iris Wine Accuracy 25% + 0.1 Mixture 75% + 0.5 (a) 2 users 75% + 0.5 0.5 0.6 0.7 0.8 0.9 1.0 Diabetes Vehicle Accuracy 25% + 0.1 Mixture 75% + 0.5 (b) 5 users Figure 3.9: Eect of user diversity on accuracy 0% 20% 40% 60% 80% 100% 25% + 0.1 50% + 0.3 75% + 0.5 25% + 0.1 50% + 0.3 75% + 0.5 25% + 0.1 50% + 0.3 75% + 0.5 No Outlier 1 Outlier 2 Outliers User 1 (Outlier 1) User 2 (Outlier 2) User 3 User 4 User 5 (a) Diabetes User 5 0% 20% 40% 60% 80% 100% 25% + 0.1 50% + 0.3 75% + 0.5 25% + 0.1 50% + 0.3 75% + 0.5 25% + 0.1 50% + 0.3 75% + 0.5 No Outlier 1 Outlier 2 Outliers User 1 (Outlier 1) User 2 (Outlier 2) User 3 User 4 User 5 (b) Vehicle Figure 3.10: Outlier detection against model poisoning random data. In Figure 3.10, we plot the fraction of the top-100 suspicious feature vectors that belong to each user. When there are no outliers, the distribution is uniform across all five users. However, in the presence of outliers, their feature vectors occupy a disproportionate number of the top hundred suspicious feature vectors. This experiment shows that Pickle can be easily retrofitted into an existing poisoning detector. In our experiments, we simply discard all outliers before building the classifier. However, it is also possible that a small amount of noisy data (e.g., mislabeled training samples) is contained in the user’s training data, but does not aect the data’s overall distribution. In this case, the classifier construction process can finally filter these non-representative samples by only selecting the most useful feature vectors for classification. 66 Chapter 4 DECAF: Detecting and Characterizing Ad Fraud in Mobile Apps 4.1 Overview Several recent studies have pointed out that advertising in mobile (smartphones and tablets) apps is plagued by various types of frauds. Mobile app advertisers are estimated to lose nearly 1 billion dollars (12% of the mobile ad budget) in 2013 due to these frauds [5]. The frauds fall under two main categories: (1) Bot-driven frauds employ bot networks or paid users to initiate fake ad impressions and clicks [5] (more than 18% impressions/clicks come from bots [14]), and (2) Placement frauds manipulate visual layouts of ads to trigger ad impressions and unintentional clicks from real users (47% of user clicks are reportedly accidental [13]). Mobile app publishers are incentivized to commit such frauds since ad networks pay them based on impression count [8, 9, 11], click count [8, 9], or more commonly, combinations of both [8, 9]. Bot-driven ad frauds have been studied recently [5,32,101], but placement frauds in mobile apps have not received much attention from the academic community. Contributions. In this chapter, we make two contributions. First, we present the design and implemen- tation of a scalable system for automatically detecting ad placement fraud in mobile apps. Second, using 67 Figure 4.1: Placement Fraud Examples a large collection of apps, we characterize the prevalence of ad placement fraud and how these frauds correlate with app ratings, app categories, and other factors. Detecting ad fraud. In Web advertising, most fraud detection is centered around analyzing server-side logs [134] or network trac [99, 100], which are mostly eective for detecting bot-driven ads. These can also reveal placement frauds to some degree (e.g., an ad not shown to users will never receive any clicks), but such detection is possible only after fraudulent impressions and clicks have been created. While this may be feasible for mobile apps, we explore a qualitatively dierent approach: to detect fraudulent behav- ior by analyzing the structure of the app, an approach that can detect placement frauds more eectively and before an app is used (e.g., before it is released to the app store). Our approach leverages the highly spe- cific, and legally enforceable, terms and conditions that ad networks place on app developers (Section 4.2). For example, Microsoft Advertising says developers must not “edit, resize, modify, filter, obscure, hide, make transparent, or reorder any advertising” [12]. Despite these prohibitions, app developers continue to engage in fraud: Figure 4.1 shows (on the left) an app in which 3 ads are shown at the bottom of a page while ad networks restrict developers to 1 per page, and (on the right) an app in which an ad is hidden behind UI buttons. 68 The key insight in our work is that manipulation of the visual layout of ads in a mobile app can be programmatically detected by combining two key ideas: (a) a UI automation tool that permits automated traversal of all the “pages” of a mobile app, and (b) extensible fraud checkers that test the visual layout of each “page” for compliance with an ad network’s terms and conditions. While we use the term ad fraud, we emphasize that our work deems as fraud any violation of published terms and conditions, and does not attempt to infer whether the violations are intentional or not. We have designed a system called DECAF that leverages the insight discussed above (Section 4.3). First, it employs an automation tool called a Monkey that, given a mobile app binary, can automatically execute it and navigate to various parts of the apps by simulating user interaction (e.g., clicking a button, swiping a page, etc.). Abstractly, a Monkey traverses a state transition graph, where pages are states, and a click or swipe triggers a transition from one state to the next. The idea of using a Monkey is not new [75, 87, 110]. The key optimization goals of designing a Monkey are good coverage and speed—the Monkey should be able to traverse a good fraction of target states in a limited time. This can be challenging as for many apps with dynamically varying content (e.g., news apps), the state transition graph can be infinitely large. Even for relatively simpler apps, recent work has shown that na¨ ıve state traversal based on a UI automation framework can take several hours [75]. Combined with the size of app stores (over a million apps on Google Play alone), this clearly motivates the need for scalable traversal 1 . Recent works therefore propose optimization techniques, many of which require instrumenting apps [110] or the OS [87]. A key feature of DECAF is that it treats apps and underlying OS as black boxes and relies on a UI automation framework. The advantage of this approach is its flexibility: DECAF can scan apps written in multiple languages (e.g., Windows Store apps can be written in C#, HTML/JavaScript, and C++) and potentially from dierent platforms (e.g., Windows and Android). However, the flexibility comes at the cost of very limited information from the UI automation framework. Existing automation frameworks provide only information about UI layout of the app’s current page and do not provide information such 1 It is also possible to scale by parallelizing the exploration of apps on a large cluster, but these resources are not free. Our eciency improvements are orthogonal to parallelization and enable more ecient cluster usage, especially for apps that may have to be repeatedly scanned because they contain dynamic content. 69 as callback functions, system events, etc. that are required by optimizations proposed in [110] and [87]. To cope with this, DECAF employs several novel techniques: a fuzzy matching-based technique to robustly identify structurally similar pages with similar ad placement (so that it suces for the Monkey to visit only one of them), a machine learning-based predictor to avoid visiting equivalent pages, an app usage based technique to prioritize app navigation paths, and a resource usage based technique for fast and reliable detection of page load completion. The techniques do not need app or OS instrumentation and greatly improves the Monkey’s coverage and speed. The second component of DECAF is to eciently identify fraudulent behavior in a given app state. We find that, rather surprisingly, analyzing visual layout of an app page to detect possible ad fraud is nontrivial. This is due to complex UI layouts of app pages (especially in tablet apps), incomplete UI layout information from the UI automation framework (e.g., missing z-coordinate), mismatch between device’s screen size and app’s page size (e.g., panoramic pages), and variable behavior of ad networks (e.g., occasionally not serving any ad due to unavailability of specific types of ads), etc. We develop novel techniques to reliably address these challenges. We have implemented DECAF to run on Windows 8 (tablet) apps and Windows Phone 8 apps (Sec- tion 4.5). Experiments show that DECAF achieves a coverage of 94% (compared to humans) in 20 minutes of execution per app and is capable of detecting many types of ad frauds in existing apps (Section 4.6). Characterizing Ad Fraud. Using DECAF we have also analyzed 50,000 Windows Phone 8 apps and 1,150 Windows tablet apps, and discovered many occurrences of various types of frauds (Section 4.7). Many of these frauds were found in apps that have been in app stores for more than two years, yet the frauds remained undetected. We have also correlated the fraud data with various app metadata crawled from the app store and observed interesting patterns. For example, we found that fraud incidence appears independent of ad rating on both phone and tablet, and some app categories exhibit higher incidence of fraud than others but the specific categories are dierent for phone and tablet apps. Finally, we find that few publishers commit most of the frauds. These results suggest ways in which ad networks can selectively allocate resources for fraud checking. 70 DECAF has been used by the ad fraud team in Microsoft and has helped detect many fraudulent apps. Fraudulent publishers were contacted to fix the problems, and the apps whose publishers did not cooperate with such notices have been blacklisted and denied ad delivery. To our knowledge, DECAF is the first tool to automatically detect ad fraud in mobile app stores. 4.2 Background, Motivation, Goals and Challenges Background. Many mobile app publishers use in-app advertisements as their source of revenue; more than 50% of the apps in major app stores show ads [58]. To embed ads in an app, an app publisher registers with a mobile ad network such as AdMob [7], iAd [9], or Microsoft Mobile Advertising [10]. In turn, ad networks contract with advertisers to deliver ads to apps. Generally speaking, the ad network provides the publisher with an ad control (i.e., a library with some visual elements embedded within). The publisher includes this ad control in her app, and assigns it some screen real estate. When the app runs and the ad control is loaded, it fetches ads from the ad network and displays it to the user. Ad networks pay publishers based on the number of times ads are seen (called impressions) or clicked by users, or some combination thereof. For example, Microsoft Mobile Advertising pays in proportion to total impression count the overall click probability. Motivation. To be fair to advertisers, ad networks usually impose strict guidelines (called prohibitions) on how ad controls should be used in apps, documented in lengthy Publisher Terms and Conditions. We call all violations of these prohibitions frauds, regardless of whether they are violated intentionally or unintentionally. There are several kinds of frauds. Placement Fraud. These frauds relate to how and where the ad control is placed. Ad networks impose placement restrictions to prevent impression or click inflation, while the advertiser may restrict what kinds of content (i.e., ad context) the ads are placed with. For instance, Microsoft Mobile Advertising stipulates that a publisher must not “edit, resize, modify, filter, obscure, hide, make transparent, or reorder any advertising” and must not “include any Ad Inventory or display any ads ... that includes materials or links to 71 materials that are unlawful (including the sale of counterfeit goods or copyright piracy), obscene,...” [12]. Similarly, Google AdMob’s terms dictate that “Ads should not be placed very close to or underneath buttons or any other object which users may accidentally click while interacting with your application” and “Ads should not be placed in areas where users will randomly click or place their fingers on the screen” [1]. Interaction Fraud. Ad networks impose restrictions on fraudulent interactions with ad controls, such as using bots to increase clicks, repeatedly loading pages to generate frequent ad requests, or oering incentives to users. Publishers cannot cache, store, copy, distribute, or redirect any ads to undermine ad networks’ business, nor can they launch denial of service attacks on the ad servers [12]. Content Fraud. This class of frauds refers to the actual contents within the ad control. Publishers should not modify ad contents, and must comply with content regulations on certain classes of apps (e.g., regulations preventing adult content in apps designed for children). So ad publishers are required to disclose to the ad network what type of apps (or pages) the control is used in, so that the ad network can filter ads appropriately. Detecting violations of these prohibitions manually in mobile apps can be extremely tedious and error prone. This is because an app can have many pages (many content-rich tablet apps have upwards of hundreds of pages) and some violations (e.g., ads hidden behind UI controls) cannot often be detected by visual inspection. This, combined with the large number of apps in app stores (over a million for both Google Play and Apple’s App Store) clearly suggests the need for automation in mobile app ad fraud detection. Goals. In this chapter, we focus on automated detection of two categories of placement frauds in mobile apps. Structural frauds: These frauds relate to how the ad controls are placed. Violators may manipulate the UI layout to inflate impressions, or to reduce ad’s foot print on screen. This can be done in multiple ways: 72 An app page contains too many ads (Microsoft Advertising allows at most 1 ad per phone screen and 3 ads per tablet screen [12]). Ads are hidden behind other controls (e.g., buttons or images) or placed outside the screen. (This violates the terms and conditions in [1, 12]). Developers often use this trick to give users the feel of an “ad-free app”, or to accommodate many ads in a page while evading manual inspection. Ads are resized and made too small for users to read. Ads are overlapped with or placed next to actionable controls, such as buttons, to capture accidental clicks. Contextual frauds: These frauds place ads in inappropriate contexts. For example, a page context fraud places ads in pages containing inappropriate (e.g., adult) content. Many big advertisers, especially the ones who try to increase brand image via display ads, do not want to show ads in such pages. Ad net- works therefore prohibit displaying ads in pages containing “obscene, pornographic, gambling related or religious” contents [12] Publishers may violate these rules in an attempt to inflate impression counts. Beyond detecting fraud, a second goal of this chapter is to characterize the prevalence of ad fraud by type, and correlate ad fraud with app popularity, app type, or other measures. Such a characterization provides an initial glimpse into the incidences of ad fraud in today’s apps, and, if tracked over time, can be used to access the eectiveness of automated fraud detection tools. Challenges. The basic approach to detecting placement fraud automatically is to programmatically in- spect the visual elements and content in an app. But, because of the large number of apps and their visual complexity (especially on tablets), programmed visual inspection of apps requires searching a large, potentially unbounded. space. In this setting, inspection of visual elements thus faces two competing chal- lenges: accuracy, and scalability. A more complete search of the visual elements can yield high accuracy at the expense of requiring significant computation and therefore sacrificing scalability. A key research contribution in this chapter is to address the tension between these challenges. 73 Beyond searching the space of all visual elements, the second key challenge is to accurately identify ad fraud within a given visual element. Detecting structural frauds in an app page requires analyzing the structure of the page and ads in it. This analysis is more challenging than it seems. For example, checking if a page shows more than one ad (or k ads in general) in a screen at any given time might seem straightforward, but can be hard on a panoramic page that is larger than the screen size and that the user can horizontally pan and/or vertically scroll. Such a page may contain multiple ads without violating the rule, as long as no more than one ad is visible in any scrolled/panned position of the screen (this is known as the “sliding screen” problem). Similarly whether an ad is hidden behind other UI controls is not straightforward if the underlying framework does not provide the depths (or, z-coordinates) of various UI controls. Finally, detecting contextual fraud is fundamentally more dicult as it requires analyzing the content of the page (and hence not feasible in-field when real users use the apps). 4.3 DECAF Overview DECAF is designed to be used by app stores or ad networks. It takes a collection of apps and a set of fraud compliance rules as input, and outputs apps/pages that violate these rules. DECAF runs on app binaries and does not assume any developer input. One might consider using static analysis of an app’s UI to detect structural fraud. However, a fraudulent app can dynamically create ad controls or change their properties during run time and bypass such static analysis. Static analysis also fails to detect contextual fraud. DECAF therefore performs dynamic checking (analogous to several recent works [75, 87, 109, 110]) in which it checks the implementation of an app by directly executing it in an emulator. 74 Figure 4.2: The architecture of DECAF includes a Monkey that controls the execution and an extensible set of fraud detection policies. Unlike previous eorts [87, 110], DECAF uses a black-box approach where it does not instrument the app binary or the OS. This design choice is pragmatic: Windows 8 tablet apps are implemented in mul- tiple languages (C#, HTML/Javascript, C++) 2 , and our design choice allows us to be language-agnostic. However, as we discuss later, this requires novel techniques to achieve high accuracy and high speed. Figure 4.2 shows the architecture of DECAF. DECAF runs mobile apps in an emulator and interacts with the app through two channels: a UI Extraction channel for extracting UI elements and their layout in the current page of an app (shown as a Document Object Model (DOM) tree in Figure 4.2), and a UI Action channel for triggering an action on a given UI element (such as clicking on a button). In Section 4.5, we describe how these channels are implemented. DECAF itself has two key components: (1) a Monkey that controls the app execution using these channels and (2) a fraud checker that examines page contents and layout for ad fraud. 4.3.1 The Monkey The execution of an app by a Monkey can be viewed as traversal on a state-transition graph that makes transitions from one state to the next based on UI inputs, such as clicking, swiping, and scrolling. Each state corresponds to a page in the app, and the Monkey is the program that provides UI inputs (through the UI Action channel) for each visited state. 2 In a sample of 1,150 tablet apps, we found that about 56.5% of the apps were written in C#, 38.3% in HTML/Javascript, and 5.2% in C++. 75 At each state that it visits, the Monkey uses the UI extraction channel to extract page information, which includes (1) structural metadata such as size, location, visibility, layer information (z-index) of each ad and non-ad control in the current page, and (2) content such as the text, images, and urls in the page. The information is extracted from the DOM tree of the page; the DOM tree contains all UI elements on a given page along with contents of the elements. The Monkey also has a dictionary of actions associated with each UI type, such as clicking a button, swiping a multi-page, and scrolling a list, and uses this dictionary to generate UI inputs on the UI action channel. Starting from an empty state and a freshly loaded app, the Monkey iterates through the UI controls on the page to the next state, until it has no transitions to make (either because all its transitions have been already explored, or it does not contain any actionable UI control). Before making a transition, the Monkey must wait for the current page to be completely loaded; page load times can be variable due to network delays, for example. After visiting a state, it uses one of two strategies. If a (hardware or software) back button is available, it retracts to a previous (in depth-first order) state. If no back button is available (e.g., many tablets do not have a physical back button and some apps do not provide a software back button), the Monkey restarts the app, navigates to the previous state through a shortest path from the first page, and starts the exploration process. In order to explore a large fraction of useful states within a limited time, the Monkey needs various optimizations. For example, it needs to determine if two states are equivalent so that it can avoid exploring states that have already been visited. It also needs to prioritize states, so that it can explore more important or useful states within the limited time budget. We discuss in Section 4.4 how DECAF addresses these. The Monkey also needs to address many other systems issues such as dealing with non-deterministic transitions and transient crashes, detecting transition to an external program (such as a browser), etc. DECAF incorporates solutions to these issues, but we omit the details here for brevity. 76 4.3.2 Fraud Checker At each quiescent state, DECAF invokes the fraud checker. The checker has a set of detectors, each of which decides if the layout or page context violates a particular rule. While DECAF’s detectors are extensible, our current implementation includes the following detectors. Small Ads: The detector returns true if any ad in the given page is smaller than the minimal valid size required by the ad network. The operation is simple as the automation framework provides widths and heights of ads. Hidden Ads: The detector returns true if any ad in the given page is (partially) hidden or unviewable. Conceptually, this operation is not hard. For each ad, the detector first finds the non-ad GUI elements, then checks if any of these non-ad elements is rendered above the ad. In practice, however, this is nontrivial due to the fact that existing automation frameworks (e.g., for Windows and for Android) do not provide z-coordinates of GUI elements, complicating the determination of whether a non-ad element is rendered above an ad. We describe in Section 4.4.4 how DECAF deals with this. Intrusive Ads: The detector returns true if the distance between an ad control and a clickable non-ad element is below a predefined threshold or if an ad control partially covers a clickable non-ad control. Detecting the latter can also be challenging since the automation framework does not provide z-coordinates of UI elements. We describe in Section 4.4.4 how DECAF deals with this. Many Ads: The detector returns true if the number of viewable ads in a screen is more than k, the maximum allowed number of ads. This can be challenging due to the mismatch of apps’ page size and device’s screen size. To address the sliding screen problem discussed before, a na¨ ıve solution would check all possible 77 screen positions in the page and see if there is any violation at any position. We propose a more ecient solution in Section 4.4.4. Inappropriate Context: The detector returns true if an ad-containing page has inappropriate content (e.g., adult content) or if the app category is inappropriate. Detecting whether or not page content is inappro- priate is outside the scope of the chapter; DECAF uses an existing system 3 that employs a combination of machine-classifiers and human inputs for content classification. 4.4 Optimizations for Coverage and Speed The basic system described in Section 4.3 can explore most states of a given app 4 . However, this may take a long time: as [75] reports, this can take several hours for apps designed to have simple UIs for in-vehicle use, and our work considers content-rich tablet apps for which na¨ ıve exploration can take significantly longer. This is not practical when the goal is to scan thousands of apps. In such cases, the Monkey will have a limited time budget, say few tens of minutes, to scan each app; indeed, in DECAF, users specify a time budget for each app, and the Monkey explores as many states as it can within that time. With limited time, na¨ ıve exploration can result in poor coverage of the underlying state transition graph, and consequent inaccuracy in ad fraud detection. In this section, we develop various techniques to address this problem. The techniques fall under three general categories that we describe next. 4.4.1 Detecting Equivalent States To optimize coverage, a commonly used idea is that after the Monkey detects that it has already explored a state equivalent to the current state, it can backtrack without further exploring the current state (and other states reachable from it). Thus, a key determinant of coverage is the definition of state equivalence. Prior work [75] points out that using a strict definition, where two states are equivalent if they have an 3 Microsoft’s internal system, used by its online services. 4 Without any human involvement, however, the Monkey can fail to reach states that require human inputs such as a login and a password. 78 identical UI layout, may be too restrictive; it advocates a heuristic for UI lists that defines a weaker form of equivalence. DECAF uses a dierent notion of state equivalence, dictated by the following requirements. First, the equivalence should be decided based on fuzzy matching rather than exact matching. This is because even within the same run of the Monkey, the structure and content of the “same” state can vary due to dynamic nature of the corresponding page and variability in network conditions. For example, when the Monkey arrives at a state, a UI widget in the current page may or may not appear depending on whether the widget has successfully downloaded data from the backend cloud service. Second, the equivalence function should be tunable to accommodate a wide range of fraud detection scenarios. For detecting contextual frauds, the Monkey may want to explore all (or as many as possible within a given time budget) distinct pages of an app, so that it can check appropriateness of all contents of the app. In such a case, two states are equivalent only if they have the same content. For detecting structural frauds, on the other hand, the Monkey may want to explore only the pages that have unique structure (i.e., layout of UI elements). In such cases, two states with the same structure are equivalent even if their contents dier. How much fuzziness to tolerate for page structure and content should also be tunable: the ad network may decide to scan some “potentially bad” apps more thoroughly than the others (e.g., because their publishers have bad histories), and hence can tolerate less fuzziness on those potentially bad apps. DECAF achieves the first requirement by using a flexible equivalence function based on cosine simi- larity of feature vectors of states. Given a state, it extracts various features from the visible elements in the DOM tree of the page. More specifically, the name of a feature is the concatenation of a UI element type and its level in the DOM tree, while its value is the count and total size of element contents (if the element contains text or image). For example, the feature (TextControl@2, 100, 2000) implies that the page contains 100 Text UI elements of total size 2000 bytes at level 2 of the DOM tree of the page. By traversing the DOM tree, DECAF discovers such features for all UI element types and their DOM tree 79 depths. This gives a feature vector for the page that looks like: [(Image@2, 10, 5000 ), (Text@1, 10, 400), (Panel@2, 100,null),. . . ]. To compare if two states are equivalent, we compute cosine similarity of their feature vectors and consider them to be equivalent if the cosine similarity is above a threshold. This configurable threshold achieves our second requirement; it acts as a tuning parameter to configure the strictness of equivalence. At one extreme, a threshold of 1 specifies content equivalence of two states 5 . A smaller threshold implies a more relaxed equivalence, fewer states to be explored by the Monkey, and faster exploration of states with less fidelity in fraud detection. To determine structural equivalence of two states, we ignore the size values in feature vectors and use a smaller threshold to accommodate slight variations in page structures. Our large amount of experiments indicate that a threshold of 0.92 strikes a good balance between thoroughness and speed while checking for structural frauds. We have also tried other distance-based or feature-based similarity measures, and did not see visible improvements compared with cosine similarity. In addition, another reason of choosing cosine similarity is due to its important property of feature order insensitivity which well fits our requirement of fuzzy match- ing. However, feature order insensitivity does not mean rendering order insensitivity. When designing the format of feature vector, we do consider the preservation of rendering order of UI elements, since the de- tection of, say hidden ads, may depend on the rendering order. For example, if one page contains one level 1 image and one level 1 ad, then we do not care about the order of how they are put in the UI tree since it is impossible for one element to be rendered above or below the other element, or else their levels should be dierent. However, if we have two pages, one has one level 1 image and one level 2 ad, while the other one has one level 2 image and one level 1 ad. Then we do consider the two pages have dierent feature vectors and both pages are checked individually since the image and the ad belong to dierent rendering levels and hidden ads may exist. 5 On rare occasions, two pages with dierent content can be classified as equivalent if their text (or image) content has exactly the same count and total size. This is because we rely on count and size, instead of contents, of texts and images to determine equivalence of pages. 80 4.4.2 Path Prioritization A Monkey is a general tool for traversing the UI state transition graph, but many (especially tablet) apps contain too many states for a Monkey to explore in a limited time budget. Indeed, some apps may even contain a practically infinite number of pages to explore. Consider a cloud-based news app that dynami- cally updates its content once every few minutes. In this app, new news pages can keep appearing while the Monkey is exploring the app, making the exploration a never-ending process. Given that the Monkey can explore only a fraction of app pages, without careful design, the Monkey can waste its time exploring states that do not add value to ad fraud detection, and so may not have time to explore useful states. This is a well-known problem with UI traversal using a Monkey, and all solutions to this problem leverage problem-specific optimizations to improve Monkey coverage. DECAF uses a novel state equivalence pre- diction method to prioritize which paths to traverse in the UI graph for detecting structural fraud, and a novel state importance assessment for detecting contextual fraud. 4.4.2.1 State Equivalence Prediction To motivate state equivalence prediction, consider exploring all structurally distinct pages of a news- serving app. Assume that the Monkey is currently in state P 0 , which contains 100 news buttons (leading to structurally equivalent states P 0;0 P 0;99 and one video button (leading to P 0;100 ). The Monkey could click the buttons in the same order as they appear in the page. It would first recursively explore state P 0;0 and its descendent states, then visit all the P 0;199 states, realize that that they are all equivalent to already visited state P 0;1 , return to P 0 . Finally, it will explore P 0;100 and its descendant states. This is clearly sub-optimal, since the time required to (1) go from P 0 to each of the states P 0;199 (forward transition) and (2) then backtracking to P 0 (backward transition) is wasted. The forward transition time includes the time for the equivalent page to completely load (we found this to be as large as 30 sec in our experiments). Backward transitions can be expensive. The na¨ ıve strategy above can also be pathologically sub-optimal in some cases. Most mobile devices do not have a physical back button, so apps typically include software 81 back buttons and our Monkey uses various heuristics based on their screen location and name to identify them. However, in many apps, the Monkey can fail to automatically identify the back button (e.g., if they are placed in unusual locations in the page and are named dierently). In such cases the Monkey does not have any obvious way to directly go back to the previous page, creating unidirectional edges in the state graph. In our example, if the transition between P 0 and P 0;1 is unidirectional, the backward transition would require the Monkey to restart the app and traverse through all states from the root to P 0 , while waiting for each state to load completely before moving to the next state. Overall, the wasted time per button is as high as 3 minutes in some of our experiments, and this can add up to a huge overhead if there are many such pathological traversals. The net eect of above overheads is that the Monkey can run out of time before it gets a chance to explore the distinct state P 0;100 . A better strategy would be to first explore pages with dierent UI layouts (P 0;0 and P 0;100 in previous example), and then if the time budget permits, to explore remaining pages. Minimizing state traversal overhead using prediction. These overheads could have been minimized if there was a way to predict whether a unidirectional edge would take us to a state equivalent to an already visited state. Our state equivalence prediction leverages this intuition, but in a slightly dierent way. On a given page, it determines which buttons would likely lead to the same (or similar) states, and then explores more than one of these buttons only if the time budget permits. Thus, in our example, if the prediction were perfect, it would click on the button leading to the video page P 0;100 before clicking on the second (and third and so on) news button. One might attempt to do such prediction based on event handlers invoked by various clickable controls, assuming that buttons leading to equivalent states will invoke the same event handler and those leading to dierent states will invoke dierent handlers. The event handler for a control can be found by static analysis of code. This, however, does not always work as event handlers can be bound to controls during run time. Even if the handlers can be reliably identified, dierent controls may not be bound to dierent 82 Control Do they have the same name? Features Do they have the same ID? Are they with the same UI element type? Parent Do they have a same parent name path? Features Do they have a same parent ID path? Do they have a same parent UI element type path? Child Do their children share a same name set? Features Do their children share a same ID set? Do their children share a same UI element type set? Table 4.1: SVM classifier features handlers; for example, we found a few popular apps that bind most of their clickable controls to a single event handler, which acts dierently based on runtime arguments. DECAF uses a language-agnostic approach that only relies on the run-time layout properties of the various UI elements. The approach is based on the intuition that UI controls that lead to equivalent states have similar “neighborhoods” in the DOM tree: often their parents and children in the UI layout hierarchy are of similar type or have similar names. This intuition, formed by examining a number of apps, suggests that it might be possible to use machine-classification to determine if two UI controls are likely to lead to the same state. Indeed, our approach uses supervised learning to construct a binary classifier for binary feature vectors. Each feature vector represents a pair of UI controls, and each element in the feature vector is a Boolean answer to the questions listed in Table 4.1. For any two UI controls, these questions can be answered from the DOM tree of the page(s) they are in. We construct a binary SVM classifier from a large labelled dataset; the classifier takes as input the feature vector corresponding to two UI controls, and determines whether they are likely to lead to equivalent states (if so, the UI controls are said to be equivalent). In constructing the classifier, we explored various feature definitions, and found ones listed in Table 4.1 to be most accurate. For instance, we found that features directly related to a control’s appearance (e.g., color and size) are not useful for prediction because they may be dierent even for controls leading to equivalent states. Our Monkey uses the predictor as follows. For every pair of UI controls in a page, the Monkey determines whether that pair is likely to lead to the same state. If so, it clusters the UI controls together, 83 resulting in a set of clusters each of which contains equivalent controls. Then, it picks one control (called the representative control) from each cluster and explores these; the order in which they are explored is configurable (e.g., increasing/decreasing by their cluster size, or randomly). The Monkey then continues its depth-first state exploration, selecting only representative controls in each state traversed. After all pages have been visited by exploring only representative controls, the Monkey visits the non-representative controls if the time budget permits. Algorithm 3 shows the overall algorithm. Note that the SVM-based clustering is also robust to dynamically changing pages—since the Monkey explores controls based on their clusters, it can simply choose whatever control is available during exploration and can ignore the controls that have disappeared between the time clusters were computed and when the Monkey is ready to click on a control. Algorithm 3 Cluster clickable controls 1: INPUT: Set C of clickable controls of a page 2: OUTPUT: Clickable controls with cluster labels 3: for i from 1 to C.Length do 4: C[i]:clusterLabel =1 5: end for 6: currentClusterLabel = 1 7: for i from 1 to C.Length do 8: if C[i]:clusterLabel< 0 then 9: C[i]:clusterLabel = currentClusterLabel 10: currentClusterLabel + + 11: end if 12: for j from i + 1 to C.Length do 13: if C[ j]:clusterLabel< 0 then 14: fSVM predict outputs true if two input controls is predicted to be in a same clusterg 15: if SVM Predict(C[i],C[ j]) then 16: C[ j]:clusterLabel = C[i]:clusterLabel 17: end if 18: end if 19: end for 20: end for 84 4.4.2.2 State Importance Assessment State prediction and fuzzy state matching does not help with state equivalence computed based on page content, as is required for contextual fraud detection. In such cases, the Monkey needs to visit all content- wise distinct pages in an app, and apps may contain too many pages to be explored within a practical time limit. DECAF exploits the observation that not all pages within an app are equally important. There are pages that users visit more often and spend more time than others. From ad fraud detection point, it is more important to check those pages first, as those pages will show more ads to users. DECAF therefore prioritizes its exploration of app states based on their “importance”—more important pages are explored before less important ones. Using app usage for estimating state importance. The importance of a state or page is an input to DECAF and can be obtained from app usage statistics from real users, either by using data from app analytic libraries such as Flurry [6] and AppInsight [116] or by having users use instrumented versions of apps. From this kind of instrumentation, it is possible to obtain a collection of traces, where each trace is a path from the root state to a given state. The importance of a state is determined by the number of traces that terminate at that state. Given these traces as input, DECAF combines the traces to generate a trace graph, which is a subgraph of the state transition graph. Each node in the trace graph has a value and a cost, where value is defined as the importance of the node (defined above) and cost is the average time for the page to load. Prioritizing state traversal using state importance. To prioritize Monkey traversal for contextual fraud, DECAF solves the following optimization problem: given a cost budget B (e.g., total time to explore the app), it determines the set of paths that can be traversed within time B such that total value of all nodes in the paths is maximized. The problem is NP-Hard, by reduction from Knapsack, so we have evaluated two greedy heuristics for prioritizing paths to explore: (1) Best node, which chooses the next unexplored 85 node with the best value to cost ratio, and (2) Best path, which chooses the next unexplored path with the highest total value-total cost ratio. We evaluate these heuristics in Section 4.6. Since app content can change dynamically, it is possible that a state in a trace disappears during the Monkey’s exploration of an app. In that case, DECAF uses the trained SVM to choose another state similar to the original state. Finally, traces can be useful not only to identify important states to explore, but also to navigate to states that require human inputs. For example, if navigating to a state requires username/password or special text inputs that the Monkey cannot produce, and if traces incorporate such inputs, the Monkey can use them during its exploration to navigate to those states. 4.4.3 Page Load Completion Detection Mobile apps are highly asynchronous but UI extraction channels typically do not provide any callback to an external observer when the rendering of a page completes. Therefore, DECAF has no way of knowing when the page has loaded in order to check state equivalence. Fixed timeouts, or timeouts based on a percentile of the distribution of page load times, can be too conservative since these distributions are highly skewed. App instrumentation is an alternative, but has been shown to be complex even for managed code such as C# or Java [116]. DECAF uses a simpler technique that works for apps written in any language (including C++, html/javascript). It uses a page load monitor which it monitors all I/O (Networking, Disk and Memory) activities of the app process, and maintains their sum over a sliding window of time T. If this sum goes below a configurable threshold, the page is considered to be loaded; the intuition here is that as long as the page is loading, the app should generate non-negligible I/O trac. The method has the virtue of simplicity, but comes at a small cost of latency, given by sliding window length, to detect the page load. 4.4.4 Fraud Checker Optimizations DECAF incorporates several scalability optimizations as part of its fraud checkers. 86 Detecting too many ads. As mentioned in Section 4.3, detecting whether a page contains more than k ads in any given screen position can be tricky. DECAF uses a more ecient algorithm whose computational complexity depends only on the total number of ads in the page and not on the page or screen size. The algorithm uses a vertical moving window across the page whose width is equal to the screen width and height is equal to the page height; this window is positioned successively at the right edges of rectangles representing ads. Within each such window, a horizontal strip of height equal to the screen height is moved from one ad rectangle bottom-edge to the next; at each position, the algorithms computes the number of ads visible inside the horizontal strip, and exits if this number exceeds a certain threshold. The complexity of this algorithm shown in Algorithm 4, is O(N 2 log(N)), where N is the total number of ads in the page. Algorithm 4 Detect ad number violation 1: INPUT: Set D of ads in a page, where D[k] R;x and D[k] R;y are the x and y coordinates of the bottom- right corner of the k-th ad, and D[k] L;x and D[k] L;y are about the top-left corner; ad number limit U 2: OUTPUT: Return true if the violation is detected 3: D x =fQuickSort D by D[k] R;x ’s of adsg 4: D y =fQuickSort D by D[k] R;y ’s of adsg 5: for i from 1 to D.Length do 6: S x =fBinarySearch on D x to get ads with D[k] R;x D[i] R;x and D[k] L;x D[i] R;x + screenWidthg 7: D y (S x ) =fthe subset of D y which is formed by elements in S x g 8: for j from 1 to D y (S x ).Length do 9: S y =fBinarySearch on D y (S x ) to get ads with D[k] R;y D y (S x )[ j] R;y and D[k] L;y D y (S x )[ j] R;y + screenHeight]g 10: if S y .Length> U then 11: return true 12: end if 13: end for 14: end for 15: return false Detecting hidden and intrusive ads. As discussed in Section 4.3, determining if an ad is completely hidden or partially overlapped by other GUI elements is challenging due to missing z-coordinates of the elements. To deal with that, DECAF uses two classifiers described below. Exploiting DOM-tree structure. This classifier predicts relative z-coordinates of various GUI elements based on their rendering order. In Windows, rendering order is the same as the depth-first traversal order of the DOM tree; i.e., if two elements have the same x- and y-coordinates, the one at the greater depth of 87 the DOM tree will be rendered over the one at the smaller depth. The classifier uses this information, along with x- and y-coordinates of GUI elements as reported by the automation framework, to decide if an ad element is hidden or partially overlapped by a non-ad element. A possible concern here is that developers may change the rendering order of the DOM tree to bypass this check. However, this change is not easy and requires modifications to the OS level API, therefore, we ignore this possibility in our work. This classifier, however, is still not perfect. It can occasionally classify a visible ad as hidden (i.e., false positives) when the GUI elements on top of the ad are invisible and the visibility status is not available from the DOM tree information. Analyzing screenshots. This approach uses image processing to detect if a target ad is visible in the app’s screenshots. It requires addressing two challenges: (1) knowing what the ad looks like, so that the image processing algorithm can search for target ad, and (2) refocusing, i.e., making sure that the screenshot captures the region of the page containing the ad. To address the first challenge, we use a proxy that serves the apps with fiducials: dummy ads with easily identifiable patterns such as a checker-board pattern. The proxy intercepts all requests to ad servers and replies with fiducials without aecting normal operations of the app. The image processing algorithm then looks for the specific pattern in screenshots. To address the refocusing challenge, the Monkey scrolls and pans app pages and analyzes screenshots only when the current screen of the page contains at least one ad. The classifier, like the previous one, is not perfect either. It can classify hidden ads as visible and vice versa due to imperfections in the image processing algorithm (especially when the background of the app page is similar to the image pattern in the dummy ad) and to the failure of the Monkey to refocus. Combining the classifiers. The two classifiers described above can be combined. In our implementation, we take a conservative approach and declare an ad to be hidden if it is classified as hidden by both the classifiers. 88 4.4.5 Discussion Parallel Execution. DECAF can scan multiple apps in parallel. We did not investigate scanning a single app with multiple Monkeys in parallel. This is because scanning at the granularity of an app is sucient in practice (See Section 4.6.3) and scanning at any finer granularity introduces significant design complexity (e.g., to coordinate Monkeys scanning the same app and to share states among them). Bypassing DECAF. Assume that developers have known the principle of how DECAF works, then they may try to bypass the checks by introducing professional guidance. For example, they may ask users to perform some complicate gestures that the Monkey cannot repeat in order to show multiple ads. However, while in practice this may work, developers are highly unlikely to take this annoying path, or else users may simply stop using their apps. Smarter Ad Controls. DECAF’s fraud checker optimizations can be implemented within the ad control. Such a “smart” ad control would perform all the DECAF checks while users use an app and take necessary actions if ads are to be shown in fraudulent ways (e.g., not serving ad to or disregarding clicks from the app or the page). This, however, introduces new challenges such as permitting communication among ad controls on the same app page, preventing modification of the ad control library through binary code injection, permitting safe access to the entire app page from within the ad control 6 , etc., so we have left an exploration of this to future work. 4.5 Implementation Tablet/Phone dierences. We have implemented DECAF for Windows Phone apps (hereafter referred to as phone apps) and Windows Store apps (referred to as tablet apps). One key dierence between our prototypes for these two platforms is how the Monkey interacts with apps. Tablet apps run on Windows 8, which provides Windows UI automation framework (similar to Android MonkeyRunner [3]); DECAF 6 Especially for HTML apps, an ad control is not able to access content outside its frame due to the same origin policy. 89 uses this framework directly. Tablet apps can be written in C#, HTML/JavaScript, or C++ and the UI framework allows interacting with them in a unified way. On the other hand, DECAF runs phone apps in Windows Phone Emulator, which does not provide UI automation, so we use techniques from [110] to extract UI elements in current page and manipulate mouse events on the host machine running the phone emulator to interact with apps. Other implementation details. As mentioned earlier, tablets have no physical back buttons, and apps incorporate software back buttons. DECAF contains heuristics to identify software back buttons, but these heuristics also motivate state equivalence prediction. Furthermore, we use Windows performance counter [18] to implement the page load monitor. To train the SVM classifier for state equivalence, we manually generated 1,000 feature vectors from a collection of training apps and used grid search with 10-fold cross validation to set the model parameters. The chosen parameter set had a highest cross-validation accuracy of 98.8%. Finally, for our user study reported in the next section, we use Windows Hook API [16] and Windows Input Simulator [17] to record and replay user interactions. 4.6 Evaluation In this section, we evaluate the overall performance of DECAF optimizations. For lack of space, we limit the results to tablet apps only, since they are more complex than phone apps. Some of our results are compared with ground truth, which we collect from human users, but since the process is not scalable, we limit our study to the 100 top free apps (29 HTML/JavaScript apps and 71 C# apps) from the Windows Store. In the next section, we run DECAF on a larger set of phone and mobile apps to characterize ad frauds. 90 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Structural coverage CDF Basic Monkey SVM Monkey (a) Structural coverage in 20 minutes of explo- ration per app 5 10 15 20 0 0.2 0.4 0.6 0.8 1 Execution time, in minutes, per app Median structural coverage Basic Monkey SVM Monkey (b) Median coverage as a function of exploration time 5 10 15 20 25 30 0 0.2 0.4 0.6 0.8 1 Page crawling time (second) CDF (c) Page crawling speed 0 300 600 900 1200 0 0.2 0.4 0.6 0.8 1 SVM mode finish time (second) CDF (d) App crawling speed Figure 4.3: CDF of evaluation result 4.6.1 State Equivalence Prediction To measure accuracy of our SVM model and its eectiveness in exploring distinct states, we need ground truth about distinct pages in our app set. We found that static analysis is not very reliable in identifying distinct states, as an app can include 3rd party libraries containing many page classes but actually use only a handful of them. We therefore use humans to find the ground truth. We gave all the 100 apps to real users, and asked them to explore as many unique pages as possible 7 . For each app, we combined all pages visited by users and counted the number of structurally distinct pages in the app. Since apps typically have relatively small number of structurally distinct pages (mean 9.15, median 5), we found humans to be eective in discovering all of them. Accuracy of SVM model. We evaluated our SVM model on the ground truths and found that it has a false positive rate of 8% and false negative rate of 12% 8 . Note that false negatives do not aect the accuracy of 7 This also mimics manual app inspection for frauds. 8 We emphasize that these rates are not directly for fraudulent ads but for predicting whether or not two clickable controls have the same click handler. 91 the Monkey; it only aects the performance of the Monkey by unnecessarily sending it to equivalent states. However, false positives imply that the Monkey ignores some distinct states by mistakenly assuming they are equivalent states. To deal with that, we keep the Monkey running and let it explore the remaining states in random order until the time budget is exhausted. This way, the Monkey gets a chance to explore some of those missed states. Benefits of using equivalence prediction. To evaluate the eectiveness of SVM-based state equivalence prediction, we use an SVM Monkey, with prediction enabled, and a Basic Monkey, that does not do any prediction and hence realizes a state is equivalent only after visiting it. We run each app twice, once with each version of the Monkey for 20 minutes. We measure the Monkey’s state exploration performance using a structural coverage metric, defined as the fraction of structurally distinct states the Monkey visits, compared with the ground truth found from real users. A Monkey with a structural coverage value of 1 is able to explore all states required to find all structural frauds. Figure 4.3(a) shows the structural coverage of the basic and the SVM Monkey, when they are both given 20 minutes to explore each app. In this graph, lower is better: the SVM Monkey achieves less than perfect coverage only for about 30% of the apps, while the basic Monkey achieves less than perfect coverage for over 70% of the apps. Overall, the mean and median coverages of the SVM Monkey are 92% and 100% respectively, and its mean and median coverage improvements are 20.37% and 26.19%, respectively. The SVM Monkey achieves perfect coverage for 71 apps. Figure 4.3(b) shows median coverage of the SVM and the basic Monkey as a function of exploration time per app (the graph for mean coverage looks similar, and hence is omitted). As shown, the SVM monkey achieves better coverage for other time limits as well, so for a given target coverage, the SVM Monkey runs much faster than the basic Monkey. For example, the basic Monkey achieves a median coverage of 66% in 20 minutes, while the SVM Monkey achieves a higher median coverage of (86%) in only 5 mins. 92 0 10 20 30 40 0 0.2 0.4 0.6 0.8 1 Value Coverage Execution time, in minutes, per app Random Node Best Node Random Path Best Path Figure 4.4: Value coverage as a function of exploration time per app, with various prioritization algorithms. The SVM Monkey fails to achieve perfect coverage for 29 of the 100 apps we tried, for several rea- sons: the Windows Automation Framework occasionally fails to recognize a few controls; some states require app-specific text inputs (e.g., a zipcode for location-based search) that our Monkey cannot handle; and some apps just have a large state transition graph. Addressing some of these limitations is left to future work, but overall we are encouraged by the coverage achieved by our optimizations. In the next section, we demonstrate that DECAF can be scaled to a several thousand apps, primarily as a result of these optimizations. 4.6.2 Assessing State Importance We now evaluate the best node and the best path strategies (Section 4.4.2.2) with two baselines that do not exploit app usage statistics: random node, where the Monkey chooses a random unexplored next state and random path, where the Monkey chooses a random path in the state graph. To assign values and costs of various app states, we conducted a user study to obtain app usage infor- mation. We picked five random apps from five dierent categories and for each app, we asked 16 users to use it for 5 minutes each. We instrumented apps to automatically log which pages users visit, how much time they spend on each page, how long each page needs to load, etc. Based on this log, we assigned each state a value proportional to the total number of times the page is visited and a cost proportional to 93 the average time the page took to load. To compare various strategies, we use the metric value coverage, computed as the ratio of the sum of values of visited states and that of all states. We ran the Monkey with all four path prioritization strategies for total exploration time limits of 5, 10, 20 and 40 minutes. As Figure 4.4 shows, value coverage increases monotonically with exploration time. More importantly, path-based algorithms outperform node-based algorithms, as they can better ex- ploit global knowledge of values of entire paths; the best-path algorithm significantly outperforms the random-path algorithm (average improvement of 27%), highlighting the value of exploiting app usage. Furthermore, exploring all (content-wise) valuable states of an app can take longer than exploring only structurally distinct states. For the set of apps we use in this experiment, achieving a near-complete cover- age takes the Monkey 40 minutes. 4.6.3 Overall Speed and Throughput Figure 4.3(c) shows the CDF of time the Monkey needs to explore one app state, measured across all 100 apps. This time includes all the operations required to process the corresponding page: waiting for the page to be completely loaded, extracting the DOM tree of the page, detecting structural fraud in the state, and deciding the next transition. The mean and median times to explore a page is 13.5 and 12.1 sec respectively; a significant component of this time is the additional 5-second delay in detecting page load completion as discussed in Section 4.4.3. We are currently exploring methods to reduce this delay. Figure 4.3(d) shows the CDF of time DECAF needs to explore one app. The CDF is computed over 71 apps that the Monkey could finish exploring within 20 minutes. The mean and median time for an app is 11.8 minutes and 11.25 minutes respectively; at this rate, DECAF can scan around 125 apps on a single machine per day. 4.6.4 False positive and false negative in DECAF False positive cases indicate scenarios that DECAF report fraudulent behaviors while they are actually not. Those cases would not happen in sub checkers for many ads, small ads and inappropriate context, since 94 fraudulent reports from these sub checkers are deterministic and reliable. For example, the sub checker for many ads directly inspects the layout of ad controls and other visual elements in the current UI page, and all the layout information is correctly provided by the OS, so as long as more than allowed number of ads (no matter hidden or not) are observed in one possible screen position, the many ads violation is correctly reported. However, the hidden and intrusive checkers may report false positive findings as have discussed in Section 4.4.4. Basically, the OS does not provide depth information for UI elements, which forces DECAF to use two heuristics to infer hidden ad violations. However, both heuristics can introduce false positive errors. Our first heuristic uses rendering order to infer depth information, but may introduce false positive findings when the element covers up the ad is actually transparent (whether an element is transparent or not is also hard to tell, especially for HTML apps using customized CSS styles). Our second heuristic replaces the actual ads with dummy ones and tries to identify whether ads of the dummy pattern are observed in screen shots. This method may also report false positive findings if the pattern recognition fails when a dummy ad is split into two dierent screen shots, and neither screen shot contains an integrated dummy ad with the given pattern. Similar explanations hold for false positive cases of intrusive ads. To understand to what level those false positive cases may aect our result, we have performed manual verification for hidden/intrusive ads in the scale of several hundred apps. Overall, our manual inspection indicates a 85% accuracy in detecting hidden/intrusive ads, which is sucient for practical purpose. In addition, since the current Monkey of DECAF is not perfect, some app pages may not be browsed 9 . This can result in false negative cases if the missed UI pages contain fraudulent behaviors (i.e. DECAF cannot find those fraud since it may not visit some UI pages). Some possible ways to further increase the Monkey coverage may include running DECAF for a longer time for each app, automatically completing textboxes based on user hints, using a new version of Winodws Automation Framework and so on. We have left the exploration of advanced optimizations to increase the Monkey coverage to future work. 9 The reason has been discussed in Section 4.6.1 95 Table 4.2: Occurrences of various fraud types among all fraudulent apps Fraud type Phone Apps Tablet Apps Too many (Structural/impression) 13% 4% Too small (Structural/impression) 40% 54% Outside screen (Structural/impression) 19% 4% Hidden (Structural/impression) 39% 32% Structural/Click 11% 18% Contextual 2% 20% 4.7 Characterizing Ad Fraud In this section, we characterize the prevalence of ad frauds, compare frauds by type across phone and tablet apps, and explore how ad fraud correlates with app rating, size, and several other factors. To obtain these results, we ran DECAF on 50,000 Windows Phone apps (phone apps) and 1,150 Windows 8 apps (tablet apps). 10 The Windows 8 App Store prevents programmatic app downloads, so we had to manually download the apps before running them through DECAF, hence the limit of 1,150 on tablet apps. Phone apps are randomly chosen from all SilverLight apps in the app store in April 2013. Microsoft Advertising used DECAF after April 2013 to detect violations in apps and force publishers into compliance, and our results include these apps. Tablet apps were downloaded in September 2013, and were taken from top 100 free apps in 15 dierent categories. We did not evaluate state equivalence prediction and state traversal prioritization with these apps. The sheer scale of these apps, and our lack of access to app analytics, made it infeasible to collect the ground truth and usage traces from users required for this evaluation. Fraud by Type. Our DECAF-based analysis reveals that ad fraud is widespread both in phone and in tablet apps. In the samples we have, we discovered more than a thousand phone apps, and more than 50 tablet apps, with at least one instance of ad fraud; the precise numbers are proprietary, and hence omitted. Table 4.2 classifies the frauds by type (note that an app may include multiple types of frauds ). Apps exhibit all of the fraud types that DECAF could detect, but to varying degrees; manipulating the sizes of 10 For Windows Phone, we consider SilverLight apps only. Some apps especially games are written in XNA that we ignore. Also, the Monkey is not yet sophisticated enough to completely traverse games such as Angry Birds. For Tablet apps, we manually sample simple games from the Games category. 96 0% 5% 10% 15% 20% Books & Reference Business Education Entertainment Finance Food & Dining Games Government Health & Fitness Kids + Family Lifestyle Music+Video News & Weather Photo Social Sports Productivity + Tools Travel % Fraudulent apps Phone Tablet Figure 4.5: Distribution of fraudulent apps over various categories ads, and hiding ads under other controls seem to be the most prevalent both on the phone and tablet. There are, however, interesting dierences between the two platforms. Contextual fraud is significantly more prevalent on the tablet, because tablet apps are more content-rich (due to the larger form factor). Ad count violations are more prevalent on the phone, which has a stricter limit (1 ad per screen) compared to the tablet (3 ads per screen). Fraud by App Category. App stores classify apps by category, and Figure 4.5 depicts distribution of ad fraud frequency across app categories for both phone and tablet apps. In some cases, fraud is equally prevalent across the two platforms, but there are several instances where fraud is more prevalent in one platform than the other. For instance, navigation and entertainment (movie reviews/timings) based apps exhibit more fraud on the phone, likely because they are more frequently used on these devices and pub- lishers focus their eorts on these categories. For a similar reason, tablets show a significantly higher proportion of fraud than phones in the Health, News and Weather, and Sports categories. Frauds by rating. We also explore the prevalence of fraud by two measures of the utility of an app. The first measure is its rating value, rounded to a number from 1-5, and we seek to understand if fraud happens more often at one rating level than at another. Figure 4.6 plots the frequency of dierent rating values across both fraudulent and non-fraudulent apps, both for the phone and the tablet. One interesting result 97 0% 10% 20% 30% 40% 50% 60% 70% Rating 1Rating 2Rating 3Rating 4Rating 5 Fraud (phone) NoFraud (Phone) Fraud (Tablet) NoFraud (Tablet) Figure 4.6: Distribution of ratings for fraudulent and non-fraudulent phone and tablet apps 0.5 0.6 0.7 0.8 0.9 1 1 10 100 1000 10000 CDF Rating Count FraudApp NoFraudApp Figure 4.7: CDF of of rating counts for phone apps is that the distribution of rating values is about the same for fraudulent and non-fraudulent apps; i.e., for a given rating, the proportion of fraudulent and non-fraudulent apps is roughly the same. Fraudulent and non-fraudulent phone apps have average ratings of 1.8 and 1.98. For tablet apps, the average ratings are 3.79 and 3.8, for fraudulent and non-fraudulent apps respectively 11 . If rating is construed as a proxy for utility, this suggests that the prevalence of fraud seems to be independent of the utility of an app. A complementary aspect of apps is popularity. While we do not have direct measures of popularity, Figure 4.7 plots the cumulative distribution of rating counts (the number of ratings an app has received) for phone apps, which has been shown to be weakly correlated with downloads [41] and can be used as a surrogate for popularity (the graphs look similar for tablet apps as well). This figure suggests that there are small distributional dierences in rating counts for fraudulent and non-fraudulent apps; the mean rating 11 Average ratings for tablet apps are higher than that for phone apps because we chose top apps for tablet. 98 0% 20% 40% 60% 80% 100% 0 500 1000 Compliance Publisher ID Phone 0% 20% 40% 60% 80% 100% 050 100 Publisher ID Tablet Figure 4.8: Compliance rate of publishers with multiple apps 1 10 100 1000 1 10 100 1000 # Fraudulent apps Publisher ID Figure 4.9: Fraudulent app count per phone app publisher counts for phone apps is 83 and 118 respectively, and for tablet apps is 136 and 157 respectively. However, these dierences are too small to make a categorical assertion of the relationship between popularity and fraud behavior. We had expected to find at least that lower popularity apps, or apps with less utility would more likely exhibit fraud behavior, since they have more incentive to do so. These results are a little inconclusive and either suggest that our intuitions are wrong, or that we need more direct measures of popularity (actual download counts) to establish the relationship. The propensity of publishers to commit fraud. Each app in an app store is developed by a publisher. A single publisher may publish more than one app, and we now examine how the instances of fraud are distributed across publishers. Figure 4.8 plots the compliance rate for phone and tablet apps for publishers 99 who have more than one app in the app store. A compliance rate of 100% means that no frauds were detected across all of the publisher’s apps, while a rate of 0% means all the publisher’s apps were fraud- ulent. The rate of compliance is much higher in tablet apps, but that may also be because our sample is much smaller. The phone app compliance may be more reflective of the app ecosystem as a whole: a small number of publishers never comply, but a significant fraction of publishers commit fraud on some of their apps. More interesting, the distribution of the number of frauds across publishers who commit fraud exhibits a heavy tail (Figure 4.9): a small number of publishers are responsible for most of the fraud. Takeaways. These measurement results are actionable in the following way. Given the scale of the problem, an ad network is often interested in selectively investing resources in fraud detection, and taken together, our results suggest ways in which the ad network should, and should not, invest resources wisely. The analysis of fraud prevalence by type suggests that ad networks could preferentially devote resources to dierent types of fraud on dierent platforms; for instance, the ad count and contextual frauds constitute the lion’s share of frauds on tablets, so an ad network may optimize fraud detection throughput by running only these checkers. Similarly, the analysis of fraud by categories suggests categories of apps to which ad networks can devote more resources, and points out that these categories may depend on the platforms. The analysis also points out that ad networks should not attempt to distinguish by rating or rating count. Finally, and perhaps most interesting, the distribution of fraud counts by publisher suggests that it may be possible to obtain significant returns on investment by examining apps from a small set of publishers. 100 Chapter 5 Related Work 5.1 Information Credibility We are not aware of any prior work in the wireless networking literature that has tackled information credibility assessment. However, other fields have actively explored credibility, defined as the believability of sources or infor- mation [51, 60, 96]. Credibility has been investigated in a number of fields including information science, human communication, human-computer interaction (HCI), marketing, psychology and so on [117]. In general, research has focused on two threads: the factors that aect credibility, and the dynamics of infor- mation credibility. The seminal work of Hovland et al. [61] may be the earliest attempt on exploring credibility, which discusses how the various characteristics of a source can aect a recipient’s acceptance of a message, in the context of human communication. Rieh, Hilligoss and other explore important dimensions of credibility in the context of social interactions [40, 60, 117], such as trustworthiness, expertise and information validity. McKnight and Kacmar [60] study a unifying framework of credibility assessment in which three distinct levels of credibility are discussed: construct, heuristics, and interaction. Their work is in the context of assessing the credibility of websites as sources of information. 101 Wright and Laskey [130] discuss how to tackle fusion of credible information. They present a weight- ing based, probabilistic model to compute uncertain information credibility from diverse sources. Several techniques are combined with this model, like prior information, evidence when available and opportuni- ties for learning from data. Sometimes, the terms credibility and trust are used synonymously. However, they are distinct notions: while trust refers to beliefs and behaviors associated with the acceptance of risk, credibility refers to the be- lievability of a source, and a believable source may or may not result in associated trusting behaviors [117]. In addition, there is a body of work that has examined processes and propagation of credible informa- tion. Corroboration as a process of credibility assessment is discussed in [42]. Proximity, both geographic and social, and its role in credibility assessment is discussed in [118]: our role of geographic distance as a measure of credibility is related to this discussion. Saavedra et al. [121] explore the dynamics and the emergence of synchronicity in decision-making when traders use corroboration as a mechanism for trading decisions. Finally, our stochastic optimization method is tangentially to weighted approaches for time series clas- sification (e.g., [72]), but our problem setting considers dynamic event arrivals. 5.2 Privacy-preserving SVM There have been several pieces of work on privacy-preserving SVM classifier construction, but each lacks support for a design dimension that is crucial for collaborative SVM classifier construction using mobile sensing. 5.2.1 Feature Perturbation Closest to our work is the body of work that perturbs feature vectors before transmitting them to a server/cloud. The work of Lin and Chen [77–79] only considers privacy-preserving classifier training 102 for a single user, but Pickle explicitly supports multiple users. Some approaches require that all partici- pants share a common perturbation matrix [91,114], while Pickle does not. Other approaches [63,92,135] focus on vertically partitioned data, where elements of the feature vector are spread among participants; by contrast, in our setting, the data is horizontally partitioned. An approach that could have been plausibly used for collaborative learning [124] generates synthetic feature vectors whose statistics match the original feature vectors [124]; we have compared Pickle against this and shown that it can result in poor accuracy. 5.2.2 Dierential Privacy Beyond perturbing the input feature vectors, some approaches have explored the use of the dierential privacy framework for privacy-preserving SVM construction. In these approaches [37, 120], the classi- fier construction assumes all the original feature vectors are available (unlike Pickle, which perturbs the original feature vectors) and the outputs of the classifiers are perturbed such that individual features are not exposed as a result of small dierences in two databases (such as two dierent versions of training samples). This is achieved by adding noise either to the classifier’s parameter vector after optimization or to the objective function itself, thus prior to optimization. Intuitively, these approaches attempt to make it dicult to infer who might have contributed feature vectors, while Pickle hides the content of the feature vector itself. Thus, the two approaches are complementary, and exploring a combination of these two methods is left to future work. 5.2.3 Cryptographic Methods Other methods have attempted to use cryptographic techniques to preserve privacy in SVM construction. A few use homomorphic encryption, but either discuss only SVM construction for two participants [74] or would require peer-to-peer communication [89, 137], whereas Pickle permits multiple users and does not require them to communicate with each other. Finally, several pieces of work [73, 126, 136] use a form of secure multiparty communication, but assume that participants do not collude, an assumption that Pickle does not make. (Of course, not all secure multi-party communication methods assume participants do not 103 collude, but, when applied to the Pickle setting, these methods have the drawback that all participants must be online when any participant wishes to use the classifier, an unwieldy assumption at best.) In summary, in the space of prior work on privacy-preserving SVM, Pickle occupies a unique niche largely driven by the requirements and constraints of collaborative learning using sensor data generated from mobile phones. 5.2.4 Other Related Work for Collaborative Learning and Privacy Navia-Vasquez et al. [111] consider distributed SVM classifier construction, but do not consider privacy. Many pieces of research in the mobile sensing literature have used machine-learning classifiers for various applications (e.g., [22, 30, 33, 86, 104], and SVM is often a popular choice. A few have examined collab- orative learning. Closest to our work is that of Ahmadi et al. [23] who consider the problem of accurately estimating a linear regression model from user contributed sensor data, while still ensuring the privacy of the contributions. While this is an instance of privacy-preserving collaborative learning, it is unclear how to extend the approach to nonlinear classifiers; as we have discussed above, for such classifiers it is necessary to carefully design privacy transforms that preserve certain relationships between contributed feature vectors. MoVi [30] is an application in which users within a social group collaboratively, using the cloud, sense their environment and recognize interesting events. However, MoVi assumes that users within a group trust each other, and that the cloud can be trusted not to reveal data from one group to third parties. Finally, Darwin [103] directly addresses collaborative learning, but does not address privacy and assumes a trustworthy cloud. Privacy-preservation has, in general, received much more attention in the data mining community which has considered cryptographic methods (e.g., [66, 125]) for clustering and other mining operations. In general, these methods do not scale to many users and require computationally-intensive encoding and decoding operations. That community has also considered anonymization of structured data (such as rela- tional tables) to ensure privacy of individual entries without significantly compromising query results. By now, it is well known that anonymization is vulnerable to composition attacks using side information [53]. 104 Preserving privacy through perturbation or randomization is most relevant to our work. One body of work has considered data perturbation techniques for datasets using various methods [84, 127, 131, 132] for dataset exchange between two parties; it is unclear how to extend this body of work to Pickle’s multi- party setting where the parties are assumed to be able to collude. Additive-noise based randomization perturbs the original data with additive noise (e.g., [20, 21]), but is susceptible to reconstruction attacks, in which the spectral properties of the perturbed data can be used to filter the additive noise and recover the original data [71]. Multiplicative noise based perturbation (e.g., [38, 83]) can be robust to these re- construction attacks. In some approaches (e.g., [38]), the multiplicative noise is dimensionality-preserving while in others [83], it is not. Dimensionality-preserving transformations can preserve inner products and Euclidean distances. Unfortunately, a dimensionality-preserving multiplicative transformation is suscep- tible to approximate reconstruction [83]. Furthermore, if this method is applied to collaborative learning, then participants must agree upon the matrix R, and collusion attacks may succeed. It is for this reason that Pickle uses a dimensionality-reducing transformation using per-user private matrices, and then uses a regression phase to recover inter-user relationships so that it can approximately infer Euclidean distances and inner products. 5.3 Automated Ad Fraud Detection DECAF is inspired by prior work on app automation and ad fraud detection. 5.3.1 App Automation Today, mobile platforms like Android provide UI automation tools [3, 4] to test mobile apps. But these tools rely on the developer to provide automation scripts, and do not provide any visibility into the app runtime so are inecient and cannot be easily used for detecting ad frauds. Recent research eorts have built upon these tools to provide full app automation, but their focus has been on dierent applications: automated testing [26, 27, 52, 62, 87, 105, 133] and automated privacy and 105 security detection [49, 57, 90, 115]. Automated testing eorts evaluate their system only on a handful of apps and many of their UI automation techniques are tuned to those apps. Systems that look for privacy and security violations execute on a large collection of apps but they only use basic UI automation techniques. Closest to our work is AMC [75], which uses automated app navigation to verify UI properties for vehicular Android apps, but reported exploration times of several hours per app and has been evaluated on 12 apps. In contrast to all of these, DECAF is designed for performance and scale to automatically discover ad frauds violations in several thousand apps. Symbolic and concolic execution [35, 46] are alternative techniques for verifying properties of code, and have been applied to mobile apps [27, 105]. For discovering UI properties, UI graph traversal is a more natural technique than concolic execution, but it may be possible to detect ad fraud using concolic execution, which we have left to future work. Tangentially relevant is work on crowdsourcing GUI testing, and automated execution frameworks for Ajax Web apps (Crawljax [98], AjaxTracker [76] and ATUSA [97]). DECAF can use crowdsourcing to obtain state importance, and Ajax-based frameworks don’t deal with mobile app specific constraints. 5.3.2 Ad Fraud Existing works on ad fraud mainly focus on the click-spam behaviors, characterizing the features of click- spam, either targeting specific attacks [25, 32, 101, 107], or taking a broader view [44]. Some work has examined other elements of the click-spam ecosystem: the quality of purchased trac [123, 138], and the spam profit model [70, 95]. Very little work exists in exploring click-spam in mobile apps. From the controlled experiment, authors in [44] observed that around one third of the mobile ad clicks may constitute click-spam. A contemporaneous paper [50] claimed that they are not aware of any mobile malware in the wild that performs advertising click fraud. Unlike these, DECAF focuses on detecting violations to ad network terms and conditions, and even before potentially fraudulent clicks have been generated. With regard to detection, most existing works focus on bot-driven click spam, either by analyzing search engine query logs to identify outliers in query distributions [134], characterizing networking traf- fic to infer coalitions made by a group of bot-driven fraudsters [99, 100], or authenticating normal user 106 clicks to filter out bot-driven clicks [59, 68, 119]. A recent work, Viceroi [45], designed a more general framework that is possible to detect not only bot-driven spam, but also some non-bot driven ones (like search-hijacking). DECAF is dierent from this body of work and focuses on user-based ad fraud in the mobile app setting rather than the click-spam fraud in the browser setting – to the best of our knowledge, ours is the first work to detect ad fraud in mobile apps. 107 Chapter 6 Conclusions and Future Work In this dissertation, we have explored possible ways to improve eciency, privacy and robustness for crowd-sensing applications. First, in Chapter 2, we have explored the design space of algorithms for a new problem, optimizing pull corroboration in an emerging application area, crowd sensing. We have proposed optimal special-case algorithms, computationally ecient approximations, and decentralized stochastically optimal variants. However, our work is merely an initial foray into a broad and unexplored space, with several directions for future work: increasing credibility and cost model realism, incorporating malice, allowing peers to relay reports, examining the performance of meta-heuristics, and exploring other realistic, yet ecient and near-optimal special-case solutions. Second, in Chapter 3, we have described Pickle, an approach to preserving privacy in mobile collab- orative learning. Pickle perturbs training feature vectors submitted by users, but uses a novel regression technique to learn relationships between training data that are required to maintain classifier accuracy. Pickle is robust, by design, to many kinds of attacks including direct inversion, collusion, reconstruction, and poisoning. Despite this, Pickle shows remarkable classification accuracy for the most commonly used classifiers, SVM and kNN. Finally, Pickle requires minimal computing resources on the mobile device, and modest resources on the cloud. Many avenues for future work remain, including an exploration of more sophisticated regression methods and other classifiers, an extension of applying Pickle to participatory 108 sensing, a more extensive and refined design of user study, and a cryptanalysis of our dimensionality- reduction. Finally, we have introduced DECAF in Chapter 4. DECAF is a system for detecting placement fraud in mobile app advertisements. It eciently explores the UI state transition graph of mobile apps in order to detect violations of terms and conditions laid down by ad networks. DECAF has been used by Mi- crosoft Advertising to detect ad fraud and our study of several thousand apps in the wild reveals interesting variability in the prevalence of fraud by type, category, and publisher. In the future, we plan to explore methods to increase the coverage of DECAF’s Monkey, expand the suite of frauds that it is capable of detecting, evaluate other metrics for determining state importance, and explore attacks designed to evade and DECAF and develop countermeasures for these attacks. 109 References [1] AdMob Publisher Guidelines and Policies. http://support.google.com/admob/answer/ 1307237?hl=en&ref_topic=1307235. [2] Amazon mechanical turk,https://www.mturk.com/. [3] Android Monkeyrunner. http://developer.android.com/tools/help/monkeyrunner_ concepts.html. [4] Android UI/Application Exerciser Monkey. http://developer.android.com/tools/help/ monkey.html. [5] Bots Mobilize. http://www.dmnews.com/bots-mobilize/article/291566/. [6] Flurry. http://www.flurry.com/. [7] Google Admob. http://www.google.com/ads/admob/. [8] Google Admob: What’s the Dierence Between Estimated and Finalized Earnings? http:// support.google.com/adsense/answer/168408/. [9] iAd App Network. http://developer.apple.com/support/appstore/iad-app-network/. [10] Microsoft Advertising. http://advertising.microsoft.com/en-us/splitter. [11] Microsoft Advertising: Build your business. http://advertising.microsoft.com/en-us/ splitter. [12] Microsoft pubCenter Publisher Terms and Conditions. http://pubcenter.microsoft.com/ StaticHTML/TC/TC_en.html. [13] The Truth About Mobile Click Fraud. http://www.imgrind.com/ the-truth-about-mobile-click-fraud/. [14] Up To 40% Of Mobile Ad Clicks May Be Accidents Or Fraud? http://www.mediapost.com/publications/article/182029/ up-to-40-of-mobile-ad-clicks-may-be-accidents-or.html#axzz2ed63eE9q. [15] What is the percentage of mobile app downloads that are free?, http://fonegigsblog.com/ 2011/08/13/what-is-the-percentage-of-mobile-app-downloads-that-are-free/. [16] Windows Hooks. http://msdn.microsoft.com/en-us/library/windows/desktop/ ms632589(v=vs.85).aspx. [17] Windows Input Simulator. http://inputsimulator.codeplex.com/. [18] Windows Performance Counters. http://msdn.microsoft.com/en-us/library/windows/ desktop/aa373083(v=vs.85).aspx. 110 [19] T. Abdelzaher, Y . Anokwa, P. Boda, J. Burke, D. Estrin, L. Guibas, A. Kansal, S. Madden, and J. Reich. Mobiscopes for human spaces. IEEE Pervasive Computing, 6(2), 2007. [20] D. Agrawal and C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS ’01), 2001. [21] R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the 2000 ACM Special Interest Group on Management Of Data (SIGMOD ’00), 2000. [22] H. Ahmadi, N. Pham, R. Ganti, T. Abdelzaher, S. Nath, and J. Han. A framework of energy ecient mobile sensing for automatic human state recognition. In Proceedings of the 7th International Conference on Mobile Systems, Applications, and Services (MobiSys ’09), 2009. [23] H. Ahmadi, N. Pham, R. Ganti, T. Abdelzaher, S. Nath, and J. Han. Privacy-aware regression modeling of participatory sensing data. In Proceedings of the 8th ACM Conference on Embedded Network Sensor Systems (SenSys ’10), 2010. [24] M. Almeida, G. Cavalheiro, A. Pereira, and A. Andrade. Investigation of age-related changes in physiological kinetic tremor. Annals of Biomedical Engineering, 38(11), 2010. [25] S. Alrwais, A. Gerber, C. Dunn, O. Spatscheck, M. Gupta, and E. Osterweil. Dissecting Ghost Clicks: Ad Fraud Via Misdirected Human Clicks. In ACSAC, 2012. [26] D. Amalfitano, A. Fasolino, S. Carmine, A. Memon, and P. Tramontana. Using GUI Ripping for Automated Testing of Android Applications. In IEEE/ACM ASE, 2012. [27] S. Anand, M. Naik, M. Harrold, and H. Yang. Automated Concolic Testing of Smartphone Apps. In ACM FSE, 2012. [28] M. Azizyan, I. Constandache, and R. Choudhury. Surroundsense: mobile phone localization via ambience fingerprinting. In Proceedings of the 15th Annual International Conference on Mobile Computing and Networking (Mobicom ’09), 2009. [29] R. Balan, D. Gergle, M. Satyanarayanan, and J. Herbsleb. Simplifying cyber foraging for mobile devices. In Proc. MobiSys, 2007. [30] X. Bao and R. Choudhury. Movi: mobile phone based video highlights via collaborative sensing. In Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services (MobiSys ’10), 2010. [31] S. Bay and M. Schwabacher. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of the 9th ACM Special Interest Group on Knowledge Discovery and Data mining (SIGKDD ’03), 2003. [32] T. Blizard and N. Livic. Click-fraud monetizing malware: A survey and case study. In MALWARE, 2012. [33] A. Bose, X. Hu, K. Shin, and T. Park. Behavioral detection of malware on mobile handsets. In Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services (MobiSys ’08), 2008. [34] R. Brand. Microdata protection through noise addition. Inference Control in Statistical Databases, 2002. [35] S. Bugrara and D. Engler. Redundant State Detection for Dynamic Symbolic Execution. In USENIX ATC, 2013. 111 [36] J. Burke, D. Estrin, M. Hansen, A. Parker, N. Ramanathan, S. Reddy, and M. Srivastava. Participa- tory sensing. In Proceedings of the 2006 World Sensor Web Workshop, 2006. [37] K. Chaudhuri, C. Monteleoni, and A. Sarwate. Dierentially private empirical risk minimization. Journal of Machine Learning Research, 12, 2011. [38] K. Chen and L. Liu. Privacy-preserving data classification with rotation perturbation. In Proceed- ings of the 5th IEEE International Conference on Data Mining (ICDM ’05), 2005. [39] K. Chen and L. Liu. Towards attack-resilient geometric data perturbation. In SIAM International Conference on Data Mining (SDM ’07), 2007. [40] P. Chen. Information credibility assessment and meta data modeling in integrating heterogeneous data sources. Air Force Research Laboratory Technical Report AFRL-IF-RS-TR-2002-298, 2002. [41] P. Chia, Y . Yamamoto, and N. Asokan. Is this App Safe? A Large Scale Study on Application Permissions and Risk Signals. In WWW, 2012. [42] R. B. Cialdini. The science of persuasion. Scientific American Mind, 2004. [43] C. Cortes and V . Vapnik. Support-vector network. Machine Learning, 20(3):273–297, 1995. [44] V . Dave, S. Guha, and Y . Zhang. Measuring and Fingerprinting Click-Spam in Ad Networks. In ACM SIGCOMM, 2012. [45] V . Dave, S. Guha, and Y . Zhang. ViceROI: Catching Click-Spam in Search Ad Networks. In ACM CCS, 2013. [46] C. Cadar D. Dunbar and D. Engler. Klee: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In USENIX OSDI, 2008. [47] S. Eisenman, E. Miluzzo, N. Lane, R. Peterson, G. Ahn, and A. Campbell. The bikenet mobile sensing system for cyclist experience mapping. In Proceedings of the 5th ACM Conference on Embedded Network Sensor Systems (SenSys ’07), 2007. [48] D. Ellis. PLP and RASTA (and MFCC, and inversion) in Matlab, 2005. online web resource. [49] W. Enck, P. Gilbert, B. Chun, L. Cox, J. Jung, P. McDaniel, and A. Sheth. Taintdroid: an Information-flow Tracking System for Realtime Privacy Monitoring on Smartphones. In USENIX OSDI, 2010. [50] A. Felt, Porter, M. Finifter, E. Chin, S. Hanna, and D. Wagner. A Survey of Mobile Malware in the Wild. In ACM SPSM, 2011. [51] B. J. Fogg and H. Tseng. The elements of computer credibility. In Proc. CHI, 2004. [52] S. Ganov, C. Killmar, S. Khurshid, and D. Perry. Event Listener Analysis and Symbolic Execution for Testing GUI Applications. In ICFEM, 2009. [53] S. Ganta, S. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD ’08), 2008. [54] M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP- Completeness. W.H. Freeman and Company, 1979. [55] J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, and N. Dahlgren. Darpa timit acoustic phonetic continuous speech corpus cdrom, 1993. 112 [56] L. Georgiadis, M. J. Neely, and L. Tassiulas. Resource allocation and cross-layer control in wireless networks. Foundations and Trends in Networking, 1(1):1–149, 2006. [57] P. Gilbert, B. Chun, L. Cox, and J. Jung. Vision: Automated Security Validation of Mobile apps at App Markets. In MCS, 2011. [58] M. Grace, W. Zhou, X. Jiang, and A. Sadeghi. Unsafe Exposure Analysis of Mobile In-App Adver- tisements. In ACM WiSec, 2012. [59] H. Haddadi. Fighting Online Click-Fraud Using Blu Ads. ACM Computer Communication Re- view, 40(2):21–25, 2010. [60] B. Hilligoss and S. Rieh. Developing a unifying framework of credibility assessment: Construct, heuristics, and interaction in context. Information Processing and Management, 44:1467–1484, 2008. [61] C. I. Hovland and W. Weiss. The influence of source credibility on communication eectiveness. Public Opinion Quarterly, 15:635–650, 1951. [62] C. Hu and I. Neamtiu. Automating GUI Testing for Android Applications. In AST, 2011. [63] Y . Hu, F. Liang, and G. He. Privacy-preserving svm classification on vertically partitioned data without secure multi-party computation. In Proceedings of the 5th International Conference on Natural Computation (ICNC ’09), 2009. [64] Z. Huang, W. Du, and B. Chen. Deriving private information from randomized data. In Proceedings of the 2005 ACM Special Interest Group on Management Of Data (SIGMOD ’05), 2005. [65] B. Hull, V . Bychkovsky, Y . Zhang, K. Chen, M. Goraczko, A. Miu, E. Shih, H. Balakrishnan, and S. Madden. Cartel: a distributed mobile sensor computing system. In Proceedings of the 4th ACM Conference on Embedded Network Sensor Systems (SenSys ’06), 2006. [66] G. Jagannathan and R. Wright. Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In Proceedings of the 11th ACM Special Interest Group on Knowledge Discovery and Data mining (SIGKDD ’05), 2005. [67] W. Johnson and J. Lindenstrauss. Extensions of lipshitz mapping into hilbert space. Contemp. Maths., 26, 1984. [68] A. Juels, S. Stamm, and M. Jakobsson. Combating Click Fraud via Premium Clicks. In USENIX Security, 2007. [69] S. Kang, J. Lee, H. Jang, and H. Lee. Seemon: scalable and energy-ecient context monitoring framework for sensor-rich mobile environments. In Proc. MobiSys, 2008. [70] C. Kanich, C. Kreibich, K. Levchenko, B. Enright, G. V oelker, V . Paxson, and S. Savage. Spama- lytics: An Empirical Analysis of Spam Marketing Conversion. In ACM CCS, 2008. [71] S. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. On the privacy preserving properties of random data perturbation techniques. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM ’03), 2003. [72] A. Kehagias and V . Petridis. Predictive modular neural networks for time series classification. Neural Networks, 10:31–49, 1997. [73] D. Kim, M. Azim, and J. Park. Privacy preserving support vector machines in wireless sensor net- works. In Proceedings of the 3rd International Conference on Availability, Reliability and Security (ARES ’08), 2008. 113 [74] S. Laur, H. Lipmaa, and T. Mielikainen. Cryptographically private support vector machines. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD ’06), 2006. [75] K. Lee, J. Flinn, T. Giuli, B. Noble, and C. Peplin. AMC: Verifying User Interface Properties for Vehicular Applications. In ACM MobiSys, 2013. [76] M. Lee, R. Kompella, and S. Singh. Ajaxtracker: Active Measurement System for High-fidelity Characterization of AJAX Applications. In USENIX WebApps, 2010. [77] K. Lin and M. Chen. Releasing the svm classifier with privacy-preservation. In Prodeedings of the 8th IEEE International Conference on Data Mining (ICDM ’08), 2008. [78] K. Lin and M. Chen. Privacy-preserving outsourcing support vector machines with random trans- formation. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD ’10), 2010. [79] K. Lin and M. Chen. On the design and analysis of the privacy-preserving svm classifier. IEEE Transactions on Knowledge and Data Engineering, 23(11), 2011. [80] Bin Liu, Yurong Jiang, Fei Sha, and Ramesh Govindan. Cloud-Enabled Privacy-Preserving Col- laborative Learning for Mobile Sensing . In Proc. ACM SenSys, 2012. [81] Bin Liu, Peter Terlecky, Amotz Bar-Noy, Ramesh Govindan, Micheal J. Neely, and Dror Rawitz. Optimizing Information Credibility in Social Swarming Applications. In Proc. IEEE Infocom mini- conference, 2011. [82] Bin Liu, Peter Terlecky, Amotz Bar-Noy, Ramesh Govindan, Micheal J. Neely, and Dror Rawitz. Optimizing Information Credibility in Social Swarming Applications. IEEE Transactions on Par- allel and Distributed Systems, 23:1147–1158, 2012. [83] K. Liu, H. Kargupta, and J. Ryan. Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. on Knowledge and Data Engineering, 18(1), 2006. [84] L. Liu, J. Wang, and J. Zhang. Wavelet-based data perturbation for simultaneous privacy-preserving and statistics-preserving. In Proceedings of the 8th IEEE International Conference on Data Mining Workshops, 2008. [85] H. Lu, B. Brush, B. Priyantha, A. Karlson, and J. Liu. Speakersense: Energy ecient unobtrusive speaker identification on mobile phones. In Proceedings of the 9th International Conference on Mobile Systems, Applications, and Services (MobiSys ’11), 2011. [86] H. Lu, W. Pan, N. Lane, T. Choudhury, and A. Campbell. Soundsense: scalable sound sensing for people-centric applications on mobile phones. In Proceedings of the 7th International Conference on Mobile Systems, Applications, and Services (MobiSys ’09), 2009. [87] A. MacHiry, R. Tahiliani, and M. Naik. Dynodroid: An Input Generation System for Android Apps. In ACM FSE, 2013. [88] T. Maekawa, Y . Yanagisawa, Y . Kishino, K. Ishiguro, K. Kamei, Y . Sakurai, and T. Okadome. Object-based activity recognition with heterogeneous sensors on wrist. 2010. [89] E. Magkos, M. Maragoudakis, V . Chrissikopoulos, and S. Gritzalis. Accurate and large-scale privacy-preserving data mining using the election paradigm. Data and Knowledge Engineering, 68(11), 2009. 114 [90] R. Mahmood, N. Esfahani, T. Kacem, N. Mirzaei, S. Malek, and A. Stavrou. A Whitebox Approach for Automated Security Testing of Android Applications on the Cloud. In AST, 2012. [91] O. Mangasarian and E. Wild. Privacy-preserving classification of horizontally partitioned data via random kernels. In Prodeedings of the 8th IEEE International Conference on Data Mining (ICDM ’08), 2008. [92] O. Mangasarian, E. Wild, and G. Fung. Privacy-preserving classification of vertically partitioned data via random kernels. ACM Transactions on Knowledge Discovery from Data, 2(3), 2008. [93] S. Martello and P. Toth. Knapsack Problems: Algorithms and Computer Implementations. Wiley: Chichester, England, 1990. [94] S. Mathur, W. Trappe, N. Mandayam, C. Ye, and A. Reznik. Radio-telepathy: extracting a se- cret key from an unauthenticated wireless channel. In Proceedings of the 14th ACM International Conference on Mobile Computing and Networking (Mobicom ’08), 2008. [95] D. McCoy, A. Pitsillidis, G. Jordan, N. Weaver, C. Kreibich, B. Krebs, G. V oelker, S. Savage, and K. Levchenko. PharmaLeaks: Understanding the Business of Online Pharmaceutical Aliate Programs. In USENIX Security, 2012. [96] D. H. McKnight and C. J. Kacmar. Factors and eects of information credibility. In Proc. ICEC, 2007. [97] A. Mesbah and A. van Deursen. Invariant-based Automatic Testing of AJAX User Interfaces. In ICSE, 2009. [98] Ali Mesbah, Arie van Deursen, and Stefan Lenselink. Crawling AJAX-based Web Applications through Dynamic Analysis of User Interface State Changes. ACM Transactions on the Web, 6(1):1– 30, 2012. [99] A. Metwally, D. Agrawal, and A. El Abbadi. DETECTIVES: DETEcting Coalition hiT Inflation attacks in adVertising nEtworks Streams. In WWW, 2007. [100] A. Metwally, F. Emekci, D. Agrawal, and A. El Abbadi. SLEUTH: Single-pubLisher attack dEtec- tion Using correlaTion Hunting. In PVLDB, 2008. [101] B. Miller, P. Pearce, C. Grier, C. Kreibich, and V . Paxson. What’s Clicking What? Techniques and Innovations of Today’s Clickbots. In IEEE DIMVA, 2011. [102] B. Milner and X. Shao. Clean speech reconstruction from MFCC vectors and fundamental fre- quency using an integrated front-end. Speech Communication, 48(6), 2006. [103] E. Miluzzo, C. Cornelius, A. Ramaswamy, T. Choudhury, Z. Liu, and A. Campbell. Darwin phones: the evolution of sensing and inference on mobile phones. In Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services (MobiSys ’10), 2010. [104] E. Miluzzo, N. Lane, K. Fodor, R. Peterson, H. Lu, M. Musolesi, S. Eisenman, X. Zheng, and A. Campbell. Sensing meets mobile social networks: the design, implementation and evaluation of the cenceme application. In Proceedings of the 6th ACM Conference on Embedded Network Sensor Systems (SenSys ’08), 2008. [105] N. Mirzaei, S. Malek, C. Pasareanu, N. Esfahani, and R. Mahmood. Testing Android Apps through Symbolic Execution. ACM SIGSOFT Software Engineering Notes, 37(6):1–5, 2012. 115 [106] P. Mohan, V . Padmanabhan, and R. Ramjee. Nericell: Rich monitoring of road and trac conditions using mobile smartphones. In Proceedings of the 6th ACM Conference on Embedded Network Sensor Systems (SenSys ’08), 2008. [107] T. Moore, N. Leontiadis, and N. Christin. Fashion Crimes: Trending-Term Exploitation on the Web. In ACM CCS, 2011. [108] M. Mun, S. Reddy, K. Shilton, N. Yau, J. Burke, D. Estrin, M. Hansen, E. Howard, R. West, and P. Boda. Peir, the personal environmental impact report, as a platform for participatory sensing systems research. In Proc. ACM MOBISYS, 2009. [109] M. Musuvathi, D. Park, A. Chou, D. Engler, and D. Dill. CMC: a Pragmatic Approach to Model Checking Real Code. In USENIX OSDI, 2002. [110] Suman Nath, Felix Lin, Lenin Ravindranath, and Jitu Padhye. SmartAds: Bringing Contextual Ads to Mobile Apps. In ACM MobiSys, 2013. [111] A. Navia-Vazquez, D. Gutierrez-Gonzalez, E. Parrado-Hernandez, and J. Navarro-Abellan. Dis- tributed support vector machines. IEEE Transactions on Neural Networks, 2006. [112] M. J. Neely. Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan & Claypool, 2010. [113] N. Pham, R. Ganti, Y . Uddin, S. Nath, and T. Abdelzaher. Privacy-preserving reconstruction of multidimensional data maps in vehicular participatory sensing. In Wireless Sensor Networks, 2011. [114] J. Qiang, B. Yang, Q. Li, and L. Jing. Privacy-preserving svm of horizontally partitioned data for linear classification. In Prodeedings of the 4th International Congress on Image and Signal Processing (CISP ’11), 2011. [115] V . Rastogi, Y . Chen, and W. Enck. Appsplayground: Automatic Security Analysis of Smartphone Applications. In ACM CODASPY, 2013. [116] L. Ravindranath, J. Padhye, S. Agarwal, R. Mahajan, I. Obermiller, and S. Shayandeh. AppInsight: Mobile App Performance Monitoring in the Wild. In USENIX OSDI, 2012. [117] S. Rieh and D. Danielson. Credibility models for multi-source fusion. Annual Review of Informa- tion Science and Technology, 41:307–364, 2007. [118] M. Rivera, S. Soderstrom, and B. Uzzi. Dyads: Dynamics of dyads in social networks: Assortative, relational, and proximity mechanisms. Annual Review of Sociology, 36:91–115, 2010. [119] F. Roesner, T. Kohno, A. Moshchuk, B. Parno, H. Wang, and C. Cowan. User-Driven Access Control: Rethinking Permission Granting in Modern Operating Systems. In IEEE S & P, 2012. [120] B. Rubinstein, P. Bartlett, L. Huang, and N. Taft. Learning in a large function space: Privacy- preserving mechanisms for svm learning. Arxiv Preprint arXiv:0911.5708, 2009. [121] S. Saavedra, B. Uzzi, and K. Hagerty. Synchronicity and the collective genius of profitable day traders. Working Paper, Kellogg School of Management, 2010. [122] A. Schrijver. Combinatorial Optimization: Polyhedra and Eciency. Springer, 2004. [123] K. Springborn, , and P. Barford. Impression Fraud in Online Advertising via Pay-Per-View Net- works. In USENIX Security, 2013. 116 [124] V . Tan and S. Ng. Privacy-preserving sharing of horizontally-distributed private data for construct- ing accurate classifiers. In Proceedings of the 1st ACM SIGKDD International Conference on Pri- vacy, Security, and Trust in KDD, 2007. [125] J. Vaidya and C. Clifton. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the 9th ACM Special Interest Group on Knowledge Discovery and Data mining (SIGKDD ’03), 2003. [126] J. Vaidya, H. Yu, and X. Jiang. Privacy preserving svm classification. Knowledge and Information Systems, 14(2), 2008. [127] J. Wang and J. Zhang. Nnmf-based factorization techniques for high-accuracy privacy protection on non-negative-valued datasets. In Proceedings of the 6th IEEE International Conference on Data Mining Workshops, 2006. [128] K. Weinberger and L. Saul. Distance metric learning for large margin nearest neighbor classifica- tion. Journal of Machine Learning Research, 10(2), 2009. [129] P. Weinzaepfel, H. Jegou, and P. Perez. Reconstructing an image from its local descriptors. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’11), 2011. [130] E. Wright and K. Laskey. Trust in virtual teams: Towards an integrative model of trust formation. In Proc. Fusion, 2006. [131] S. Xu and S. Lai. Fast fourier transform based data perturbation method for privacy protection. In IEEE Intelligence and Security Informatics, 2007. [132] S. Xu, J. Zhang, D. Han, and J. Wang. Singular value decomposition based data distortion strategy for privacy protection. Knowledge and Information Systems, 10(3), 2006. [133] W. Yang, M. Prasad, and T. Xie. A Grey-box Approach for Automated GUI-model Generation of Mobile Applications. In FASE, 2013. [134] F. Yu, Y . Xie, and Q. Ke. SBotMiner: Large Scale Search Bot Detection. In ACM WSDM, 2010. [135] H. Yu, J. Vaidya, and X. Jiang. Privacy-preserving svm classification on vertically partitioned data. In Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD ’06), 2006. [136] H. Yu, J. Vaidya, and X. Jiang. Privacy-preserving svm using nonlinear kernels on horizontally partitioned data. In Proceedings of the 21st Annual ACM Symposium on Applied Computing (SAC ’06), 2006. [137] J. Zhan, L. Chang, and S. Matwin. Privacy-preserving support vector machines learning. In Pro- ceedings of the 2005 International Conference on Electronic Business (ICEB ’05), 2005. [138] Q. Zhang, T. Ristenpart, S. Savage, and G. V oelker. Got Trac? An Evaluation of Click Trac Providers. In WebQuality, 2011. 117 Appendix A A.1 Problem Formulation A.1.1 Justification for Location Assumption In our problem formulation, we ignore the cost of sending periodic location updates to the director. In practice, this may be a reasonable assumption for three reasons. First, the cost of location updates may be amortized over other context aware applications that may be executing on the smart phone. Second, although this cost may be significant, it adds a fixed cost to our formulations and does not aect the results we present in the paper. Finally, the absolute cost of the location updates themselves is significantly less than the cost of video transmissions, for example. A.2 Complexity Analysis A.2.1 Proof of Theorem 2.3.1 Proof We first show a reduction from the Knapsack Problem to MaxCred. Suppose we are given an instance of the decision version of the Knapsack Problem with N distinct elements, element i has value v i and weight w i . Assume the maximum weight of the knapsack is B. Is there a collection of items with weight at most B and with value at least K? Given an instance of the Knapsack Problem, we shall contruct an instance of MaxCred given a budget B in polynomial time. Let c i; j = v i for i = j and 0 otherwise. Let e i = w i for all i2f1;:::; Ng: It is now clear that the maximum value of the Knapsack is at least K i the maximum credibility is at least K. Furthermore, the Knapsack Problem can be cast as a minimization problem: selecting a set of items with weight at most B and value at least K, is equivalent to selecting a set, the complement, with weight at least C = P m i=1 w i B, and value at most V = P m i=1 v i K. We shall reduce this version of the problem to MinCost in the following way: Let e i = v i for all i2f1;:::; Ng, let c i; j = w i for i = j and 0 otherwise. It is now clear that the minimum value of the Knapsack is at most V with weight at least C i the minimum cost is at most V with credibility at least C. A.2.2 Much Stronger Result on the Complexity A much stronger result can also be proven. Assume the credibility values are given by Equation 2.1. MaxCred and MinCost are even NP-Hard for this instance, which we shall refer to as MaxCred-G and MinCost-G respectively. The proofs of NP-Hardness for MaxCred-G and MinCost-G use the following reduction from the Partition Problem [54]. Given an instance of Partition: a set of integersfx 1 x 2 x N g and P = 1 2 P N i=1 x i , construct the following instance of MaxCred-G and MinCost-G. Let there be N reporters with d(p i ; E) = 1 for all i2f1;:::; Ng and R = 2N formats for each reporter to choose from. Note that since d(p i ; E) = 1 for all 118 i2f1;:::; Ng, the credibility values per format are equivalent for each reporter and furthermore we may allow 1 ;:::; 2N to be arbitrary positive integers satisfying 1 > 2 ;> 2N . Let the credibility of format 2i 1 be c 2i1 = 2i1 = 2 t+2i with cost e 2i1 = 2 t+i , and the credibility of format 2i be c 2i = 2i = 2 t+2i + x i with cost e 2i = 2 t+i + x i , i2f1;:::; Ng, where t =blog 2 2Pc + 1. Note that 1 2 ;:::; 2N since 2 t+2i + x i < 2 2 t+2i = 2 t+2i+1 < 2 t+2(i+1) ; i2f1;:::; 2N 1g where the first inequality uses that x i < 2P < 2 t < 2 t+2i , i 2 f1;:::; Ng. To complete the reduction, define B = 2 t P N i=1 2 i + P and C = 2 t P N i=1 2 2i + P. This reduction can be done in polynomial time. Let n be the size of the Partition problem in bits. Clearly N n andjx i j n, i2f1;:::; Ng, wherejxj denotes the size of x in bits. Now,jPj <j P N i=1 x i j = O(n + N) = O(n) where the first equality usesja + bj maxfjaj;jbjg + 1. Since t 4P,jtj = O(n). Since c i ; e i 2 t+2N + max 1 jN x j ,jc i j;je i j = O(t + 2N + n) = O(n), i2f1;:::; 2Ng: Furthermore, B C = 2 t P N i=1 2 2i + P; thusjBj;jCj = O(n): The reduction uses 4N + 2 = O(n) integers each of size O(n) and hence is done in polynomial time. The proofs of NP-Hardness also make use of the following lemma. Lemma A.2.1 In the reduction, if credibility at least C = 2 t P N i=1 2 2i + P can be obtained with cost at most B = 2 t P N i=1 2 i + P, then exactly one format fromf2i 1; 2ig is selected for each i2f1;:::; Ng. Proof Assume this is not the case. Consider the last index j for which exactly one format fromf2 j 1; 2 jg is not selected. Assuming neither format 2 j 1 nor 2 j is selected, credibility C can not be obtained: the cost eectiveness of format i < 2 j 1 is at most 2 t+2( j1) =2 t+ j1 = 2 j1 . Any assignment of formats f1;:::; 2 jg to j reporters which does not use formatsf2 j 1; 2 jg can gain at most (2 t j X k=1 2 k + P)2 j1 < 2 t (2 j+1 2 + 1)2 j1 = 2 t+2 j 2 t+ j1 < 2 t+2 j 2P: Hence any assignment of formatsf1;:::; 2Ng to N reporters for which neither format 2 j 1 nor 2 j is selected, and where exactly one format fromf2i 1; 2ig is selected for each i2f j + 1;:::; Ng can gain at most 2 t+2 j 2P + 2 t P i> j 2 2i + P N i= j+1 x i which is less than C, a contradiction. If 2 or more formats are selected fromf2 j 1; 2 jg, then the cost of such an assignment is greater than B: the cost is at least X i j e 2i1 + e 2 j1 = 2 t ( X i j 2 i + 2 j ) = 2 t ( X i j 2 i + j1 X i=1 2 i + 2) = 2 t ( N X i=1 2 i + 2) = 2 t+N+1 since B = 2 t P N i=1 2 i + P< 2 t ( P N i=1 2 i + 1) = 2 t (2 N+1 1)< 2 t+N+1 ; we have a contradiction. Theorem A.2.2 MaxCred-G is NP-Hard Proof fx 1 ;:::; x N g2 Partition i credibility C = 2 t P N i=1 2 2i +P can be obtained with cost at most B: Assume fx 1 ;:::; x N g2 Partition. Then there is a set I such that P i2I x i = P. Select format 2i for every i2 I and format 2i 1 otherwise. Then the credibility obtained for this assignment is 2 t P N i=1 2 2i + P with cost 2 t P N i=1 2 i + P = B. 119 Assume there is an assignment for which credibility 2 t P N i=1 2 2i + P can be obtained with cost at most B = 2 t P N i=1 2 i + P. By Lemma A.2.1 exactly one format fromf2i 1; 2ig is selected for each i2f1;:::; Ng. Define I =fi : format 2i was selectedg. Then P i2I x i = P and hencefx 1 ;:::; x N g2 Partition. Theorem A.2.3 MinCost-G is NP-Hard Proof fx 1 ;:::; x N g 2 Partition i cost B = 2 t P N i=1 2 i + P can be obtained with credibility at least C = 2 t P N i=1 2 2i + P. Assumefx 1 ;:::; x N g2 Partition. Then there is a set I such that P i2I x i = P. Select format 2i for every i2 I and format 2i 1 otherwise. Then the cost obtained for this assignment is 2 t P N i=1 2 i + P = B with credibility 2 t P N i=1 2 2i + P = C. Assume cost B = 2 t P N i=1 2 i + P can be obtained with credibility at least C = 2 t P N i=1 2 2i + P. By Lemma A.2.1 exactly one format fromf2i 1; 2ig is selected for each i 2 f1;:::; Ng. Define I = fi : format 2i was selectedg. Then P i2I x i = P and hencefx 1 ;:::; x N g2 Partition. A.3 Optimal Solutions A.3.1 Proof of Theorem 2.3.4 Proof MinCost-2F loops through all possible (i; Y) pairs. For each possible (i; Y) pair, it finds an assign- ment of maximum credibility using the same routine as in MaxCred-2F which was shown to be optimal. If the maximum credibility of this assignment exceeds the threshold C, the cost of such an assignment is computed, otherwise it is set to1. The algorithm chooses the assignment of minimum cost whose maximum credibility exceeds the threshold C as the minimizer and hence is optimal. A.4 Renewals Problem A.4.1 Proof of Theorem 2.4.2 Proof To prove (2.29), we use induction. Assume that Z 1 [k] V + e max for some frame k (it is true by assumption for k = 1). We prove it also holds for frame k + 1. Consider the case when Z 1 [k] V. Then Z 1 [k + 1] V + e max (because Z 1 [k] can increase by at most e max on any frame, see dynamics (2.24)). Now consider the opposite case when Z 1 [k]> V. Then for any i; j with e j [k]> 0, the variable x i; j [k] has weight: Vc i; j [k] Z 1 [k]e j [k] = e j [k][V(c i; j [k]=e j [k]) Z j [k]] e j [k][V Z j [k]] 0 It follows that for all reporters i, the above algorithm chooses a format j such that either e j [k] = 0, or chooses x i; j [k] = 0 for all j (so that no format is selected and the reporter is idle). Thus, no reporters incur any cost on frame k, the total cost expended on this frame is 0, and so Z 1 [k] cannot increase on the next slot. We thus have Z 1 [k + 1] Z 1 [k] V + e max . This completes the proof of (2.29). To prove (2.30), we have from the queue equation (2.24) that for any frame k: Z 1 [k + 1] Z 1 [k] e av + N X i=1 R X J=1 x i; j [k]e j [k] 120 Summing the above over k2fk 0 ;:::; k 0 + P 1g yields: Z 1 [k 0 + P] Z 1 [k 0 ] e av P + k 0 +P1 X k=k 0 2 6 6 6 6 6 4 N X i=1 R X J=1 x i; j [k]e j [k] 3 7 7 7 7 7 5 Rearranging terms and using the fact that Z 1 [k 0 ] 0 yields: k 0 +P1 X k=k 0 2 6 6 6 6 6 4 N X i=1 R X J=1 x i; j [k]e j [k] 3 7 7 7 7 7 5 e av P + Z 1 [k 0 + P] Using the fact that Z 1 [k 0 + P] V + e max proves the result. 121
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Efficient pipelines for vision-based context sensing
PDF
Crowd-sourced collaborative sensing in highly mobile environments
PDF
Cloud-enabled mobile sensing systems
PDF
Toward understanding mobile apps at scale
PDF
Design of cost-efficient multi-sensor collaboration in wireless sensor networks
PDF
Optimal redundancy design for CMOS and post‐CMOS technologies
PDF
Rate adaptation in networks of wireless sensors
PDF
Networked cooperative perception: towards robust and efficient autonomous driving
PDF
Controlling information in neural networks for fairness and privacy
PDF
The benefits of participatory vehicular sensing
PDF
Sensing with sound: acoustic tomography and underwater sensor networks
PDF
Language abstractions and program analysis techniques to build reliable, efficient, and robust networked systems
PDF
Architectures and algorithms of charge management and thermal control for energy storage systems and mobile devices
PDF
Model-driven situational awareness in large-scale, complex systems
PDF
Gradient-based active query routing in wireless sensor networks
PDF
Cooperation in wireless networks with selfish users
PDF
Robust routing and energy management in wireless sensor networks
PDF
Active sensing in robotic deployments
PDF
Leveraging programmability and machine learning for distributed network management to improve security and performance
PDF
Efficient crowd-based visual learning for edge devices
Asset Metadata
Creator
Liu, Bin
(author)
Core Title
Improving efficiency, privacy and robustness for crowd‐sensing applications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science (Computer Networks)
Publication Date
05/21/2014
Defense Date
04/07/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
context‐awareness,crowd‐sensing,OAI-PMH Harvest,smartphone
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Govindan, Ramesh (
committee chair
), Gupta, Sandeep (
committee member
), Gupta, Sandeep K. (
committee member
)
Creator Email
binliu@usc.edu,teamking55@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-415247
Unique identifier
UC11295379
Identifier
etd-LiuBin-2524.pdf (filename),usctheses-c3-415247 (legacy record id)
Legacy Identifier
etd-LiuBin-2524.pdf
Dmrecord
415247
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Liu, Bin
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
context‐awareness
crowd‐sensing
smartphone