Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 943 (2014)
(USC DC Other)
USC Computer Science Technical Reports, no. 943 (2014)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
AFrameworkforProtectingWorkerLocationPrivacyin SpatialCrowdsourcing Hien To Computer Science Dept. University of Southern California hto@usc.edu Gabriel Ghinita Dept. of Computer Science University of Massachusetts, Boston Gabriel.Ghinita@umb.edu Cyrus Shahabi Computer Science Dept. University of Southern California shahabi@usc.edu ABSTRACT Spatial Crowdsourcing (SC) is a novel and transformative platform that engages individuals, groups and communities in the act of collecting, analyzing, and disseminating en- vironmental, social and other spatio-temporal information. The objective of SC is to outsource a set of spatio-temporal tasks to a set of workers, i.e., individuals with mobile devices that perform the tasks by physically traveling to specied locations of interest. However, current solutions require the workers, who in many cases are simply volunteering for a cause, to disclose their locations to untrustworthy entities. In this paper, we introduce a framework for protecting lo- cation privacy of workers participating in SC tasks. We argue that existing location privacy techniques are not suf- cient for SC, and we propose a mechanism based on dif- ferential privacy and geocasting that achieves eective SC services while oering privacy guarantees to workers. We in- vestigate analytical models and task assignment strategies that balance multiple crucial aspects of SC functionality, such as task completion rate, worker travel distance and system overhead. Extensive experimental results on real- world datasets show that the proposed technique protects workers' location privacy without incurring signicant per- formance metrics penalties. 1. INTRODUCTION Recent years have witnessed a signicant growth in the number of mobile smart phone users, as well as fast develop- ment in phone hardware performance, software functional- ity and communication features. Today's mobile phones are powerful devices that can act as multi-modal sensors collect- ing and sharing various types of data, e.g., picture, video, lo- cation, movement speed, direction and acceleration. In this context, Spatial Crowdsourcing (SC) [15] is emerging as a novel and transformative platform that engages individuals, groups and communities in the act of collecting, analyzing, and disseminating environmental, social and other informa- tion for which spatio-temporal features are relevant. With SC, task requesters outsource their spatio-temporal tasks to a set of workers, i.e., individuals with mobile devices that perform the tasks by physically traveling to specied loca- tions of interest. The nature of tasks may vary from en- vironmental sensing to capturing images at social or enter- tainment events. Typically, requesters and workers register with a centralized spatial crowdsourcing server (SC-server) that acts as a broker between parties, and often also plays a role in how tasks are assigned to workers (i.e., scheduling according to some performance criteria). SC has numer- ous applications in domains such as environmental sensing, journalism, crisis response and urban planning. Consider an emergency response scenario where the Red Cross (i.e., requester) is interested in collecting pictures and videos of disaster areas from various locations in a country (e.g., typhoon Haiyan in the Philippines in 2013, or the Haiti earthquake in 2010). The requester can issue a query to an SC-server, and the request is then forwarded to workers that are situated in proximity to the zones of interest. The workers record photos and videos using their mobile phones, and send the results back to the requester. Participatory sensing is another domain where SC is very suitable. Mobile users can leverage their sensor-equipped mobile devices to collect environmental or trac data. SC is feasible only if workers and tasks are matched ef- fectively, i.e., tasks are completed in a timely fashion, and workers do not need to travel across very long distances. To that extent, matching at the SC-server must take into ac- count the locations of workers. However, the SC-server may not be trusted, and disclosing individual locations has seri- ous privacy implications [10, 22, 8, 4]. Knowing worker lo- cations, an adversary can stage a broad spectrum of attacks such as physical surveillance and stalking, identity theft, and breach of sensitive information (e.g., an individual's health status, alternative lifestyles, political and religious views, etc). Thus, ensuring location privacy is an essential aspect of SC, because mobile users will not accept to engage in spatial tasks if their privacy is violated. Location privacy has been addressed before in the context of location-based services. Several solutions [10, 22, 8] have been proposed to protect location-based queries, i.e., given an individual's location, nd points of interest in the prox- imity without disclosing the actual coordinates. However, in SC, a worker's location is no longer part of the query, but rather the result of a spatial query around the task lo- cation. This is a considerably more challenging problem, and previous solutions do not oer satisfactory results. In addition, while some work considers queries on private lo- cations in the context of outsourced databases [31, 30], it is assumed that the data owner entity and the querying entity trust each other, with protection being oered only against intermediate service provider entities. This scenario does not apply in SC, as there is no inherent trust relationship between requesters and workers. We propose a framework for protecting privacy of worker locations, whereby the SC-server only has access to data sanitized according to dierential privacy (DP) [6]. In prac- tice, there may be many SC-servers run by diverse organi- zations that do not have an established trust relationships with the workers. On the other hand, every worker sub- scribes to a cellular service provider (CSP) that provides Internet connectivity. The CSP already has access to the 1 worker locations (e.g., through cell tower triangulation), but as opposed to the SC-server, the CSP signs a contract with its subscribers, which stipulates the terms and conditions of location disclosure. Thus, the CSP can collect user lo- cations and release them to third party SC-servers in noisy form, according to DP. However, using DP introduces two dicult challenges, as discussed next. First, the SC-server must match workers to tasks using noisy data, which requires complex strategies to ensure ef- fective task assignment. To create sanitized data releases at the CSP, we adopt the Private Spatial Decomposition (PSD) approach, rst introduced in [4]. A PSD is a san- itized spatial index, where each index node contains a noisy count of the workers rooted at that node. Specically, we devise a mechanism to create a Worker PSD by extending the Adaptive Grid (AG) technique [26]. To ensure that task assignment has a high success rate, we introduce an ana- lytical model that determines with high probability a PSD partition around the task location that includes sucient workers to complete the task. Second, by the nature of the DP protection model, fake en- tries may need to be created in the PSD. Thus, the SC-server cannot directly contact workers, not even if pseudonyms are used, as merely establishing a network connection to an en- tity would allow the SC-server to learn whether an entry is real or not, and breach privacy. To address this challenge, we propose the use of geocasting [24] as means to deliver task requests to workers. Once a PSD partition is identied by the analytical model outlined above, the task request is geocast to all the workers within the partition. Geocast in- troduces overhead considerations that need to be carefully considered in the framework design. Our specic contributions are: (i). We identify the specic challenges of location privacy in the context of SC, and we propose a framework that achieves dierentially-private protection guaran- tees. To the best of our knowledge, this is the rst work to study location privacy for SC. (ii). We propose an analytical model that measures the probability of task completion with uncertain worker locations, and we devise a search strategy that nds appropriate PSD partitions to ensure high success rate of task assignment. (iii). We introduce a geocast mechanism for task request dissemination that is necessary to overcome the re- strictions imposed by DP, and we factor the geocast system overhead in the PSD partition search strategy. (iv). We conduct an extensive set of experiments on real- world datasets which shows that the proposed frame- work is able to protect workers' location privacy with- out signicantly aecting the eectiveness and e- ciency of the SC system. The remainder of this paper is organized as follows: Sec- tion 2 presents necessary background. Section 3 introduces the proposed privacy framework, whereas Section 4 and 5 detail the proposed solution. Experimental results are pre- sented in Section 6, followed by a survey of related work in Section 7, and conclusions in Section 8. 2. BACKGROUND 2.1 SpatialCrowdsourcing Spatial Crowdsourcing SC [15] is a type of online crowd- sourcing where performing a task requires the worker to physically travel to the location of the task (termed spa- tial task). According to the taxonomy in [15], there are two categories of SC, based on how workers are matched to tasks. In Worker Selected Tasks (WST) mode, the SC- server publishes the spatial tasks online, and workers can autonomously choose any tasks in their vicinity without the need to coordinate with the SC-server. In Server Assigned Tasks (SAT) mode, online workers send their location to the SC-server, and the SC-server assigns tasks to nearby work- ers. WST is the simpler protocol, and it does not require work- ers to share their locations with the SC-server. However, the assignment is often sub-optimal, as workers do not have a global system view. Workers typically choose the closest task to them, which may cause multiple workers to travel to the same task, while many other tasks remain unassigned. The SAT mode incurs the overhead of running complex matching algorithms at the SC-server, but the best-suited worker is selected for a task. This requires the SC-server to know the workers' locations, which poses a privacy threat. In our work, we consider the SAT mode, but we also pro- vide location privacy protection for the workers. Instead of directly disclosing their coordinates to the SC-server, worker locations are rst pooled together by a CSP and sanitized according to dierential privacy. This introduces signicant challenges, as the SC-server has to employ far more complex task assignment strategies that must take into account the uncertain nature of the received location data. 2.2 DifferentialPrivacy Dierential Privacy (DP) [6] has emerged as the de-facto standard in data privacy, thanks to its strong protection guarantees rooted in statistical analysis. DP is a seman- tic model which provides protection against realistic adver- saries with access to background information. DP ensures that an adversary is not able to learn from the sanitized data whether a particular individual is present or not in the original data, regardless of the adversary's prior knowledge. DP allows interaction with a database only by means of aggregate (e.g., count, sum) queries. Random noise is added to each query result to preserve privacy, such that an adver- sary that attempts to attack the privacy of some individual workerw will not be able to distinguish from the set of query results (called a transcript) whether a record representingw is present or not in the database. Definition 1 (-indistinguishability). Consider that a database produces transcript U on the set of queriesQS = fQ1;Q2;:::;Qqg, and let > 0 be an arbitrarily-small real constant. Then, transcript U satises -indistinguishability if for every pair of sibling datasets D1, D2 such thatjD1j = jD2j and D1, D2 dier in only one record, it holds that ln Pr[QS D 1 =U] Pr[QS D 2 =U] In other words, an attacker cannot learn whether the tran- script was obtained by answering the query setQS on dataset D1 or D2. Parameter is called privacy budget, and speci- es the amount of protection required, with smaller values corresponding to stricter privacy protection. To achieve - indistinguishability, DP injects noise into each query result, 2 and the amount of noise required is proportional to the sen- sitivity of the query set QS, formally denes as: Definition 2 (L1-Sensitivity). Given any arbitrary sibling datasets D1 and D2, the sensitivity of query set QS is the maximum change in the query results of D1 and D2 (QS) = max D 1 ;D 2 q X i=1 jQS(D1)QS(D2)j An essential result from [7] shows that a sucient condition to achieve dierential privacy with parameter is to add to each query result randomly distributed Laplace noise with mean =(QS)=. Typically, the interaction with a dataset consists of a se- ries of analyses (i.e., transcripts)Ai, each required to satisfy i-dierential privacy. Then, the privacy level of the result- ing analysis can be computed as follows: Theorem 1 (Sequential Composition [21]). LetAi be a set of analyses such that each provides "i-DP. Then, running in sequence all analyses Ai provides ( P i "i)-DP. Theorem 2 (Parallel Composition [21]). IfDi are disjoint subsets of the original database, and Ai is a set of analyses each providing "i-DP, then applying each analysis Ai on partition Di provides max (i)-DP. 2.3 PrivateSpatialDecompositions(PSD) The work in [4] introduced the concept of Private Spatial Decompositions (PSD) to release spatial datasets in a DP- compliant manner. A PSD is a spatial index transformed according to DP, where each index node is obtained by re- leasing a noisy count of the data points enclosed by that node's extent. Various index types such as grids, quad-trees or k-d trees [27] can be used as a basis for PSD. Accuracy of PSD is heavily in uenced by the type of PSD structure and its parameters (e.g., height, fan-out). With space-based partitioning PSD, the split position for a node does not depend on worker locations. This category includes at structures such as grids, or hierarchical ones such as BSP-trees (Binary Space Partitioning) and quad- trees [27]. The privacy budget needs to be consumed only when counting the workers in each index node. Typically, all nodes at same index level have non-overlapping extents, which yields a constant and low sensitivity of 2 per level (i.e., changing a single location in the data may aect at most two partitions in a level). The budget is best dis- tributed across levels according to the geometric allocation [4], where leaf nodes receive more budget than higher levels. The sequential composition theorem applies across nodes on the same root-to-leaf path, whereas parallel composition ap- plies to disjoint paths in the hierarchy. Object-based PSD are simple to construct, but can become unbalanced. Object-based structures such as k-d trees and R-trees [4] perform splits of nodes based on the placement of data points. To ensure privacy, split decisions must also be done according to DP, and signicant budget may be used in the process. Typically, the exponential mechanism [4] is used to assign a merit score to each candidate split point according to some cost function (e.g., distance from median in case of k-d trees), and one value is randomly picked based on its noisy score. The budget must be split between protect- ing node counts and building the index structure. Data- dependent PSD are more balanced in theory, but they are not very robust, in the sense that accuracy can decrease abruptly with only slight changes of the PSD parameters, or for certain input dataset distributions. The recent work in [26] compares tree-based methods with multi-level grids, and shows that two-level grids tend to per- form better than recursive partitioning counterparts. The paper also proposes an Adaptive Grid (AG) approach, where the granularity of the second-level grid is chosen based on the noisy counts obtained in the rst-level (sequential com- position is applied). AG is a hybrid which inherits the sim- plicity and robustness of data-independent PSD, but still uses a small amount of data-dependent information in choos- ing the granularity for the second level. In our work, we adapt the AG method to address SC-specic requirements. 3. PRIVACYFRAMEWORK Section 3.1 presents the system model and the work ow for privacy-preserving SC. Section 3.2 outlines the privacy model and assumptions. Section 3.3 discusses design chal- lenges and associated performance metrics. 3.1 SystemModel We consider the problem of privacy-preserving SC task assignment in the SAT mode. Figure 1 shows the proposed system architecture. Workers send their locations (Step 0) to a trusted cellular service provider (CSP) which collects updates and releases a PSD according to privacy budget mutually agreed upon with the workers. The PSD is ac- cessed by the SC-server (Step 1), which also receives tasks from a number of requesters (Step 2). For simplicity, we fo- cus on the single-SC-server case, but our system model can support multiple SC-servers. When the SC-server receives a task t, it queries the PSD to determine a geocast region (GR) that encloses with high probability workers in relative proximity to t. Due to the uncertain nature of the PSD, this is a challenging process which will be detailed later in Section 5. Next, the SC-server initiates a geocast communication [24] process (Step 3) to disseminate t to all workers within GR. According to DP, sanitizing a dataset requires creation of fake locations in the PSD. If the SC-server is allowed to directly contact work- ers, then failure to establish a communication channel would breach privacy, as the SC-server is able to distinguish fake workers from real ones. Using geocast is a unique feature of our framework which is necessary to achieve protection. Geocast can be performed either with the help of the CSP infrastructure, or through a mobile ad-hoc network where the CSP contacts a single worker in the GR, and then the message is disseminated on a hop-by-hop basis to the entire GR. The latter approach keeps CSP overhead low, and can reduce operation costs for workers. Upon receiving request t, a worker w decides whether to perform the task or not. If yes (Step 4), she sends a consent message to the SC-server conrming w's availability (alter- natively, the consent can be directly sent to the requester). Ifw is not willing to participate in the task, then no consent is sent, and no information about the worker is disclosed. 3.2 PrivacyModelandAssumptions Our specic objective is to protect both the location and the identity of workers during task assignment. Once a worker consents to a task, the worker herself may directly disclose information to the task requester (e.g., to enable 3 3. Geocast {t,GR} 2. Task Request t Requesters Workers SC-Server Worker Database 1. Sanitized Release PSD 4. Consent Cell Service Provider GR 0. Report Locations Figure 1: Privacy framework for spatial crowdsourcing a communication channel between worker and requester). However, such additional disclosure is outside our scope, as each worker has the right to disclose his or her individual information. Our focus is on what happens prior to consent, when worker location and identity must be protected from both task requesters and the SC server. We emphasize that focusing on the SC assignment step is the correct approach, given the fact that SC workers have to physically travel to the task location. Mere completion of a task discloses the fact that some worker must have been at that location, and this sort of disclosure is unavoid- able in SC. To protect her location after consent, a worker can still enjoy some form of identity protection (e.g., using pseudonyms and anonymous routing), for which solutions are already available (e.g., TOR). On the other hand, no solution exists to date for the more challenging problem of privacy-preserving task assignment, hence we direct our ef- forts in this direction. Furthermore, focusing on task assignment also makes sense from a disclosure volume standpoint. During assignment, all workers are candidates for participation, therefore locations of all workers would be exposed, absent a privacy-preserving mechanism. On the other hand, after task request dissemi- nation, only few workers will participate in task completion, and only if they give their explicit consent. Workers cannot trust the SC-server, especially as there may be many such entities with diverse backgrounds, e.g., private companies, non-prots, government organizations, academic institutions, etc. On the other hand, the CSP already has a signed agreement with workers through the service contract, so there is already a trust relationship es- tablished, as well as mutually-agreed upon rules for data disclosure. Furthermore, the CSP already knows where sub- scribers are, e.g., using cell tower triangulation, so worker location reporting does not introduce additional disclosure. However, the CSP has no expertise, and perhaps no nan- cial interest, to host an SC service, which needs to deal with a diverse set of issues such as interacting with various task requester categories, managing proles (e.g., some workers may only volunteer for environmental tasks), etc. The role of the CSP is to aggregate locations from subscribed work- ers, transform them according to DP, and release the data in sanitized form to one or more SC-servers for assignment. As multiple SC-servers can use the same PSD, it is practical for the CSP to provide PSDs for a small fee, e.g., a percent- age of the workers' payment, or a tax incentive in case of a public-interest SC applications. 3.3 DesignGoalsandPerformanceMetrics Protecting worker location complicates signicantly task assignment, and may reduce the eectiveness and eciency of worker-task matching. Due to the nature of DP, it is possible for a region to contain no workers, even if the PSD shows a positive count. Therefore, no workers (or an in- sucient number thereof) may be notied of the task re- quest. The task may not be completed. Alternatively, a worker may be notied of the task even though she is at a long distance away from the task location, whereas a nearer worker does not receive the request. Finally, in the non- private SAT case, only one selected worker, whose location and identity is known, is notied of the task request. With location protection, many redundant messages may need to be sent, increasing system overhead. Therefore, we focus on the following performance metrics: Assignment Success Rate (ASR). Due to PSD data uncertainty, the SC-server may incorrectly assign workers to tasks (e.g., no worker is reached, or task is too far and workers do not accept it). ASR measures the ratio of tasks accepted by a worker 1 to the total number of task requests. The challenge is to maintain ASR close to 100%. Worker Travel Distance (WTD). The SC-server is no longer able to accurately evaluate worker-task distance, hence workers may have to travel long dis- tances to tasks. The challenge is to keep the worker travel distance low, even when exact worker locations are not known. System Overhead. Dealing with imprecise locations increases the complexity of assignment algorithms, which poses scalability problems. A signicant metric to measure overhead is the average number of notied workers (ANW). This number aects both the com- munication overhead required to geocast task requests, as well as the computation overhead of the matching algorithm, which depends on how many workers need to be notied of a task request. 4. BUILDINGTHEWORKERPSD The rst step in the proposed framework consists of build- ing a PSD (at the CSP side) to be later used for task as- signment at the SC-server. Building the PSD is an essen- tial step, because it determines how accurate is the released data, which in turn aects ASR, WTD and ANW . As dis- cussed in Section 2.3, hierarchical object-based partitioning PSD such as k-d trees may be dicult to tune, and they are not robust to changes in data distribution. In this section, we modify the Adaptive Grid (AG) method proposed in [26] to address the specic requirements of the SC framework. AG has been shown to match and even im- prove accuracy compared to k-d-trees, for the same privacy budget . Table 1 summarizes the notations used in our presentation. PSDs based on uniform grids treat all regions in the dataset identically, despite large variances in location density. As a result, they over-partition the space in sparse regions, and 1 ASR does not capture worker reliability, tasks may still fail to complete after being accepted. Our focus is on assignment success, reliability is outside our scope. 4 Symbol Denition ", "i Total privacy budget and level-i budget AG budget split, = 0:5 means "1 ="2 N Total number of workers N 0 Noisy worker count of level-1 cells mimi Level-i grid granularity n Expected noisy worker count of a level-2 cell t A task or its location, used interchangeably ci A level-2 cell nc i Noisy worker count of ci p a c i Acceptance rate of workers within ci c 0 i Sub-cell of cell ci Table 1: Summary of Notations under-partition in dense regions. AG avoids these draw- backs by using a two-level grid and variable cell granular- ity. At the rst level, AG creates a coarse-grained, xed- size m1m1 grid over the data domain. AG uses a data- independent heuristic to choose level-1 granularity as m1 =max(10; l 1 4 r N k2 m ) whereN is the total number of locations (this is considered known, but a high-precision estimate can also be found using a small fraction of the privacy budget). k2 = 10 is suggested in [26]. Next, AG issues m 2 1 count queries, one for each level-1 cell, using a fraction of the total privacy budget: 1 =, where 0 < < 1. AG then partitions each level-1 cell into m2m2 level-2 cells, where m2 is adaptively chosen based on the noisy count N 0 of the level-1 cell: m2 = l r N 0 2 k2 m (1) where2 =1 is the remaining budget, and the constant is set empirically by the authors to k2 = 5. The budget parameter determines how privacy budget is divided be- tween the two levels, and the authors of [26] recommend = 0:5. Figure 2 shows a snapshot of an adaptive grid, with four level-1 cells A,B,C,D. Constructing a dierentially private AG requires two steps. First, the noisy countsN 0 ofA,B,C,D are computed by adding random Laplace noise with mean 1 = 2="1 ("1 indistinguishability) to the actual counts of these cells. Second, based on the noisy counts, level-1 cells are further split into level-2 cells. According to Eq. (1), cell D, which has noisy count 200 is partitioned according to a 3x3 grid, while the granularity for other cells is 2x2. There- after, AG adds to each level-2 cell (ci, i = 1::21) random Laplace noise with mean 2 = 2="2 ("2 indistinguishabil- ity). Finally, their corresponding noisy counts nc i together with the structure of the AG are published. According to Theorem 2.2, the sanitized release of AG provides "-DP. A B C D Level 1 Level 2 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9 c 10 c c c 13 c 14 c 16 c 17 c 15 c 18 c c c c ) 100 ( ' = A N ) 100 ( ' = B N ) 100 ( ' = C N ) 200 ( ' = D N 11 c 12 c 19 c 20 c 21 c Figure 2: A snapshot of adaptive grid (" = 0:5, = 0:5) Although AG was shown to yield good results for general- purpose spatial queries [26], it is not directly applicable to SC, due to its rigidity in choosing its parameters. Specif- ically, the granularity m2 of the level-2 grid is too coarse, leading to large geocast areas and high communication over- head, as we show next. According to Eq. (1), the expected number of workers (i.e., noisy count) in a level-2 cell is: n =N 0 =m2 2 =k2=2 Table 2a presents dierent values ofm2 and n when varying total budget with = 0:5. " "2 m2 n 1 0.5 3 11 0.5 0.25 2 25 0.1 0.05 1 100 (a) Original AG (k 2 = 5) " "2 m2 n 1 0.5 6 2.8 0.5 0.25 5 5.6 0.1 0.05 2 28.2 (b) Modied AG (k 2 = p 2) Table 2: Granularitym 2 and average count per cell n (N 0 = 100) Note that, the values of n are rather large, especially for more restrictive privacy settings (i.e., lower ). For = 0:1, n is 100. In practice, a geocast region is likely to include multiple PSD cells, hence 100 is a lower bound on the ANW , while its typical values can grow much higher, leading to prohibitive communication cost. We propose a more suitable heuristic for choosingk2. Re- call that the primary requirement of SC task assignment is to achieve high ASR. To that extent, we want to ensure that the task request is geocast in a non-empty region, i.e., the real worker count is strictly positive. According to the Laplace mechanism of DP, each PSD count is the sum of noisy and real counts. Given the level-2 privacy budget 2, we can also quantify the distribution of added noise, which has standard deviation = p 2=2. Therefore, if the PSD count is larger than, then with high probability there will be at least one worker in the level-2 cell. Our objective is to increase the granularity m2 in order to decrease overhead, but only to the point where there is at least one worker in a cell. Denote bycountPSD the value reported by PSD for a certain level-2 cell. The probability that the real count is larger than zero is expressed as p h = 1 1 2 exp( countPSD 1=2 ) Furthermore, we want to have the PSD count larger than the noise, i.e., n = k2="2 p 2="2, so at the limit we set k2 = p 2. The resulting probability of having non-empty cells is p h = 1 1 2 exp( p 2) = 0:88. According to Eq. (1), the corresponding granularity is m2 = l q N 0 "2= p 2 m . In summary, we modify AG by carefully reducing the granularity threshold at level-2 such that ANW is reduced, while the probability for each level-2 cell to contain a real worker is at least 88%. Table 2b shows that this new set- ting signicantly reduces n, and as a result ANW . Next, we present a search strategy which looks at grouping cells together such that the ASR achieved is above a given thresh- old. 5. TASKASSIGNMENT When a request for a task t is posted, the SC-server queries the PSD and determines a geocast region GR where 5 the task is disseminated. The goal of the SC-server is to obtain a high success rate for task assignment, while at the same time reducing the worker travel distance WTD and request dissemination overhead ANW . 5.1 TaskLocalnessandAcceptanceRate Travel distance is critical in SC, as workers need to phys- ically visit the task locations. Workers are more likely to perform tasks closer to their home or workplace [23, 15, 1]. The work in [23] shows that 10% of all workers, denoted as super-agents, perform more than 80% of the tasks. Among super-agents, 90% have daily travel distance less than 40 miles, and the average travel distance per day is 27 miles. This property is referred to as task localness [15]. A re- lated study [11] addresses the localness of contents posted by Flickr and Wikipedia users, and proposes a spatial content production model (SCPM) that computes the mean contri- bution distance (MCD) of each worker as follows: MCD(wi) = n X j=1 d(Lw i ;Lc j ) n (2) where L(wi) is the location of worker wi, and Lc j are the locations of its n contributions. Based on Eq. (2), we can nd the maximum travel dis- tance (MTD) that a high percentage of workers are will- ing to travel to perform their assigned tasks. For example, MTD of super-agents in crowdsourcing markets studied in [23] is 40 miles with 90% cumulative ratio of contributors. Besides communication overhead, task localness is thus an- other reason to impose an upper bound on geocast region size. Intuitively, the maximum geocast region is a square area with side size equal to 2 MTD . Hereafter, we re- fer to MTD as both the maximum travel distance and the maximum geocast region size. We denote by acceptance rate (AR) the probabilityp a (1 p a 1), that a worker accepts to complete a task for which s/he receives a request. We assume that all workers are identical and independent of each other in deciding to per- form tasks. The work in [23] studies reward-based SC la- bor markets and shows that super agents have an average AR of 90:73% while other agents have an acceptance rate of 69:58%. Acceptance rate is much smaller in self-incentivized SC [15], where the workers voluntarily perform tasks, with- out receiving incentives. In practice, a worker is more willing to accept nearby tasks. To that extent, we model acceptance rate as a de- creasing function F of travel distance. We consider two cases: (i) linear, where AR decreases linearly with distance starting from an initial MAR (Maximum AR) value (when the worker is co-located with the task) and (ii) Zipf, where acceptance rate follows Zipf distribution with skewness pa- rameters. The higher the value ofs, the fasterpa drops. p a is maximized when the worker is co-located with the task and becomes negligible at MTD. If the distance is larger than MTD, p a = 0. 5.2 AnalyticalUtilityModel We develop an analytical utility model that allows the SC- server to quantify the probability that a task request dissem- inated in a certain GR is accepted by a worker. Intuitively, the utility depends on the AR and on the worker count w estimated to be enclosed within GR. An SC-server will typ- ically establish an expected utility threshold EU which is the targeted success rate for a task (note that, this is a system goal, rather than an outcome, the latter being measured by ASR). Generally, EU is considerably larger than an individ- ual worker's p a , so the GR must contain multiple workers. We dene X as a random variable for the event that a worker accepts a received task: P(X = True) = p a and P(X = False) = 1 p a . Assumingw independent workers, X Binomial(w; p a ). We dene the utility of a geocast re- gion covering w workers as: U = 1 (1p a ) w (3) U measures the probability that at least one worker accepts the task. The utility denition can be extended for the case of redundant task assignment, where multiple work- ers are required to complete task. In such a case, U = 1 P k i=1 w i (p a ) i (1 p a ) wi , where k is the number of workers required to perform the task. Although redundant task assignment is required in some cases [16], in this work we focus on single-worker task assignment. 5.3 GeocastRegionConstruction Given task t, the geocast region construction algorithm must balance two con icting requirements: determine a re- gion that (i) contains sucient workers such that task t is accepted with high probability, and (ii) the size of the geo- cast region is small. The input to the algorithm is task t as well as the worker PSD, consisting of the two-level AG with a noisy worker count for each grid cell. The algorithm chooses as initial GR the level-2 cell that covers the task, and determines itsU value. As long as util- ity is lower than threshold EU , it keeps expanding the GR by adding neighboring cells. Cells are added one at a time, based on their estimated increase in GR utility. Following the task localness property, we take into account the dis- tance of each candidate neighboring cell to the location oft, and give priority to closer cells. The algorithm stops either when the utility of the obtained GR exceeds threshold EU , or when the size of GR is larger than MTD, hence utility can no longer be increased. The GR construction algorithm is a greedy heuristic, as it always chooses the candidate cell that produces the highest utility increase at each step. Algorithm 1 Greedy Algorithm (GDY) 1: Input: task t, MTD, 0<EU<1 2: Output: geocast region GR 3: MTD is the square of size 2MTD centered at t 4: Init GR =fg, utility U = 0 5: Init max-heap Q =flevel-2 cell that covers tg 6: Removefc i ;Uc i g Q, Uc i is computed from Eq. (3) 7: If c i =Nil, return GRfgeocast region is larger than MTDg 8: GR =GR[c i 9: If Uc i 0, U = 1 (1U)(1Uc i ) 10: If UEU, return GR 11: Find neighbors =fc i 0 s neighborsgGR\MTD 12: Q =Q[neighbors 13: Goto Line 6 The pseudo-code of the greedy algorithm is depicted in Algorithm 1. In Line 5, Q is a heap of cellsfcig, sorted decreasingly according to cell utility Uc i . Uc i is computed according to Eq. (3), namely Uc i = 1 (1p a c i ) nc i , where nc i is the noisy worker count ofci, andp a c i is the acceptance rate of the workers insideci. Since worker locations within a 6 cell are not known, we assume they all have the same accep- tance rate. Moreover, we assume the worker-task distance is equal to the average distance between the task and each four corners of cell ci. When a candidate cell is removed from Q (Line 6), it is added to GR (Line 8), and GR utility is updated in Line 9. The updated utility U 0 is the probability that a worker in either the current geocast region, or the newly added cell, or in both, performs the task: U 0 =U(1Uc i ) + (1U)Uc i +UUc i = 1 (1U)(1Uc i ) Line 11 computes the new neighboring cells that are not in GR, and they are not situated farther than MTD. These cells are added to Q according to their respective utilities. If a cell resides partially outside MTD, it is pruned to its fraction contained within the MTD, and its noisy count is updated proportionally to the pruned area. In summary, the geocast region construction algorithm greedily expands the GR by choosing to include at each step the grid cell that results in the highest estimated increase in utility. Cell utility takes into account the noisy worker count, as well as the distance between the cell and the task location. Next, we consider two renements to the heuristic: rst, in Section 5.4 we investigate a ner-grained solution search space by allowing partial cell inclusion; second, in Section 5.5 we consider the eect that the GR shape has on hop-by-hop task request dissemination. 5.4 PartialCellSelection Even though the adaptation of AG proposed in Section 4 signicantly reduces the granularity of level-2 cells, the num- ber of workers can still be rather large, and the resulting ANW can lead to high task request dissemination costs. Such workers may be unnecessarily included in the GR, even if the required EU could be achieved with far fewer workers. We propose an optimization that allows partial inclusion in the GR of a level-2 cell. Before adding a new cell ci to the GR (Line 10 of Al- gorithm 1), the optimization checks whether the utility in- crease provided byci will exceed the required utility EU . If so, the algorithm computes a sub-region of ci whose utility is sucient to reach EU . The pseudocode of the heuristic is depicted in Algorithm 2, which includes two steps. First, it computes the percentage of ci's area (Lines 3-7) that is likely to enclose sucient users. Next, it nds a sub-cell with that area (Lines 8-9) which is uniquely determined by its shape and location. The optimization in Algorithm 2 can be inserted as a function call before Line 10 in the main Algorithm 1. To compute a sub-cell, two constraints need Algorithm 2 Partial Cell Selection Heuristic 1: Input: task location t, last cell c i , current utility Ucurr 2: Output: sub-cell c 0 i of c i 3: dist =distance(t;c i ) 4: p a sub =acc rate(dist) 5: U required = UUcurr 1Ucurr 6: worker count needed to achieve U required , w required = log 1p a sub 1U need 7: Area percentile =w required =wc i 8: If c i covers t, nd sub-cell given area percentile 9: Otherwise, nd sub-cell adjacent with current region to be satised. First, the sub-cell needs to be completely inside the parent cell. Second, the sub-cell must be adjacent with the current GR to form a continuous region. Therefore, depending on whether or not the current GR contains one or multiple cells, we use two strategies to nd the sub-cell. Sub-cell i c ' i c 0 t 4 t 5 t 6 t 7 t 8 t 1 t 2 t 3 t (a) Case 1: splitting c i t 13 c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9 c 10 c 11 c 12 c 14 c 16 c 17 c 15 c 18 c 19 c 20 c 21 c 19 c 20 c 21 c (b) Case 2: splitting cell 7 Figure 3: Examples of partial cell selection. Figure 3a depicts the rst case where the GR includes only one grid cell ci (i.e., the task t0 is inside ci, the parent cell). Intuitively, to cover the closest workers to the task, the shape of the sub-cellc 0 i (dashed line) must be a square. The boundary of cell c 0 i can therefore be completely determined given its area. To satisfy the rst constraint, the center ofc 0 i needs to be in the shaded square, whose center is the same as that of ci, and its size is equal to the dierence between the side lengths of ci and c 0 i . In addition, the position of c 0 i is such that the distance between its center and the task is minimized. The distance is zero when the task (e.g., t0) is inside the shaded region (the task is co-located withc 0 i 's cen- ter). Otherwise, if the task is outside the shaded square, its closest sub-cell's center must be on the border of the shaded square. Subsequently, depending on the relative position of the task to the shaded circle (i.e, eight possibilitiest1-t8), we can nd the corresponding sub-cell's center. For example, the closest sub-cell's center oft1 is the left bottom corner of the shaded square. Figure 3b presents the second case, in which the GR com- prises of multiple cellsf4,7,10,13g. This example is a at version of the AG in Figure 2. The arrows in the gure depict the expansion process of the geocast algorithm. For example, cells 4 and 14 are expanded from cell 10 while cell 7 is expanded from cell 13. To ensure the GR is a continu- ous region, we require the long edge of the sub-cell (dashed rectangle) to be adjacent to the neighbor cell (i.e., 13) that the splitting cell (i.e., 7) is expanded from. When its long edge is xed, the sub-cell is uniquely specied given its area. The rationale behind this choice is to ensure the continuity constraint. 5.5 CommunicationCost Dissemination of a task request within the GR can be implemented in two ways: Infrastructure-based Mode. In this mode, the CSP sends an individual message to each worker within the GR. The cost is proportional to ANW , which may be large. Infrastructure-less Mode. Workers within the GR can relay the task request hop-by-hop, using a mobile ad- hoc network protocol over WiFi or Bluetooth. In this case, the CSP only needs to send several messages to workers (one single message may suce if the worker network is connected). 7 Geocasting using hop-by-hop communication is an attrac- tive alternative. The SC-server does not know the actual worker placement, so the GR construction strategy cannot rely on detailed routing information, but fortunately, the shape of the GR is often a good predictor of ad-hoc routing performance. Intuitively, it is cheaper to geocast within a shape with less skew, such as a circle or a square, as opposed to skewed regions such as line-shape areas, which have large network diameter. For instance, in Figure 3b, the region of cellsf1,2,3,4g is more favorable for geocast thanf2,4,5,6g, despite the fact that the two areas have equal size. We assume that the geocasting cost is proportional to the minimum bounding circle that covers the GR. Thus, the more compact the GR, the lower the cost. Several measures of compactness for two-dimensional shapes are discussed in [19]. One widely accepted measure proposed in [18] is the Digital Compactness Measurement (DCM), which measures region compactness as the ratio between the area of the re- gion and the area of its smallest circumscribing circle. An ecient solution to nd the smallest enclosing circle is a randomized algorithm [28] that runs in linear time to the number of data points in the region. The maximum value of DCM is 1 when the shape is a circle. We modify Algorithm 1 to choose new cells to add to GR based on compactness, instead of utility. At each iteration, the cell that increases the compactness of the GR most is chosen from the list of candidates. Due to the inclusion of the new cell, the potential compactness increase of all other candidates may need to be re-computed, to account for the change in shape. We also consider a hybrid method that factors in both utility and compactness in cell selection. The merit function of the hybrid is a linear combination of the resulting GR utility and compactness. To evaluate the eectiveness of using compactness in the GR search strategy, we use as metric an estimation of the hop count required to disseminate the task request to all workers, given the communication range of the wireless net- work (e.g., 50-100 meters for WiFi). We approximate the hop count as the diameter of the network divided by the communication range: Hop count = Farthest distance between two workers 2 Communication range (4) In practice, the worker network needs to be connected, for the ad-hoc based geocast to succeed. In other words, a mes- sage from any worker (i.e., seed) should be able to reach any other in the GR, using hop-by-hop wireless communi- cation. Otherwise, if the network contains multiple discon- nected components, the task cannot be sent to all workers from a single seed. In the latter case, the CSP would need to send the task to multiple seeds within the ad-hoc net- work. However, this level of detail goes beyond the scope of our work, and we restrict ourselves to using the hop count metric as an estimation of geocast cost, in conjunction with ANW . 6. PERFORMANCEEVALUATION We evaluate experimentally the performance of the pro- posed framework for worker location protection in SC. We present the experimental methodology in Section 6.1, fol- lowed by results and discussions in Section 6.2. 6.1 ExperimentalMethodology Name #Tasks #Workers MTD (km) Gowalla 151,075 6,160 3.6 Yelp 15,583 70,817 13.5 Table 3: Dataset Characteristics We use two real-world datasets: Gowalla and Yelp. Gowalla contains the check-in history of users in a location-based so- cial network. For our experiments, we use the check-in data in the area of San Francisco, California. We assume that Gowalla users are the workers of the SC system, and their locations are those of the most recent check-in points. We also model each check-in point as a task that was previously accepted for execution by a worker. Based on this model, we determine the mean contribution distances (MCDs) ac- cording to Eq. (2), and we compute maximum travel dis- tance (MTD) as the 90%MCD percentile value, leading to a value of 3:6km. The Yelp data corresponds to the greater area of Phoenix, Arizona, and includes locations of 15; 583 restaurants, 70; 817 users and 335; 022 user reviews. We use restaurant locations as tasks, and a user review is equiva- lent to accepting an SC task. The resulting MTD for Yelp is 13:5km. To evaluate the overhead of privacy, we compare our pro- posed solution with a non-private algorithm that has ac- cess to exact worker locations. Given a task and the actual worker locations, the algorithm keeps adding nearby work- ers one by one (1NN , 2NN ,. etc) until the obtained utility exceeds threshold EU , or until the size of the GR is larger than MTD. The geocast query is the minimum bounding circle of the nearest workers. We consider privacy budget values 2f0:1; 0:4; 0:7; 1g, ranging from strict to loose privacy requirements. We set the expected utility EU 2f0:4; 0:5; 0:7; 0:9g and the max- imum acceptance rate MAR2f0:1; 0:4; 0:7; 0:9g. Default values are shown in boldface. For Zipf acceptance rate de- crease function, skew parameters is set to 1. Wireless com- munication range is 100 meters. We randomly generated 1; 000 tasks and measured the performance of the proposed solution with respect to the metrics introduced in Section 3.3: ASR, ANW , and WTD. For WTD, we consider two scenarios: in WTD-NN, the SC- server collects all consents and chooses the closest worker to the task site, whereas in WTD-FC the rst consenting worker is assigned to the task. We also measure the av- erage hop count HOP required for geocast, according to Eq. (4). To compute ASR, we simulate a binomial model as discussed in Section 5.1, and each worker ips a biased coin and decides whether to accept or not a received task request, based on personalized threshold p a (recall that p a takes into account distance to task). A task is considered accepted if at least one worker agrees to perform it. Finally, we also show the results obtained for the average number of cells in a GR (CELL) and the compactness of the GR. Al- though these metrics are not directly perceived by the end users, they help to better understand the underpinnings of the proposed solution. All measured results are averaged over ten random seeds. 6.2 ExperimentalResults 6.2.1 Evaluationof GR ConstructionHeuristics We evaluate the performance of the greedy algorithm for GR construction from Section 5.3 and its variations. GDY refers to the algorithm running using the original AG PSD 8 from [26], whereas G-GR uses our customized granularity AG solution. The optimization allowing partial cell selection is denoted byG-PA, and the combination of bothG-GR and G-PA by G-GP. Figure 4 illustrates the results. G-GP generally performs best in terms of minimizing ANW , WTD and HOP in all combination of datasets (Gowalla, Yelp) and acceptance rate functions (Linear, Zipf). More- over, by comparing G-GP and G-PA with GDY and G- GR, we observe that customized AG granularity contributes mostly to the improvements. Partial cell selection proves useful mostly when the privacy budget is small (i.e., result- ing grid is coarse). Compared to GDY ,G-GP reduces ANW by up to a factor of 5, and the improvement is more signif- icant when privacy budget is low. Specically, increasing provides a more accurate estimation for the worker counts in the PSD, and also the granularity of the level-2 AG grows. As a result, ANW can be more tightly controlled. More- over, G-GP also yields reduced WTD and HOP by up to a factor of 8 and 7, respectively. On the other hand, G-PR obtains lower ASR than the expected utility of 90%, particularly for small . This can be explained based on the fact that applying partial cell selection tends to reduce aggressively the number of workers included in the GR, which may result in under-provisioning (i.e., an insucient number of workers receive task requests). All other methods achieve close to the target EU of 90%, but most often this is a result of over-provisioning, which in turn increases ANW . Figure 5 captures in more detail the eect of G-GP and grid granularity on ASR, as well as the under/over-provisioning tendencies. With coarser-grained grids (i.e., large k2) over- provision occurs, whereas ner-grained grids suer from ex- cessive noise-to-real-count ratio, resulting in under-provision. Note that our choice of k2 = p 2 1:41 obtains a good trade-o: it achieves near 90% utility and also reduces ANW , WTD and HOP. 50 60 70 80 90 100 0.1 0.2 0.4 0.8 1.41 1.6 3.2 6.4 12.8 25.6 ASR k2 Gowalla-Linear Gowalla-Zipf Yelp-Linear Yelp-Zipf Over-provision Under-provision k2 Figure 5: Average ASR over 2f:1;:4;:7; 1g, varying k 2 . 6.2.2 EvaluationofCompactness-BasedHeuristics We evaluate the eect of the compactness-guided heuristic for GR construction. For brevity, we only include Gowalla results (Yelp dataset shows similar trends). As shown in Figure 6 the compactness-based approach (G-GP-Compact signicantly increases the compactness measure compared to its utility-based counterpart (G-GP-Pure). The hop count is also reduced, by up to 36%, particularly when the privacy budget is large. However, the compactness-only approach does not fare that well for lower privacy budgets. On the other hand, the hybrid heuristic that combines utility and ANW HOP WTDNN WTDFC Gow.-Linear 161% 54% 25% 18% Gow.-Zipf 103% 30% 22% 23% Yelp-Linear 202% 92% 19% 20% Yelp-Zipf 132% 41% 17% 25% Table 4: The average relative increase in percentage of dierent measurements when varying compactness in the ranking of candidates (G-GP-Hybrid) manages to perform better than its counterparts for all values. We conclude that such a balanced approach is the best solution for GR construction. 6.2.3 OverheadofAchievingPrivacy We compare the proposed solution with a non-private al- gorithm for task assignment. Figure 7 presents the overhead incurred by privacy when varying (for brevity, we only show Gowalla results). As expected, when increases, the PSD oers more accurate data, and the overhead (in terms ofANW , WTD and HOP) decreases. Interestingly though, ASR drops in value. This can be explained through signif- icant over-provisioning that occurs for lower budgets, when the greedy heuristic enlarges the GR in the quest for achiev- ing the desired EU. As a result, more workers are notied, and the chances of task acceptance are higher. However, overhead is also much higher. We also observe that privacy does not signicantly in- crease WTD, proving that the greedy GR construction al- gorithm does a good job in selecting nearby workers for a task. Table 4 summarizes the variation of considered metrics when adding privacy. Note that, the travel distance, which is perhaps the most important factor in SC, is not considerably impacted by privacy. We also observed that the overhead incurred is generally higher for the sparser Yelp data, which is not surprising, as it is a well-known fact that dierentially private algorithms perform better on dense datasets. Table 4 also shows the eect of dierent acceptance rate functions. Zipf incurs lower overhead compared to Linear. The reason is that with Zipf distribution, the acceptance rate of the workers drops faster for the same distance to the task compared with the linear case. The smaller acceptance rate leads to larger ANW in both private and non-private cases; however, ANW increases at a faster rate in the non- private case (Figure 7). 6.2.4 TheEffectofVaryingMARandEU We evaluate the performance ofG-GP-Hybrid on the Yelp dataset by varying the maximum acceptance rate (MAR) and the expected utility EU (similar trends were observed for Gowalla). Figure 8a shows the results when varying MAR. As expected, a higher acceptance rate yields lower overhead and shorter travel distance, as workers are more willing to accept tasks. The GR size is also smaller, thus leading to a smaller network diameter and HOP value. Interestingly, Figures 8c and 8d show that MAR has a signicant eect on decreasing WTD. This eect is more pronounced than the drop due to increase in privacy budget , as observed in previous experiments. Figure 8e shows that the number of grid cells in the GR drops as MAR increases, due to increased utility of each cell. For the largest MAR value, a single cell is sucient as GR, so CELL = 1. 9 40 60 80 100 120 GDY G-GR G-PA G-GP 0 20 40 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (a) ANW , Gow.-Linear 0.1 0.2 0.3 GDY G-GR 0 0.1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 GDY G-GR G-PA G-GP (b) WTDNN ,Gow.-Linear 0.2 0.3 0.4 0.5 GDY G-GR G-PA G-GP 0 0.1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (c) WTDFC, Gow.-Linear 4 6 8 GDY G-GR G-PA G-GP 0 2 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (d) HOP, Gow.-Linear 40 60 80 100 GDY G-GR 0 20 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 GDY G-GR G-PA G-GP (e) ASR, Gow.-Linear 80 120 160 GDY G-GR G-PA G-GP 0 40 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (f) ANW , Gow.-Zipf 0.2 0.3 0.4 GDY G-GR 0 0.1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 GDY G-GR G-PA G-GP (g) WTDNN , Gow.-Zipf 0.2 0.4 0.6 0 0.2 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 GDY G-GR G-PA G-GP (h) WTDFC, Gow.-Zipf 3 4 5 6 7 8 0 1 2 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 GDY G-GR G-PA G-GP (i) HOP, Gow.-Zipf 40 60 80 100 GDY G-GR 0 20 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 GDY G-GR G-PA G-GP (j) ASR, Gow.-Zipf 250 GDY G-GR 150 200 GDY G-GR G-PA G-GP 100 150 0 50 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (k) ANW , Yelp-Linear 1 2 GDY G-GR G-PA G-GP 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (l) WTDNN , Yelp-Linear 1 2 3 GDY G-GR G-PA G-GP 0 1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (m) WTDFC, Yelp-Linear 20 30 40 50 GDY G-GR G-PA G-GP 0 10 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (n) HOP, Yelp-Linear 40 60 80 100 GDY G-GR 0 20 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 GDY G-GR G-PA G-GP (o) ASR, Yelp-Linear 100 150 200 250 GDY G-GR G-PA G-GP 0 50 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (p) ANW , Yelp-Zipf 1 2 GDY G-GR G-PA G-GP 0 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 G-PA G-GP (q) WTDNN , Yelp-Zipf 1 2 3 GDY G-GR G-PA G-GP 0 1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (r) WTDFC, Yelp-Zipf 20 30 40 50 GDY G-GR G-PA G-GP 0 10 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (s) HOP, Yelp-Zipf 40 60 80 100 GDY G-GR 0 20 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 GDY G-GR G-PA G-GP (t) ASR, Yelp-Zipf Figure 4: Comparison of GR construction heuristics by varying ". 0.4 0.6 0.8 1 G-GP-Pure 0 0.2 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 G-GP-Pure G-GP-Hybrid G-GP-Compact (a) CMP, Gow.-Linear 1 2 3 G-GP-Pure 0 1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 G-GP-Pure G-GP-Hybrid G-GP-Compact (b) HOP, Gow.-Linear 20 30 40 G-GP-Pure 0 10 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 G-GP-Hybrid G-GP-Compact (c) ANW , Gow.-Linear 0.05 0.1 0.15 G-GP-Pure 0 0.05 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 G-GP-Hybrid G-GP-Compact (d) WTDNN ,Gow.-Linear 0.1 0.15 0.2 G-GP-Pure 0 0.05 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 G-GP-Hybrid G-GP-Compact (e) WTDFC, Gow.-Linear 0.4 0.6 0.8 1 G-GP-Pure 0 0.2 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 G-GP-Pure G-GP-Hybrid G-GP-Compact (f) CMP, Gow.-Zipf 2 3 4 G-GP-Pure 0 1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 G-GP-Pure G-GP-Hybrid G-GP-Compact (g) HOP, Gow.-Zipf 40 60 80 G-GP-Pure 0 20 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 G-GP-Hybrid G-GP-Compact (h) ANW , Gow.-Zipf 0.1 0.2 0.3 G-GP-Pure 0 0.1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 G-GP-Hybrid G-GP-Compact (i) WTDNN , Gow.-Zipf 0.1 0.2 0.3 G-GP-Pure 0 0.1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 G-GP-Hybrid G-GP-Compact (j) WTDFC, Gow.-Zipf Figure 6: Comparison of compactness-based heuristics by varying ". Figure 9 measures the impact of increasingEU. To obtain a higher probability of task acceptance, the GR construction algorithm will generate a larger geocast region, leading to increased overhead, as measured by ANW , HOP and WTD. 7. RELATEDWORK While crowdsourcing has largely been used by both re- search communities (e.g., image processing [2], databases [25]) and industry (e.g., oDesk and Amazon Mechanical Turk, spatial crowdsourcing only recently received rising attention (e.g., [23], [16] and [15]). Location privacy has been studied extensively. One group 10 20 30 40 Privacy Non-Privacy 0 10 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (a) ANW , Gow.-Linear 1 2 3 0 1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 Privacy Non-Privacy (b) HOP, Gow.-Linear 0.05 0.1 0.15 0 0.05 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 Privacy Non-Privacy (c)WTDNN , Gow.-Linear 0.1 0.15 0.2 0 0.05 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 Privacy Non-Privacy (d) WTDFC, Gow.-Linear 85 90 95 100 Privacy 75 80 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 Non-Privacy (e) ASR, Gow.-Linear 20 40 60 0 20 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 Privacy Non-Privacy (f) ANW , Gow.-Zipf 2 3 4 0 1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 Privacy Non-Privacy (g) HOP, Gow.-Zipf 0.1 0.2 0.3 0 0.1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 Privacy Non-Privacy (h) WTDNN , Gow.-Zipf 0.2 0.3 0.4 0 0.1 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 Privacy Non-Privacy (i) WTDFC, Gow.-Zipf 85 90 95 100 Privacy 75 80 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 Non-Privacy (j) ASR, Gow.-Zipf Figure 7: Overhead of privacy (G-GP-Hybrid) compared to non-private algorithm. 20 30 40 50 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 10 AR=0.1 AR=0.4 AR=0.7 AR=1 (a) ANW , Yelp-Linear 2 4 6 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 2 AR=0.1 AR=0.4 AR=0.7 AR=1 (b) HOP, Yelp-Linear 0.1 0.2 0.3 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 0.1 AR=0.1 AR=0.4 AR=0.7 AR=1 (c) WTDNN , Yelp-Linear 0.2 0.3 0.4 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 0.1 AR=0.1 AR=0.4 AR=0.7 AR=1 (d) WTDFC, Yelp-Linear 4 6 8 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 2 AR=0.1 AR=0.4 AR=0.7 AR=1 (e) CELL, Yelp-Linear Figure 8: Performance of geocast algorithm (e.g., G-GP-Hybrid) by varying MAR (Yelp-Linear). 20 30 40 50 0 10 EU=30 EU=50 EU=70 EU=90 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (a) ANW , Yelp-Linear 2 4 6 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 2 EU=30 EU=50 EU=70 EU=90 (b) HOP, Yelp-Linear 0.1 0.2 0.3 Eps=0.1 Eps=0.4 0 0.1 EU=30 EU=50 EU=70 EU=90 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (c) WTDNN , Yelp-Linear 0.2 0.3 0.4 0 0.1 EU=30 EU=50 EU=70 EU=90 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 (d) WTDFC, Yelp-Linear 4 6 8 Eps=0.1 Eps=0.4 Eps=0.7 Eps=1 0 2 EU=30 EU=50 EU=70 EU=90 (e) CELL, Yelp-Linear Figure 9: Performance of geocast algorithm (e.g., G-GP-Hybrid) by varying EU (Yelp-Linear). of techniques focus on evaluating the query in a transformed space [17, 31, 8], where both data and query are encrypted while preserving spatial relationship for query answering. Another group protect location data through anonymous usage of information, such as location cloaking, by spatial resolutions of location information. A typical example of spatial cloaking is the spatial l-anonymity [20], where the location of a user is cloaked amongk other users. The men- tioned techniques assume a centralized architecture with a trusted third party, known as location anonymizer. An issue with these approaches is that anonymizer becomes a single point of attacks. Thus, other techniques focus on peer-to- peer systems [3, 9]. While location privacy has largely been studied in the con- text of location-based services, only a few work have studied privacy in participatory sensing (PS) [14, 12, 13, 5]. The fo- cus of [14] is to privately assign a set of spatial tasks to each worker while other works [12, 13] focus on preserv- ing privacy in a PS campaign during the data contribution (i.e., how participants upload the collected data to the server without revealing their identities). The closest work to ours is discussed in [5], in which a privacy-preserving framework in WST mode is proposed, and the participants collect data in an opportunistic manner without the need to coordinate with the server. The dierence in our focus is that the re- questers direct the task distribution phrase rather than the SC-server. The unique of our work to all above studies is the use of dierential privacy (DP), which is one of the most eective privacy techniques. When compared to K-anonymity-based approaches [3, 9, 20], DP is a newer model with strong pri- vacy guarantee and a theoretical foundation. Another dif- ference between DP andk-anonymity is that user's identity is revealed in k-anonymity while DP only reveals synopsis of the data, thus identity is protected. Compared to crypto- based approaches (e.g., [17, 31, 8]), DP techniques (e.g., [4, 26, 29]) are more suitable for our private spatial crowd- sourcing framework. The reason is that the crypto-based approaches are often used when one wants to retrieve query results from the data without publishing the data. For ex- ample answeringkNN queries without revealing locations of the query points, thus it may not involve much distance- based processing (i.e., nd a good region query around a task). In addition, the crypto-based approaches are more 11 computational expensive than the DP-based methods. On the other hand, DP follows k-anonymity model, in which a sanitized version of the data is published. Thus, more distance-based processing can be performed on top of the dierentially private data. 8. CONCLUSION In this paper, we introduced a novel privacy-aware frame- work for spatial crowdsourcing, which enables the participa- tion of workers without compromising their location privacy. We identied geocasting as a needed step to ensure that pri- vacy is protected prior to workers consenting to a task. We also provided heuristics and optimizations for determining eective geocast regions that achieve high task assignment rate with low overhead. Our experimental results on real data demonstrated that the proposed techniques are eec- tive, and the cost of privacy is practical. As future work, we aim to extend our framework for the case where privacy of both workers and tasks needs to be protected. Another challenging problem is to address PSD in the context of mul- tiple time snapshots. Finally, we will focus on nding more sophisticated PSD structures that provide better accuracy than AG. 9. REFERENCES [1] F. Alt, A. S. Shirazi, A. Schmidt, U. Kramer, and Z. Nawaz. Location-based crowdsourcing: extending crowdsourcing to the real world. In 6th Nordic Conference on Human-Computer Interaction, pages 13{22, 2010. [2] K.-T. Chen, C.-C. Wu, Y.-C. Chang, and C.-L. Lei. A crowdsourceable qoe evaluation framework for multimedia content. In Proceedings of the 17th ACM international conference on Multimedia, pages 491{500. ACM, 2009. [3] C.-Y. Chow, M. F. Mokbel, and X. Liu. Spatial cloaking for anonymous location-based services in mobile peer-to-peer environments. GeoInformatica, 15(2):351{380, 2011. [4] G. Cormode, C. Procopiuc, D. Srivastava, E. Shen, and T. Yu. Dierentially private spatial decompositions. In ICDE, pages 20{31, 2012. [5] C. Cornelius, A. Kapadia, D. Kotz, D. Peebles, M. Shin, and N. Triandopoulos. Anonysense: privacy-aware people-centric sensing. In Intl. Conf. on Mobile systems, applications, and services, pages 211{224, 2008. [6] C. Dwork. Dierential privacy. In Automata, languages and programming, pages 1{12. Springer, 2006. [7] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, pages 265{284. Springer, 2006. [8] G. Ghinita, P. Kalnis, A. Khoshgozaran, C. Shahabi, and K.-L. Tan. Private queries in location based services: anonymizers are not necessary. In SIGMOD, pages 121{132, 2008. [9] G. Ghinita, P. Kalnis, and S. Skiadopoulos. Mobihide: a mobilea peer-to-peer system for anonymous location-based queries. In Advances in Spatial and Temporal Databases, pages 221{238. Springer, 2007. [10] M. Gruteser and D. Grunwald. Anonymous Usage of Location-Based Services Through Spatial and Temporal Cloaking. In USENIX MobiSys, 2003. [11] B. J. Hecht and D. Gergle. On the localness of user-generated content. In Proceedings of the 2010 ACM conference on Computer supported cooperative work, pages 229{232. ACM, 2010. [12] L. Hu and C. Shahabi. Privacy assurance in mobile sensing networks: go beyond trusted servers. In Pervasive Computing and Communications, pages 613{619, 2010. [13] K. L. Huang, S. S. Kanhere, and W. Hu. Towards privacy-sensitive participatory sensing. In Pervasive Computing and Communications, pages 1{6, 2009. [14] L. Kazemi and C. Shahabi. Towards preserving privacy in participatory sensing. In Pervasive Computing and Communications, pages 328{331. IEEE, 2011. [15] L. Kazemi and C. Shahabi. Geocrowd: enabling query answering with spatial crowdsourcing. In ACM SIGSPATIAL GIS, pages 189{198, 2012. [16] L. Kazemi, C. Shahabi, and L. Chen. Geotrucrowd: Trustworthy query answering with spatial crowdsourcing. 2013. [17] A. Khoshgozaran, C. Shahabi, and H. Shirani-Mehr. Location privacy: going beyond k-anonymity, cloaking and anonymizers. Knowledge and Information Systems, 26(3):435{465, 2011. [18] C. E. Kim and T. A. Anderson. Digital disks and a digital compactness measure. In Proceedings of the sixteenth annual ACM symposium on Theory of computing, pages 117{124. ACM, 1984. [19] W. Li, M. F. Goodchild, and R. Church. An ecient measure of compactness for two-dimensional shapes and its application in regionalization problems. International Journal of Geographical Information Science, (ahead-of-print):1{24, 2013. [20] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):3, 2007. [21] F. McSherry and I. Mironov. Dierentially private recommender systems: building privacy into the net. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 627{636. ACM, 2009. [22] M. F. Mokbel, C.-Y. Chow, and W. G. Aref. The New Casper: Query Processing for Location Services without Compromising Privacy. In Proc. of VLDB, 2006. [23] M. Musthag and D. Ganesan. Labor dynamics in a mobile micro-task market. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 641{650. ACM, 2013. [24] J. C. Navas and T. Imielinski. Geocastgeographic addressing and routing. In Proceedings of the 3rd annual ACM/IEEE international conference on Mobile computing and networking, pages 66{76. ACM, 1997. [25] A. G. Parameswaran, H. Garcia-Molina, H. Park, N. Polyzotis, A. Ramesh, and J. Widom. Crowdscreen: Algorithms for ltering data with humans. In ACM SIGMOD, pages 361{372. ACM, 2012. [26] W. Qardaji, W. Yang, and N. Li. Dierentially private grids for geospatial data. In Data Engineering (ICDE), 2012 IEEE 28th International Conference on. IEEE, 2013. [27] H. Samet. The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1990. [28] E. Welzl. Smallest enclosing disks (balls and ellipsoids). Springer, 1991. [29] Y. Xiao, L. Xiong, and C. Yuan. Dierentially private data release through multidimensional partitioning. In Secure Data Management, pages 150{168. Springer, 2010. [30] B. Yao, F. Li, and X. Xiao. Secure Nearest Neighbor Revisited. In Proc. of ICDE, 2013. [31] M. L. Yiu, G. Ghinita, C. S. Jensen, and P. Kalnis. Enabling search services on outsourced private spatial data. The VLDB Journal, 19(3):363{384, 2010. 12
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 962 (2015)
PDF
USC Computer Science Technical Reports, no. 968 (2016)
PDF
USC Computer Science Technical Reports, no. 966 (2016)
PDF
USC Computer Science Technical Reports, no. 964 (2016)
PDF
USC Computer Science Technical Reports, no. 959 (2015)
PDF
USC Computer Science Technical Reports, no. 948 (2014)
PDF
USC Computer Science Technical Reports, no. 835 (2004)
PDF
USC Computer Science Technical Reports, no. 840 (2005)
PDF
USC Computer Science Technical Reports, no. 694 (1999)
PDF
USC Computer Science Technical Reports, no. 719 (1999)
PDF
USC Computer Science Technical Reports, no. 855 (2005)
PDF
USC Computer Science Technical Reports, no. 740 (2001)
PDF
USC Computer Science Technical Reports, no. 828 (2004)
PDF
USC Computer Science Technical Reports, no. 618 (1995)
PDF
USC Computer Science Technical Reports, no. 600 (1995)
PDF
USC Computer Science Technical Reports, no. 736 (2000)
PDF
USC Computer Science Technical Reports, no. 785 (2003)
PDF
USC Computer Science Technical Reports, no. 733 (2000)
PDF
USC Computer Science Technical Reports, no. 647 (1997)
PDF
USC Computer Science Technical Reports, no. 893 (2007)
Description
Hien To, Gabriel Ghinita, and Cyrus Shahabi. "A framework for protecting worker location privacy in spatial crowdsourcing." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 943 (2014).
Asset Metadata
Creator
Ghinita, Gabriel
(author),
Shahabi, Cyrus
(author),
To, Hien
(author)
Core Title
USC Computer Science Technical Reports, no. 943 (2014)
Alternative Title
A framework for protecting worker location privacy in spatial crowdsourcing (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
12 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16270510
Identifier
14-943 A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing (filename)
Legacy Identifier
usc-cstr-14-943
Format
12 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/