Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Enabling query answering in a trustworthy privacy-aware spatial crowdsourcing
(USC Thesis Other)
Enabling query answering in a trustworthy privacy-aware spatial crowdsourcing
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ENABLINGQUERYANSWERINGINATRUSTWORTHYPRIVACY-AWARE SPATIALCROWDSOURCING by LeylaKazemi ADissertationPresentedtothe FACULTYOFTHEUSCGRADUATESCHOOL UNIVERSITYOFSOUTHERNCALIFORNIA InPartialFulfillmentofthe RequirementsfortheDegree DOCTOROFPHILOSOPHY (COMPUTERSCIENCE) December2012 Copyright 2012 LeylaKazemi tomyparentsfortheirendlesslove,support,andencouragement ii Acknowledgments Iwouldliketotakethisopportunitytoexpressmygratitudetoanumberofpeoplewho helpedandsupportedmeduringmyPhDstudies. First of all, I would like to thank my academic advisor, Professor Cyrus Shahabi, without whom this work would not be possible. His enthusiastic support has always beenagreatmotivationformeduringthisjourney. Hetaughtmehowtoseeaproblem, solveit, and more importantly howto see the big picture of the problem. He taught me thatagoodresearchproblemistheonewhichreflectsareal-worldproblem. Iadmirehis helpandsupport,andtrulyhonoredtohavehimasmyadvisorduringmyPhDstudies. I am also greatly indebted to the folks at InfoLab for their support and encourage- mentinthepastfewyears. Amongthem,aspecialthanksgoestoDr. MehdiSharifzadeh whohelpedmeinthefirsttwoyearswhenIjoinedInfoLab,andIhadthehonortocoau- thortwopaperswithhim. Next,IwouldliketothankDr. FarnoushBanaei-Kashanifor his mentorship during my PhD studies. He has always been a great listener, and his advices have been a great encouragement during the difficult times. I am also thankful to Dr. Mehrdad Jahangiri and Dr. Seon Ho Kim for their support and help. Finally, I wouldalsoliketoexpressmyappreciationtoAli,Jeff,andspeciallyHoutanforalways helpingmewheneverIwasfacingproblemsthroughoutmyresearch. IamthankfultoProfessorGauravSukhatme,ProfessorYanLiu,ProfessorAiichiro Nakano, and Professor Shri Narayanan who served in my qualification committee, and iii provided many insightful comments on my research work. I would also like to thank Professor Gaurav Sukhatme and Professor Shri Narayanan for serving in my defense committee,andofferinggreatfeedbacksontheimprovementofmythesis. Last but not least, I sincerely appreciate the love, support, and encouragement of those to whom this dissertation is dedicated. To my dearest mom and dad, Effat Fekri and Morteza Kazemi, for their endless love and support throughout all these years. Words cannot describe the gratitude I owe to you. To my sister, Mimi, my brothers MohammadandMehdi,andmybrother-in-law,Farshad,whohavealwayslovedmeand encouraged me in this journey. To my dearest loving friend, Amirhossein, for always being there for me. I’m truly blessed to have you all, and admire your continuous sup- portandlove. iv Contents Dedication ii Acknowledgments iii ListofFigures viii ListofTables x Abstract xi 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 ThesisStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Self-Incentivised Server Constrained Server Assigned (SISSA) Spatial Crowdsourcing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 PrivacyinSISSASpatialCrowdsourcing . . . . . . . . . . . . 5 1.3.2 TrustinPrivacy-AwareSISSASpatialCrowdsourcing . . . . . 7 1.4 Self-Incentivised Worker Constrained Server Assigned (SIWSA) Spa- tialCrowdsourcing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.1 TrustinSIWSASpatialCrowdsourcing . . . . . . . . . . . . . 10 1.4.2 PrivacyinTrustworthySIWSASpatialCrowdsourcing . . . . . 11 1.5 RoadMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 ATaxonomyofSpatialCrowdsourcing 14 2.1 Terminologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 SpatialCrowdsourcingClassification . . . . . . . . . . . . . . . . . . . 15 2.2.1 Reward-basedSpatialCrowdsourcing . . . . . . . . . . . . . . 15 2.2.2 Self-incentivisedSpatialCrowdsourcing . . . . . . . . . . . . . 16 2.3 SpatialTaskPublishingModes . . . . . . . . . . . . . . . . . . . . . . 16 2.3.1 WorkerSelectedTasks(WST)Mode . . . . . . . . . . . . . . . 17 2.3.2 ServerAssignedTasks(SAT)Mode . . . . . . . . . . . . . . . 17 2.3.2.1 ServerSpatialConstrained . . . . . . . . . . . . . . 18 v 2.3.2.2 WorkerSpatialConstrained . . . . . . . . . . . . . . 18 3 RelatedWork 19 3.1 RelatedWorkinCrowdsourcing . . . . . . . . . . . . . . . . . . . . . 19 3.2 RelatedWorkinPrivacy . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 RelatedWorkinTrust . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4 Self-IncentivisedServerConstrainedServerAssigned(SISSA)SpatialCrowd- sourcing 25 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1.2 SystemModel . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2 PrivacyinSISSASpatialCrowdsourcing . . . . . . . . . . . . . . . . . 28 4.2.1 FormalProblemDefinition . . . . . . . . . . . . . . . . . . . . 29 4.2.2 PiRi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.2.1 QueryFormation . . . . . . . . . . . . . . . . . . . 33 4.2.2.2 QuerySelection . . . . . . . . . . . . . . . . . . . . 35 4.3 TrustinPrivacy-AwareSISSASpatialCrowdsourcing . . . . . . . . . . 42 4.3.1 FormalProblemDefinition . . . . . . . . . . . . . . . . . . . . 42 4.3.2 TruPa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3.2.1 LimitedPruningTechnique(LPT) . . . . . . . . . . 47 4.3.2.2 BoundedAnonymityLevel(BAL) . . . . . . . . . . 53 4.3.2.3 Heuristics-basedBoundedAnonymityLevel(HBAL) 56 5 Self-IncentivisedWorkerConstrainedServerAssigned(SIWSA)SpatialCrowd- sourcing 60 5.1 TaskAssignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.1.2 AssignmentProtocol . . . . . . . . . . . . . . . . . . . . . . . 63 5.1.2.1 Greedy(GR)Strategy . . . . . . . . . . . . . . . . . 64 5.1.2.2 LeastLocationEntropyPriority(LLEP)Strategy . . . 67 5.1.2.3 NearestNeighborPriority(NNP)Strategy . . . . . . 70 5.2 TrustinSIWSASpatialCrowdsourcing . . . . . . . . . . . . . . . . . 71 5.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2.1.1 Terminologies . . . . . . . . . . . . . . . . . . . . . 72 5.2.1.2 ReputationScheme . . . . . . . . . . . . . . . . . . 74 5.2.1.3 ProblemDefinition . . . . . . . . . . . . . . . . . . 76 5.2.2 ComplexityAnalysis . . . . . . . . . . . . . . . . . . . . . . . 78 5.2.3 AssignmentProtocol . . . . . . . . . . . . . . . . . . . . . . . 80 5.2.3.1 Greedy(GR)Approach . . . . . . . . . . . . . . . . 80 5.2.3.2 LocalOptimization(LO)Approach . . . . . . . . . . 82 5.2.3.3 Heuristic-basedLocalOptimization(HLO)Approach 84 vi 5.3 PrivacyinTrustworthySIWSASpatialCrowdsourcing . . . . . . . . . 88 5.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.3.1.1 ProblemDefinition . . . . . . . . . . . . . . . . . . 88 5.3.1.2 SystemDesign . . . . . . . . . . . . . . . . . . . . . 89 5.3.1.3 ThreatModel . . . . . . . . . . . . . . . . . . . . . 91 5.3.1.4 TrustModel . . . . . . . . . . . . . . . . . . . . . . 92 5.3.2 AssignmentProtocol . . . . . . . . . . . . . . . . . . . . . . . 93 6 ExperimentalEvaluations 96 6.1 PrivacyinSISSASpatialCrowdsourcing . . . . . . . . . . . . . . . . . 96 6.1.1 ExperimentalMethodology . . . . . . . . . . . . . . . . . . . 96 6.1.2 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.1.3 EffectofPrivacyRequirement . . . . . . . . . . . . . . . . . . 99 6.1.4 EffectofTransmissionRange . . . . . . . . . . . . . . . . . . 100 6.2 TrustinPrivacy-AwareSISSASpatialCrowdsourcing . . . . . . . . . . 101 6.2.1 ExperimentalMethodology . . . . . . . . . . . . . . . . . . . 101 6.2.2 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.2.3 EffectofTrustLevel . . . . . . . . . . . . . . . . . . . . . . . 103 6.2.4 EffectofPrivacyRequirement . . . . . . . . . . . . . . . . . . 104 6.3 SIWSASpatialCrowdsourcing . . . . . . . . . . . . . . . . . . . . . . 105 6.3.1 ExperimentalMethodology . . . . . . . . . . . . . . . . . . . 105 6.3.2 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.3.3 EffectofMaximumAcceptableTasksConstraint . . . . . . . . 108 6.3.4 EffectofSpatialRegionConstraint . . . . . . . . . . . . . . . 109 6.4 TrustinSIWSASpatialCrowdsourcing . . . . . . . . . . . . . . . . . 110 6.4.1 ExperimentalMethodology . . . . . . . . . . . . . . . . . . . 111 6.4.2 EffectofNumberofWorkersperTask(W/T) . . . . . . . . . . 113 6.4.3 EffectofNumberofTasksperWorker(T/W) . . . . . . . . . . 116 6.4.4 EffectofMaximumAcceptableTasks(maxT)Constraint . . . . 117 7 Conclusions 119 ReferenceList 121 vii ListofFigures 2.1 Ataxonomyofspatialcrowdsourcing. Thefocusofthisworkisshown ingrey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1 Illustratinganexampleofprivacy-awarerangequery . . . . . . . . . . 27 4.2 Illustratingtheassignmentofspatialtaskstotheworkers . . . . . . . . 30 4.3 Illustratinganexampleofqueryformationforasingleworker . . . . . 33 4.4 Illustratinganexampleofrangedependency . . . . . . . . . . . . . . . 34 4.5 Queryformationalgorithm . . . . . . . . . . . . . . . . . . . . . . . . 35 4.6 Illustratinganexampleofall-inclusivityleak . . . . . . . . . . . . . . 36 4.7 Queryselectionalgorithm. . . . . . . . . . . . . . . . . . . . . . . . . 40 4.8 IllustratingexamplesofkN-workerandRkN-workersetswithk = 2 . . 43 4.9 IllustratinganexampleofPaRknWproblem . . . . . . . . . . . . . . . 45 4.10 PublickNN queryoverprivatedataalgorithm . . . . . . . . . . . . . . 48 4.11 IllustratingthefirstiterationinPkNNP algorithm . . . . . . . . . . . 50 4.12 LPTfilteringalgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.13 LPTrefiningalgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.1 Thespatialcrowdsourcingframework . . . . . . . . . . . . . . . . . . 62 5.2 An example of the reduction of the maximum task assignment instance problemtothemaximumflowproblematinstances i . . . . . . . . . . 66 5.3 Atrustworthyspatialcrowdsourcingframework . . . . . . . . . . . . . 73 5.4 Anexampleofatrustworthyspatialcrowdsourcingsystem . . . . . . . 74 viii 5.5 Greedyalgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.6 LocalOptimizationalgorithm. . . . . . . . . . . . . . . . . . . . . . . 84 6.1 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.2 Effectofprivacyrequirement . . . . . . . . . . . . . . . . . . . . . . . 100 6.3 Effectoftransmissionrange . . . . . . . . . . . . . . . . . . . . . . . 100 6.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.5 Effectoftrustlevel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.6 Effectofprivacyrequirement . . . . . . . . . . . . . . . . . . . . . . . 105 6.7 Scalability-Syntheticdata . . . . . . . . . . . . . . . . . . . . . . . . 108 6.8 Scalability-Realdata . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.9 EffectofmaxT -Syntheticdata . . . . . . . . . . . . . . . . . . . . . 109 6.10 EffectofR-Syntheticdata . . . . . . . . . . . . . . . . . . . . . . . . 110 6.11 EffectofW/Tonsyntheticdata . . . . . . . . . . . . . . . . . . . . . . 115 6.12 EffectofW/Tonrealdata . . . . . . . . . . . . . . . . . . . . . . . . 116 6.13 EffectofT/W-Syntheticdata . . . . . . . . . . . . . . . . . . . . . . 117 6.14 EffectofmaxT -SYN-UNIFORM . . . . . . . . . . . . . . . . . . . . 118 ix ListofTables 4.1 Scoreassignmenttothequeryregions . . . . . . . . . . . . . . . . . . 41 4.2 Scoredistributionamongthecontainerregions . . . . . . . . . . . . . 41 4.3 Votingresult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.4 R2N-workersetofsetW . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.5 2N-ASRsetforsetT inLPT . . . . . . . . . . . . . . . . . . . . . . . 51 4.6 R2N-ASRsetforsetT inLPT . . . . . . . . . . . . . . . . . . . . . . 51 4.7 2N-worker A forsetT inLPT . . . . . . . . . . . . . . . . . . . . . . . 52 4.8 R2N-workersetforsetW inLPT . . . . . . . . . . . . . . . . . . . . 53 4.9 2N-ASRsetforsetT inBAL . . . . . . . . . . . . . . . . . . . . . . . 55 4.10 R2N-workersetforsetW inBAL . . . . . . . . . . . . . . . . . . . . 56 4.11 R2N-workerforsetW inHBAL . . . . . . . . . . . . . . . . . . . . . 59 5.1 IllustratingthepotentialmatchsetsforthespatialtasksofFigure5.4 . . 78 5.2 IllustratingthestepsofGRapproachfortheexampleofFigure5.4 . . . 82 5.3 IllustratingthestepsofLOapproachfortheexampleofFigure5.4 . . . 84 6.1 DistributionofthesyntheticdataforW/T . . . . . . . . . . . . . . . . 113 6.2 DistributionofthesyntheticdataforT/W . . . . . . . . . . . . . . . . 113 x Abstract With the ubiquity of mobile devices, spatial crowdsourcing is emerging as a new plat- form, enabling spatial tasks (i.e., tasks related to a location) assigned to and performed by human workers. However, privacy and trust are the two significant barriers to the success of any spatial crowdsourcing system. First, the workers may not want to asso- ciatethemselveswiththetasktheyperform. Second,thevalidityofthecontributeddata isnotverified,sincetheintentionsoftheworkersisnotalwaysclear. Inthisdissertation, forthefirsttimeweintroduceataxonomyforspatialcrowdsourcing. Subsequently,we studyoneclassofthistaxonomy,inwhichworkerssendtheirlocationstoacentralized server and thereafter the server assigns to every worker his nearby tasks. Thereafter, we formally define the problem of privacy and trust in spatial crowdsourcing systems and examine its challenges. We propose a trustworthy privacy-aware framework for spatial crowdsourcing systems, which enables the participation of the workers without compromisingtheirprivacywhileimprovingthetrustworthinessoftheperformedtasks. xi Chapter1 Introduction 1.1 Motivation Duetotheubiquityofsensors,everypersonwithamobilephonecannowactasamulti- modalsensorcollectingvarioustypesofdatainstantaneously(e.g.,picture,video,audio, location, time, speed, direction, acceleration). Many studies suggest significant future growth in the number of mobile smart phone users, the phone’s hardware and software features, and the broadband bandwidth. Therefore, it is critical to fully utilize this new platform for various tasks, among which the most promising is spatial crowdsourcing. In this work, we introduce spatial crowdsourcing as the process of crowdsourcing a set of spatial tasks (i.e., tasks related to a location) to a set of workers, which requires the workerstoperformthespatialtasksbyphysicallytravelingtothoselocations. Consider a scenario, in which a requester is interested in collecting pictures and videos of the anti-government demonstrations from various locations of a city. With spatial crowd- sourcing,therequester,insteadoftravelingtothelocationsofeachoftheevents,issues hisquerytoaspatialcrowdsourcingserver(orSC-server). Consequently,theSC-server crowdsourcesthequeryamongtheavailableworkersinthevicinityoftheevents. Once theworkersdocumenttheirnearbyevents,theresultsaresentbacktotherequester. While crowdsourcing has recently attracted both research communities (e.g., database [FKK + 11], image processing [CWCL, SF08], NLP [SOJN]) and industry (e.g., Amazon’s Mechanical Turk [tur] and CrowdFlower [flo]), only a few work [ASS + 10,ber09]havestudiedspatialcrowdsourcing. Moreover,mostexistingworkon 1 spatial crowdsourcing focus on a particular class of spatial crowdsourcing called par- ticipatory sensing. With participatory sensing, the goal is to exploit the mobile users, for a given campaign, by leveraging their sensor-equipped mobile devices to collect and share data. Some real-world examples of participatory sensing projects include [cyc,ber09,HBZ + ,MPR]. Forexample,theMobileMillenniumproject[ber09]byUC Berkeleyisastate-of-the-artsystemthatusesGPS-enabledmobilephonestocollecten route traffic information and upload it to a server in real time. The server processes the contributed traffic data, estimates future traffic flows and sends traffic suggestions and predictions back to the mobile users. Similar projects were implemented earlier by CalTel [HBZ + ] and Nericell [MPR] which used mobile sensors/smart phones mounted on vehicles to collect information about traffic, WiFi access points on the route and roadcondition. InCycleSense[cyc],bikersreporttheirbikingroutestoaserverduring their daily commute in the Los Angeles area, along with information about air quality, hazards,trafficconditions,accidents,etc. All these previous studies on participatory sensing focus on a single campaign and trytoaddresschallengesspecifictothatcampaign. Moreexamplesofsinglecampaigns include[DKCB],whichisacampaignforwatchingpetroprices,and[PHG07]whichis a campaign for monitoring the urban air pollution. However, our focus is on devising a genericcrowdsourcingframework,similartoAmazonTurk,butspatial,wheremultiple campaigns can be handled simultaneously. Moreover, most existing studies on partic- ipatory sensing focus on small campaigns with a limited number of workers, and are not scalable to large spatial crowdsourcing applications. Finally, spatial crowdsourcing subsumes participatory sensing by introducing a general framework, which allows any formofspatialtaskstobeassignedandperformedbyhumans. 2 However, two major impediments may hinder spatial crowdsourcing’s practicality andsuccessinreal-world: privacyandtrust. First,withspatialcrowdsourcingthework- ers may not want to associate themselves with the spatial tasks that are answered and transmitted. For example, during the 2009’s unrest in Iran, many participants uploaded pictures and videos of the anti-government demonstrations into various news agency servers (e.g., CNN), crippling the establishment’s censorship of news to the outside world. Unfortunately,thesameactrevealedtheidentityofsomeoftheseworkersresult- ingintheirarrestandconviction(andinsomeunfortunatecasesledtotheirexecutions). Second, the data contributed by workers cannot always be trusted, because the motiva- tion of the workers for data contribution is not always clear. For example, in the same unrest in Iran, undercover agents also uploaded pictures and videos painting a totally different image of what were occurring. Some skeptics of spatial crowdsourcing go as far as calling it a garbage-in-garbage-out system due to the issues of trust. Finally, the interplayofprivacyandtrustmakestheproblemevenmorechallengingasoncewecan successfullyhideworkers’identities;evaluatingreputationoftheworkersbecomeseven harder. In this dissertation, for the first time we introduce a taxonomy for spatial crowd- sourcing. First,weclassifyspatialcrowdsourcingbasedonpeople’smotivation. Subse- quently, we focuson one classof thistaxonomy, in whichworkersare self-incentivised to perform tasks voluntarily. Here, people usually have other incentives rather than receiving a reward such as documenting an event or promoting their cultural, political, or religious views. Next, we define two modes of task publishing in self-incentivised spatial crowdsourcing. Consequently, we focus on one mode of task publishing, in which workers send their locations to a centralized server and thereafter the server assigns to every worker his nearby tasks. This mode of task publishing, referred to 3 as Self-incentivised Server Assigned can further be classified based on how the assign- ment is performed into two subclasses: Self-Incentivised Server Constrained Server Assigned (SISSA) and Self-Incentivised Worker Constrained Server Assigned (SIWSA). With SISSA, the server enforces the spatial constraint on how the tasks should be assigned by assigning every spatial task to his nearest worker, whereas with SIWSA, the worker enforces the spatial constraint by specifying a spatial region in which the worker is willing to perform tasks. Consequently, in this dissertation we first study the two subclasses in more details. Thereafter, for each of the subclasses we focus on the issuesofprivacyandtrust. 1.2 ThesisStatement In this dissertation, we introduce spatial crowdsourcing as a new emerging platform, which allow requesters to issue spatial queries that can be crowdsourced to a set of worker. Weidentifyprivacyandtrustasthekeyunderlyingimpedimentstothesuccess of any spatial crowdsourcing system. We also study the issues of privacy and trust in more details on one class of spatial crowdsourcing, self-incentivised server assigned spatialcrowdsourcing. Moreformally,ourgoalisto develop an efficient framework for self-incentivisedserverassignedspatialcrowdsourcingwhichenablestheparticipation of the workerswithout compromisingtheir privacywhile improvingthe trustworthiness oftheperformedtasks. 4 1.3 Self-Incentivised Server Constrained Server Assigned(SISSA)SpatialCrowdsourcing With SISSA spatial crowdsourcing, workers send their location to the spatial crowd- sourcing server. The server then assigns to each worker all the spatial tasks which are closer to him than to any other worker. Once the spatial tasks are assigned, the worker voluntarilyperformsthetasksbyphysicallytravelingtothoselocations. 1.3.1 PrivacyinSISSASpatialCrowdsourcing First, we study the issue of privacy in SISSA spatial crowdsourcing. We define the pri- vacyproblemasdisassociatingaworkerfromthetaskheperforms. Considerascenario wherethegoalofarequesteristocollectpictures/videosfromtheanti-governmentriots at different locations of a city with the coordinated effort of the workers. Accordingly, each worker w should query the server for the set of closeby spatial tasks. These are the spatial tasks that are closer tow than to any other worker. One may argue a simple anonymization of the worker’s identity will achieve this. However, we argue that hid- ingtheuser’sidentityalonewithouthidinghislocationwouldnotaddresstheseprivacy issues. An attacker or a (potentially malicious) SC-server can infer the identity of the querysourcefromits(subsequent)locationinformation. Forexample,auser’slocation information can be tracked through several stationary connection points (e.g., cell tow- ers). Afterawhile,theuserleaves”atrailofpacketcrumbs”whichcouldbeassociated to a certain residence or office location and easily lead to determine the user’s identity. Several other types of surprisingly private information such as health issues (e.g., pres- ence in a cancer treatment center) or religion preferences (e.g., presence in a church) can also be revealed by just observing anonymous users’ movement and usage pattern overtime[blu]. Frequentchangesofpseudonymshavebeenlongsuggestedasawayof 5 protecting user’s identity. Recently, in [JWH], it has been shown that this might not be enoughifaserverhasaccesstousers’locationsandcanrecordthemforaperiodoftime. In general, Barabsi et al. showed in a recent publication in Nature [GRB] that there is a close correlation between people’s identities and their movement patterns. Thus, the servercanidentifyaqueryissuerbyassociatingthequerytothelocationfromwhichthe query is issued. We refer to this process as a location-based attack. Therefore, in this research effort we focus only on protecting the workers from location-based attacks by disassociatingaqueryfromthequerylocation. Notethathidingauserlocationismuch morechallengingthanhidinghis/heridentity. Thisisbecausewithserverassignedspa- tialcrowdsourcing,thelocationoftheuserisneededforasuccessfuldatacollectionand contribution,whiletheidentityoftheuserisnotnecessary. We also show that with spatial crowdsourcing, hiding location information is more challenging than that of location-based services. While the existing location privacy preservingtechniquesaddresstheprivacyconcernsinthecontextoflocation-basedser- vices (LBS) [KSSM, CMA09, CML], they do not apply to the SISSA location privacy problemduetothefollowingtwocharacteristicsofSISSAspatialcrowdsourcing. First, with SISSA spatial crowdsourcing, in order to enable query answering through a coor- dinated effort, all the online workers query the SC-server for their nearby tasks. This is in contrast to LBS which serves millions of users from which any arbitrary subset of them might ask a query at a given time and location. We refer to this characteristic as the all-inclusivity property. Second, each worker queries for all those spatial tasks that are closer to him than to any other worker. Thus, the second property of the SISSA spatial crowdsourcing is that each worker asks a range query from the server which is dependentonthelocationofotherusers. Werefertothispropertyasrangedependency. These two properties, which reveal extra information to the server as compared to the conventionalLBS,resultinmajorprivacyleaks. 6 We devise a privacy-aware framework for SISSA spatial crowdsourcing ([KS11b, KS11a]), which addresses these two major privacy leaks. Our approach, termed PiRi has the two following properties: Partial-inclusivity and Range independence. PiRi is based on the observation that the range queries sent by workers have significant over- laps. Therefore, instead of each worker asking a separate query, only a group of the representative workers ask queries from the server, and share their results with those who have not posed any query. Moreover, instead of each worker submitting a range query, which is dependent on other workers locations, we propose an adjustment tech- nique that adjusts the range query such that the query becomes independent of the oth- ers. Our extensive experiments show that our PiRi approach is 98% more resilient to location-basedattacks,whiletheextracommunicationcostistolerable. 1.3.2 TrustinPrivacy-AwareSISSASpatialCrowdsourcing Second, we study the issue of trust. Trust in spatial crowdsourcing refers to whether the SC-server should trust the contributed data from different users. One might solve this by incorporating a trusted software/hardware module in the user’s mobile device [GCJW, LKZM08, DBFH, GBMR, MPP + ]. While this protects the sensed data from malicious software manipulation before sending it to the server, it does not protect the data from users who either intentionally (i.e., malicious users) or unintentionally (e.g., making mistakes) perform the tasks incorrectly. Moreover, these approaches work for sensor-dependenttasksandcannotbegeneralizedtoanytypeofspatialtasks. Therefore, we propose a Trustworthy Privacy-aware framework for SISSA spatial crowdsourcing [KS], TruPa. Our key idea here is to increase the chance of validity of the collected data by having multiple workers perform a given spatial task redundantly. Here, the intuitive assumption, based on the idea of the wisdom of crowds [Sur04], is that the majority of workers generate correct data. Thus, the answer with the majority 7 voteisverifiedascorrect. Thisindicatesthatinsteadofeachspatialtaskbeassignedtoa particularworker,itisassignedtokworkers,wherekisdefinedbytherequester,which determinestheleveloftrustforthespatialtask. Consequently,thehigherthevalueofk, themorechancethatataskiscorrectlyperformed. Oneofthemainquestionswetryto address in this research effort is how to pick the k workers for a given task. Intuitively, workerswhoareclosertoaspatialtaskarebettercandidatestoperformthetask. Thus, for a given workerw, the goal is to find all those tasks, which havew as their k nearest workers. Thus,ourproblemistosolvethereverseknearestworker(RknW)problemfor everyworker,whilepreservingtheworker’slocationprivacy. Werefertothisproblemas privateallreverseknearestworker(privateaRkNWorPaRkNW)problem. Wepropose TruPa,aclassofthreeapproaches,namelyLPT,BAL,andHBAL,tosolvethisproblem. In all the three approaches, in order to preserve the privacy, workers follow the PiRi approach so that a subset of them (representatives) blur their location in a cloaked area among m-1 other workers, and send cloaked regions to the server. Since the server doesnothavetheexactworkers’locations,onlyanapproximateresultcanberetrieved. While our goal is to minimize the uncertainty in our proposed approaches, we show that the approximation only results in more workers performing one single task; thus, increasingthechanceofdatavalidity. Inourexperiments,weverifytheapplicabilityof the proposed approaches by confirming no missing hits in all the three approaches. We also show that with HBAL approach every worker is assigned to on average 25% extra spatialtasks(falsehits). 8 1.4 Self-Incentivised Worker Constrained Server Assigned(SIWSA)SpatialCrowdsourcing With SIWSA spatial crowdsourcing, a set of workers send their task inquiries to a SC- server. The task inquiry of a worker, which includes his location along with a set of constraints (e.g., a region), is a request that the worker issues to inform the SC-server of his availability to work. Consequently, the SC-server, who receives the location of the workers, assigns to every worker his nearby tasks. In this class of spatial crowd- sourcing, the main optimization goal is to maximize the overall task assignment while conformingtotheconstraintsoftheworkers. Werefertothisproblemasthe maximum taskassignment(MTA)problem. ThesolutiontotheMTAproblemcouldbestraightfor- wardiftheSC-serverhadaglobalknowledgeofboththespatialtasksandtheworkers. However,theSC-serveriscontinuouslyreceivingspatialtasksfromrequestersandalso task inquiries from the workers. Therefore, the SC-server can only maximize the task assignment at every time instance (i.e., local optimization) with no knowledge of the future. We propose three alternative solutions to the MTA problem [KS12]. Our first approach, namely Greedy (GR), follows the local optimization strategy by maximizing thetaskassignmentateverytimeinstance. TheGreedyapproachutilizestheconstraints of the workers to assign to every worker his nearby tasks. Our second approach, called Least Location Entropy Priority (LLEP), improves the Greedy approach by utilizing the entropy of the location. The location entropy heuristic is based on the intuition that spatial tasks are more likely to be performed in future if they are located in areas with higher population of workers (i.e., higher location entropy). Therefore, the LLEP approach improves the overall task assignment by assigning higher priority to spatial 9 tasks located in places with lower location entropy, as they are less likely to be com- pleted in future. With spatial crowdsourcing, since workers should physically travel to a location in order to perform a task, the travel cost of the workers is also an important factor. Therefore, our third approach, referred to as Nearest Neighbor Priority (NNP), incorporates the travel cost of the workers into the task assignment by assigning higher priority to the tasks with lower travel cost. Our extensive experiments on both real and synthetic data show that in comparison with GR, our LLEP approach can improve the numberofassignedtasksbyupto36%,whiletheNNPapproachcanimprovethetravel cost of the workers by up to 41%. Consequently, based on the objective of the applica- tion,eitherLLEPorNNPcanbeappliedtosolvetheMTAproblem. 1.4.1 TrustinSIWSASpatialCrowdsourcing Next, we address the issue of trust in SIWSA spatial crowdsourcing. In real-world, people have reputations. Therefore, we associate a reputation score to every worker, which states the probability that a worker performs a task correctly. We also define a confidence level, determined by the requester, for every spatial task, which states that the answer to the given spatial task is only acceptable if its confidence is higher than a given threshold. Consequently, a spatial task may be needed to be assigned to more thanoneworker. Consideranexample,inwhichaspatialtaskcannotbeassignedtoany individual worker because its confidence is not satisfied by any of them. However, if weselectasubsetofworkerswhoseaggregatereputationsatisfiestheconfidenceofthe task, the task can be performed. We propose a mechanism to aggregate the reputation scoresoftheworkersbycomputingtheprobabilitythatthemajorityofworkersperform the task correctly. This intuition is based on the idea of the wisdom of crowds [Sur04] thatthemajorityoftheworkersaretrusted. Consequently,ourproblemturnsintomax- imizingthenumberofassignedtaskswhilesatisfyingboththeconfidenceofeverytask 10 and the constraints of the workers. We refer to this problem as Maximum Correct Task Assignment(orMCTA)problem. WeprovethattheMCTAproblemisNP-hardbyreductionfrom3Dmatchingprob- lem. This makes the optimal algorithms impractical. Consequently, we propose three approximation algorithms to solve the MCTA problem. Our first solution, dubbed as Greedy (GR), is based on a greedy approach that solves the 3D-matching problem. Our second approach, namely Local Optimization (LO) tries to improve the Greedy approach by performing some local optimization. Finally, our third approach, referred toasHeuristic-basedGreedy(HGR)triestoapplysomeheuristicstoefficientlyimprove the approximation. With our extensive experiments on both real and synthetic data, we evaluatedtheperformanceofourproposedapproaches: GR,LO,andHGR.Ourexten- siveexperimentsshowthatourHGRapproachcanefficientlysolvetheMCTAproblem, whileachievingsimilarresultcomparingtotheoptimizationapproaches. 1.4.2 PrivacyinTrustworthySIWSASpatialCrowdsourcing Finally, we address the issue of privacy in trustworthy SIWSA spatial crowdsourcing. Thechallengeisthateveniftheworkerhideshislocation(asdiscussedinSection1.3.1), since every worker is associated with a reputation score, the SC-server can utilize the worker’s reputation to identify who have performed which task. The reason is that reputationisonlymeaningfulifboundedtoanidentity,whichisnotthecaseinaprivate spatial crowdsourcing. Thus, anonymity is hard to achieve, when both location and reputation should be incorporated into the worker’s query. We define our problem as privacy-aware MCTA (or PAMCTA) problem, in which the goal is to solve the MCTA problemwhileprotectingtheprivacyoftheworkers. We propose a protocol to address the privacy issue in PAMCTA problem. Our idea is to cloak both location and reputation of the workers. Note that cloaking the location 11 byitselfmaynotbesufficienttoprotecttheprivacyoftheworkers,sincethereputation score of the workers may also reveal information about the identity of the workers. Consequently, instead of sending their exact reputation scores 1 , the workers cloak their reputation with a number of workers, and only send the minimum reputation score of thoseworkers. Althoughthisseemssimilartospatialcloaking,aworkercannotsimply cloakhisreputationbycommunicatingwithhisclosebypeers. Thereasonisthatunlike location information, the worker is not trusted to store his reputation score, since the worker can easily increase his reputation score. Consequently, we need a trusted party toauthenticatethereputationscoreoftheworkersbydigitallysigningthem,sothatthey cannotbetamperedbytheworkers. Werefertothistrustedthirdpartyasthecertification authority (CA). However, the problem is that even though the CA signs the reputation score of every worker, the worker is still responsible for cloaking the reputations, for which the worker is not trusted. Alternatively, the CA can store the reputation scores of all the workers. Consequently, before the task inquiry the worker sends a request to the CA with a cloaking parameter k r , where the CA cloaks his reputation with k r -1 other workers, signs the lower-bound of the cloaked range, and sends it to the worker. Thereafter, during the task inquiry, the worker sends a spatial region and the signed lower-bound reputation score instead of his exact location and reputation to the SC- server to protect his privacy. Once the server receives a lower-bound reputation score along with other constraints of the worker, the server can employ any of the proposed approachesthatsolvestheMCTAproblem. 1 It is straightforward that the reputation scores cannot be maintained by the SC-server due to privacy concerns. 12 1.5 RoadMap The remainder of this dissertation is organized as follows. Chapter 2 introduces our taxonomyforspatialcrowdsourcing. Chapter3reviewsrelatedwork. InChapters4and 5, we discuss privacy and trust in the SISSA and SIWSA spatial crowdsourcing frame- works, identify their challenges, and propose a set of approaches which address those challenges. Chapter 6 presents our empirical evaluation of the proposed approaches. Finally, we conclude this dissertation with a summary of our contributions and future workinChapter7. 13 Chapter2 ATaxonomyofSpatialCrowdsourcing Spatialcrowdsourcingopensupanewmechanismforspatialtasks(i.e.,tasksrelatedto alocation)tobeperformedbyhumans. Inthischapter,wedefineataxonomy 1 forspatial crowdsourcing(Figure2.1). First,wedefineasetofterminologiesthatwewilluseinthe restofthedissertation. Thereafter,weclassifyspatialcrowdsourcingbasedonpeople’s motivation. Next, we define two modes of task publishing in spatial crowdsourcing. Finally,wedefinetwowaysforspatialtaskverificationinspatialcrowdsourcing. 2.1 Terminologies Inthissection,weintroducesomepreliminaryterminologiesthatwewilluselater. First, weformallydefineaspatialtask. Definition1(SpatialTask) Aspatialtasktofform<l,d,s,δ>isaquerywithdescrip- tiondtobeansweredatlocationl,wherelisapointinthe2Dspace. Thequeryisasked attimesandwillbeexpiredattimes+δ. Notethatthespatialtasktcanbeperformedbyahumanonlyifthehumanisphysi- callylocatedatlocationl. Forexample,considerascenario,inwhichthespatialtaskis to take a picture from a particular building. This means that the worker needs to physi- callygototheexactlocationofthebuildinginordertotakethepicture. Forsimplicity, 1 Notethateventhoughthetaxonomycanbegeneralizedtoanytypeofcrowdsourcing(i.e.,spatialor non-spatial),inthisdissertationwefocusonthetaxonomyinthecontextofspatialcrowdsourcing. 14 weassumethatalltaskstakethesameamountoftimetofinish. Withthisdefinition,we nowdefinethespatialcrowdsourcedquery. Definition2(SpatialCrowdsourcedQuery) A spatial crowdsourced query (or SC- Query) of form (<t 1 ,t 2 ,...>,k) is a set of spatial tasks and a parameter k issued by arequester,whereeveryspatialtaskt i istobecrowdsourcedk numberoftimes. After receiving the SC-queries from all the requesters, the spatial crowdsourcing server(orSC-server)assignsthespatialtasksoftheseSC-queriestotheavailablework- ers. Inthefollowingweformallydefineaworker. Definition3(Worker) A worker, denoted by w, is a carrier of a mobile device who volunteerstoperformspatialtasks. Aworkercanbeinaneitheronlineorofflinemode. Aworkerisonlinewhenheisreadytoaccepttasks. 2.2 SpatialCrowdsourcingClassification Spatial crowdsourcing is the process of crowdsourcing a set of spatial tasks to a set of workers, which requires the workers to be physically located at that location in order to perform the corresponding task. Spatial crowdsourcing can be classified based on the motivation of the workers into two classes: reward-based and self-incentivised (see Figure2.1). 2.2.1 Reward-basedSpatialCrowdsourcing With reward-based spatial crowdsourcing, every spatial task has a price and workers will receive a certain reward for every spatial task they perform correctly. An example of reward-based spatial crowdsourcing is [XCW09], where every worker can receive a smallrewardforcompletingandsharingasensingtask. 15 Figure 2.1: A taxonomy of spatial crowdsourcing. The focus of this work is shown in grey. 2.2.2 Self-incentivisedSpatialCrowdsourcing This class of spatial-crowdsourcing is for people who are self-incentivised to perform tasks voluntarily. Here, people usually have other incentives rather than receiving a reward such as documenting an event or promoting their cultural, political, or religious views. Anexampleofthisclassincludesaparticipatorysensingcampaign[cyc,ber09], inwhichagroupofpeoplearewillingtovoluntarilyreporttrafficevents(e.g.,accidents) byleveragingtheirsensor-equippedmobiledevices. Ourfocusinthisdissertationison thisclassofspatialcrowdsourcing. 2.3 SpatialTaskPublishingModes With spatial crowdsourcing, tasks can be published in two different modes: Worker SelectedTasks(WST)andServerAssignedTasks(SAT). 16 2.3.1 WorkerSelectedTasks(WST)Mode With this mode, the SC-server publicly publishes the spatial tasks and online workers canchooseanyspatialtaskintheirvicinitywithouttheneedtocoordinatewiththeSC- server. One advantage of this mode is that since the workers can choose any arbitrary taskintheirvicinityautonomously,theydonotneedtorevealtheirlocationstotheSC- serverforeveryassignment. However,onedrawbackofthismodeisthattheSC-server does not have any control over the allocation of spatial tasks. This may result in some spatial tasks never be assigned, while others get assigned redundantly. Another draw- backofWSTisthatworkerschoosetasksbasedontheirownobjectives(e.g.,choosing the k closest spatial tasks to minimize their travel cost), which is not necessarily the ultimate objective of the SC-server (i.e., maximizing the overall task assignment). An exampleoftheWSTmodeis[ASS + 10],whereusersbrowseforavailablespatialtasks, andpicktheonesintheirneighborhood. 2.3.2 ServerAssignedTasks(SAT)Mode In this mode, the SC-server does not publish the spatial tasks to the workers. Instead, anyonlineworkersendshislocationtotheSC-server. TheSC-serverafterreceivingthe locationsofallonlineworkers,assignstoeveryworkerhisclosebytasks. Theadvantage ofSATisthatunlikeWST,theSC-serverhasthebigpicture,andtherefore,canassignto everyworkerhisclosebytaskswhilemaximizingtheoveralltaskassignment(i.e.,global optimization). However, the drawback is that workers should report their locations to the SC-server for every assignment, which can pose a privacy threat. Our focus in this dissertationisonthismodeofspatialcrowdsourcing. WefurtherclassifytheSATmode basedonwho(i.e.,eithertheserverortheworker)enforcesthespatialconstraintduring thetaskassignmentintotwoclasses. Considereachclassinturn. 17 2.3.2.1 ServerSpatialConstrained With this class, the server enforces the spatial constraint on how the tasks should be assigned. That is, the workers send their locations to the server. Thereafter, the server assignstoeveryworkerasetofspatialtaskswhichareclosertothatworkerthantoany otherworkers. WestudythisclassinmoredetailsinChapter4. 2.3.2.2 WorkerSpatialConstrained With this class, the worker enforces the spatial constraint on how the tasks should be assigned. Thatis,everyworkersendsaspatialregiontotheserver,whichstatesthatthe workerisonlywillingtoperformtasksinsidethatregion. Thereafter,theserverassigns to the worker a set of tasks inside his spatial region. Note that with this class, although the worker enforces the constraint on which tasks he is willing to perform, the server is still the one responsible for assigning tasks. Therefore, it can optimally assign tasks whilesatisfyingtheconstraintsoftheworkers. Wefocusonthisclassinmoredetailsin Chapter5. 18 Chapter3 RelatedWork In this chapter, we review three groups of related studies. The first group studies the existingworkoncrowdsourcing. Thesecondgroupfocusesonprivacyrelatedproblems, whereasthefocusofthethirdgroupisontheissueoftrust. 3.1 RelatedWorkinCrowdsourcing As discussed earlier, crowdsourcing has been gathering extensive attention in the research community. A related survey in this area can be found in [DRH11]. With the increasing popularity of crowdsourcing, recently, a set of crowdsourcing services such as Amazon’s Mechanical Turk (AMT) [tur] and CrowdFlower [flo] have emerged which allow requesters to issue tasks that workers can perform for a certain reward. Crowdsourcing has been largely used in a wide range of applications. Examples of suchapplicationsareimagesearch[YKG],naturallanguageannotations[SOJN],video and image annotations [CWCL, SF08], social games [GPD + , vAD08], graph search [PSGM + 11], and search relevance [ARS08]. Moreover, the database community has utilizedcrowdsourcinginrelationalqueryprocessing[FKK + 11,MWMM11,PP11]. In [FKK + 11]arelationalqueryprocessingsystemisproposedthatusescrowdsourcingto answerqueriesthatcannototherwisebeanswered. Despite all the studies on crowdsourcing, only a few studies [ASS + 10, BYD11] have focused on spatial crowdsourcing. In [BYD11], the problem of crowdsourcing location-based queries over Twitter has been studied, which employs a location-based 19 service(e.g.,Foursquare)tofindtheappropriatepeopletoanswerthegivenquery. Even thoughthisworkfocusesonlocation-basedqueries,itdoesnotassigntousersanyspa- tial task, for which the user should go to that location and perform the corresponding task. Instead,itchoosesusersbasedontheirhistoricalFoursquarecheck-ins. Moreover, in [ASS + 10], a crowdsourcing platform with WST mode is proposed, which utilizes locationasaparametertodistributetasksamongworker. Oneclassofspatialcrowdsourcingisknownasparticipatorysensing(PS),inwhich workers form a campaign to perform sensing tasks. Examples of participatory sensing campaigns include [cyc,ber09, HBZ + , MPR], which uses GPS-enabled mobile phones to collect traffic information. Other examples are [CKK + ]. In [CKK + ], a participa- torysensingframeworkwithWSTmodeisproposed. However,themajordrawbackof all the existing work on participatory sensing is that they focus on a single campaign and try to address the challenges specific to that campaign. Another drawback of most existing work on participatory sensing is that they are designed for small campaigns, withasmallnumberofparticipants,andarenotscalabletolargespatialcrowdsourcing applications. Finally, while most existing work on participatory sensing systems focus on a particular application, our work introduces a generalized framework for any type ofspatialcrowdsourcingsystem. Another class of spatial crowdsourcing is known as volunteered geographic infor- mation(orVGI),inwhichthegoalistocreategeographicinformationprovidedvolun- tarily by individuals. Some examples include WikiMapia [wik], OpenStreetMap [ope], andGoogleMapMaker[goo]. Theseprojectsallowtheuserstogeneratetheirowngeo- graphic content, and add it to a pre-built map. For example, a user can add the features of a location, or the events occurred at that location. However, the major difference between VGI and spatial crowdsourcing is that in VGI, users unsolicitedly participate by randomly contributing data, whereas in spatial crowdsourcing, a set of spatial tasks 20 are queried by the requesters, and workers are required to perform those tasks. More- over, with most VGI projects ([goo, wik]), users are not required to physically go to a particularlocationinordertogeneratedatawithrespecttothatlocation. Finally,asthe namesuggests,VGIfallsintotheclassofself-incentivisedspatialcrowdsourcing. 3.2 RelatedWorkinPrivacy Privacy preserving techniques have been studied in the context of location-based ser- vices. One category of techniques [GKK + , YGJKb, KSSM] focuses on evaluating the query in a transformed space, where both the data and query are encrypted, and their spatial relationship is preserved to answer the location-based query. However, many of the transformation techniques fail to guarantee practical query accuracy. Another groupofwell-knowntechniquesinpreservingusers’privacyisthespatialcloakingtech- nique [GKS, CML, BLPW, GL, MCA, KGMP], where the user’s location is blurred in a cloaked area, while satisfying the user’s privacy requirements. An example of spa- tial cloaking is the spatial m-anonymity (SMA) [Swe], where the location of the user is cloaked among m-1 other users. While any of the privacy preserving techniques can be utilized to protect the users’ privacy, in this work without loss of generality we use cloaking techniques due to the following reasons: 1) accuracy and 2) popularity in dif- ferentenvironments(i.e,centralized,distributed,peertopeer). MostoftheSMAtechniquesassumea centralized architecture[BLPW,GL,MCA, KGMP], which utilizes a trusted third party known as location anonymizer. The anonymizer is responsible for first cloaking user’s location in an area, while satisfy- ing the user’s privacy requirements, and then contacting the location-based server. The server computes the result based on the cloaked region rather than the user’s exact location. Thus, the result might contain false hits. The centralized approach has two 21 drawbacks. First, the centralized approach does not scale because the users should repeatedly report their location to the anonymizer. Second, by storing all the users’ locations, the anonymizer becomes a single point for attacks. To address these short- comings, recent techniques [GKS] focus on distributed environments, where the users employsomecomplexdatastructurestoanonymizetheirlocationamongthemselvesvia fixed infrastructures (e.g., base stations). However, because of high update cost, these approachesarenotdesignedforthecaseswhereusersfrequentlymoveorjoin/leavethe system. Therefore, alternative approaches have been proposed [CML] for unstructured peer-to-peer networks where users cloak their location in a region by communicating withtheirneighboringpeerswithoutrequiringashareddatastructure. Inthiswork,we employ the P2P spatial cloaking techniques to hide the user’s location when querying theSC-server. Despite all the studies about privacy in the context of LBS, only a few work [HKH,SBE + ,HS,CKK + ]havestudiedprivacyinparticipatorysensing. In[SBE + ],the concept of participatory privacy regulation is introduced, which allows the participants to decide the limits of disclosure. Moreover, in [HKH, HS], different approaches are proposed, which focus on preserving privacy in a PS campaign during the data contri- bution,ratherthanthecoordinationphase. Thatis,theseapproachesdealwithhowpar- ticipantsuploadthecollecteddatatotheserverwithoutrevealingtheiridentity,whereas our focus is on how to privately assign a set of spatial tasks to each worker. The com- bination of private task assignment and private data contribution forms an end-to-end privacy-aware framework for the spatial crowdsourcing systems. The closest work to our privacy problem is discussed in [CKK + ], in which a privacy-preserving framework in WST mode is proposed, in which the participants collect data in an opportunistic mannerwithouttheneedtocoordinatewiththeserver. Thisindicatesthatundercertain conditions, somedatawillneverbecollected, whereasotherdatamightredundantlybe 22 collected. This is different from our focus, in which the server should direct the task assignmentphasetoanswertherequesters’queries. 3.3 RelatedWorkinTrust In this section, we focus on four groups of studies that are related to the issue of trust. Thefirstgroupstudiestrustinparticipatorysensing. Existingworkinthisareapropose approaches which incorporate a trusted software/hardware into the mobile device. The role of the trusted module is to sign the data sensed by the mobile sensor. The goal is to avoid any malicious software to manipulate the sensed data before sending it to the server[DBFH,GCJW,GBMR,LKZM08,SW]. Whilethisachievestrustatsomelevel, ithastwodrawbacks. Oneisthattheseapproachesworkforsensor-dependenttasksand cannotbegeneralizedtoanytypeofspatialtasks. Themoreimportantdrawbackisthat these approaches detect if malicious software modifies the sensed data, but they do not considerthecaseswhereusersaremaliciousorunintentionallycollectwrongdata(e.g., makingmistakes). Anothergroupofstudiesfocusesonqueryintegrityindataoutsourcing[Sio,YPPK, YGJKa, KHSW]. In data outsourcing, a publisher owns a dataset, but it outsources the data to a service provider, which answers the queries asked by the users. However, the service provider is not trusted, and therefore might not correctly answer the query issuer. The goal here is to guarantee that the query answer is both complete (i.e., no missing data) and correct (i.e., no wrong data). These studies differ from our problem because with spatial crowdsourcing, it is not the query result but the data, generated by theworkers,whichmightnotbecorrect. The third group studies reputation systems in P2P networks [GJA, OLT]. In this groupofwork,membersofon-linecommunitieswithnopriorknowledgeofeachother 23 use the feedback from their peers to assess the trustworthiness of the peers in the com- munity. This is related to our work in a sense that people are more interested in using data from peers with higher reputation. However, there are two new challenges with our problem. First, the spatial proximity is not an issue with P2P reputation systems. That is, a peer is usually interested in downloading data from a peer which is trusted, but not necessarily spatially close. Second, unlike private spatial crowdsourcing, with P2Preputationsystems,theidentitiesofpeersareknownandhencetheirreputationsare publiclystoredandadvertised. Finally, the fourth group of studies focuses on the worker quality in crowdsourc- ing. As already discussed, one of the major issues in crowdsourcing is how to verify the validity of the answer provided by the workers. In [IPW10], a quality management approachforAMTisproposed,whichimprovesthequalityoftheresultbydistinguish- ing spam workers from biased workers. Moreover, a probabilistic model is discussed in [RYZ + 10], which infers the error rates of crowdsourcing workers. In [GAMS10], a set of techniques is explored which predicts the truth from a set of conflicting views. Furthermore, in [VVE11] an analysis of crowdsourcing results is performed with the objective of reducing spam and increasing accuracy. Finally, the jury selection prob- lem in the crowd is studied in [CST12], where the goal is to select a subset of crowd underalimitedbudget,whoseaggregatedwisdomhasthelowestprobabilityofdrawing a wrong answer. While the focus of most of these studies is on improving the result, ourobjectiveistoassignasetofspatialtaskstoasetofworkerswhileguaranteeingthe correct answer with some confidence. Moreover, we focus on spatial crowdsourcing, which requires the workers to physically go the location of the spatial tasks to perform thetask. 24 Chapter4 Self-IncentivisedServerConstrained ServerAssigned(SISSA)Spatial Crowdsourcing AsdiscussedinSection2,withserverspatialconstrainedSATmode,theserverenforces the spatial constraint on how to assign tasks. Consequently, every worker can send his location to the SC-server. The server then assigns to each worker all the spatial tasks which are closer to him than to any other worker. We already discussed that the two major challenges in any spatial crowdsourcing system are protecting workers’ privacy and guaranteeing the validity of the task results. In this chapter, we focus on the issues of privacy and trust in a Self-Incentivised Server constrained Server Assigned (SISSA) spatial crowdsourcing. We first review a set of preliminaries (Section 4.1). Thereafter, inSection4.2wefocusonprivacyissues. Finally,wefocusontrustinSection4.3. 4.1 Preliminaries 4.1.1 Background As already discussed, one of the well-known techniques in preserving location privacy isthespatialm-anonymity,whichisdefinedasfollows: 25 Definition4(Spatialm-anonymity(SMA)) GivenasetofusersU,letA∈U beaset of m users spatially enclosed in an anonymizing spatial region (m-ASR or ASR). A user u ∈ A forms a m-anonymous query, if the probability of distinguishing him from other usersinAdoesnotexceed1/m,wheremistherequireddegreeofanonymity. Many existing approaches focus on spatial cloaking algorithms in centralized or distributed environments. However, [CML] developed a spatial cloaking algorithm for a P2P environment, where no fixed infrastructure or centralized/distributed servers are available. We start by using the P2P SMA to address the privacy problem in spatial crowdsourcing. Here,weprovideabackgroundontheP2PSMAapproach. The idea of P2P SMA approach (see [CML]) is that a user communicates with his neighboring peers via multi-hop routing to find at least m-1 other peers. Each user has twoprivacyrequirements: m,anda. mistheminimumnumberofusersintheASR,and aistheminimumareaofthecloakedregion. Aftersatisfyingthem-anonymityrequire- ment, the user extends the ASR to a, so that the minimum area privacy requirement is also satisfied. Consequently, the user sends his spatial query along with the ASR to the server. Theserverisequippedwithaprivacy-awarequeryprocessor,whichcomputesa minimal answer set that contains the user’s exact result. After receiving the answer set fromtheserver,theuserrefinestheanswersettoretrievetheexactresult. Figure 4.1 illustrates an example of a privacy-aware range query, where user U 1 issues a query with m = 4 and a radius of 3 (i.e., r = 3). He first collaborates with his neighbors through multi-hop routing to form the ASR with 3 other peers. After sending the ASR (solid lined rectangle) along with the range query to the server, the query processor determines the minimal answer set (i.e., the answer to the range query for every point in the ASR). The reason is that the server does not know which of the 4 users asked the query. According to [CML], the minimal answer set includes all the objects inside the region as well as all the objects within the radius of 3 from every 26 pointontheedgesoftheASR(i.e.,alltheobjectsinsidethedottedlinerectangle). This guarantees no missing hits, but probably includes some false hits. Consequently, once U 1 receivestheanswerset,hecanrefineittoretrievealltheobjectswithintheradiusof 3fromhislocation. U 1 U 2 U 3 U 4 U 5 U 6 U 7 U 8 r Figure4.1: Illustratinganexampleofprivacy-awarerangequery 4.1.2 SystemModel In this section, we first describe our privacy threat model, and then discuss our system architecturewhichconsistsofthreeentities,workers,requestersandtheSC-server. Ourassumptionisthatworkerstrusttheirlocalpeerstosharetheirlocationinforma- tion,anddonotrevealanysensitiveinformationabouttheirpeers. However,theydonot trust the SC-server. We refer to any such entity as adversary. Moreover, the adversary, if needed, can obtain the locations of all workers [GZPK]. The reason is that workers oftenissuetheirqueriesfromthesamelocations(office,home),whichcanbeidentified throughphysicalobservation,triangulation,etc. Ingeneral,sinceitisdifficulttomodel theexactknowledgeavailabletotheadversary,thisisanecessaryassumptiontoguaran- tee that the privacy preserving technique is secure under the most pessimistic scenario. That is, even though the workers’ locations might be known to the adversary, it should not pose a threat (i.e., location-based attack) to the system if the system can success- fully disassociate the queries from their locations. The adversary is also aware of the 27 anonymization technique which is used by the workers. However, each worker deter- mines his own privacy level, which is only available to himself. Moreover, each user mustregisterwiththeserver,receiveapassword,andbecomearegisteredworkerbefore communicating with other workers. In order to guarantee the pseudonymity of work- ers’locationinformation,eachqueryisassignedwithauniquepseudonymousidentity, which is totally unrelated to the worker’s personal identity. Finally, we make no guar- antees if the answer to the spatial task contain any sensitive information (e.g., a photo containingsomeone’shome). Our SC-server which contains the list of spatial tasks is equipped with a privacy- awarequeryprocessor,whichprocessesthequeriesissuedbytheworkers. Eachworker can determine his privacy level, by specifying two parameters: m, and A. m determines the m-anonymity, and A specifies the minimum resolution of the cloaked region. Each worker is equipped with two wireless network interface cards. One is dedicated to the communicationwiththeSC-serverthroughabasestationorwirelessmodem. Theother one is dedicated to the P2P communication among the peers through a wireless LAN, e.g.,BluetoothorIEEE802.11. Also,eachworkerisequippedwithapositioningdevice, e.g.,GPS,whichcandetermineitscurrentlocation. 4.2 PrivacyinSISSASpatialCrowdsourcing Consider a scenario where the goal of a requester is to collect pictures/videos from the anti-government riots at different locations of a city with the coordinated effort of the workers. The requester sends his query to the SC-server. Moreover, each worker w sendsaquery(akainquiry)totheserverforthesetofclosebyspatialtasks.Thesearethe spatialtasksthatareclosertowthantoanyotherworker. However,wmaynotbewilling todisclosehisidentityduetosafetyreasons. Analternativeisthatw sendshisqueryto 28 atrustedserver,knownasanonymizer. Theanonymizerremovestheuser’sIDfromthe query and forwards the query to the server. However, the server requires w’s location informationtoanswerthequery. Duetothestrongcorrelationbetweenpeopleandtheir movements (see [GRB]), a malicious server can identify w by associating his location informationtow. Forexample,ifw issuesthequeryfromhishome,hisidentitycanbe easily revealed by linking the home address tow using the online white page services. Thus, the server can identify the worker who issued the query by associating the query to the location from which the query is issued. We refer to this process as a location- basedattack. Ourgoalinthisworkistoprotecttheworkersfromlocation-basedattacks bydisassociatinganqueryfromitslocation. 4.2.1 FormalProblemDefinition Intuitively,inordertoperformaspatialtask,wearemoreinterestedintheworkerwith closer Euclidean distance 1 to the corresponding task. Hence, we define the notion of RN-workerset. Definition5 (RN-worker set) Given a worker w i , we refer to reverse nearest (RN) workersetofworkerw i asasetofallspatialtaskstowhichw i istheirclosestworker. A major focus with the spatial crowdsourcing system is to design a framework in whicheachworkerisassignedtoasetofclosebyspatialtasks. Definition6(ParticipantAssignment(PA)) Given a set of spatial tasks T, and a set ofworkersW,theParticipantAssignment(PA)problemistoidentifytheRN-workerset ofeveryworker. (i.e.,assigningtoeachworkerw∈W anyspatialtaskt∈T,suchthat tisclosertow thantoanyotherworkerinW). 1 Otherdistancemetrics,suchasnetworkdistance,canbeincorporatedaswell. 29 Notethatforsimplification, wedefinetheassignmentproblemforagivensnapshot of time and location. That is, we do not assume the workers move during the assign- ment. Thisseemsintuitive,sinceworkersusuallyplantheirpathsfromtheirresidential location (e.g., home, office) before starting their movement. Moreover, workers are the currentactiveusersofthesystemwillingtoparticipateintheprocess. InordertosolvethePAproblem,astraightforwardsolutionisthateachworkersends his location to the server. The server then assigns to each worker his RN-worker set by computing the Voronoi diagram of the workers. Figure 4.2 depicts such scenario. The formaldefinitionoftheVoronoidiagramisasfollows. Definition7(VoronoiDiagram) Given an environment E(W) ∈ R 2 , with W as the setofworkers,theVoronoidiagramofW isapartitioningofE intoasetofcells,where each cellV w belongs to a workerw, and any pointp ∈ E in the cellV w is closer tow than to any other worker in the environment. Here, the closeness between two points is definedintermsofEuclideandistance. a)SetofworkersW andsetoftasksT b)VoronoiDiagramofW Figure4.2: Illustratingtheassignmentofspatialtaskstotheworkers Once the server computes the Voronoi diagram of the workers, it forwards to each workerw,allthespatialtaskslyinginsidethecorrespondingcellV w . However,inmany scenarios the server is not trusted, and therefore, a worker may not be willing to reveal hisidentitytotheserver. Eveniftheworkerhideshisidentityfromtheserver(i.e.,only 30 revealshislocation),duetothestrongcorrelationbetweenpeopleandtheirmovements ([GRB]), a worker can still be identified by his location. In the following, we first formallydefineourprivacyattack. Thereafter,wedefinetheprivacyproblem. Definition8(Location-basedattack) Alocation-basedattackistoidentifytheworker whoissuesanquerybyassociatingthequerytoquerylocation(i.e.,locationfromwhich thequeryisissued). Definition9(Privacy-AwareParticipatoryAssignment(PAPA)) The Privacy- Aware Participatory Assignment (PAPA) problem is a variation of the PA problem (Definition 6), in which the goal is to protect workers’ identity from location-based attacks. 4.2.2 PiRi As already discussed, to solve the PAPA problem, workers cannot share their locations with the untrustworthy server for the assignment of spatial tasks. Therefore, the cen- tralizedsolutiontothePAproblemisnolongerapplicabletothePAPAproblem. Thus, one baseline solution is that workers communicate among their peers to compute their Voronoi cell. Thereafter, each worker performs a privacy-aware range query to retrieve all the spatial tasks inside his Voronoi cell. The worker asks such query by applying the P2P SMA technique (see Section 4.1.1). That is, the worker blurs his location in an ASR among m-1 other peers, and sends the ASR, along with a radiusr as the range query to the server. Note that the radius r represents the radius of the smallest circle which contains the worker’s Voronoi cell. Consequently, the server responds to each workerbysendingtohimallthespatialtaskswithrespecttotherangequerysubmitted byanypointinsidetheworker’sASR.Finally,theworkerobtainshisRN-workersetby refiningtheretrievedresultfromtheserver. 31 However, this baseline approach has major privacy leaks, which originates from the two characteristics of the server constrained assignment: all-inclusivity and range dependency. These properties give enough information to the server with which the server can easily identify each worker by linking his query to the query location. This gets even easier, if the server knows the exact locations of all the workers. The reason is that on one hand the server receives a set of query regions, and on the other hand, the server has the query locations. Each query region overlaps with a set of workers, one of which have issued the query. Therefore, the server can associate the query to its location by solving a matching problem between these two sets of data. As a result, the more information the server has, the more correct matches it can find between the queries and query locations. Consequently, the baseline approach is not applicable to ourPAPAproblem. Our proposed approach, termed PiRi, overcomes the drawbacks of the baseline approach by preventing these privacy leaks. PiRi has the two following properties: partial-inclusivity and range independence. The intuition is to avoid sharing any extra information with the server, as compared to conventional LBS, such that the adver- sary cannot use the gathered data in the server to compromise the system. Hence, our algorithm has two major steps. The first step is Query Formation, where each worker computes his Voronoi cell in a distributed fashion, and forms his ASR. In this step, an adjustmenttechniqueisappliedtothequery,whichguaranteestherangeindependency. InSection4.2.2.1,weexplainthisstepinmoredetails. Inthesecondstep,QuerySelec- tion (Section 4.2.2.2), a voting mechanism is devised to select the set of representative workers,whoseASRsshouldbesentouttotheserver. Thesequeryresultswilllaterbe sharedwiththerestoftheworkers. Thissteppreventstheall-inclusivityleak. 32 4.2.2.1 QueryFormation To solve the PAPA problem, a set of spatial tasks those inside his Voronoi cell (i.e., his RN-worker set), should be assigned to each worker. This indicates that each worker should first compute his Voronoi cell to form the spatial range query. Thereafter, by employing the P2P SMA technique, the worker forms a privacy-aware range query. However, the problem is that the range query is dependent on the size of the worker’s Voronoi cell (range dependency), which is a potential for information leak. Therefore, at this phase, we adjust the size of the range query, such that the privacy-awareness of therangequeryisguaranteed. Inthefollowing,wefirstbrieflyexplainhowtheVoronoi cellofeveryworkeriscomputed. Subsequently,weexplainthecloakingstep. Figure4.3: Illustratinganexampleofqueryformationforasingleworker In order to compute the Voronoi diagram, any distributed algorithm for Voronoi diagram computation can be applied [AINX, BD, SS]. In this work, we employ the technique from [AINX], called Completely Cooperative (CC). In order to compute the Voronoi cell of a worker, the CC approach has two major steps: 1) finding the Voronoi neighbors for the worker, and 2) computing the boundary of the cell by solving the geometric intersection of bisectors between the worker and the neighbors. The idea behindtheCCapproachisthatinsteadofworkersendingoutqueriestothenetworkfor discoveringtheirVoronoineighbors,theneighborsinformeachotheraboutanypotential 33 Voronoi neighbor. Once the Voronoi neighbors of a worker are identified, the worker computeshisVoronoicellbyintersectingthebisectorsoftheneighbors. Figure4.4: Illustratinganexampleofrangedependency Figure 4.5 depicts the pseudo-code for the query formation step. After the Voronoi cell computation, every worker w forms a spatial range query, which contains the Voronoicell,alongwithanASRtosendouttotheserver. Thatis,w computesaradius r w ,whichistheradiusofthesmallestenclosingcircleofhisVoronoicell(line2). This forms the spatial range query. Figure 4.3 depicts an example of the query formation for the worker w 1 of Figure 4.2, where w 1 computes his Voronoi cell, and the radius r 1 , respectively. Thus, the circle with radius r 1 is the smallest enclosing circle of w 1 ’s Voronoi cell. Next, as stated in line 4 of Figure 4.5, the worker, using the technique explained in Section 4.1.1, forms an ASR, in which his location is blurred amongm-1 otherpeers(withm = 3,thesolidlinedrectangleinFigure4.3). Consequently,theworkerw i cansendtheASRalongwiththeradiusr i totheserver to retrieve all the tasks, which lay inside his Voronoi cell. The problem is that each of themworkersintheASR,termedlocalpeer,hasadifferentVoronoisize,andtherefore, adifferentr isassociated witheach. Considering an extremescenario wherethe server knows the locations of the workers, it also knows their Voronoi cells and therefore, the radiusr for each of them. Consequently, the server can easily identify the query issuer (i.e, the set of all workers in the ASR with radiusr). Figure 4.4 depicts such scenario, 34 QueryFormation (workerw i ) 01. letV i =Voronoicellofw i ; 02. letr i =radiusofsmallestenclosingcircleofV i ; 03. letr max =0; 04. letA i =m-ASRofw i ; 05. foreachpeerw j insideA i 06. letr j =radiusofsmallestenclosingcircleofV j ; 07. letr max =max(r max ;r j ); 08. return(A i ,r max ); Figure4.5: Queryformationalgorithm where w 1 (black-filled circle) cloaks himself with w 2 , and sends the ASR along with radius r 1 to the server (see the size of r 1 as compared to r 2 ). The server, knowing the locationoftheworkers,andhencetheirVoronoicells(i.e.,r 1 ,andr 2 ),relatesthequery with radiusr 1 to its query location (i.e., the location of a worker with the Voronoi cell ofthesameradius). In order to avoid the range dependency leak, each workerw i should not only cloak his location among m-1 other peers, but also cloak his range query among that of the otherm-1 peers. In other words, instead of forming his range query with radiusr i , the workerformshisquerywithradiusr max ,wherer max isthemaximumradiusamongall the m peers inside the ASR (lines 5-8 in Figure 4.5). This guarantees the m-anonymity atalltimes. InFigure4.3,R 1 (thedottedlinerectangle)showsthequeryregionformed byr max . 4.2.2.2 QuerySelection Onceallworkersformedtheirqueryregion,theycansenditouttotheserver. Sincethe server is receiving queries from all workers, it can utilize the gathered information (i.e, query regions) from all workers to attack the system (all-inclusivity leak). Figure 4.6 illustrates such scenario. For simplicity, we assume that only workers w 1::3 participate in the system. The figure shows that w 1 cloaks himself with w 2 . Similarly, w 2 forms 35 a cloaked region withw 1 . Consequently, bothw 1 andw 2 form identical query regions. Thefigurealsodepictsthatw 3 cloakshimselfwithw 1 . Accordingly,theservercaneas- ilyidentifyw 3 byrelatingittothequeryregionR 3 ,sincew 3 appearsonlyonce(i.e.,R 3 ) in all the three submitted query regions to the server. This indicates the more workers submitqueriestotheserver,themoreinformationserverhastoinfertheworkers’iden- tities. Ouralgorithmattemptstopreventthisleakbyminimizingthenumberofqueries submittedtotheserver,whileassigningtheclosebytaskstoeverysingleworker. Figure4.6: Illustratinganexampleofall-inclusivityleak In order to address this issue, we observe that there is a large overlap among the queryregionsoftheworkers. Therefore,byreceivingtheresultfromtheserver,onecan share his result with all the peers whose Voronoi cells lay completely inside his query region. The question is how to select the group of representative workers. To answer thisquestion,weshouldsolvethefollowingoptimizationproblem. Definition10(V-Cover) GivenasetofworkersW ,andasetofspatialtasksT,letR and V be the set of query regions and Voronoi cells for the set W, respectively, where R i correspondstothequeryregionforuserw i ,andV i istheVoronoicellforw i . TheV- CoverproblemfindsasetC ⊆RthatcoverstheentiresetV withminimumcardinality. WenowprovethattheV-CoverproblemisNP-hardbyreductionfromtheminimum setcoverproblem. First,westatetheminimumsetcoverproblem. 36 Definition11(MinimumSetCover) LetS={S 1 ,S 2 ,...,S t }beacollectionoffinitesets, S i ’s,whoseelementsaredrawnfromauniversalsetU (i.e., ∪ t i=1 S i =U). Minimumset coverfindsasetC withminimumcardinalitywhereC ⊆S and ∪ S j ∈C S j =U. For example, assume U={1,2,3,4,5} and S={S 1 ,S 2 ,S 3 ,S 4 }, where S 1 ={1,2,3}, S 2 ={2,4},S 3 ={3,4},andS 4 ={4,5}. TheminimumsetcoverisC={S 1 ,S 4 }. Themin- imumsetcoverproblemisNP-hard. Consequently,thefollowingtheoremisentailed. Theorem1 TheV-CoverproblemisNP-hard. Proof1 Weprovethetheorembyprovidingapolynomialtimereductionfromminimum setcoverproblem. Towardsthatend,weprovethatgivenaninstanceoftheminimumset cover problem, denoted byI s , there exists an instance of the V-Cover problem, denoted by I v , such that the solution to I v can be converted to the solution of I s in polynomial time. ConsideragivenI s havingU astheuniversalset,S={S 1 ,S 2 ,...,S t }whereS i ⊆U, and ∪ t i=1 S i =U. TosolveI s ,weselectasetC ⊆S,withminimumcardinality,tocover alltheelementsinU. Correspondingly,tosolveI v ,welookforaC ⊆R,withminimum cardinality,suchthatalltheVoronoicellsinV arecoveredwiththequeryregionsinC. Therefore, we propose the following mapping fromI s components toI v components to reduceI s toI v . Suppose the universal setU corresponds to the set of Voronoi cellsV. The intuition behind this mapping is that with I s we want to cover each element in U andaccordinglyweaimtocoveralltheVoronoicellsinV. EachS i ∈S ismappedtoa queryregionR i ∈RasselectionofsetsinI s correspondstoselectingthequeryregions inI v . Wenextexplaineachmappingindetail. For mappingS toR, we assume there exists a query regionR i ∈ R corresponding to S i ∈ S. Next, we assume a Voronoi cell V j ∈ V exists corresponding to U j ∈ U. V j is covered by R i ∈ R (i.e., it falls completely inside R i ) if and only if U j ∈ S i . It is easy to observe that if the answer toI v is the setW, the answer toI s will be the set C={S i |R i ∈C}. Thiscompletestheproof. 37 According to the above theorem, we can employ any heuristic that solves the set cover problem to solve the V-Cover problem. One of the well-known approaches for solving the set cover problem is a greedy algorithm which is based on the following heuristic: at each stage of the algorithm, pick the set with the largest number of uncov- ered elements [KT05a]. Consequently, in order to solve the V-Cover problem, during each step of the algorithm, we should pick a representative worker whose query region coversthelargestnumberofuncoveredVoronoicellsfromV. However,thisapproachis applicableonlyinacentralizedstructure,whereaglobalknowledgeoftheenvironment is available. In the V-Cover problem with a distributed architecture, each worker only has knowledge about his local peers and their Voronoi cells. Therefore, making glob- ally optimal choices (i.e., picking the query region, which covers the largest number of Voronoicells)ateverystepisnontrivial,andalsocostly. To address this issue, our goal is to extend the greedy heuristic to support the dis- tributed architecture. Hence, we implement a voting mechanism, so that the workers agree locally among their neighbors on selecting a set of representatives. That is, each worker picks a peer from the set of his local peers, based on the score value associated to them. Intuitively, the score value captures how significant a worker is in represent- ing other peers, which is defined based on 1) the number of local peers covered 2 by his query region (m), and 2) the number of query regions covering each of his local peers. Accordingto(1),aworkerwithlargequeryregion(i.e.,largem)isassignedwithahigh score value. However, as (2) suggests, the number of query regions that cover each of thoselocalpeersalsoaffectsthescorevalue. ConsidertheexampleofFigure4.6,where the query region of each of the three workers covers two peers. However, as the figure 2 Henceforth,forbrevityweusetheexpressioncoveringapeer torefertocoveringtheVoronoicellof thatpeer. 38 shows,R 3 is the only query region coveringw 3 , and therefore a higher score should be assignedtow 3 . The pseudo-code of our algorithm is shown in Figure 4.7. We explain the details of the voting mechanism with the example of Figure 4.2, where only the workersw 1::8 are the active users. The voting mechanism starts by assigning the score values to the workers. The score value for each worker is determined by his local peers based on the importance of the worker to any of them. Consequently, each worker computes the final score by summing up all the partial scores he receives from the local peers. The algorithm starts by each worker sending his cloaking parameter m to all his local peers (lines 7-8 of Figure 4.7). Table 4.1 depicts the valuem for each workerw i along with thesetofpeersthathisqueryregionR i contains. Forexample,w 1 formsa3-anonymous query,andhisqueryregion,R 1 ,containsw 1 ,w 2 ,andw 3 . Therefore,w 1 sendsthevalue m = 3 to both w 2 and w 3 . Accordingly, every worker receives the parameter m with respect to all query regions in which he resides (termed container regions). Table 4.2 illustratesthecontainerregionsforeveryworker(e.g.,R 1 ,R 2 ,andR 7 arethecontainer regionsforw 2 ). Subsequently, each worker assigns a partial score value to all his container regions, basedontheirmvalue,suchthatregionswithlargermvaluesareassignedwithhigher scores. Note that the sum of the scores that each worker gives to his container regions is normalized to 100 (lines 9-15 of Figure 4.7). For example, as Table 3 4.2 depicts,w 3 assigns score value of 50 to both of his container regions R 1 and R 3 , since both have m = 3. Thereafter, each worker computes his final score by summing up all partial scores he receives from his local peers (lines 16-17 of Figure 4.7). The final scores of the users are shown in the last column of Table 4.1. As the table shows, w 1 receives 3 Forsimplifications,scoresarerounded,andtherefore,sumofthescoresmightnotaddupto100. 39 QuerySelection (workerw i ) 01. letR i =queryregionofw i ; 02. letCR i =setofcontainerregionsofw i ; 03. letsum m =0; 04. letscore i =0; 05. letmax-score i =0; 06. letrep =null; 07. foreachpeerw j insideR i 08. sendm i tow j ; 09. foreachcontainerregionR j ∈CR i 10. letw j =owneroftheregionR j ; 11. letm j =cloakingparameterforw j ; 12. sum m =sum m +m j ; 13. foreachcontainerregionR j ∈CR i 14. letscore i j =m j /sum m *100; 15. sendscore i j tow j ; 16. foreachpeerw j insideR i 17. letscore i =score i +score j i ; 18. foreachpeerw j insideR i 19. sendscore i tow j ; 20. foreachcontainerregionR j ∈CR i 21. ifscore j >max-score i 22. max-score i =score j ; 23. rep=w j ; 24. return(rep); Figure4.7: Queryselectionalgorithm the scores{25,37,50} from his peers{w 1 ,w 2 ,w 3 } respectively, and therefore his final scoreaddsupto112. Finally, every worker sends his final score to all his local peers. By receiving the finalscoresofthecontainerregions,eachworkerw i votesforthecontainerregionwith the maximum score (lines 18-23 of Figure 4.7). Note that for container regions with equal scores, as tie breaker, the worker randomly votes for one. The voting results is shown in Table 4.3. For example, workerw 4 choosesR 3 among his container regions, since it has the maximum score. According to Table 4.3, the final representatives are {w 3 ,w 6 ,w 7 }. Thisindicatesthatonlythreeoftheworkersshouldquerytheserver. We alsorefertotheworkerwhovotedforaregion,asthe votingworker ofthatregion. For the same example,w 5 andw 6 are the voting workers ofR 6 . During the final process of 40 Worker QueryRegion m Workers Score w 1 R 1 3 {w 1 ,w 2 ,w 3 } 25+37+50=112 w 2 R 2 2 {w 2 ,w 8 } 25+28=53 w 3 R 3 3 {w 1 ,w 3 ,w 4 } 25+50+50=125 w 4 R 4 3 {w 1 ,w 4 ,w 5 } 25+50+37=112 w 5 R 5 2 {w 5 ,w 6 } 25+40=65 w 6 R 6 3 {w 1 ,w 5 ,w 6 } 25+37+60=122 w 7 R 7 3 {w 2 ,w 7 ,w 8 } 37+60+42=139 w 8 R 8 2 {w 7 ,w 8 } 40+28=68 Table4.1: Scoreassignmenttothequeryregions Worker ContainerRegions ScoreDistribution w 1 {R 1 ,R 3 ,R 4 ,R 6 } {25,25,25,25} w 2 {R 1 ,R 2 ,R 7 } {37,25,37} w 3 {R 1 ,R 3 } {50,50} w 4 {R 3 ,R 4 } {50,50} w 5 {R 4 ,R 5 ,R 6 } {37,25,37} w 6 {R 5 ,R 6 } {40,60} w 7 {R 7 ,R 8 } {60,40} w 8 {R 2 ,R 7 ,R 8 } {28,42,28} Table4.2: Scoredistributionamongthecontainerregions voting,eachvotingworkerw i informsthecorrespondingelectedworkerbysendinghim amessage,whichalsoincludeshisradiusr i . Thereasonforsendingtheradiusr i isthat once the representative receives the result from the server, he would know which part oftheresultsetbelongstow i (therepresentativealreadyknowsw i ’slocationduringthe SMAprocess). Oncethequeryisissued,therepresentativefilterstheresultonbehalfof everyvotingworker,andsendsthemthecorrespondingresult. 41 User Vote w 1 Max{R 1 (112),R 3 (125),R 4 (112),R 6 (122)}:R 3 (125) w 2 Max{R 1 (112),R 2 (53),R 7 (139)}:R 7 (139) w 3 Max{R 1 (112),R 3 (125)}:R 3 (125) w 4 Max{R 3 (125),R 4 (112)}:R 3 (125) w 5 Max{R 4 (112),R 5 (65),R 6 (122)}:R 6 (122) w 6 Max{R 5 (65),R 6 (122)}:R 6 (122) w 7 Max{R 7 (139),R 8 (68)}:R 7 (139) w 8 Max{R 2 (53),R 7 (139),R 8 (68)}:R 7 (139) Table4.3: Votingresult 4.3 TrustinPrivacy-AwareSISSASpatialCrowdsourc- ing FollowingourrunningapplicationscenarioofSection4.2,thegoaloftheSC-serverwas to collect pictures/videos from the anti-government riots at different locations of a city. Once the data is collected and contributed, an important issue is how to verify that the collected pictures/videos are valid. In general, while the main objective of the spatial crowdsourcingsystemsistoleveragemobileuserstocollectdata,verifyingthecorrect- ness of the collected data by these users becomes critical. In this section, we define the problem of trust in SISSA spatial crowdsourcing systems while preserving the privacy of the workers. In the following, we first formally define the problem. Thereafter, we proposeTruPa,aclassofthreesolutionstothisproblem. 4.3.1 FormalProblemDefinition Unlike the PiRi approach where each spatial task is assigned to one worker, to ensure trust,weneedmultipleworkersassignedtoeachspatialtask. Thatis,everyspatialtask inthesubmittedSC-query(Section2.1)isassociatedwithatrustlevelk. Hence,inorder toextendPiRitoincorporateatrustlevelofk,everyspatialtaskshouldbeassignedtoat 42 least k workers. Thus, we need to define the notions of kN-worker set and RkN-worker set. Definition12 (kN-worker set) Given a setT of spatial tasks, and a setW of workers, lett i ∈T. Werefertok-nearest(kN)workersetoftaskt i (kN-worker i ⊂W)asasetof k workers(i.e.,|kN-worker i | =k),ofwhicheveryworkerinkN-worker i correspondsto oneofthek closestworkersoft i . Figure4.8adepictsthe2N-workersetfort 1 (i.e.,k = 2), wheretheelementsofthe setareshownwithhollowsquares(2N-worker 1 ={w 1 ,w 4 }). Definition13 (RkN-workerset)Givenaworkerw i ,werefertoreversek-nearest(RkN) worker set of workerw i (RkN-worker i ⊂ T) as a set of all spatial tasks to whichw i is oneoftheirk closestworkers. Figure 4.8b depicts the R2N-worker set (i.e., k = 2) forw 1 , where the elements of thesetareshownwithhollowcircles(R2N-worker 1 ={t 1 ,t 4 }). a)kN-workersetforspatialtaskt 1 b)RkN-workersetforworkerw 1 Figure4.8: IllustratingexamplesofkN-workerandRkN-workersetswithk = 2 Definition14 (aRknRproblem)GivenasetofworkersW,asetofspatialtasksT,and a trust value k, the problem is to find the RkN-worker set of every worker. We refer to thisproblemasallreversek-nearestworker(aRknW)problem. 43 Worker R2N-workerset w 1 {t 1 ,t 4 } w 2 {t 3 ,t 7 } w 3 {t 3 ,t 4 ,t 5 } w 4 {t 1 ,t 2 ,t 5 ,t 6 } w 5 {t 2 ,t 6 } w 6 {t 8 ,t 9 } w 7 {t 8 ,t 9 } w 8 {t 7 } Table4.4: R2N-workersetofsetW The aRknW problem can be restated as a special case of the bichromatic reverse k- nearestneighborproblem,inwhichthereversek-nearestneighborofallworkersshould be retrieved. Therefore, since bRkNN should be solved for every worker, the problem is analogous to solving kNN for every spatial task, which is a less complex problem. Therefore,astraightforwardsolutionisthateachworkersendshislocationtotheserver. The server then computes the k closest workers of every spatial task (i.e., kN-worker set), inverts the result, and sends to every worker his bRkNN result (i.e., RkN-worker set). Note that the queries are issued from the workers. Therefore, from the view of the worker, the bRkNN problem should be solved. However, due to the nature of the aRknWproblem,sinceallworkersquerytheserver,theservercansolvetheaRknWby solving kNN for every spatial task. Consider the example of Figure 4.8 fork = 2. The solution to aRknW problem is shown in Table 4.4, which shows the R2N-worker set of everyworker. To defend against the location-based attack, we cannot reduce aRknW to multiple kNNs in server. Hence, we need to solve the aRknW problem privately to ensure both trustandprivacy. Definition15(PrivateallReversek-nearestWorker(PaRknW)) The private all reverse k-nearest worker (PaRknW) problem is a variation of the aRknW problem 44 (Definition 14), in which the goal is to protect workers’ identity from location-based attacks. Figure4.9: IllustratinganexampleofPaRknWproblem In order to solve the PaRknW problem, the server should compute the k closest workersforeveryspatialtask,forwhichitneedstheworkerslocations. However,work- ers cannot share their locations with the untrustworthy server. Thus, instead of sending theirexactlocation,workersfollowthePiRiapproach(Section4.2.2)toblurtheirloca- tion in a cloaked area amongm-1 other workers, from which a subset of them (i.e., by utilizing the voting mechanism) are selected to send cloaked regions to the server. We cannotdirectlyapplythePiRiapproachtosolvethePaRknRproblem,sinceinthePiRi approachweonlyassignoneworkertoeachspatialtask. 4 Thus,weonlyutilizePiRifor preserving privacy. Note that unlike the PiRi approach where workers should compute their Voronoi cell to form their query region (i.e., querying all the spatial tasks inside their Voronoi cell), to solve the PaRknW problem, computing the order-k Voronoi cell is not practical for the workers. Therefore, the query formation step is only partly per- formed at the worker-side. This means that the representative workers only send their ASRs(insteadoftheirqueryregions)totheserver. 4 ThedirectadaptationofPiRirequirestheorder-kVoronoicellcomputation,whichiscomputationally expensive. 45 Figure 4.9 illustrates a set of ASRs, each containing a set of workers. For example, workersw 1 ,w 2 ,w 3 formanASRA 1 withm = 3,whereasw 7 andw 8 formanASRA 3 withm = 2. Withthesedefinitions,wereformulateourproblemasfollows. Definition16(PrivateallReversek-nearestWorker(PaRknW)) Given a set of workers W, and a set of spatial tasks T, let A be the set of ASRs sent to the server, whereeveryworkerw i iscloakedinatleastoneASRofthesetA. Theproblemistofind theRkN-workersetofallworkers. Figure4.9depictsanexampleofPaRknWproblem,inwhichworkersofFigure4.8 formgroupsofdifferentsizes(i.e.,m). Thereafter,eachgroupsendsthecorresponding ASRtotheserver. ThegoalistofindtheRkN-workersetofeveryworker. 4.3.2 TruPa In order to solve the PaRknW problem, all of our approaches follow a filter and refine- ment technique, where the filtering step is performed at the server-side, and the refine- mentstepisperformedattheworker-side. Inthefilteringstep,sincetheserverreceives a set of ASRs, the idea is to prune a subset of spatial tasks that cannot be in the RkN- worker set of the workers of a given ASR. Thereafter, the filtered results are sent to the workers, where the goal is to exploit some local information to refine the retrieved result. The challenge here is twofold. First, the server receives a set of ASRs instead oftheexactlocationsofworkers. Thus,itcanonlycomputetheRkN-workersetforthe ASRsratherthantheworkers. Second,inordertorefinetheresults,aglobalknowledge ofworkers’locationsarerequired. However,aworkeronlyhaslimitedknowledgeabout his local peers. This results in approximate answer. In the following, we propose three solutionstothisproblem. Ourfirstapproachisbasedonalimitedpruningtechniqueto 46 reducetheuncertainty. Oursecondapproachimprovesthefirstapproachbyenforcinga realistic assumption that results in a better pruning. Finally, our third approach applies someheuristicstoachievemoreaccurateapproximation. 4.3.2.1 LimitedPruningTechnique(LPT) In this approach, workers first communicate locally among themselves, blur their loca- tionsamongm-1otherworkers,andsendtheirASRstotheserver. Theserverreceivesa setofASRs,ofwhicheveryASRisassociatedwithadifferentanonymitylevelm. This valuemisdependentontheprivacyrequirementoftheworkersinsidethecorresponding ASR. Due to the unavailability of the workers’ locations, the server computes the kN- worker set of every spatial task with the given ASRs during the filtering step. Clearly, theserverneedstoexploreatleastk closestASRsasalowerboundtoassurethatthek closestworkersresideintheresultset. ThereasonisthateventhoughanASRcontains mworkers, theserverdoesnotknowthevaluemforeachASR.Consideringtheworst case when for all ASRs we havem = 1, kN-worker set of a spatial task is located in at least k closest ASRs. Once the set of candidate ASRs, which include the exact query answerforeachspatialtask,areretrieved(termedkN-ASR),theserverinvertstheresult tofindtheRkN-ASRsetforeveryASR.Theseareallthespatialtaskswhichpotentially havetheworkersinthegivenASRaspartoftheirkN-workerset. Sinceweusedalower bound to find the kN-worker set of every spatial task, the RkN-ASR set of every ASR contains the exact answer and possibly a set of false hits. This means that there is no spatial task in the RkN-worker set of a worker which is not in the RkN-ASR set of its correspondingASR. Once the result of the filtering step is sent to every representative worker, in order to prune the false hits, the representative refines the result by verifying the kN-worker set of every spatial task in the result set. In order to verify the correctness of the result, 47 PkNNP Algorithm (setofASRsA,spatialtaskt,k) 01. foreachASRA j ∈A 02. letmaxdist Aj t =distancefromttofarthestpoint onA j ; 03. letA sorted =listofallASRsinAsortedbytheir maxdist t inanincreasedorder; 04. letc =1; 05. letRes ={}; 06. while(c≤k) 07. letA sorted [c] =theASRwithcthsmallestmaxdist t ; 08. letRes c =alltheASRsinradiusmaxdist A sorted [c] t fromt; 09. addRes c toRes; 10. c =c+1; 11. returnRes; Figure4.10: PublickNN queryoverprivatedataalgorithm the representative should have the location of all workers. However, the representative workeronlyhasthelocationofhislocalpeers,andthereforeprunesasubsetofthefalse hits. Finally, the representative sends the corresponding partially refined result to each ofhisvotingworkers(i.e.,thoselocalpeerstowhomtheyvotedfor). Thisindicatesthat every worker is assigned to a set of spatial tasks so that for each spatial task, at least k answerswillbecollected;thus,satisfyingthekleveloftrustfortheSC-querysubmitted bytherequester. Inthefollowing,weexplainbothfilteringandrefinementstepsinmore detail. ThereafterweprovethecompletenessofLPTapproach. FilteringStep The filtering step starts by computing the kN-ASR set of every spatial task,whichcanbeinterpretedasapublickNN queryoverprivatedata. Thereasonisthat from the server point of view, the query point (i.e., spatial task) is public, whereas the data points (i.e., workers) are private and are represented with a set of ASRs. To solve thisproblemfork = 1,wecanusetheproposedapproachin[CMA09]. Weextendthis approachforourfilteringcasewherek > 1. 48 The pseudo-code of this algorithm (i.e., PkNNP Algorithm) is shown in Figure 4.10. WeexplainthedetailsofthisalgorithmwiththeexampleofFigure4.9withk = 2. The PkNNP algorithm for a given spatial task t works in an incremental fashion by first finding the closest neighbor, and incrementally expanding the search until the k closestneighborsarefound(lines6-10). Inordertocomputethenearestworkertot,as anupperbound,weassumethatthelocationoftheworkerisinthefarthestcornerofthe givenASRfromt. Inotherwords,thedistancebetweenaspatialtasktandanASRA j , isthedistancefromttothefarthestcornerofA j denotedbymaxdist A j t (line2). Figure 4.11depictsA 2 astheclosestASR,andthedistancebetweentandA 2 isrepresentedby a dashed line. Note that finding the closest ASR tot with the distance ofmaxdist A min t does not guarantee that the closest worker is found. The reason is thatmaxdist A min t is only an upper bound, and not every worker is located in the farthest corner. To address this, once the closest ASR is found, a range query with a radius maxdist A min t should be computed to retrieve all the possible results. In the example of Figure 4.11, a range query with radius maxdist A 2 t is performed, which returns A 1 . This indicates that the nearest worker is located in eitherA 1 orA 2 . The next step is to repeat the iteration for k = 2. In this step, we find the second closest ASR (i.e., A 1 ), and perform a range query with the radiusmaxdist A 1 t . This will addA 3 to the result-set. At this point, the algorithmterminates,andreturnsA 1 ,A 2 ,andA 3 astheresult-set. Table4.5depictsthe 2N-ASRsetforeveryspatialtask. The pseudo-code of the filtering algorithm is depicted in Figure 4.12. After the computationofkN-ASRsetforeveryspatialtask,theserverinvertstheresulttofindthe RkN-ASR set for every ASR (lines 3-4). Finally, each RkN-ASR set of an ASR is sent toitscorrespondingowner(lines5-6). Table4.6showstheresultofthisstep. 49 Figure4.11: IllustratingthefirstiterationinPkNNP algorithm LPT FilterAlgorithm (setofASRsA, setofspatialtasksT) 01. foreachspatialtaskt i ∈T 02. letkN-ASR ti =PknnP (A,t i ,k); 03. foreachASRA i ∈A 04. letRkN-ASR Ai =setofallspatialtaskswhichhaveA i intheirkN-ASRset; 05. letrw i =representativeworkerandownerofA i ; 06. sendRkN-ASR Ai torw i ; Figure4.12: LPTfilteringalgorithm Refinement Step Once the RkN-ASR set of a given ASR is sent to its correspond- ingrepresentativeworker,therepresentativeperformstherefinementstep(Figure4.13) beforesendingtheresulttoeachofthevotingworkers. Thegoaloftherefinementisto prunetheextraspatialtasksfromtheRkN-workersetofeachofthepeers. Todoso,the representative validates the kN-worker set of every spatial task in the result. However, since the representative only has the location of his local peers, the kN-worker set of the spatial tasks can only be verified with respect to the workers inside the given ASR (lines1-2inFigure4.13). Table4.7depictsthekN-workersetofeveryspatialtaskwith respect to each of the ASRs. For example, the kN-worker set of t 1 with respect to A 1 (denoted by kN-worker A 1 t 1 ) includes w 1 and w 2 . The reason is that among the workers inside A 1 (i.e., w 1 ,w 2 , and w 3 ), w 1 and w 2 are closer to t 1 . Therefore, the refinement stepeliminatest 1 fromtheRkN-workersetofw 3 . 50 Spatialtask kN-ASR t 1 {A 1 ,A 2 ,A 3 } t 2 {A 1 ,A 2 ,A 3 } t 3 {A 1 ,A 2 ,A 3 } t 4 {A 1 ,A 2 ,A 3 } t 5 {A 1 ,A 2 } t 6 {A 1 ,A 2 } t 7 {A 1 ,A 2 ,A 3 } t 8 {A 1 ,A 2 ,A 3 } t 9 {A 1 ,A 2 ,A 3 } Table4.5: 2N-ASRsetforsetT inLPT ASR RkN-ASR A 1 {t 1 ,...,t 9 } A 2 {t 1 ,...,t 9 } A 3 {t 1 ,t 2 ,t 3 ,t 4 ,t 7 ,t 8 ,t 9 } Table4.6: R2N-ASRsetforsetT inLPT Afterthe validationstep, the algorithm invertsthe result, andsends the correspond- ing RkN-worker set to every voting worker in the ASR (lines 3-8). Table 4.8 shows the finalresultsenttoeveryworker. Everyworkerreceivestheexactresult, showninbold, alongwithasetoffalsepositives. ThismeansthatallspatialtasksmeetthekN-workers’ requirement(i.e.,LPTsuccessfullyfindsk closestworkersforeveryspatialtasktocol- lect k answers). However, due to privacy concerns, every worker is assigned to more spatialtasksthanexpected. Below,wedefinethenotionofwastefulcollection(WC). Definition17(WastefulCollection) Given a workerw i , we refer to the percentage of extraspatialtaskassignmentstow i asthewastefulcollectionofw i denotedbyWC i . WC i = falsepositives i truepositives i +falsepositives i ×100 (4.1) 51 LPT RefineAlgorithm (A i ,RkN-ASR Ai ) 01. foreachspatialtaskt j ∈RkN-ASR Ai 02. letkN-worker Ai tj =setofk closestworkersof t j insideA i ; 03. foreveryvotingworkerw m insideA i 04. letRkN-worker wm ={}; 05. foreachspatialtaskt j ∈RkN-ASR Ai 06. ifw m ∈kN-worker Ai tj 07. addt j toRkN-worker wm ; 08. sendRkN-worker wm tow m ; Figure4.13: LPTrefiningalgorithm Spatialtask 2N-w A 1 2N-w A 2 2N-w A 3 t 1 {w 1 ,w 2 } {w 4 ,w 6 } {w 7 ,w 8 } t 2 {w 1 ,w 3 } {w 4 ,w 5 } {w 7 ,w 8 } t 3 {w 2 ,w 3 } {w 4 ,w 6 } {w 7 ,w 8 } t 4 {w 1 ,w 3 } {w 4 ,w 6 } {w 7 ,w 8 } t 5 {w 1 ,w 3 } {w 4 ,w 5 } N/A t 6 {w 1 ,w 3 } {w 4 ,w 5 } N/A t 7 {w 1 ,w 2 } {w 4 ,w 6 } {w 7 ,w 8 } t 8 {w 1 ,w 2 } {w 4 ,w 6 } {w 7 ,w 8 } t 9 {w 1 ,w 2 } {w 5 ,w 6 } {w 7 ,w 8 } Table4.7: 2N-worker A forsetT inLPT The term of wasteful collection is defined per individual worker. We compute the averageofthewastefulcollectionsforallworkers,denotedbyWC,astheoverallwaste- fulcollectionofthesystem. ItisevidentthatlargervaluesofWCresultinmoreanswers perspatialtask. InTable4.8,thewastefulcollectionforeveryworkeriscalculated. The averageWCis62%,whichmeansthatonaverageeveryworkerisassignedto62%extra spatialtasks. Ourgoalistoimprovethetechniquebyminimizingthisextraassignment. LPT Completeness In order to prove the completeness of LPT, first we define the followinglemma. 52 Worker R2N-worker WC w 1 {t 1 ,t 2 ,t 4 ,t 5 ,t 6 ,t 7 ,t 8 ,t 9 } 75% w 2 {t 1 ,t 3 ,t 7 ,t 8 ,t 9 } 60% w 3 {t 2 ,t 3 ,t 4 ,t 5 ,t 6 } 40% w 4 {t 1 ,t 2 ,t 3 ,t 4 ,t 5 ,t 6 ,t 7 ,t 8 } 50% w 5 {t 2 ,t 5 ,t 6 ,t 9 } 50% w 6 {t 1 ,t 3 ,t 4 ,t 7 ,t 8 ,t 9 } 67% w 7 {t 1 ,t 2 ,t 3 ,t 4 ,t 7 ,t 8 ,t 9 } 71% w 8 {t 1 ,t 2 ,t 3 ,t 4 ,t 7 ,t 8 ,t 9 } 86% Table4.8: R2N-workersetforsetW inLPT Lemma1 By utilizing the PiRi approach, any ASR sent to the server contains at least onevotingworker,whileeveryworkervotesforoneandonlyoneASR. Theorem2 TheLPTapproachiscomplete(i.e,nomissingdata). Proof2 In order to prove the completeness, we should prove that the PkNNP algo- rithm retrieves all the ASRs which contain the k closest workers to a given spatial task t. We prove this by contradiction. Assume the kth closest worker is outside the radius maxdist A sorted [k] t . This means that all ASRs in the given radius contain at most k−1 workers. However, there are at least k ASRs in the given radius. Moreover, based on lemma 1, every ASR contains at least one voting worker. Therefore, at least k distinct workersexistintheradiusmaxdist A sorted [k] t fromt,whichcontradictsourpriorassump- tion. 4.3.2.2 BoundedAnonymityLevel(BAL) Our LPT approach does not make any assumption about the anonymity levelm of any ASR. This means m can have any value dependent on the privacy requirement of the workers in the given ASR. Therefore, in order to guarantee that the k closest workers for every spatial task are retrieved, the server needs to explore at least k closest ASRs 53 considering the worst case where m = 1. However, due to privacy concerns, cloaking usually occurs among more than one person. In this case, if the value of m becomes availabletotheserver,theservercanfindthekclosestworkersbyexploringlessnumber ofASRs. Consequently,thenumberofextraassignmentstoeveryworker(i.e.,wasteful collection) would drop. However, knowing the anonymity level of a given ASR results in privacy leak. Instead, the server can enforce a constraint, where the anonymity level of any ASR should stay beyond a certain threshold. In other words, the server can defineasystemvalue,denotedbym min ,wheretheanonymitylevelofanyASRshould be larger thanm min . However, this only works if the ASRs do not overlap. In the case of an overlap, a worker might be in more than one ASR. Thus, every ASR should have at leastm min voting workers to ensure enough number of distinct workers (i.e., m min ) in every ASR. Since this cannot be guaranteed, the constraint can be enforced when all therepresentativeworkersagreeuponit. Oursecondapproach,referredtoasboundedanonymitylevel(BAL)isbasedonthis assumption. Enforcing the minimum anonymity level constraint has few advantages. First,theserverisstillunawareoftheanonymitylevelofanyASR.Second,form min > 1 less number of ASRs might get explored, and therefore, less false hits in the result. Third,LPTisaspecialcaseofBAL,wherem min = 1. Inthefollowing,weexplainthe detailsofBALapproach. FilteringStep Similar to the LPT approach, the filtering step in BALapproach starts with computing the kN-ASR set of every spatial task. This means that an incremental approach is used by first finding the closest neighbor, and expanding the search until k closest ones are found. The difference is that here m min is enforced to every ASR. Consequently, once an ASR is explored, the server knows that at the worst case m min workers reside in the given ASR. Thus, the algorithm might stop at an earlier iteration. 54 Spatialtask kN-ASR t 1 {A 1 ,A 2 } t 2 {A 1 ,A 2 } t 3 {A 1 } t 4 {A 1 ,A 2 } t 5 {A 1 ,A 2 } t 6 {A 1 ,A 2 } t 7 {A 1 ,A 2 ,A 3 } t 8 {A 1 ,A 2 ,A 3 } t 9 {A 1 ,A 2 ,A 3 } Table4.9: 2N-ASRsetforsetT inBAL In our example of Figure 4.9, considering m min = 2, the algorithm finds A 2 as the closestASRtot 1 . TheserverknowsthatA 2 hasanonymitylevelofatleast2. However, the algorithm does not stop at this point, since a worker in a distance ofmaxdist A 2 t 1 in another ASR might be closer to t 1 . Thus, the algorithm performs a range query with a radius of maxdist A 2 t 1 , and adds any intersecting ASR to the result-set (i.e., A 1 ). At this point, the algorithm stops, since the two closest workers cannot reside outside the radiusmaxdist A 2 t 1 fromt 1 . Finally,thealgorithmreturnsA 1 andA 2 astheresult. Table 4.9 depicts the 2N-ASR sets of all spatial tasks using the BAL approach. As the table shows,lessnumberofASRsareexploredcomparingtoLPTapproach. Once the kN-ASR set of every spatial task is computed, the server inverts the result andsendsthecorrespondingRkN-ASRsetstotheownersfortherefinementprocess. RefinementStep TherefinementprocessisexactlysimilartotheLPTapproach. After receiving the RkN-ASR set, the representative worker validates the kN-worker set of every spatial task in the result with respect to all workers in the given ASR. Thereafter, the representative inverts the result, and sends the corresponding RkN-worker set to every voting worker in the given ASR. Table 4.10 depicts the final result sent to every 55 Worker R2N-worker WC w 1 {t 1 ,t 2 ,t 4 ,t 5 ,t 6 ,t 7 ,t 8 ,t 9 } 75% w 2 {t 1 ,t 3 ,t 7 ,t 8 ,t 9 } 60% w 3 {t 2 ,t 3 ,t 4 ,t 5 ,t 6 } 40% w 4 {t 1 ,t 2 ,t 4 ,t 5 ,t 6 ,t 7 ,t 8 } 43% w 5 {t 2 ,t 5 ,t 6 ,t 9 } 50% w 6 {t 1 ,t 4 ,t 7 ,t 8 ,t 9 } 60% w 7 {t 7 ,t 8 ,t 9 } 33% w 8 {t 7 ,t 8 ,t 9 } 67% Table4.10: R2N-workersetforsetW inBAL worker along with the WC percentage. As also expected, we see a slight decrease in theextraspatialtaskassignmenttotheworkers. TheaverageWC isreducedto53%. BALCompleteness Inthefollowing,weprovethecompletenessofBAL. Theorem3 TheBALapproachiscomplete. Proof3 TheproofissimilartothatofTheorem2,andthereforeisomitted. 4.3.2.3 Heuristics-basedBoundedAnonymityLevel(HBAL) In both LPT and BAL, the refinement step is performed based on the local knowledge that each representative worker has about his local peers. Therefore, the validation of kN-workersetforeveryspatialtaskisonlybasedonthelocalworkersinthegivenASR. This results in a limited pruning capability. In order to improve this, the representative workers require some knowledge about other non-local workers. However, the server does not have the location information of other workers. Instead, it can share some information about their ASRs. Therefore, we can employ a set of heuristics to expand the pruning with the extra information sent to the representative workers. We refer to thisapproachas Heuristics-basedBoundedAnonymity Level(HBAL).Weexplainmore detailsinthefollowingsections. 56 Filtering Step The filtering step of HBAL is similar to that of the BAL approach in that the server computes the kN-ASR set of all spatial tasks. Next, the server inverts theresult,andsendstheRkN-ASRsetofeveryASRtoitscorrespondingrepresentative worker. However,foreveryspatialtasktintheRkN-ASRsetofagivenASR,theserver also sends the kN-ASR set oft to the corresponding ASR. This extra knowledge helps therefinementstepwithmorepruning. FollowingourexampleofFigure4.9,onceTable 4.9isgenerated,theservernotonlysendst 1 ,...,t 9 toA 1 ,italsosendstheirkN-ASRsets. This means that the server sends the kN-ASR set oft 1 (i.e.,A 1 andA 2 ) to bothA 1 and A 2 . RefinementStep OncetherepresentativeworkerreceivestheRkN-ASRset,itrefines the result in two phases. The first phase is similar to the refinement step of both LPT and BAL, where the kN-worker sets of the spatial tasks are validated against all local workers in the given ASR. In the second phase, the results of the previous phase are examinedagainstasetofASRsofnon-locals,whicharecontainedinthekN-ASRsetof thecorrespondingspatialtask. Forexample, byvalidatingthe 2N-workersetoft 4 with respect toA 1 , we retrievedw 1 andw 3 in the first phase. In the next phase, sinceA 2 ∈ 2N-ASR t 4 (see Table 4.9), we employ some heuristics to validate the first-phase result against A 2 . The question is how to employ the non-local ASRs to do the refinement. Thefollowinglemmasanswerthisquestion. Lemma2 Given a spatial task t, and an ASR A i , let kN-worker A i t be the kN-worker set of t with respect to elements in A i . Also, let w ∈kN-worker A i t . We say w belongs to kN-worker t if the distance between t and w is smaller than the distance between t and the closest point on any of the non-local ASRs in kN-ASR set oft (i.e., dist(t,w)< mindist A j t ∀A j̸=i ∈kN-ASR t ). 57 Based on Lemma 2, we can guarantee that a worker is in kN-worker set of a spatial task,iftheirdistanceissmallerthantheminimumdistancetoanyotherASR.Following ourexampleofvalidating2N-workersetoft 4 ,weretrievedw 1 andw 3 inthefirstphase. Inthenextphase,wehavetovalidatethemagainstA 2 . Accordingtotheabovelemma, wecanguaranteethatw 1 andw 3 arethe2N-workersoft 4 ,sincenoworkerinA 2 isina closerdistancetot 4 . Lemma3 Given a spatial task t, and an ASR A i , let kN-worker A i t be the kN-worker set of t with respect to elements in A i . Also let w be the jth nearest worker in kN- worker A i t . We say w / ∈kN-worker t if the distance between t and w is larger than the distance betweent and the farthest point onn number of the ASRs in kN-ASR set oft, where(n×m min )+(j−1)≥k. This indicates that we can prune a worker from the kN-worker set of a spatial task, if their distance is larger than the maximum distance to a set of ASRs, in which the total number of possible results exceedsk. In our following example of Figure 4.9, by validatingthe2N-workersetoft 8 withrespecttoA 1 ,weretrievedw 1 andw 2 inthefirst phase. Inthesecondphase,weshouldvalidatethemagainstA 2 andA 3 (seeTable4.9). As the figure depicts, the distance between t 8 and w 2 is larger than maxdist A 3 t 8 . Since A 3 contains at least two workers, the k requirement is satisfied. This indicates that w 2 cannot be any of the two closest workers tot 8 . Thus, we can prunew 2 from the result set. Similarly,w 1 isalsoprunedfromtheresultset. Consequently,t 8 isnolongerinthe R2N-workersetofanyoftheworkersinA 1 . Inotherwords,noneoftheworkersinA 1 areassignedtoperformthetaskt 8 ,sinceaccordingtolemma3,theycannotbethetwo closestworkerstot 8 . Table4.11depictsthefinalresultforeachworkerintheHBALapproach. Thetable shows a drop in the number of false hits, and thus the percentage of WC. The average percentageofWC isreducedto38%. 58 Worker R2N-worker WC w 1 {t 1 ,t 2 ,t 4 ,t 5 ,t 6 ,t 7 } 67% w 2 {t 1 ,t 3 ,t 7 } 33% w 3 {t 3 ,t 4 ,t 5 ,t 6 } 25% w 4 {t 1 ,t 2 ,t 5 ,t 6 } 0% w 5 {t 2 ,t 5 ,t 6 ,t 9 } 50% w 6 {t 1 ,t 8 ,t 9 } 33% w 7 {t 7 ,t 8 ,t 9 } 33% w 8 {t 7 ,t 8 ,t 9 } 67% Table4.11: R2N-workerforsetW inHBAL HBALCompleteness ThefollowingtheoremprovesthecompletenessofHBAL. Theorem4 TheHBALapproachiscomplete. Proof4 Theproofistrivial,andthereforeisomitted. 59 Chapter5 Self-IncentivisedWorkerConstrained ServerAssigned(SIWSA)Spatial Crowdsourcing InChapter4,wefocusedontheissuesofprivacyandtrustinSISSAspatialcrowdsourc- ing. In this chapter, we study Self-Incentivised Worker Constrained Server Assigned (SIWSA) spatial crowdsourcing. With SIWSA spatial crowdsourcing, every worker specifies a spatial range, in which he is willing to perform spatial tasks. Moreover, every worker has a capacity, which is the number of tasks he is willing to perform. Consequently,theproblemturnsintomaximizingthenumberoftaskassignment,while satisfying the constraints of the workers. In the following, we first focus on the assign- ment optimization in Section 5.1. Thereafter, we study the issues of trust (Section 5.2) andprivacy(Section5.3)inSIWSAspatialcrowdsourcing,respectively. 5.1 TaskAssignment With SIWSA spatial crowdsourcing, a set of workers send their task inquiries to a SC- server. The task inquiry of a worker, which includes his location along with a set of constraints (e.g., a region), is a request that the worker issues to inform the SC-server of his availability to work. Consequently, the SC-server, who receives the location of 60 the workers, assigns to every worker his nearby tasks. In this class of spatial crowd- sourcing, the main optimization goal is to maximize the overall task assignment while conformingtotheconstraintsoftheworkers. Werefertothisproblemasthe maximum taskassignment(MTA)problem. ThesolutiontotheMTAproblemcouldbestraightfor- wardiftheSC-serverhadaglobalknowledgeofboththespatialtasksandtheworkers. However,theSC-serveriscontinuouslyreceivingspatialtasksfromrequestersandalso task inquiries from the workers. Therefore, the SC-server can only maximize the task assignment at every time instance (i.e., local optimization) with no knowledge of the future. In this section, our goal is to propose a solution which addresses the challenges oftheMTAproblem. 5.1.1 Preliminaries In this section, we define a set of preliminaries in the context of SIWSA spatial crowd- sourcing. With SIWSA, once a worker goes online, he sends a task inquiry to the SC- server(Figure5.1). Wenowformallydefinethetaskinquiry. Definition18(TaskInquiryorTI) Task inquiry is a request that an online workerw sendstotheSC-server,whenreadytowork. Theinquiryincludeslocationofw,l,along with two constraints: a spatial regionR, and the maximum number of acceptable tasks maxT. The spatial regionR represented by a rectangle is the area in which the worker can accept spatial tasks. In other words, any task outside the region will be rejected by the worker. Moreover, maxT is the maximum number of tasks that the worker is willing toperform. Notethatthetaskinquiryisdefinedfortheworkerconstrainedmode,whereworkers enforcetheirconstraintsonhowtasksshouldbeassigned. Theworkerscanalsospecify other constraints in their task inquiry (e.g., category of the task, amount of time they 61 Figure5.1: Thespatialcrowdsourcingframework have). However,inthisworkweonlyconsidertwoconstraintsforeveryworker(i.e.,R andmaxT). Note that with self-incentivised spatial crowdsourcing, workers volunteer toperformspatialtaskswithoutexpectinganyreward. Once the workers send their task inquiries, the SC-server assigns to every worker a setoftasks,whilesatisfyingeachworker’sconstraints. However,thetaskassignmentis notaone-timeprocess. TheSC-servercontinuouslyreceivesSC-queriesfromrequesters and task inquiries from workers. Therefore, we define the notion of task assignment instanceset,whichisthesetofassignedtasksforagiveninstanceoftime. Definition19(TaskAssignmentInstanceSet) Let W i ={w 1 ,w 2 ,...} be the set of onlineworkersattimes i . Also,letT i ={t 1 ,t 2 ,...}bethesetofavailabletasksattimes i . Thetaskassignmentinstanceset,denotedbyI i isthesetoftuplesofform<w,t>,where aspatialtasktisassignedtoaworkerw,whilesatisfyingtheworkers’constraints. Also, |I i |denotesthenumberoftasks,whichareassignedattimeinstances i . Consequently,thetaskassignmentinstancesetmustconformtotheconstraintsofthe workers. This means that for every tuple<w,t>∈ I i , the spatial taskt must be located inside the spatial regionR of workerw. Moreover, every workerw can be assigned to at most maxT number of tasks (i.e., the number of tuples in I i including w is at most 62 maxT). Note that the goal is to assign everyspatial task to one worker (i.e. single task assignment). Thatis,wehavek=1inthegivenSC-querysubmittedbytherequester. Based on the above definition, We now define the maximum task assignment prob- lem. Definition20(MaximumTaskAssignment(MTA)) Given a time interval ϕ = {s 1 ,s 2 ,...,s n }, let |I i | be the number of assigned tasks at time instance s i . The max- imum task assignment problem is the process of assigning tasks to the workers during thetimeintervalϕ,whilethetotalnumberofassignedtasks(i.e., n i=1 |I i |)ismaximized. Note that in the ideal case, all tasks will be assigned to all workers. However, this mightnotbepracticalduetotheconstraintsoftheworkers. Therefore,ouroptimization goalistomaximizethenumberofassignedtasks. 5.1.2 AssignmentProtocol In order to solve the MTA problem, the SC-server should have a global knowledge of all the spatial tasks and the workers ([UYMM08, WTFX07]). This would allow the SC-server to optimally assign every task to every worker, so that the total number of assigned tasks is maximized. However, the SC-server does not have such knowledge. Ateveryinstanceoftime,theSC-serverreceivesasetofnewtasksfromtherequesters, andalsoasetofnewtaskinquiriesfromtheworkers. Therefore,theSC-serveronlyhas alocalviewoftheavailabletasksandworkersatanyinstanceoftime. Thismeansthat a global optimal assignment is not feasible. Instead, the SC-server tries to optimize the task assignment locally at every instance of time. The SC-server does this by utilizing the spatial information that workers share during their task inquiries. In the following, we propose three solutions to this problem. All the solutions follow the local optimal 63 assignmentstrategy. OurfirstapproachtriestosolveMTAinagreedywaybymaximiz- ingthetaskassignmentateveryinstanceoftime. Oursecondapproachtriestoimprove the optimization by applying a heuristic, which utilizes the location entropy of an area, to maximize the overall assignment. Finally, our third approach tries to maximize the taskassignmentwhiletakingintoaccountthetravelcostoftheworkers. 5.1.2.1 Greedy(GR)Strategy As discussed earlier, with this approach the idea is to do the maximum assignment at everyinstanceoftime. ThereasonthisapproachiscalledGreedyisthatateveryinstance oftime,itonlytriestomaximizethecurrentassignment(i.e.,localoptimizationinstead of global optimization). Note that this does not necessarily result in a globally optimal answer. Given a set of online workers W i ={w 1 ,w 2 ,...}, and a set of available tasks T i ={t 1 ,t 2 ,...} at time instances i , the goal is to assign maximum number of tasks inT i toworkersinW i foreveryinstances i ,whichisequivalenttomaximizing|I i |. Werefer tothisasmaximumtaskassignmentinstanceproblem. Thus,ourgoalinthisapproachis to maximize the overall assignment by solving the maximum task assignment instance problemforeveryinstanceoftime. Inordertosolvethe maximumtaskassignmentinstanceproblem, theideaisto uti- lize the constraints of the workers to guarantee that tasks are properly assigned. Note that without the constraints, a worker might be assigned to a spatial task in a far dis- tance from his location. However, with spatial crowdsourcing, since workers need to physicallygotoalocationtoperformaspatialtask,thegoalistoassignonlyanumber of tasks within a given distance to the workers. During the task inquiry, every online worker forms two constraints: the spatial regionR, and the maximum number of tasks maxT. ThismeansthateveryworkeriswillingtoperformatmostmaxT tasks,which should not be outside his spatial region R. With the following theorem, we can solve 64 the maximum task assignment instance problem by reducing it to the maximum flow problem. Theorem5 The maximum task assignment instance problem is reducible to the maxi- mumflowproblem. Proof 5 We prove this for time instance s i with W i ={w 1 ,w 2 ,...} as the set of online workers, andT i ={t 1 ,t 2 ,...} as the set of available spatial tasks. LetG i =(V,E) be the flow network graph with V as the set of vertices, and E as the set of edges at time instances i . ThesetV contains|W i |+|T i |+2vertices. Eachworkerw j mapstoavertex v j . Each spatial task t j maps to a vertex v |W i |+j . We create a new source vertex src labeledasv 0 ,andanewdestinationvertexdstlabeledasv |W i |+|T i |+1 . The set E contains|W i |+|T i |+m edges. There are|W i | edges connecting the new src vertex to the vertices mapped from W i . For a given edge connecting the src vertex to vertexv j (mapped fromw j ) denoted by (src,v j ), we set the capacity tomaxT j (i.e., c(src,v j )=maxT j ), since every worker is only capable of performing maxT number of tasks. There are also|T i | edges connecting the vertices mapped from T i to the new dst vertex. We set the capacity of each of these edges to 1, since every task is to be assigned to one worker (i.e., single task assignment). Every worker w j has a spatial region constraint R j , and can only perform tasks inside its spatial region. Thus, for every workerw j we add an edge fromv j to all the vertices mapped fromT i , which are insidethespatialregionR j . Foreachofthesemedges,wealsosetthecapacitytoone. Figure 5.2 better clarifies this reduction. Figure 5.2a shows an example of a set of workers W i and a set of available tasks T i at time instance s i . Every worker w j is associated with a spatial region R j . The corresponding flow network graph G i is depictedinFigure5.2b. Asshowninthefigure,workerw 1 canonlyaccepttasksinside hisspatialregion(i.e.,t 2 ,t 5 ,andt 7 ). Therefore,thevertexmappedfromw 1 cantransfer 65 a)AnexampleofW i andT i b)FlownetworkgraphG i = (V,E) Figure 5.2: An example of the reduction of the maximum task assignment instance problemtothemaximumflowproblematinstances i flowtoonlythethreeverticesmappedfromthosetasks(i.e.,v 5 ,v 8 ,andv 10 ). Moreover, w 1 is only willing to accept two tasks sincemaxT 1 = 2. Therefore, the capacity of the edge(src,v 1 )is2. Finally,thecapacityofalltheedgesconnectingtheverticesmapped from spatial tasks (i.e., v 4 ..v 13 ) to the destination vertex dst are 1, since every spatial taskistobeassignedtooneworker. By reducing to the maximum flow problem, we can now use any algorithm that computes the maximum flow in the network to solve the maximum task assignment instance problem. One of the well-known techniques in computing the maximum flow is the Ford-Fulkerson algorithm [KT05b]. The idea behind Ford-Fulkerson algorithm is that it starts sending flow from the source vertex to the destination vertex, as long as 66 thereisapathbetweenthetwowithavailablecapacity. Consequently, inordertosolve theMTAproblemwerepeatthisstepforeveryinstanceoftime. The Greedy approach can be marginally improved by incorporating conventional non-spatialtaskschedulingapproaches[sch]suchasFIFO,orFEFO(firstexpired,first out) 1 . WithFEFO,theexpirationtimeofeverytaskcanbeutilizedasatiebreakerinthe assignment process by prioritizing the tasks based on their expiration. However, in this paper our goal is to exploit the spatial properties of the problem space. Therefore, we introducetwospatialheuristicsinourfollowingtwoapproaches. 5.1.2.2 LeastLocationEntropyPriority(LLEP)Strategy The problem with the Greedy strategy is that at every instance of time, it only tries to maximize the current assignment, without considering future optimizations. Even though we are clairvoyant on neither the future SC-queries from the requesters nor the future task inquiries from the workers, we can use some heuristics to maximize the overallassignments. Oneoftheheuristicsthatcanimprovethetaskassignmentprocess istoexploitthespatialcharacteristicsoftheenvironmentduringtheassignment,oneof which is the distribution of the workers in that area. Since every spatial task is linked to a location in the environment, a task is more likely to be completed when located in areas with higher worker densities. Therefore, the idea is to assign higher priority to taskswhicharelocatedinworker-sparseareas. Weuseentropyofalocationtomeasurethetotalnumberofworkersinthatlocation as well as the relative proportion of their future visits to that location. We refer to this aslocationentropy. Locationentropywasfirstintroducedin[CTH + 10]. Alocationhas a high entropy if many workers visit that location with equal proportions. Conversely, 1 Our experiments showed that the impact of incorporating these conventional task scheduling approachesismarginal. 67 a location will have a low entropy if the distribution of the visits to that location is restricted to only a few workers. Thus, our heuristic is to give higher priority to tasks whicharelocatedinareaswithsmallerlocationentropy,becausethosetaskshavelower chanceofbeingcompletedbyotherworkers. Wenowformallydefinethelocationentropy. Foragivenlocationl,letO l betheset ofvisitstolocationl. Thus,|O l |givesthetotalnumberofvisitstol. Also,letW l bethe set of distinct workers that visitedl. Moreover, letO w;l be the set of visits that worker w has made to the locationl. The probability that a random drawn fromO l belongs to O w;l isP l (w) = |O w;l | |O l | , which is the fraction of total visits tol that belongs to workerw. Thelocationentropyforl iscomputedasfollows: Entropy(l) =− ∑ w∈W l P l (w)×logP l (w) (5.1) Bycomputingtheentropyofeverylocation,wecanassociatetoeverytaskt i ofform <l i ,q i ,s i ,δ i > a certain cost, which is the entropy of its locationl i . Accordingly, tasks with lower costs have higher priority, since they have a smaller chance of being com- pleted. Thus,ourgoalinthisapproachistoassignthemaximumnumberoftasksduring every instance of time while the total cost associated to the assigned tasks is the low- est. We refer to this problem as the minimum-cost maximum task assignment instance problem. With the following theorem, we can solve the minimum-cost maximum task assignment instance problem by reducing it to the minimum-cost maximum flow prob- lem[min]. AminimumcostmaximumflowofanetworkG=(V,E)isamaximumflow withthesmallestpossiblecost. Theorem6 Theminimum-costmaximumtaskassignmentinstanceproblemisreducible totheminimum-costmaximumflowproblem. Proof 6 We already proved in Theorem 5 that the maximum task assignment instance problemisreducibletothemaximumflowproblem. Intheminimum-costmaximumtask 68 assignment instance problem, every task is associated with a cost. We prove this for time instances i withW i ={w 1 ,w 2 ,...} as the set of online workers, andT i ={t 1 ,t 2 ,...} as the set of available tasks. Let G i =(V,E) be the flow network graph constructed in the proof of Theorem 5. For every taskt j , letV j be the set of all vertices mapped from workersW i whichhaveedgesconnectedtothevertexmappedfromt j (i.e.,v |W i |+j ). For every vertex u ∈ V j , let (u,v |W i |+j ) be the edge connected to v |W i |+j . We associate to (u,v |W i |+j ) the cost oft j (i.e., a(u,v |W i |+j ) =Entropy(l j )). Moreover, we set the cost ofallotheredgesinE to0. Thus,byfindingtheminimum-costmaximumflowingraph G i ,wehaveassignedthemaximumnumberoftaskswiththeminimumcost. In the example of Figure 5.2, letEntropy(l 5 ) be the location entropy of the spatial task t 5 . Since t 5 is located in the spatial regions of the workers w 1 and w 2 , we set the costofbothedges(v 1 ,v 8 )and(v 2 ,v 8 )toEntropy(l 5 ). According to the above theorem, solving our problem is equivalent to solving the minimum-cost maximum flow problem at every time instance. In order to solve the minimum-cost maximum flow problem, one of the well-known techniques [min] is to firstfindthemaximumflowofthenetworkusingFord-Fulkersonoranyotheralgorithm which computes the maximum flow. Thereafter, the cost of the flow can be minimized byapplyinglinearprogramming. LetG i =(V,E)betheflownetworkgraphconstructedintheproofofTheorem6for timeinstances i . Everyedge(u,v)∈E hascapacityc(u,v)> 0,flowf(u,v)≥ 0,and cost a(u,v) ≥ 0, where the cost of sending the flow f(u,v) is f(u,v)×a(u,v). Let f max be the maximum flow sent from src to dst using the Ford-Fulkerson algorithm. Thegoalistominimizethetotalcostoftheflow,whichcanbedefinedasfollows: ∑ (u;v)∈E f(u;v)×a(u;v) (5.2) withtheconstraints 69 f(u;v)≤c(u;v) (5.3) f(u;v) =−f(v;u); (5.4) ∑ w∈V f(u;w) = 0forallu̸=src;dst (5.5) ∑ w∈V f(src;w) =f max and ∑ w∈V f(w;dst) =f max (5.6) Since all constraints are linear, and our goal is to optimize a linear function, we can solve this by linear programming. Therefore, our LLEP strategy solves the MTA problembycomputingtheminimum-costmaximumflowforeverytimeinstance,where thecostisdefinedintermsofthelocationentropyofthetasks. 5.1.2.3 NearestNeighborPriority(NNP)Strategy With the both GR and LLEP approaches, our goal was to maximize the overall task assignment. However, we did not consider the travel cost (e.g., in time or distance) of the workers during the assignment process. With spatial crowdsourcing, the travel cost becomesacriticalissuesinceworkersshouldphysicallygotothelocationofthespatial task in order to perform the task. Even though the task assignment process satisfies the spatial constraint of every worker by assigning him only those tasks inside his spatial region,itdoesnotnecessarilyassigntoeveryworkerthosetaskswiththesmallesttravel costs. Withthisapproach,weincorporatethetravelcostoftheworkersintheassignment process. Our goal is to maximize the task assignment at every time instance while minimizing the travel cost of the workers whenever possible. Intuitively, tasks which areclosertoaworkerhavesmallertravelcosts. Thismeansthatwestilltrytomaximize 70 the overall task assignment. However, we assign higher priorities to tasks which are closerinspatialdistancetotheworker. We define the travel cost between a worker w and a spatial task t in terms of the Euclidean distance 2 between the two, denoted byd(w,t). Consequently, by computing the distance between every worker and his allowable spatial tasks (i.e., those inside his spatial region), we can associate higher priorities to the closer tasks. We do this by associatingtoeveryedgebetweenaworkerwandaspatialtasktacertaincost,whichis thedistancebetweenthetwo(i.e.,d(w,t)). Thus,ourproblemistoassignthemaximum number of tasks during every time instance, while the total cost of the assignment is the lowest. Consequently, the problem turns into the minimum-cost maximum task assignmentinstanceproblem. Therefore,asimilarsolutiontothatofSection5.1.2.2but withadifferentcostfunctioncanbeappliedtosolvethisproblem. 5.2 TrustinSIWSASpatialCrowdsourcing With spatial crowdsourcing, the general assumption is that every spatial task is per- formedcorrectly(Section5.1). However,inmanyscenariostheworkersarenottrusted. Aworkermayintentionally(i.e.,malicioususers)orunintentionally(e.g.,makingmis- takes) provide a wrong answer to a given query. Thus, every worker is associated with a reputation score, which states the probability that the worker performs the task cor- rectly. Moreover, we define a confidence level for every spatial task, which states that the answer to the given spatial task is only accepted if its confidence is higher than a certainthreshold. Thismeansthateveryspatialtaskshouldbeassignedtoenoughnum- ber of workers such that their aggregate reputation satisfies the confidence of the task. SimilartoSection5.1,ourgoalistomaximizethenumberofassignedtasks. However, 2 Othermetricssuchasnetworkdistancearealsoapplicable 71 inthisproblemtheconfidenceofeveryspatialtaskshouldalsobesatisfied. Inthissec- tion, we formally define the problem of Maximum Correct Task Assignment (MCTA). Thereafter,weprovideasetofsolutionsforthisproblem. 5.2.1 Preliminaries In this section, first we introduce a set of terminologies. Thereafter, we discuss our reputationscheme. Finally,weformallydefineourproblem. 5.2.1.1 Terminologies As discussed before, in many scenarios the workers are not trusted. A worker may intentionallyorunintentionallyprovideawronganswertoagivenquery. Therefore,we defineaconfidencelevelforeveryspatialtask,whichstatesthattheanswertothegiven spatialtaskisonlyacceptedifitsconfidenceishigherthanacertainthreshold. Wenow definethenotionofα-confidence. Definition21(α-confidentspatialtask) A spatial task t is α-confident, if the proba- bilityofthetasktbeingperformedcorrectlyisatleastα. Withthisdefinition,wenowdefinetheprobabilisticspatialcrowdsourcedquery. Definition22 [Probabilistic Spatial Crowdsourced Query] A probabilistic spatial crowdsourced query (or PSC-Query) of form (< t 1 ,α 1 >,< t 2 ,α 2 >,...) is a query consistingofasetoftuplesofform<t i ,α i >issuedbyarequester,whereeveryspatial taskt i istobecrowdsourcedwithatleastα i -confidence. After receiving the PSC-queries from all the requesters, the spatial crowdsourcing server (or SC-server) assigns the spatial tasks of these PSC-queries to the available workers, while satisfying the confidence probability of every spatial task. We refer to 72 Figure5.3: Atrustworthyspatialcrowdsourcingframework thisasatrustworthyspatialcrowdsourcingframework(Figure5.3). Inthefollowingwe formallyredefineaworker. Definition23(Worker) A worker, denoted by w, is a carrier of a mobile device who volunteerstoperformspatialtasks. Aworkercanbeinaneitheronlineorofflinemode. A worker is online when he is ready to accept tasks. Moreover, every worker is associ- ated with a reputation scorer (0≤ r ≤ 1), which gives the probability that the worker performsataskcorrectly. Sinceworkersmaynotbetrusted,everyworkerisassociatedwithareputationscore, which shows the trustworthiness of the worker in performing a task. Consequently, the higher the reputation score, the more chance that the worker performs a given task cor- rectly. We assume the reputation scores are stored and maintained at the SC-server. Onceaworkergoesonline,hesendsataskinquirytotheSC-server(Figure5.3). There- after, the SC-server should assign to every worker a set of tasks, while satisfying both theconstraintsoftheworkersandtheconfidenceprobabilityofthetasks. Figure5.4illustratesanexampleofatrustworthyspatialcrowdsourcingsystemwith a set of spatial tasks T = {t 1 ,..,t 10 } and a set of workers W = {w 1 ,w 2 ,w 3 }. The 73 Figure5.4: Anexampleofatrustworthyspatialcrowdsourcingsystem confidenceprobabilityofthetasksandthereputationscoreoftheworkersareshownin twodifferenttables. Anexampleofanassignmentistoassignw 1 tothetaskst 2 andt 3 , since both tasks are inside the spatial region of w 1 (i.e., R 1 ). Moreover, the reputation scoreofw 1 satisfiestheconfidenceprobabilityofbotht 2 andt 3 (i.e.,r 1 >α 2 ,r 1 >α 3 ). Finally,themaximumnumberofacceptabletasksforw 1 is2(i.e.,maxT 1 =2). 5.2.1.2 ReputationScheme InSection5.1,thegoalwastomaximizethenumberoftasksassignedtoworkers,while satisfying the constraints of the workers. Moreover, since the assumption is that all workers are trusted, every task should only be assigned to one worker. However, with PSC-query we should also take into account the confidence probability of the tasks. Therefore, a task may be needed to be assigned to more than one worker. Consider the example of Figure 5.4, in which t 1 is located inside the spatial regions of all the three workers. Here, t 1 cannot be assigned to any of the individual workers, because its confidence probability is not satisfied by any of them. Instead, it may be possible to assign the task to a number of workers simultaneously, where the aggregation of the workers’reputationscoressatisfiesα 1 . Consequently,byassigningmultipleworkersto 74 atask,twoissuesarise: 1)howtodecidethecorrectanswerbasedondifferentanswers oftheworkers,and2)howtoaggregatethereputationscoresoftheworkerstocheckif therequiredconfidenceissatisfied. Inthefollowing,wediscussthetwoissues. With spatial crowdsourcing applications, one of the major challenges is how to aggregate the answers provided by different workers. Note that different spatial tasks may support different modalities of answers (e.g., binary/numerical value, text, photo). Inthiswork,forsimplicityweassumethattheanswertoaspatialtaskisabinaryvalue (0/1). Anexampleofa spatialtaskwithbinaryvaluecanbe asfollows. Given a photo, is this an image of a particular building?. However,thiscanbegeneralizedtoanydata modality. Onestraightforwardexampleistoquantify(ifpossible)anymodalityofdata to a binary value. Consequently, one of the well-known mechanisms to make a single decision based on the answers of a group of workers is majority voting, which accepts the answer supported by the majority of workers. This intuition is based on the idea of the wisdom of crowds [Sur04] that the majority of the workers are trusted. We now formallydefinetheconceptofmajorityvoting. Definition24(MajorityVoting) Givenaspatialtaskt i ∈T,letW i ⊆W bethesetof allworkerswhoperform thetaskt i . Also, letV i betheset ofbooleananswersprovided by those workers. That is, for every worker w j ∈ W i , v j ∈ V i is the answer that w j provides with respect to the task t i . Consequently, majority voting is computed as follows: MV(V i ) = 1 if ∑ vj∈Vi v j ≥ |Vi| 2 +1 0 otherwise (5.7) In this work, we use majority voting for any decision process when multiple work- ersperformasingletasksimultaneously. Thereafter,inordertoaggregatethereputation score of the workers, we need to compute the probability that the majority of workers 75 perform the task correctly (see [CST12]). Thus, we now define the Aggregate Reputa- tionScore. Definition25(AggregateReputationScore(ARS)) Given a spatial task t ∈ T, the aggregate reputation score of the set Q ⊆ W is the probability that at least |Q| 2 + 1 numberoftheworkersperformthetasktcorrectly. ARS(Q) = |Q| ∑ k= |Q| 2 +1 ∑ A⊂F k ∏ wj∈A r j ∏ wj= ∈A (1−r j ) (5.8) where F k is all the subsets of Q with size k, and r j is the reputation of the worker w j . Consider t 1 in the example of Figure 5.4. As the figure shows, t 1 is located inside the spatial regions of all the three workers w 1 , w 2 , and w 3 with reputation scores 0.7, 0.6, and 0.7, respectively. Thus, for the setQ ={w 1 ,w 2 ,w 3 }, the aggregate reputation scoreofthethreeworkersiscomputedasfollows: ARS(Q) = (0.7·0.6·0.7)+(0.7·0.4·0.7)+(0.7·0.6·0.3·2) = 0.74 Consequently, by aggregating the reputation score of the three workers, t 1 can be performedbyassigningittoallthethreeworkerssimultaneously,sinceα 1 < 74%. 5.2.1.3 ProblemDefinition In this section, we first define the notions of a correct match and a potential match set. Thereafter,weformallydefineourproblem. Definition26(CorrectMatch) Given a task t ∈ T and a set of workers W, we refer to the set C ⊆ W as a correct match for the task t, if t is located inside the spatial region of every workerw ∈ C, and the aggregate reputation score of the workers inC satisfies the confidence probability of t (i.e., ARS(C) ≥ α). We denote the set C by 76 <w i w j ...>. Moreover, we representa correctmatch between a given task and a setC ofworkersby(t,<w i w j ...>)(or(t,C)). Note that formulating a correct match with a set of AND operations among the workers(i.e.,<w i w j ...>)isbasedonthefactthattheseworkersshouldallperformthe task simultaneously to meet its confidence probability. An example of a correct match in Figure 5.4 is (t 1 ,< w 1 w 2 w 3 >), since all the three workers have t 1 in their spatial regions,andARS({w 1 ,w 2 ,w 3 })>α 1 . Definition27(PotentialMatchSet) Given a task t i ∈ T and a set of workers W, let P(W) be the power set of the setW. We refer to the setM i ⊆ P(W) as the potential matchsetfort i ifM i containsallthecorrectmatchesfort i . Table 5.1 depicts the potential match sets for all the spatial tasks of Figure 5.4. For example, the potential match set for t 2 is M 2 = {(t 2 ,< w 1 >),(t 2 ,< w 3 >),(t 2 ,< w 1 w 3 >)}, since t 2 is only located inside the spatial regions of w 1 and w 3 . Moreover, the aggregate reputation score of everyC ∈ M 2 satisfies the confidence probability of t 2 (i.e.,r 1 >α 2 ,r 3 >α 2 ,andARS({w 1 ,w 3 })>α 2 ). Anotherexampleisthepotential match set fort 4 , which is an empty set. The reason is that its confidence probability is notsatisfiedbyanyoftheworkerswhosespatialregioncontainst 4 (i.e.,r 2 <α 4 ). Definition28(ProblemDefinition) Given a set of workersW={w 1 ,w 2 ,...} and a set of spatial tasks T={t 1 ,t 2 ,...}, let M= |T| ∪ i=1 M i be the union of the potential match sets for all spatial tasks, where every correct match inM is of form (t i ,< w j w k ... >). The maximum correct task assignment ( or MCTA) problem is to maximize the number of assignedtasksbyselectingasubsetofthecorrectmatches,inwhicheveryspatialtaskt i isassignedtoatmostonecorrectmatchinM,whilesatisfyingtheworkers’constraints. 77 Task PotentialMatchSet t 1 {(t 1 ,<w 1 w 2 w 3 >)} t 2 {(t 2 ,<w 1 w 3 >),(t 2 ,<w 1 >),(t 2 ,<w 3 >)} t 3 {(t 3 ,<w 1 >)} t 4 {} t 5 {(t 5 ,<w 1 >)} t 6 {} t 7 {} t 8 {(t 8 ,<w 1 w 2 w 3 >)} t 9 {(t 9 ,<w 3 >)} t 10 {(t 10 ,<w 2 w 3 >),(t 10 ,<w 2 >),(t 10 ,<w 3 >)} Table5.1: IllustratingthepotentialmatchsetsforthespatialtasksofFigure5.4 5.2.2 ComplexityAnalysis Inthissection,weprovethatthemaximumcorrecttaskassignmentisanNP-hardprob- lembyreductionfrommaximum3-dimensionalmatchingproblem[GJ79],whichisalso an NP-hard problem. The maximum 3-dimensional matching problem can be formal- izedasfollows: Definition29(Maximum3D-MatchingProblem) LetX,Y, andZ be finite, disjoint sets,andletT beasubsetofX×Y ×Z. Thatis,foreverytriple(x,y,z)∈T,wehave x ∈ X,y ∈ Y, andz ∈ Z. we sayM ⊆ T is a 3-dimensional matching if for any two distinct triples (x 1 ,y 1 ,z 1 ) ∈ M and (x 2 ,y 2 ,z 2 ) ∈ M, we havex 1 ̸= x 2 , y 1 ̸= y 2 , and z 1 ̸=z 2 . Thus,themaximum3-dimensionalmatchingproblemistofinda3-dimensional matchingM ⊆T thatmaximizes|M|. InordertoprovethattheMCTAproblemisNP-hard,wefirstprovethattheMCTA 1 problemisNP-hard;wedefineMCTA 1 asaspecialinstanceofMCTAprobleminwhich the maximum number of acceptable tasks (i.e.,maxT) for every worker is one. There- after, we readily conclude that the MCTA problem is NP-hard. The following lemma provesthatMCTA 1 isNP-hard. 78 Lemma4 TheMCTA 1 problemisNP-hard. Proof7 We prove the lemma by providing a polynomial reduction from maximum 3D- matching problem. Towards that end, we prove that given an instance of the maximum 3D-matching problem, denoted by I m , there exists an instance of the MCTA 1 problem, denoted by I a , such that the solution to I a can be converted to the solution of I m in polynomial time. Consider a givenI m , in which each setX,Y, andZ hasn elements. Also, let T be a subset of X ×Y ×Z. To solve I m , we select a set M ⊆ T, in which M is the largest 3D matching. Correspondingly, to solveI a , we selectA⊆ |T| ∪ i=1 M i with maximumcardinality,inwhichnotwomatchesinAshouldoverlap. Therefore,weproposethefollowingmappingfromI m componentstoI a components to reduce I m to I a . For every element in X, we create a spatial task. Thereafter, for every element in Y and Z, we create a worker. That is, we create a total of n spatial tasks and 2n workers. Every task t i has a potential match set M i , which is the set of allpossiblecorrectmatches. Moreover,everycorrectmatchin |T| ∪ i=1 M i isatripleofform (t x ,<w y w z >), where0<x≤n,0<y ≤n, andn<z ≤ 2n. Consequently,to solve I a , we need to find a set A ⊆ M, in which A is the largest 3D matching. That is, for every two matches in A, (t x 1 ,< w y 1 ,w z 1 >) and (t x 2 ,< w y 2 w z 2 >), we have t x 1 ̸= t x 2 , w y 1 ̸=w y 2 ,andw z 1 ̸=w z 2 . ItiseasytoobservethatiftheanswertoI a isthesetA,the answertoI m willbethesetM withmaximumcardinality. Thiscompletestheproof. ThefollowingtheoremfollowsfromLemma4: Theorem7 TheMCTAproblemisNP-hard. Proof 8 We prove by restriction from MCTA 1 . MCTA 1 is a special instance of MCTA andisNP-hardbasedonLemma4. Therefore,MCTAisalsoNP-hard. 79 5.2.3 AssignmentProtocol Based on theorem 7, the MCTA problem is NP-hard, which makes the optimal algo- rithms impractical. Consequently, we can employ any approximation algorithm that solvesthe3D-matchingproblemtosolvetheMCTAproblem. Inthefollowing,wepro- pose three solutions to this problem. Our first solution is based on a greedy approach thatsolvesthe3D-matchingproblem. Oursecondapproachtriestoimprovethegreedy approach by performing some local optimization. Finally, our third approach tries to applysomeheuristicstoimprovetheapproximation. 5.2.3.1 Greedy(GR)Approach One of the well-known approaches for solving the 3D-matching is a greedy algo- rithm which iteratively expands the matching set until no more expansion is possible [DKH11]. Correspondingly, to solve the MCTA problem, we can iteratively assign a task to one of its correct matches, until no more assignment is possible. Note that with MCTA problem, the maximum number of acceptable tasks for every worker may not necessarily be one. Consequently, we address this by transforming every worker with maxT capacity into maxT workers with capacity of 1. This allows a worker to be assignedtoatmostmaxT numberoftasks. Moreover,unlikethe3D-matchingproblem whereeverymatchisintheformofatriple,inMCTAproblemeverycorrectmatchmay containanynumberofworkers(i.e.,from1to|W|). The pseudo-code of the Greedy (GR) algorithm is shown in Figure 5.5. We explain the details of the GR approach with the example of Figure 5.4. The algorithm starts by iterating through every correct match in the set M, which is the union of the potential matchsetsforallspatialtasks,andaddsthecorrectmatchtotheresultsetAifitdoesnot contradict with any of the already added correct matches (lines 3-17). We say a correct match (t i ,C) contradicts with a correct match (t i ′,C ′ ) in A if either of these two cases 80 GR-Approach (SetW,SetT,SetM) 01. LetA={}; 02. LetUsed[]={0}; 03. Foreachcorrectmatch(t i ,C)∈M 04. contradict=false; 05. Foreachcorrectmatch(t i ′,C ′ )∈A 06. If(t i =t i ′) 07. contradict=true;break; 08. Else 09. Foreveryw∈ (C∩C ′ ) 10. If(Used[w]=w:maxT) 11. contradict=true;break; 12. If(contradict)break; 13. If(!contradict) 14. A=A∪(t i ,C); 15. Foreveryw∈C 16. IncrementUsed[w]by1; 17. Endfor; 18. ReturnA; Figure5.5: Greedyalgorithm occur: 1) the task has already been assigned (i.e., t i = t i ′), or 2) for any worker in the setC, the worker has already used all his capacity (lines 6-12). That is, the worker has been assignedmaxT number of times. Table 5.2 depicts the status of the setA for the exampleofFigure5.4ateverystep. Wecanseethatateverystepthemostrecentlyadded correct match is shown in bold. According to Table 5.2, in the first step the algorithm assignst 1 to< w 1 w 2 w 3 >. Thereafter, the algorithm assigns the taskt 2 to< w 1 w 3 > (step2). Atthispoint,thealgorithmreachest 3 . However,sincet 3 canonlybeassigned to w 1 , and w 1 has already used all his capacity, t 3 remains unassigned. The algorithm repeatsthissteptofindallthenon-contradictingcorrectmatches. Consequently,instep 3, the GR algorithm adds (t 10 ,< w 2 >) to the setA. The algorithm stops when it scans throughallthecorrectmatches. One of the properties of the GR approach is that it finds a maximal assignment. An assignment A is maximal if any correct match in M-A contradicts A. The reason is thatifanycorrectmatchexists,whichdoesnotcontradictA,itcouldbeaddedtoA. As 81 Steps A 1 {(t 1 ,<w 1 w 2 w 3 >)} 2 {(t 1 ,<w 1 w 2 w 3 >),(t 2 ,<w 1 w 3 >)} 3 {(t 1 ,<w 1 w 2 w 3 >),(t 2 ,<w 1 w 3 >),(t 10 ,<w 2 >)} Table5.2: IllustratingthestepsofGRapproachfortheexampleofFigure5.4 depictedinTable5.2,wecanseethatthefinalsetAismaximal,anditcontains3correct matches. Notethatamaximalassignmentisnotequivalenttothemaximumassignment (i.e.,theoptimalanswertoMCTAproblem). 5.2.3.2 LocalOptimization(LO)Approach The problem with the GR approach is that the assignment is performed in an ad-hoc fashion,andistotallydependentontheorderinwhichthecorrectmatchesarescanned. Inotherwords,thespatialtasksareassignedarbitrarilywithoutconsideringanyheuris- tic to improve the result. The Local Optimization approach adopted from [HS89] tries to improve the Greedy approach by finding an optimal solution within a neighborhood set of solutions. Consequently, the LO approach first uses the GR approach to find an assignment. Thereafter, it tries to improve the assignment by performing some local search. Figure 5.6 depicts the pseudo-code for the LO algorithm. We explain the details of the LO algorithm with the example of Figure 5.4 (see Table 5.3). The algorithm starts by applying the Greedy approach to find a maximal assignment A (line 1). Since A is maximal, it cannot be directly expanded by adding more correct matches. However, it is still possible that if we remove a correct match from A, we may be able to replace it with more than one correct match in order to increase the number of assigned tasks. Consequently, the algorithm iterates through all the correct matches in the setA (lines 5-14), and for every correct match (t i ,C), the LO algorithm removes it from the result 82 setA. Asshowninstep2ofTable5.3,(t 1 ,<w 1 w 2 w 3 >)isremovedfromA. Thereafter, thealgorithmsearchesforthesetM ′ ,whichisthesetofallthenon-contradictingcorrect matchesinM thatcouldbeaddedtoA−(t i ,C). Forexample,thesetM ′ afterremoving (t 1 ,<w 1 w 2 w 3 >)fromthesetAincludes(t 3 ,<w 1 >),(t 5 ,<w 1 >),(t 8 ,<w 1 w 2 w 3 >), and(t 9 ,<w 3 >). Notethateventhoughthesecorrectmatchesdonotcontradictwiththe setA, they may contradict with each other. For example, (t 3 ,< w 1 >) and (t 5 ,< w 1 >) inthesetM ′ contradictswitheachother. Therefore,thealgorithmneedstocomputethe set A ′ with maximum number of non-contradicting correct matches, given the set M ′ . That is, it needs to solve the MCTA problem for the set M ′ . Note that the set M ′ is a much smaller set as compared to M. Therefore, computing the maximum assignment usinganyoftheoptimalapproachesisfeasible 3 . Inourexample,thesetA ′ constructed fromM ′ includes (t 3 ,< w 1 >) and (t 9 ,< w 3 >). Consequently, the algorithm tradesA ′ for(t i ,C)onlyif|A ′ |>1(lines9-11). Thatis,thealgorithmaddsA ′ toAonlyiftheset A could be expanded by more than one correct match. Otherwise, the already removed correctmatch(t i ,C)isputbackintotheresultset(lines12-13). AsdepictedinStep3of Table5.3,thesetA ′ isaddedtothesetA,sinceitcontainstwocorrectmatches. Finally, theLOalgorithmstopswhennomoresuchtradingispossible(lines3-15). OnepropertyoftheLOalgorithmisthatthefinalresultAisamaximalassignment, in which no correct match can be traded for more than one correct match in M −A. The LO algorithm can be extended to a more generalized case, in which we construct a maximal matchingA such that no k correct matches can be traded with more thank correctmatchesinM −A(see[HS89]). 3 EvenifthesetM ′ waslarge,theGRapproachcouldbeappliedtocomputethesetA ′ 83 LO-Approach (SetW,SetT,SetM) 01. LetA=GR-Approach(W,T,M); 02. Letimprove=true; 03. While(improve)do 04. improve=false; 05. Foreachcorrectmatch(t i ,C)∈A 06. A=A-(t i ,C); 07. LetM ′ bethesetofcorrectmatchesinM notcontradictingA; 08. ConstructthesetA ′ ofmaximumcardinality fromM ′ wherenotwoelementscontradict; 09. If(|A ′ |>1) 10. A=A∪A ′ ; 11. improve=true; 12. Else 13. A=A∪(t i ,C); 14. Endfor; 15. Endwhile; 16. ReturnA; Figure5.6: LocalOptimizationalgorithm Steps A 1 {(t 1 ,<w 1 w 2 w 3 >),(t 2 ,<w 1 w 3 >),(t 10 ,<w 2 >)} 2 {(t 2 ,<w 1 w 3 >),(t 10 ,<w 2 >)} 3 {(t 2 ,<w 1 w 3 >),(t 10 ,<w 2 >),(t 3 ,<w 1 >), (t 9 ,<w 3 >)} 4 {(t 10 ,<w 2 >),(t 3 ,<w 1 >,(t 9 ,<w 3 >)} 5 {(t 10 ,<w 2 >),(t 3 ,<w 1 >),(t 9 ,<w 3 >), (t 2 ,<w 3 >),(t 5 ,<w 1 >)} Table5.3: IllustratingthestepsofLOapproachfortheexampleofFigure5.4 5.2.3.3 Heuristic-basedLocalOptimization(HLO)Approach In this approach, our goal is to employ a set of heuristics to increase the number of assignedtasks. Ourfirstheuristicfiltersoutasetofcorrectmatchestoimprovethefinal result. Oursecondheuristicisbasedontheintuitionthatitwouldbemorebeneficialto utilize less number of workers when assigning a task. This would allow those workers to be assigned to other tasks; thus, increasing the total number of assigned tasks. Our thirdheuristictakesintoaccountthetravelcost(e.g.,intimeordistance)oftheworkers 84 during the assignment process. Therefore, the intuition here is to give more priority to the workers who are in smaller distance to a given spatial task. In the following sections, we discuss each of the heuristics in turn. Thereafter, we discuss the steps of HLOalgorithminmoredetails. Filtering Heuristic In order to solve the MCTA problem, we need to compute the potential match set for every spatial taskt. This requires computing the aggregate rep- utation score for any combination of workers whose spatial region contain the task t. Consequently, repeating this step for all the spatial tasks can create a large number of correct matches. This would make the existing approaches less efficient. Our idea is to prune a set of correct matches whose pruning may improve the final answer. In the following, we first define the term domination. Thereafter, we define a lemma, which depictshowwecanfilteroutasetofcorrectmatches. Definition30(Domination) Given two correct matches (t,C)∈ M and (t,C ′ )∈ M, we saythecorrectmatch(t,C)dominatesthecorrectmatch(t,C ′ )ifC ⊆C ′ . Lemma5 Given the set M (Definition 31), let A be the output of the LO algorithm. Moreover, let D ∈ M be the set of all correct matches being dominated by the rest of the correct matches inM −D. Let ^ A be the output of the LO algorithm, given the set ^ M=M−D. Wehave| ^ A|≥|A|. Thatis,thesetDcanbesafelyprunedfromM without degradingthefinalresult. Proof9 The proof is trivial. Let (t,C ′ )∈ D. Also, let (t,C ′ ) be dominated by (t,C) ∈ M − D. Now, let us assume that the task t is assigned to the set C ′ in A. We can always replace (t,C ′ ) with (t,C) in A, since C is the subset of the workers in C ′ . Moreover, since there exists a set of workers in C ′ who are not in C, replacing (t,C ′ ) with (t,C) will release some workers to be assigned to other tasks. Consequently, this 85 may result in increasing the number of assignments. Thus, for ^ M=M −D, we have | ^ A|≥|A|. Given Lemma 5, we can remove all the correct matches in the set M which are alreadydominatedbyothercorrectmatchesinM. FortheexampleofFigure5.4,theset of correct matches which can be pruned from the setM isD ={(t 2 ,< w 1 w 3 >),(t 10 ,< w 2 w 3 >)}. LeastWorkerAssigned(LWA)Heuristic OneofthedrawbacksoftheGRapproach wasthatthetaskassignmentwasperformedarbitrarily. AlthoughtheLOapproachtries toimprovetheoutputofGR,utilizingasetofheuristicsmayfurtherimprovetheresult. Our next heuristic is to assign higher priorities to the correct matches with less number ofworkers. Thatis,giventwocorrectmatches(t,C)and(t ′ ,C ′ ),where|C|<|C ′ |,(t,C) has a higher priority. Consider every worker as a resource. The intuition is that these resources are limited (i.e., workers have limited capacities). Consequently, it would be muchwisertowastelessnumberofresourcesforagivenspatialtaskwheneverpossible, so that those resources can be used by the rest of the tasks; thus, increasing the total numberofassignedtasks. LeastAggregateDistance(LAD)Heuristic Sofar,wehavenotconsideredthetravel cost(e.g.,intimeordistance)oftheworkersduringtheassignmentprocess. Withspatial crowdsourcing, the travel cost becomes a critical issue since workers should physically go to the location of the spatial task in order to perform the task. Consequently, based on this heuristic, the idea is to give more priority to workers whose aggregate distance toagivenspatialtaskislessthanthoseofotherworkers. 86 We define the travel cost between a worker w and a spatial task t in terms of the Euclideandistance 4 betweenthetwo,denotedbyd(t,w). Moreover,givenasetofwork- ers C, who should be assigned to the task t simultaneously, we define the aggregate distance, denoted by ADist(t,C), as the sum of the Euclidean distances between the spatial task t and all the workers in C (i.e., ADist(t,C)= ∑ w∈C d(t,w)). Consequently, by computing the aggregate distance between a spatial task t and all the workers in C given a correct match (t,C), we can associate higher priorities to the correct matches with smaller aggregate distances. Note that the LAD heuristic somehow captures the LWA heuristic, since in most cases less number of workers results in smaller aggregate distance. However, in general both can be merged together, which we explain in the nextsection. HLO Algorithm In this section, we explain the details of the HLO algorithm by employing the already discussed heuristics. The HLO algorithm works similar to the LO algorithm with three differences. First, the input to the algorithm is the set of non- dominatedcorrectmatches ^ M insteadofthesetM. Second,theset ^ M isorderedbythe number of workers and the aggregate distance, respectively. That is, we first give more priority to the correct matches with less number of workers. Thereafter, among those with equal number of workers, we give higher priority to those with smaller aggregate distances. Finally,insteadofarbitraryscanning,theset ^ M isscannedinthedescending orderofpriority. 4 Othermetricssuchasnetworkdistancearealsoapplicable 87 5.3 Privacy in Trustworthy SIWSA Spatial Crowd- sourcing In Section 5.2, our goal was to maximize the number of assigned tasks, while guaran- teeing a given confidence for every spatial task. However, we did not take into account the privacy of the workers. The challenge is that even if the worker hides his location by sending a range (i.e., worker constrained), since every worker is associated with a reputation score, the SC-server can utilize the worker’s reputation to identify who has performed which task. The reason is that reputation is only meaningful if bounded to an identity, which is not the case in a private spatial crowdsourcing. Thus, anonymity is hard to achieve, when both location and reputation should be incorporated into the worker’squery. Inthissection,westudythisprobleminmoredetails. 5.3.1 Preliminaries In this section, we begin by formally defining our problem. Thereafter, we present the system design, followed by the security properties we desire for our design, and the underlyingtrustassumptionsandthreatmodel. 5.3.1.1 ProblemDefinition In Section 5.2, we studied the issue of trust by solving the MCTA problem. However, in many scenarios the server is not trusted, and therefore, a worker may not be willing torevealhis identity to theserver. Evenif theworkerhides his identity from theserver (i.e., only reveals his location), he can still be threatened by the location-based attack. Wenowformallydefinetheprivacyproblem. Definition31(ProblemDefinition) Given a set of workersW={w 1 ,w 2 ,...} and a set of spatial tasksT={t 1 ,t 2 ,...}, the goal is to solve the MCTA problem while protecting 88 the privacy of the workers. We refer to this as privacy-aware MCTA (or PAMCTA) problem. 5.3.1.2 SystemDesign With this problem, our goal is to enable both privacy and trust in spatial crowdsourc- ing. To enable privacy, a worker should not reveal his identity to the server. Even if the worker hides his identity from the server (i.e., only reveals his location and repu- tation), due to the strong correlation between people and their movements, a worker can still be identified by his location. Therefore, due to the privacy reasons, instead of sending their exact locations, the workers only send the spatial regions in which they can perform tasks. The workers can utilize any of the spatial cloaking techniques [GL, MCA, CML](e.g., k-anonymity) to blur their location in a region with k-1 other mobile users. A well-known approach is the P2P spatial cloaking [CML], which we discussed in Section 4.1.1. An alternative is that the workers simply form their spatial regionsindividuallywithoutcommunicatingwithothers(e.g. zipcodearea). Moreover, cloaking the locations by itself may not be sufficient to protect the pri- vacy of the workers, since the reputation score of the workers may also reveal infor- mation about the identity of the workers, especially if it is linked to a spatial region. Consequently, instead of sending their exact reputation scores, the workers cloak their reputation with a number of workers, and only send the minimum reputation score of thoseworkers. Althoughthisseemssimilartospatialcloaking,aworkercannotsimply cloakhisreputationbycommunicatingwithhisclosebypeers. Thereasonisthatunlike location information, the worker is not trusted to store his reputation score, since the worker can easily increase his reputation score. Consequently, we need a trusted party toauthenticatethereputationscoreoftheworkersbydigitallysigningthem,sothatthey 89 cannot be tampered by the workers. We refer to this trusted third party as the certifica- tion authority (CA). The certification authority (CA) is responsible for registering both workersandrequesters. Inordertoregistertheworkers,theCAverifiesthatthespatial crowdsourcingsoftwareisproperlyinstalledontheworker’smobiledevice. Moreover, the CA issues certificates to both workers and requesters, so that the SC-server can verify their authenticity. The CA is also responsible for maintaining and updating the reputation scores of all the workers. However, the problem is that even though the CA signs the reputation score of every worker, the worker is still responsible for cloaking the reputations, for which the worker is not trusted. Alternatively, the CA can store the reputation scores of all the workers. Consequently, before the task inquiry the worker sends a request to the CA with a cloaking parameterk r , where the CA cloaks his repu- tationwithk r -1otherworkers,signsthelower-boundofthecloakedrange,andsendsit totheworker. Thereafter, duringthetaskinquiry,theworkersendsaspatialregionand the signed lower-bound reputation score instead of his exact location and reputation to theSC-servertoprotecthisprivacy. Once the task is performed by the worker, the worker needs to submit the result to the SC-server. Since all the spatial tasks are geo-tagged, this makes it easier for an attacker to associate a submitted task to the worker who performs it. The MIX network (MIX)worksasananonymizationchannelbetweentheworkersandtheSC-server. Itis responsiblefordisassociatingasubmittedtaskresultfromtheworkerwhoperformsthe spatial task. Therefore, the task result will be sent to the SC-server anonymously. The MIX network was originally invented by Chaum [Cha81] as an anonymous remailing scheme, where every message was sent as a multiply-encrypted message by the client, inawaythateveryMIXnode”peelsoff”alayeroftheencryption,andforwardsittothe next MIX node. The MIX node waits for enough messages to arrive before forwarding them. Assuming enough number of users using the MIX, this process of delaying and 90 mixing the messages makes it difficult for the adversary to correlate the incoming and outgoingmessages. 5.3.1.3 ThreatModel In this work, our goal is to provide two security properties: protecting the privacy of theworkersandrequesters,andensuringthecorrectnessoftheperformedtasks. Inthis section, we study the likely threats related to each goal. The adversary can appear as anyofthesystemcomponentsorasathirdparty. Worker/Requester Privacy We seek to provide anonymity for the requesters. The adversary may wish to identify the requester who issues a given spatial crowdsourcing query. We assume that the adversary may eavesdrop on communication between the requester and the SC-server, collude with the SC-server to identify the requester, or attempttoimpersonatetheSC-servertoreceivethespatialtasks. Moreover,weseektoprotecttheprivacyoftheworkerstoencouragetheirparticipa- tion in the spatial crowdsourcing query answering. We assume that the adversary may eavesdroponthecommunicationsbetweentheworkerandtheCAorattempttoimper- sonate CA to obtain the real identity of the worker. The adversary may also attempt to impersonate the SC-server, eavesdrop on the communication between the worker and the SC-server, or collude with the SC-server to obtain the sensitive information of the workersduringbothtaskinquiryandtasksubmission. Furthermore, theadversarymay wish to identify a worker by correlating his sensitive information (e.g., location, repu- tation) to the spatial task that the worker performs. The adversary may also identify a workerbylinkinghisreputationtohislocationinformation. Spatial Task Correctness We seek to guarantee the correctness of the spatial tasks withacertainconfidence,oncetheyareperformed. Theadversarymayposeasaworker, 91 andintentionallyprovideawronganswertoaspatialtask. Theadversarymayalsopose as several workers simultaneously by performing a spatial task redundantly to increase the number of votes given to a wrong answer. The adversary may try to increase his reputation score to increase the confidence of a given task. Moreover, since the reputa- tionscoresoftheworkersareconstantlychanging,theadversarymaytrytouseanolder (higher)versionofhisreputationscore. 5.3.1.4 TrustModel In this section, we define our trust model. For each of the system components, we identifywhatthatentitytrustsabouttheothers. Worker Workers only trust the CA to share their real identity with. They also share their reputation scores with the CA (i.e., reputation scores are assigned by the CA). However, the workers do not share their location with the CA. Instead, they share their location with the closeby mobile users. This is based on the intuition that the location informationislesssensitiveforclosebypeople,sincetheyarealreadyinthesamearea. Moreover, workers only share their location with closeby users if they want to cloak their location via k-anonymity techniques to form their spatial regions. Furthermore, workerstrusttheCAtoverifytheauthenticityoftheSC-server. TheyalsotrusttheSC- servertoacceptspatialtasksfromit. However,theydonottrusttheSC-servertoneither share their location nor their reputation score. Instead, they send their cloaked region along with their lower-bound reputation score. Finally, we assume that the workers do nottamperwiththemobiledevice’shardwareandsoftware. 92 Requester Requesters only trust the CA (not the SC-server) to share their identity with. Requesters also trust the CA to certify the SC-server, so that they can authen- ticate the SC-server before submitting their query. Moreover, requesters trust the SC- server with both submitting their spatial tasks and accepting the results once they are performed. That is, the requesters trust the SC-server that the performed spatial tasks satisfy the required confidence probability, and the SC-server does not tamper with the result. CertificationAuthority(CA) CAisthemosttrustedcomponentinoursystem. More- over,weassumethatCAisadministrativelyindependentfromtheSC-server. SpatialCrowdsourcingServer The SC-server trusts the CA to certify valid workers and requesters in the system. The SC-server trusts the valid requesters to accept their queries, and trusts the valid workers to assign spatial tasks to them. Moreover, the SC- servertruststhemajorityofworkersinperformingthespatialtaskscorrectly. 5.3.2 AssignmentProtocol In this section, we describe our privacy-preserving protocol for task assignment, which provides security despite the threats of Section 5.3.1.3 under the trust assumptions of Section 5.3.1.4. We assume that the registration process for all the entities have been handledoffline. Query Submission The requester sends his PSC-query along with a certified pseudonym (provided by CA during the registration) through a secure channel to the SC-server. The requester, therefore, ensures that the true SC-server receives the query without being tampered by a third party. Moreover, the SC-server knows the query comesfromavalidrequesterbyverifyingtheauthenticityoftherequester. 93 Task Inquiry The worker sends an inquiry to the CA through a secure channel. The inquiry contains his certified pseudonym along with a parameterk r . Consequently, the CA renews his pseudonym. Moreover, the CA cloaks his reputation with k r -1 other workers,signsboththeworker’spseudonymandthelower-boundofthecloakedrange, andsendsthemtotheworker. Thereasonitonlysendsbackthelower-boundinsteadof a cloaked range is that only the lower-bound value of the reputation score is important toguaranteethattheconfidenceofataskissatisfied. Thereafter,duringthetaskinquiry, the worker sends a spatial region along with maxT as well as his signed pseudonym andhissignedlower-boundreputationscoretotheSC-servertoprotecthisprivacy. The taskinquiryisalsosentthroughasecurechannel. Task Assignment After receiving the task inquiries from all the workers, the SC- server solves the MCTA problem for the given set of workers and set of tasks. The SC-serverdoesnothavetheexactreputationscoresoftheworkers. However,itcanuse thelower-boundvalueofthereputationscorestoensurethattheconfidenceofataskis satisfied. TaskSubmission Afterperformingaspatialtask,theworkerusestheMIXnetworkto sendthetaskalongwithhisreputationtotheSC-server. Thisdoesnotlettheadversary tolinkagiventasktotheworkerwhoperformedit. Task Verification The SC-server uses the majority voting technique to verify the validity of the task results. Consequently, the SC-server modifies the reputation scores oftheworkersdependingonwhetherornottheyhavecompletedagiventaskcorrectly. Note that the SC-server does not have the reputation scores of the workers. However, it knows the pseudonyms of the workers along with how much their reputation should increase or decrease. Consequently, the SC-server sends this information to the CA, so 94 thattheCAwhohastheexactreputationoftheworkerscanupdatetheirscoresaccord- ingly. Moreover, for every spatial task, the SC-server sends the pseudonyms of all the workers who performed that task to the CA. This would let the CA to check if any workerhasbeenperformingagiventaskmultipletimes(toincreasehisvote). 95 Chapter6 ExperimentalEvaluations In this chapter, we evaluate the performances of our proposed frameworks. Consider eachinturn. 6.1 PrivacyinSISSASpatialCrowdsourcing Weconductedseveralsimulation-basedexperimentstoevaluatetheperformanceofour PiRi framework. Below, first we discuss our experimental methodology. Next, we presentourexperimentalresults. 6.1.1 ExperimentalMethodology Weperformedthreesetsofexperiments. Withthefirstsetofexperiments,weevaluated the scalability of our proposed technique. For the rest of the experiments, we evalu- ated the impact of the worker’s privacy requirement and the transmission range on our approach. With these experiments, we used two performance measures: 1) communi- cationcost, and2)privacyleak. Wemeasuredthecommunicationcostofourapproach in terms of number of messages incurred by our algorithms per each worker. In order tomeasuretheprivacyleak,wedefinedanewmetricforquantifyingtheprivacyleakin spatialcrowdsourcing. We propose a new privacy leak (PL) metric to determine how successful the server isinassociatingthesubmittedqueriestothequerylocations. Weassumetheworstcase scenario, where the server knows the locations of the workers. Consequently, on one 96 hand the server receives a set of query regions (R), and on the other hand, the server has the query locations (L). Each query region overlaps with a set of workers, one of whichhaveissuedthequery. Therefore,theservercanassociatethequerytoitslocation bysolvingamatchingproblembetweenthesetwosetsofdata. Accordingly,abipartite graph is formed with vertices composed of two disjoint sets L and R, where an edge connects L i to R j , if L i is located inside R j . Since every R j is issued by a worker j, findingamaximumbipartitematchingassignseveryquerytoexactlyonequerylocation. The PL metric which measures the percentage of correct matches between L and R is definedasfollows. PL = NumberofcorrectmatchesbetweenLandR |R| ×100 (6.1) where|R|isthenumberofqueryregions. To compare with a competitive work, since no existing work has been found, we comparedourworkwiththebaselineapproach(BA),proposedinSection4.2.2. More- over, in order to separately show the effectiveness of each of the two properties of the PiRiapproach,wecomparedthebaselineapproachwitheachofourtwoproposedtech- niquescorrespondingtoeachpropertydenotedbyPiandRi. Thatis,thePiapproachis a variation of PiRi that only addresses the all-inclusivity property (i.e., is only partial- inclusive and does not have the range-independence feature). Similarly, Ri approach is a variation of the PiRi approach, which is only range-independent. We ran 1000 cases, andreportedtheaverageoftheresults. We conducted our experiments with the objective of collecting a set of photos from 800 locations in part of Los Angeles county. The locations of these spatial tasks were randomly selected. Moreover, our workers dataset includes random generation of 500 usersinthesamearea. Wesetthedefaultnumberofworkersto300,andvaryitbetween 50 to 500. Moreover, we set the transmission range to 250 meters, and vary it between 97 50to250meters. Thedegreeofanonymity(m)foreachworkervariesbetween5to20, with5asthedefaultvalue. Theminimumarearequirementwassettozeroinallcases. 6.1.2 Scalability With the first set of experiments, we evaluated the scalability of our PiRi approach by varying the number of workers from 50 to 500. As Figure 6.1a depicts, the privacy leak is not much affected by the number of workers. The reason is that even though theoverallinformationsentouttotheserverincreasesasthenumberofworkersgrows, theamountofinformationperworkerremainsthesame,andhence,thisdoesnotaffect the privacy leak. With the BA approach, the privacy leak is around 75% in all cases, whereas this value is decreased to 2% with the PiRi approach. This shows a signifi- cant improvement of PiRi over BA in preserving the privacy. Moreover, as the figure shows,thePiapproachisthenextbestapproachwithahugeeffectonPL(PL≃10%). This confirms that the query selection has the most significant impact within the PiRi approach. The reason is that Pi focuses on minimizing the number of workers sending outqueries, whichlowersthechanceofanaccuratematchingbetweentheworkersand the queries. The Ri approach, on the other hand, has the least impact on PL decreasing itto65%ascomparedtothebaselineapproach. Thereasonisthatqueryregionsdonot usuallyneedalotofexpansion(i.e.,adjustment)tomeettheprivacyrequirements. This showsthattheimpactoftheadjustmentisnotreallysubstantial. Figure 6.1b shows the impact of varying the number of workers on the number of messages. Asthefigureshows,thenumberofmessagesslightlyincreasesinmostcases. Inadensernetwork,morecommunicationisrequiredamongthepeerstoperformtheir queries. WeobservethelargestincreasewithPiRiandPiapproaches,whereasthisonly hasaslightimpactontheBAapproach. Moreover,thefigureshowsthatthenumberof messagesinthePiRiapproach(35-50messagesperworker)is3.5to5timesmorethan 98 0 20 40 60 80 100 50 100 200 300 400 500 No. of Participants Privacy Leak BA Ri Pi PiRi 0 20 40 60 50 100 200 300 400 500 No. of Participants No. Messages PiRi Pi Ri BA (a) (b) Figure6.1: Scalability thatoftheBAapproach. ThisisbecauseofapplyingtheextrastepsinPiRitopreserve theprivacy. 6.1.3 EffectofPrivacyRequirement Inthenextsetofexperiments,weevaluatedtheperformanceofourPiRiapproachwith respecttotheworker’sprivacyrequirementmvariedfrom5to20. Figure6.2aillustrates adecreaseintheprivacyleakas mgrows. Thereasonisthatanincreasein mresultsin higher privacy-aware queries, and therefore less privacy leak. The PL value in the BA approach decreases from 80% to 45%, whereas this value remains almost fixed for the PiRi approach (i.e., PL ≃ 0). Similar to the previous experiments, Pi closely follows PiRiforhavingthemostimpactontheprivacyleakinallcases. Moreover,Figure6.2b showstheeffectofvaryingmonthecommunicationcost. Thefigureillustratesthatthe number of messages increases with an increase in m. This is because as m grows, the sizeofthequeryregionsincreases,andtherefore,moremessagesaretransmittedinboth phasesofthePiRiapproach. 99 0 20 40 60 80 100 5 10 15 20 Anonymity Level Privacy Leak BA Ri Pi PiRi 0 50 100 150 5 10 15 20 Anonymity Level No. of Messages PiRi Pi Ri BA (a) (b) Figure6.2: Effectofprivacyrequirement 0 20 40 60 80 100 100 150 200 250 Transmission Range Privacy Leak BA Ri Pi PiRi 0 50 100 150 100 150 200 250 Transmission Range No. of Messages PiRi Pi Ri BA (a) (b) Figure6.3: Effectoftransmissionrange 6.1.4 EffectofTransmissionRange In the final set of experiments, we measured the performance of our approach with respecttoincreasingthetransmissionrangefrom50to250meters. AsFigure6.3shows, the privacy leak is not affected by varying the transmission range. However, we see a decrease in the communication cost by increasing the transmission range. The reason is that with a higher transmission range, workers can communicate with their peers at a shorter hop distance. This reduces the communication cost. However, the overall informationsentouttotheserver,whichmightrevealthequeryissuers’identity,remains thesame. Our main observation from our experiments is that with an extra cost the privacy is achievable in SISSA spatial crowdsourcing systems. In general, there is a tradeoff 100 between the privacy and the communication overhead. According to the experiments, we observed a significant drop (up to 90%) in the privacy leak of the PiRi approach comparedtothatoftheBAapproach,whereasthecommunicationoverheadwashigher than that of the BA approach. However, we argue that this cost is not a burden to the workers since this is only a one-time cost associated to assigning spatial tasks to the workers during the planning phase. Moreover, this communication overhead can be interpretedintwoways: a)messagingchargesandb)powerconsumption,ofwhich(a) is negligible since most P2P communications (e.g., Bluetooth) are either free or users payfixedmonthlycharges. Forthecaseof(b),sincethefocusisonplanningforworkers withfixedlocations(i.e.,homeoroffice),wecanassumethatmostworkershaveaccess to stable power sources. Thus, the battery consumption is less critical than the times whereworkersareconstantlymoving. 6.2 TrustinPrivacy-AwareSISSASpatialCrowdsourc- ing Wealsoconductedseveralsimulation-basedexperimentstoevaluatetheperformanceof ourTruPaframework. Below,firstwediscussourexperimentalmethodology. Next,we presentourexperimentalresultsonthethreeapproaches: LPT,BAL,andHBAL. 6.2.1 ExperimentalMethodology Weperformedthreesetsofexperiments. Withthefirstsetofexperiments,weevaluated thescalabilityofourproposedapproaches. Fortherestoftheexperiments,weevaluated the impact of the spatial task’s trust level and the worker’s privacy requirement on our approaches. With these experiments, we used two performance measures: 1) WC, and 101 2) communication cost, in which the communication cost is measured in terms of the numberofmessagesincurredbyouralgorithmsforeachrepresentativeworker. We conducted our experiments with the objective of collecting a set of photos from 500 locations in part of the Los Angeles county. These points were randomly selected in this area. Moreover, our workers dataset includes random generation of 400 users in the same area. We set the default number of workers to 200, and vary it between 100 to400. Moreover,wevarythetrustlevelofeveryspatialtaskbetween2to5,with3as the default value (i.e.,k = 3). We also assumed one trust level for all the spatial tasks. We set the transmission range to 250 meters. At the transport layer, we set the MTU (Maximum Transmission Unit) to be 500 bytes. The degree of anonymity (m) for each worker varies between 5 to 20, with 5 as the default value. Finally for both BAL and HBAL,wesetm min to2. WeassumethateveryASRcontainsatleastm min = 2voting workers,andthereforethevalueofm min isagreedamongtheworkers. 6.2.2 Scalability In the first set of experiments, we evaluated the scalability of our approach by varying thenumberofworkersfrom100to400. AsFigure6.4adepicts,weseeaslightincrease in the WC percentage for all the three approaches as the number of workers grows. ThereasonisthatlargernumberofworkersresultsindenserASRs,andthereforemore overlapbetweenASRs. Inotherwords,sincemoreASRsarereturnedasthekN-ASRset ofeveryspatialtask,thenumberoffalsehitsincreases. Ingeneral,weseeLPTwiththe highest percentage of WC in all cases. Moreover, BAL only has a slight improvement over LPT. The reason is that due to privacy concerns, we chose a low value for m min (i.e., m min = 2). Choosing higher values would result in more pruning during the filtering step; thus, improving the WC percentage of BAL. Finally, we see HBAL with theleastpercentageofWC (upto2.5timesbetterthanLPT). 102 0% 20% 40% 60% 80% 100% 100 200 300 400 No. of Participants WC LPT BAL HBAL 0 20 40 60 80 100 200 300 400 No. of Participants No. of Messages LPT BAL HBAL (a) (b) Figure6.4: Scalability Figure 6.4b shows the impact of varying the number of workers on the number of messages. As the figure shows, the number of messages increases in all cases. In a denser network, more communication is required among the peers to perform their queries. It is worth mentioning that the dominant communication overhead (on aver- age 70%) in all the three approaches is due to the P2P communication for preserving the privacy. Moreover, we see that with HBAL, due to the extra information sent to therepresentativesforpruningduringtherefinement,thecommunicationcostishigher than both LPT and BAL. However, the extra cost is only 30% higher than LPT in the worst case. Another interesting observation is that BAL has less communication cost as compared to LPT in all cases. The reason is that with the BAL approach the extra pruning is performed at the server side, resulting in less information to be sent to the representatives. 6.2.3 EffectofTrustLevel In the next set of experiments, we evaluated the performance of our approaches with respect to the task’s trust level varied from 2 to 5. Figure 6.5a illustrates an increase in the WC percentage as k grows in most cases. The reason is that as k increases, less number of workers are pruned during the local validation phase of the refinement step. However, the increase is less significant for HBAL due to the extra pruning in 103 0% 20% 40% 60% 80% 100% 2 3 4 5 k WC LPT BAL HBAL 0 20 40 60 80 2 3 4 5 k No. of Messages LPT BAL HBAL (a) (b) Figure6.5: Effectoftrustlevel the refinement step. Moreover, with an increase in k, the kN-ASR set of every spatial task becomes larger; thus increasing the communication cost in all cases (Figure 6.5b). Similartothepreviousexperiments, HBALactsthebestintermsofimprovingthe WC percentage (up to 2.8 times better than LPT), while the extra communication cost stays between15%to30%ofthatofLPT. 6.2.4 EffectofPrivacyRequirement In our final set of experiments, we measured the performance of our approaches with respect to increasing the privacy requirement (m) from 5 to 20. As Figure 6.6a shows, with an increase in m, the percentage of WC reduces in all cases. The reason is that for larger values of m, each ASR contains more number of workers. Consequently, kN-worker set of a spatial task exists in a less number of ASRs, which results in more pruningduringthevalidationphaseoftherefinementstep. Moreover,Figure6.6bshows the effect of varying m on the communication cost. The figure illustrates that the num- ber of messages increases with an increase in m. This is because as m grows, more communication is required among the peers of a given ASR. Similarly, we see HBAL outperformingLPTandBALintermsofthe WC percentage, whilethecommunication overheadisonlyslightlyhigherthanthoseofthetwo. 104 0% 20% 40% 60% 80% 100% 5 10 15 20 m WC LPT BAL HBAL 0 20 40 60 80 5 10 15 20 m No. of Messages LPT BAL HBAL (a) (b) Figure6.6: Effectofprivacyrequirement 6.3 SIWSASpatialCrowdsourcing Weconductedseveralexperimentsonbothreal-worldandsyntheticdatatoevaluatethe performanceofourproposedapproaches: GR,LLEP,andNNP.Below,wefirstdiscuss ourexperimentalmethodology. Next,wepresentourexperimentalresults. 6.3.1 ExperimentalMethodology We performed three sets of experiments. In the first set of experiments, we evaluated thescalabilityofourproposedapproachesbyvaryingthenumberofspatialtasks. Inthe restoftheexperiments,weevaluatedtheimpactoftheworkers’constraints(i.e.,Rand maxT) on the performance of our approaches. With these experiments, we used two performance measures: 1) the total number of assigned tasks, and 2) the average travel costforaworkertoperformaspatialtask,inwhichthetravelcostismeasuredinterms oftheEuclideandistancebetweentheworkerandthelocationofthetask. We conducted our experiments with both real-world (REAL) and synthetic (SYN) data sets. The real-world data set is obtained from Gowalla [gow], a location-based social network, where users are able to check in to different spots in their vicinity. The check-ins include the location and the time that the users entered the spots. For our experiments,weusedthecheck-indataoveraperiodof100daysin2010,coveringthe 105 state of California. Moreover, we assumed that Gowalla users are the workers of our spatial crowdsourcing system. We picked the granularity of a time instance as one day. Consequently, we assumed all the users who checked in during a day as our available workersforthatday. Moreover,sinceusersmayhavevariouscheck-insduringaday,for everyuserw, wesetmaxT asthe numberof check-ins of theuser in thatday, and also wesetRastheminimumboundingrectangleofthosechecked-inlocations. Intuitively, checking in a spot is equivalent to accepting a spatial task at that location. Moreover, the spatial tasks were randomly generated for the given spots in the area. In order to compute the location entropy, we discretesized the latitude and longitude space into a 0.0002×0.0002 grid (approximately 30 meters× 30 meters). For every grid cell, we computed location entropy based on the definition explained in Section 5.1.2.2. With our synthetic experiments, we randomly generated data from a uniform distribution for both workers and spatial tasks. Moreover, we used a similar grid structure as in REAL foraspatialareaof50kilometers×50kilometers. Inallofourexperiments,wevariedthenumberoftasksbetween50kand200k,with 100kasthedefaultvalue. Wealsosetthedurationofeveryspatialtaskto40days(i.e., δ = 40). Moreover,wefixedthetimeintervalϕto100days. WiththeSYNexperiments, we fixed the number of workers at 10k. Furthermore, unless mentioned otherwise, we randomlyselectedthevalueofmaxT between1to20,andthespatialregionRbetween 0.01 to 0.05 of the entire area. Note that since both maxT and R are fixed in REAL (i.e.,dependontheworker’scheck-insduringoneday),weonlyconductedourfirstset of experiments with both REAL and SYN data sets. In the rest of the experiments, in which we need to vary eithermaxT orR, we only used SYN. Finally, for each of our experiments,weran500cases,andreportedtheaverageoftheresults. 106 6.3.2 Scalability Inthefirstsetofexperiments,weevaluatedthescalabilityofourapproachesbyvarying thenumberofspatialtasksfrom50kto200k. Figure6.7adepictstheresultofourexper- imentsusingthesyntheticdata. Asthefiguredemonstrates,theassignmentincreasesas the number of tasks grows. The figure also shows that LLEP outperforms both GR and NNPintermsofthenumberofassignedtasks(upto35%),duetoapplyingthelocation entropy heuristic. Furthermore, as the number of tasks grows, the impact of location entropy heuristic becomes more significant. The reason is that with a large number of tasks,moretasksappearinthespatialregionofeveryworker,andthus,awiseselection of the tasks becomes more critical. Figure 6.8a depicts similar experiments using our REAL data. Similarly, the assignment increases as the number of tasks grows. More- over,thefigureshowsthesuperiorityofLLEPascomparedwithGRandNNPinterms of the number of assigned tasks in all cases (up to 30%). Note that our experiments on bothREALandSYNdatashowsthatalargenumberoftasks(morethan50%)remains unassigned. Thishappensduetodifferentreasonssuchastheconstraintsoftheworkers (e.g., the spatial region of a worker may overlap with only a small number of tasks) or theexpirationoftheunassignedtasks. Figures6.7band6.8bdepicttheimpactofvaryingthenumberoftasksontheaverage travelcostoftheworkersusingSYNandREALdata,respectively. Asthefiguresshow, theaveragetravelcostoftheworkersdecreasesinallcasesbecauseinatask-densearea, there is a higher probability that an assigned task is in a closer distance to a worker. Moreover, we observe that NNP improves the travel cost of the workers as compared with GR and LLEP by up to 45% using the SYN data and up to 42% using the REAL data,whichprovestheeffectivenessofthetravelcostheuristic. 107 a) b) Figure6.7: Scalability-Syntheticdata a) b) Figure6.8: Scalability-Realdata 6.3.3 EffectofMaximumAcceptableTasksConstraint In the next set of experiments, we evaluated the impact of the maximum acceptable tasks(i.e.,maxT)constraintusingthesyntheticdata. WeincreasedthevalueofmaxT between [1-10] to [1-40]. Figure 6.9a illustrates an increase in the number of assigned tasksasmaxT grows. ThereasonisthatwithanincreaseinmaxT,workersarewilling todomoretasks,andthus,thenumberofassignmentincreases. Moreover,similartothe previousexperiments,LLEPisthesuperiorapproachintermsofimprovingthenumber of task assignment (up to 36% times better than GR). However, the impact of location entropy heuristic is more significant for smaller values of maxT (Figure 6.9a). The reason is that with smaller values of maxT, only a small number of tasks should be selectedfromthoseinsidethespatialregionofaworker. Therefore, awiseselectionof tasksusingthelocationentropyheuristicbecomesmoresignificant. 108 a) b) Figure6.9: EffectofmaxT -Syntheticdata Figure6.9bdepictsanincreaseinthetravelcostasmaxT grows. Thereasonisthat the higher maxT, the more tasks assigned to every worker, resulting in higher travel cost. Moreover, while NNP outperforms both GR and LLEP in terms of the travel cost (between27%to39%),itssuperiorityismoresignificantwithsmallervaluesofmaxT. The reason is that asmaxT grows, more tasks are selected inside the spatial region of theworker. Thus,theimpactofthetravelcostheuristicbecomeslesscritical. 6.3.4 EffectofSpatialRegionConstraint In our final set of experiments, we measured the performance of our approaches with respect to expanding the spatial region of every worker from [0.01% 0.05%] to [0.01% 0.2%]. As Figure 6.10a shows, with an expansion in R, the number of assigned tasks increases. The reason is that larger spatial regions cover more number of spatial tasks. That is, more edges connect the vertices mapped from the workers to the vertices mapped from the tasks, which leads to more flow in the corresponding flow network graph. Moreover, Figure 6.10b shows the effect of varying R on the travel cost. The figure illustrates that the travel cost increases with an expansion in R. This is because asRgrows,farthertaskswillbeassignedtotheworkerswithhigherprobability,which increasestheaveragetravelcost. 109 a) b) Figure6.10: EffectofR-Syntheticdata The main observation from this set of experiments is that LLEP outperforms both GR and NNP in terms of the number of task assignment, while the NNP approach is superiorintermsofthetravelcost. Thisshowsthatbasedontheobjectiveofthecrowd- sourcing application (i.e., maximizing the assignment or maximizing the assignment with the minimum possible travel cost), either of the LLEP or NNP approaches can be selected 1 . 6.4 TrustinSIWSASpatialCrowdsourcing Weconductedseveralexperimentsonbothreal-worldandsyntheticdatatoevaluatethe performance of our proposed approaches: GR, LO, and HGR. Moreover, in order to evaluatetheimpactofourheuristicsontheLOapproach,wedevisedanotherapproach, referredtoasHLO,whichappliesthethreepreprocessingstepsofSection??ontheset M before running the LO algorithm. Below, we first discuss our experimental method- ology. Next,wepresentourexperimentalresults. 1 An alternative solution is a hybrid approach which utilizes both the location entropy and the travel costintheassignmentprocess,whichisthefocusofourfuturework. 110 6.4.1 ExperimentalMethodology We performed three sets of experiments. In the first two sets of experiments, we eval- uated the scalability of our proposed approaches by varying both the average number of workers whose spatial regions contain a given spatial task, namely workers per task (W/T), and the average number of spatial tasks which are inside the spatial region of a given worker, denoted by tasks per worker (T/W). In the rest of the experiments, we evaluated the impact of the workers’ capacity constraints on the performance of our approaches. Note that every worker has two constraints: maxT and R. However, we only evaluated the impact of one of them (i.e., maxT) on our approaches, since both constraints have similar effects. With these experiments, we used three performance measures: 1)thetotalnumberofassignedtasks,2)CPUcost,whichisthetime(insec- onds)ittakestosolvetheMCTAproblem,and3)theaverageoftheaggregatetravelcost foragiventask,whichisthesumofthetravelcostsofalltheworkerswhoareassigned to the task. The travel cost is measured in terms of the Euclidean distance between the worker and the location of the task. Finally, we conducted our experiments on both synthetic (SYN) and real-world (REAL) data sets. With our synthetic experiments we used two distributions: uniform (SYN-UNIFORM) and skewed (SYN-SKEWED). In thefollowing,wediscussourdatasetsinmoredetails. With our first set of synthetic experiments, in order to evaluate the impact of W/T, weconsideredthreecases(Table6.1),sparse,medium,anddense,inwhichtheaverage number of W/T is 2, 4, and 8, respectively. This means that we consider an area to be worker-dense, if the average number of workers who are eligible to perform a spatial taskis8,whereasinasparsecase,theaveragenumberofW/Tis2. Inourexperiments onSYN-UNIFORM,theaveragenumberofW/Tvarieswithasmallstandarddeviation (from 1.1 to 2.5), whereas in our experiments on SYN-SKEWED, the average number of W/T varies with a large standard deviation (between 4 to 16). With our second set 111 of synthetic experiments, in order to evaluate the impact of T/W, we considered three cases (Table 6.2), sparse, medium, and dense, in which the average number of T/W is 5, 15, and 25, respectively. Note that our intuitive assumption is that the number of tasks are usually higher than the number of available workers at a given time instance. Therefore, in a task-dense area, every worker contains on average 25 tasks inside his region, whereas in a worker-dense area, the average number of workers eligible to per- formataskis8. Similartotheprevioussetofexperiments,withtheuniformdistribution (SYN-UNIFORM), the average number of T/W varies with a small standard deviation (from2to5),whereaswiththeskeweddistribution(SYN-SKEWED),theaveragenum- ber of T/W varies with a large standard deviation (between 25 to 80). Finally, with our last set of experiments, we varied the average number of maxT for every worker between 5 to 15. With this set of experiments, we only reported our experiments on SYN-UNIFORM, in which the value of maxT varies with a small standard deviation (between1to3),sincesimilarresultswereobtainedintheskewedcase. Thereal-worlddatasetisobtainedfromGowalla[gow],alocation-basedsocialnet- work,whereusersareabletocheckintodifferentspotsintheirvicinity. Thecheck-ins includethelocationandthetimethattheusersenteredthespots. Wedefinedthespatial tasksfor115580spots(e.g.,restaurants)inthestateofCalifornia. Anexampleofaspa- tialtaskdescriptionisasfollows: ”Doesthecleannessofthespotmatchesitsratings?”. Moreover,weassumedthatGowallausersaretheworkersofourspatialcrowdsourcing system, since users who check in to different spots may be good candidates to perform spatial tasks in the vicinity of those spots. For our experiments, we used the check-in data over a period of one day, covering the state of California. For this particular set of experiments,theaveragenumberofW/Twasaround4withstandarddeviationof9. Finally, in all of our experiments, for both the reputation score of every worker and theconfidenceprobabilityofeveryspatialtask,werandomlyselectedanumberbetween 112 W/T SYN-UNIFORM SYN-SKEWED Sparse Avg: 2,SD:1.1 Avg: 2,SD:4 Medium Avg: 4,SD:1.7 Avg: 4,SD:10 Dense Avg: 8,SD:2.5 Avg: 8,SD:16 Table6.1: DistributionofthesyntheticdataforW/T T/W SYN-UNIFORM SYN-SKEWED Sparse Avg: 5,SD:2 Avg: 5,SD:25 Medium Avg: 15,SD:3 Avg: 15,SD:50 Dense Avg: 25,SD:5 Avg: 25,SD:80 Table6.2: DistributionofthesyntheticdataforT/W 0 to 1 from a uniform distribution. Furthermore, unless mentioned otherwise, we fixed the average W/T to 2, the average T/W to 15, and the average value of maxT to 10 withstandarddeviations1.1,2,and1,respectively. Foreachofourexperiments,weran 500 cases, and reported the average of the results. Finally, experiments were run on an Intel(R)Core(TM)2@2.66GHzprocessorwith4GBofRAM. 6.4.2 EffectofNumberofWorkersperTask(W/T) In the first set of experiments, we evaluated the scalability of our approaches by vary- ing the number of workers whose spatial regions contain a given spatial task. Fig- ures6.11aand6.11bdepicttheresultofourexperimentsonbothSYN-UNIFORMand SYN-SKEWED.Asthefiguresdemonstrate,theassignmentincreasesasthenumberof W/T grows. The reason is that more resources become available to perform tasks. The figures also show that HGR is outperforming GR by up to 2 times, which shows the effectiveness of our heuristics. Moreover, our experiments demonstrate that HGR acts similartotheLOapproach,whichprovesthatbyonlyapplyingtheheuristicsontheGR approach, we can obtain results similar to the case where we iteratively perform local 113 optimization. Anotherobservationfromthissetofexperimentsisthattheimpactofthe heuristics becomes more significant for larger number of W/T. The reason is that in a worker-dense area, there is a higher chance that more than one worker is assigned to a given task. Thus, applying pruning and LWA heuristics becomes more critical. More- over,thefiguresshowaslightimprovementinthenumberofassignedtasksbyapplying theheuristicsontheLOapproach(i.e.,HLO).Finally,weobservethattheoverallnum- ber of assigned tasks is higher for the uniform data as compared to that of the skewed data. The reason is that in the skewed case, many tasks fall outside the spatial regions oftheworkers,andthereforecannotbeassigned. Figures 6.11c and 6.11d depict the impact of varying the number of W/T on the CPU cost (logarithmic scale) using uniform and skewed data, respectively. Our first observation is that both GR and HGR approaches perform significantly better than LO and HLO approaches in terms of the CPU cost. The reason is that while both GR and HGR scan once through the list of correct matches, with LO and HLO, the algorithms iterativelyscanthelistuntilnomorelocaloptimizationispossible. Moreover,weseethe superiorityofHGRascomparedtoGRintermsoftheCPUcostbyupto2.7timesusing theuniformdatasetandupto2.2timesusingtheskeweddataset. Thishappensdueto thepruningheuristic,sincealargenumberofcorrectmatchesarepruned,andtherefore do not need to be processed. Finally, while both LO and HLO are not applicable to real-world crowdsourcing applications due to their large CPU cost, our experiments show that HLO outperforms LO by more than 3.4 times. This shows that our proposed heuristicsresultinobtainingthenearoptimalanswermorerapidly. Figures 6.11e and 6.11f demonstrate the impact of varying the number of W/T on the aggregate travel cost of the workers in performing a given task using uniform and skewed data, respectively. The figures show that as the number of W/T grows, there is ahigherchancethatmorethanoneworkerisassignedtoagiventask,andthereforethe 114 a)No. ofassignedtasks- b)No. ofassignedtasks- SYN-UNIFORM SYN-SKEWED c)CPUcost- d)CPUcost- SYN-UNIFORM SYN-SKEWED e)Aggregatedistance- f)Aggregatedistance- SYN-UNIFORM SYN-SKEWED Figure6.11: EffectofW/Tonsyntheticdata aggregatetravelcostoftheworkersincreases. Moreover,weobservethatbothHGRand HLO perform significantly better than GR and LO (up to 3.1 times using the uniform data and up to 5 times using the skewed data). Moreover, the experiments show that the LAD heuristic becomes more useful in a worker-dense area, where more workers are assigned to a given task. Finally, our experiments show better improvements of our heuristicsontheskeweddataset,sincewiththeskeweddataset,theaveragenumberof W/Tchangeswithahighvariance. Therefore,ataskmaybeassignedtoalargenumber ofworkers,whichrendersourheuristicsmoreuseful. 115 a)No. ofassignedtasks b)CPUcost c)Aggregatedistance Figure6.12: EffectofW/Tonrealdata Finally,Figure6.12depictsourexperimentsonrealdata,inwhichtheaveragenum- ber of W/T is 4. The experiments show similar results in terms of HGR outperforming theGRapproachinallcases,whichprovestheeffectivenessofourheuristicswithareal distributionofworkersandspatialtasks. 6.4.3 EffectofNumberofTasksperWorker(T/W) Inthenextsetofexperiments,weevaluatedthescalabilityofourapproachesbyvarying theaveragenumberoftaskswhicharelocatedinsidethespatialregionofagivenworker. Figures 6.13a and 6.13b depict the result of our experiments on both SYN-UNIFORM andSYN-SKEWED.Withthissetofexperiments,weonlyreportedtheimpactofvary- ing T/W on the number of assigned tasks, since the rest was similar to our previous set of experiments. As the figures show, the total number of assigned tasks increases as T/W grows. The reason is that more tasks are available to be performed by workers. Moreover, experiments on both uniform and skewed data sets demonstrate the outper- formanceofHGRascomparedtotheGRapproachbyupto30%withtheuniformdata, andupto26%withtheskeweddata. Furthermore,asthefiguresshow,theimpactofour heuristicsbecomesmoresignificantinmediumanddenseareas,whereasinsparseareas all approaches perform similarly. The reason is that in all of our experiments we fixed the average value ofmaxT to 10. In a task-sparse area, every worker has on average 5 116 a)No. ofassignedtasks- b)No. ofassignedtasks- SYN-UNIFORM SYN-SKEWED Figure6.13: EffectofT/W-Syntheticdata tasksinsidehisregion. Therefore,duetoabundanceoftheresources,alltheassignment algorithmsachievesimilarresults. 6.4.4 EffectofMaximumAcceptableTasks(maxT)Constraint In our final set of experiments, we measured the performance of our approaches with respect to increasing the average value ofmaxT for every worker from 5 to 15. Figure 6.14aillustratesanincreaseinthenumberofassignedtasksasmaxT grows. Thereason is that with an increase in maxT, workers are willing to do more tasks, and thus, the number of assignment increases. Similar to previous experiments, we see the outper- formance of HGR as compared to GR in terms of the task assignment, while achieving resultsclosetotheoptimizationapproaches. Moreover,Figure6.14bdepictsanincrease in the CPU cost asmaxT grows. Similar to the previous set of experiments we see the superiority of the greedy approaches (GR and HGR) as compared to the optimization approaches (HLO and LO) in terms of the CPU cost. Finally, as Figure 6.14c depicts, both HGR and HLO outperform both GR and LO in terms of the aggregate travel cost byupto1.5times. ThemainobservationfromthissetofexperimentsisthattheHGRapproachoutper- forms the GR approach in all cases, while its performance in terms of task assignment is close to the local optimization approaches. Moreover, due to the high CPU cost, 117 a)No. ofassignedtasks b)CPUcost c)Aggregatedistance Figure6.14: EffectofmaxT -SYN-UNIFORM none of the optimization approaches are applicable to the real-world applications. This statesthatourHGRapproachcanefficientlysolvetheMCTAproblem,whileachieving similarresultcomparingtotheoptimizationapproaches. 118 Chapter7 Conclusions In this dissertation, we identified spatial crowdsourcing as a new emerging platform. We introduced spatial crowdsourcing as the process of crowdsourcing a set of spatial tasks to a set of workers. We also identified that the key underlying impediments to the success of any spatial crowdsourcing system are privacy and trust, and focused on these two issues in more details. Moreover, we defined a taxonomy for spatial crowd- sourcing. Westudiedoneclassofspatialcrowdsourcing,inwhichinsteadoftheworker selectingthetasksarbitrarily, theserverassignstheclosebytaskstoeveryworker. Fur- thermore, our focus was on self-incentivised spatial crowdsourcing, in which workers arewillingtoparticipatevoluntarily. Westudiedtwovariationsofserverassignedtasks, SISSA and SIWSA. With SISSA, the server enforces the spatial constraint on how to perform the task assignment. We focused on the issue of privacy, and formally defined the PAPA problem. We proposed the PiRi approach, a solution to the PAPA problem, whichaddressesthemajorprivacyleaksinSISSAspatialcrowdsourcingsystem. Next, we focused on the trust issue in privacy-aware SISSA spatial crowdsourcing. Conse- quently, weidentified the PaRknWproblem, and proposed TruPa, a class ofthree solu- tion to this problem. With SIWSA, the worker enforces the constraint on which tasks he is willing to perform. Thus, we formally defined the MTA problem as the problem of maximizing the number of assigned tasks, while satisfying the workers’ constraints. Subsequently, we proposed our assignment protocol that included three various solu- tions to the MTA problem, namely GR, LLEP, and NNP. Furthermore, we focused on the trust issue in SIWSA spatial crowdsourcing, introduced the MCTA problem, and 119 proposedanassignmentprotocolwiththreesolutions,namelyGR,LO,andHLO,tothe MCTA problem. Finally, we studied the issue of privacy in trustworthy SIWSA spatial crowdsourcing,andproposedaframeworktoaddresstheissueofprivacy. Aspartofourfuturework,forourPiRitechnique,ourplanistoconductamorethor- ough analysis and study alternative approaches to address the corresponding problems. We would also like to compare these approaches in performance. For example, while theone-timecommunicationcostofourpreliminaryapproachesisstilltolerableinmost cases,wewillfocusonexploringalternativeapproachestoenhancetheperformanceof ourinitialapproaches. Moreover,weplantofocusonotherclassesofspatialcrowdsourcing,andstudytheir privacy and trust issues. In particular, with worker selected mode, workers arbitrarily perform tasks without any coordination with the server. Consequently, enforcing the assignment of a set of tasks is more challenging. Moreover, with reward-based spatial crowdsourcing, workers gain rewards for every spatial task they perform. While this might better motivate the workers to perform tasks, protecting their privacy is much morechallenging. Thereasonisthattheserverneedstoknowtheidentityoftheworker inordertoofferhimrewards. 120 ReferenceList [AINX] Waleed Alsalih, Kamrul Islam, Yurai N´ u nez-Rodr´ ıguez, and Henry Xiao. Distributed voronoi diagram computation in wireless sensor net- works. InSPAA’08,pages364–364. [ARS08] OmarAlonso,DanielE.Rose,andBenjaminStewart. Crowdsourcingfor relevanceevaluation. SIGIRForum,42(2):9–15,2008. [ASS + 10] Florian Alt, Alireza Sahami Shirazi, Albrecht Schmidt, Urs Kramer, and Zahid Nawaz. Location-based crowdsourcing: extending crowdsourcing totherealworld.InProceedingsofthe6thNordicConferenceonHuman- Computer Interaction: Extending Boundaries, NordiCHI ’10, pages 13– 22,2010. [BD] Boulat A. Bash and Peter J. Desnoyers. Exact distributed voronoi cell computationinsensornetworks. InIPSN’07,pages236–243. [ber09] University of california berkeley, 2008-2009. http://traffic.berkeley.edu/. [BLPW] Bhuvan Bamba, Ling Liu, Peter Pesti, and Ting Wang. Supporting anonymouslocationqueriesinmobileenvironmentswithprivacygrid. In WWW’08,pages237–246. [blu] Thebluetoothtracking.http://www.bluetoothtracking.org/. [BYD11] M.F. Bulut, Y.S. Yilmaz, and M. Demirbas. Crowdsourcing location- basedqueries. In Pervasive Computing and Communications Workshops (PERCOM Workshops), 2011 IEEE International Conference on, pages 513–518,2011. [Cha81] DavidL.Chaum. Untraceableelectronicmail,returnaddresses,anddig- italpseudonyms. Commun.ACM,24(2):84–90,1981. 121 [CKK + ] CoryCornelius,ApuKapadia,DavidKotz,DanPeebles,MinhoShin,and Nikos Triandopoulos. Anonysense: privacy-aware people-centric sens- ing. InMobiSys’08,pages211–224. [CMA09] Chi-YinChow,MohamedF.Mokbel,andWalidG.Aref. Casper*: Query processing for location services without compromising privacy. ACM TODS,34(4),2009. [CML] Chi-YinChow,MohamedF.Mokbel,andXuanLiu. Spatialcloakingfor anonymouslocation-basedservicesinmobilepeer-to-peerenvironments. InGeoInformatica’09. [CST12] CalebChenCao,JieyingShe,YongxinTong,andLeiChen0002. Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB,5(11):1495–1506,2012. [CTH + 10] Justin Cranshaw, Eran Toch, Jason Hong, Aniket Kittur, and Norman Sadeh. Bridging the gap between physical location and online social networks. In Proceedings of the 12th ACM international conference on Ubiquitouscomputing,Ubicomp’10,pages119–128,2010. [CWCL] Kuan-Ta Chen, Chen-Chi Wu, Yu-Chun Chang, and Chin-Laung Lei. A crowdsourceable qoe evaluation framework for multimedia content. In Proceedings of the 17th ACM international conference on Multimedia, MM’09,pages491–500. [cyc] Center for embedded networked sensing (cens). http://urban.cens.ucla.edu/projects/. [DBFH] AkshayDua,NirupamaBulusu,Wu-ChangFeng,andWenHu. Towards trustworthyparticipatorysensing. InHotSec’09. [DKCB] Y.F.Dong,S.Kanhere,C.T.Chou,andN.Bulusu. Automaticcollection of fuel prices from a network of mobile cameras. In DCOSS ’08, pages 140–156. [DKH11] D.Z. Du, K.I. Ko, and X. Hu. Design and Analysis of Approximation Algorithms. SpringerOptimizationandItsApplications.Springer,2011. [DRH11] AnhaiDoan,RaghuRamakrishnan,andAlonY.Halevy. Crowdsourcing systemsontheworld-wideweb. Commun.ACM,54(4):86–96,2011. [FKK + 11] MichaelJ.Franklin,DonaldKossmann,TimKraska,SukritiRamesh,and Reynold Xin. Crowddb: answering queries with crowdsourcing. In Pro- ceedings of the 2011 international conference on Management of data, SIGMOD’11,pages61–72,2011. 122 [flo] Crowdflower. http://www.crowdflower.com. [GAMS10] Alban Galland, Serge Abiteboul, Am´ elie Marian, and Pierre Senellart. Corroborating information from disagreeing views. In Proceedings of the third ACM international conference on Web search and data mining, WSDM’10,pages131–140,NewYork,NewYork,USA,2010.ACM. [GBMR] Ramakrishna Gummadi, Hari Balakrishnan, Petros Maniatis, and Sylvia Ratnasamy.Not-a-Bot(NAB):ImprovingServiceAvailabilityintheFace ofBotnetAttacks. InNSDI’09. [GCJW] PeterGilbert,LandonP.Cox,JaeyeonJung,andDavidWetherall.Toward trustworthymobilesensing. InHotMobile’10. [GJ79] Michael R. Garey and David S. Johnson. Computers and Intractability: AGuidetotheTheoryofNP-Completeness. W.H.Freeman&Co.,1979. [GJA] MinaxiGupta,PaulJudge,andMostafaAmmar. Areputationsystemfor peer-to-peernetworks. InNOSSDAV’03. [GKK + ] Gabriel Ghinita, Panos Kalnis, Ali Khoshgozaran, Cyrus Shahabi, and Kian-Lee Tan. Private queries in location based services: anonymizers arenotnecessary. InSIGMOD’08,pages121–132. [GKS] Gabriel Ghinita, Panos Kalnis, and Spiros Skiadopoulos. Mobihide: A mobilea peer-to-peer system for anonymous location-based queries. In SSTD’07,pages221–238. [GL] Bu˘ graGedikandLingLiu. Protectinglocationprivacywithpersonalized k-anonymity: Architectureandalgorithms. IEEETMC’08,7(1):1–18. [goo] Googlemapmaker.http://www.wikipedia.org/wiki/Google Map Maker. [gow] Gowalla. http://www.wikipedia.org/wiki/Gowalla. [GPD + ] Ido Guy, Adam Perer, Tal Daniel, Ohad Greenshpan, and Itai Turbahn. Guess who?: enriching the social graph through a crowdsourcing game. InProceedingsofthe2011annualconferenceonHumanfactorsincom- putingsystems,CHI’11,pages1373–1382. [GRB] M. C. Gonzalez, Cesar A. Hidalgo R., and Albert-L´ aszl´ o Barab´ asi. Understandingindividualhumanmobilitypatterns. Nature’08,453. [GZPK] Gabriel Ghinita, Keliang Zhao, Dimitris Papadias, and Panos Kalnis. A reciprocal framework for spatial k-anonymity. Inf. Syst.’10, 35(3):299– 314. 123 [HBZ + ] Bret Hull, Vladimir Bychkovsky, Yang Zhang, Kevin Chen, Michel Goraczko,AllenMiu,EugeneShih,HariBalakrishnan,andSamuelMad- den. Cartel: adistributedmobilesensorcomputingsystem. InSenSys’06. [HKH] Kuan Lun Huang, Salil S. Kanhere, and Wen Hu. Towards privacy- sensitiveparticipatorysensing. InIEEEPerCom’09. [HS] Ling Hu and Cyrus Shahabi. Privacy assurance in mobile sensing net- works:gobeyondtrustedservers. InPerCom2010Workshops. [HS89] C.A.J.HurkensandA.Schrijver. Onthesizeofsystemsofsetseverytof whichhaveansdr,withanapplicationtotheworst-caseratioofheuristics forpackingproblems. SIAMJ.Discret.Math.,2(1):68–72,1989. [IPW10] Panagiotis G. Ipeirotis, Foster Provost, and Jing Wang. Quality manage- ment on amazon mechanical turk. In Proceedings of the ACM SIGKDD WorkshoponHumanComputation,HCOMP’10,pages64–67,Washing- tonDC,2010.ACM. [JWH] TaoJiang,HelenJ.Wang,andYih-ChunHu. Preservinglocationprivacy inwirelesslans. InMobiSys’07,pages246–257. [KGMP] Panos Kalnis, Gabriel Ghinita, Kyriakos Mouratidis, and Dimitris Papa- dias. Preventing location-based identity inference in anonymous spatial queries. IEEETKDE’07,19(12):1719–1733. [KHSW] Wei-Shinn Ku, Ling Hu, Cyrus Shahabi, and Haixun Wang. Query integrity assurance of location-based services accessing outsourced spa- tialdatabases. InSSTD’09. [KS] Leyla Kazemi and Cyrus Shahabi. Tapas: Trustworthy privacy-aware participatorysensing. SubmittedtoPerCom’12. [KS11a] Leyla Kazemi and Cyrus Shahabi. A privacy-aware framework for par- ticipatorysensing. SIGKDDExplorations,13(1),2011. [KS11b] LeylaKazemiandCyrusShahabi. Towardspreservingprivacyinpartici- patorysensing. InPerComWorkshops,pages328–331,2011. [KS12] LeylaKazemiandCyrusShahabi. Geocrowd: Enablingqueryanswering with spatial crowdsourcing. In ACM SIGSPATIAL GIS 2012, Redondo Beach,CA,2012. [KSSM] Ali Khoshgozaran, Cyrus Shahabi, and Houtan Shirani-Mehr. Location privacy: going beyond k-anonymity, cloaking and anonymizers. Knowl- edgeandInformationSystems’10. 124 [KT05a] JonKleinbergandEvaTardos. AlgorithmDesign. 2005. [KT05b] JonKleinbergandEvaTardos.AlgorithmDesign.Addison-WesleyLong- manPublishingCo.,Inc.,2005. [LKZM08] Vincent Lenders, Emmanouil Koukoumidis, Pei Zhang, and Margaret Martonosi.Location-basedtrustformobileuser-generatedcontent: appli- cations,challengesandimplementations. InHotMobile’08,2008. [MCA] MohamedF.Mokbel,Chi-YinChow,andWalidG.Aref. Thenewcasper: queryprocessingforlocationserviceswithoutcompromisingprivacy. In VLDB’06,pages763–774. [min] Minimum-cost maximum flow problem. http://www.wikipedia.org/wiki/Minimum-cost flow problem. [MPP + ] Jonathan M. McCune, Bryan J. Parno, Adrian Perrig, Michael K. Reiter, and Hiroshi Isozaki. Flicker: an execution infrastructure for tcb mini- mization. InEurosys’08,pages315–328. [MPR] Prashanth Mohan, Venkata N. Padmanabhan, and Ramachandran Ram- jee. Nericell: richmonitoringofroadandtrafficconditionsusingmobile smartphones. InSenSys’08. [MWMM11] Adam Marcus, Eugene Wu, Samuel Madden, and Robert C. Miller. Crowdsourceddatabases: Queryprocessingwithpeople. InCIDR,pages 211–214,2011. [OLT] Beng Chin Ooi, Chu Yee Liau, and Kian-Lee Tau. Managing trust in peer-to-peersystemsusingreputation-basedtechniques. InWAIM’03. [ope] Openstreetmap.http://www.wikipedia.org/wiki/OpenStreetMap. [PHG07] E.Paulos,R.Honicky,andE.Goodman.Sensingatmosphere.InInWork- shop on Sensing on Everyday Mobile Phones in Support of Participatory Research,2007. [PP11] Aditya Parameswaran and Neoklis Polyzotis. Answering queries using humans,algorithmsanddatabases. InConferenceonInovativeDataSys- temsResearch(CIDR2011),2011. [PSGM + 11] AdityaParameswaran,AnishDasSarma,HectorGarcia-Molina,Neoklis Polyzotis, and Jennifer Widom. Human-assisted graph search: it’s okay toaskquestions. Proc.VLDBEndow.,4(5):267–278,2011. 125 [RYZ + 10] Vikas C. Raykar, Shipeng Yu, Linda H. Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. Learning from crowds. JournalofMachineLearningResearch,11:1297–1322,2010. [SBE + ] Katie Shilton, Jeff Burke, Deborah Estrin, Mark Hansen, and Mani B. Srivastava. Participatoryprivacyinurbansensing. InMODUS’08. [sch] Scheduling.http://en.wikipedia.org/wiki/Scheduling(computing). [SF08] AlexanderSorokinandDavidForsyth. Utilitydataannotsationwithama- zon mechanical turk. Computer Vision and Pattern Recognition Work- shop,0,2008. [Sio] Radu Sion. Query execution assurance for outsourced databases. In VLDB’05. [SOJN] Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical MethodsinNaturalLanguageProcessing,EMNLP’08,pages254–263. [SS] MehdiSharifzadehandCyrusShahabi. Utilizingvoronoicellsoflocation data streams for accurate computation of aggregate functions in sensor networks. Geoinformatica’06,10(1):9–36. [Sur04] James Surowiecki. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, SocietiesandNations. 2004. [SW] Stefan Saroiu and Alec Wolman. I am a sensor, and i approve this mes- sage. InHotMobile’10. [Swe] Latanya Sweeney. k-anonymity: a model for protecting privacy. Int. J. Uncertain.FuzzinessKnowl.-BasedSyst.’02,10(5):557–570. [tur] Amazonmechanicalturk. http://www.mturk.com. [UYMM08] Leong Hou U, Man Lung Yiu, Kyriakos Mouratidis, and Nikos Mamoulis. Capacityconstrainedassignmentinspatialdatabases. InPro- ceedings of the 2008 ACM SIGMOD international conference on Man- agementofdata,SIGMOD’08,pages15–28.ACM,2008. [vAD08] LuisvonAhnandLauraDabbish.Designinggameswithapurpose.Com- mun.ACM,51(8):58–67,2008. 126 [VVE11] Jeroen Vuurens, Arjen P. De Vries, and Carsten Eickhoff. How Much SpamCanYouTake? AnAnalysisofCrowdsourcingResultstoIncrease Accuracy. In Matthew Lease, Vaughn Hester, Alexander Sorokin, and Emine Yilmaz, editors, Proceedings of the ACM SIGIR 2011 Workshop on Crowdsourcing for Information Retrieval (CIR 2011), pages 48–55, 2011. [wik] Wikimapia. http://www.wikipedia.org/wiki/WikiMapia. [WTFX07] Raymond Chi-Wing Wong, Yufei Tao, Ada Wai-Chee Fu, and Xiaokui Xiao. On efficient spatial matching. In Proceedings of the 33rd interna- tional conference on Very large data bases, VLDB ’07, pages 579–590. VLDBEndowment,2007. [XCW09] Xiaojuan Xie, Haining Chen, and Hongyi Wu. Bargain-based stimula- tion mechanism for selfish mobile nodes in participatory sensing net- work. In Proceedings of the 6th Annual IEEE communications society conferenceonSensor,MeshandAdHocCommunicationsandNetworks, SECON’09,pages72–80,2009. [YGJKa] ManL.Yiu,GabrielGhinita,ChristianS.Jensen,andPanosKalnis. Out- sourcingsearchservicesonprivatespatialdata. InICDE’09. [YGJKb] Man Lung Yiu, Gabriel Ghinita, Christian S. Jensen, and Panos Kalnis. Enabling search services on outsourced private spatial data. VLDBJ’10, 19(3):363–384. [YKG] TingxinYan,VikasKumar,andDeepakGanesan. Crowdsearch: exploit- ing crowds for accurate real-time image search on mobile phones. In Proceedingsofthe8thinternationalconferenceonMobilesystems,appli- cations,andservices,MobiSys’10,pages77–90. [YPPK] YinYang,StavrosPapadopoulos,DimitrisPapadias,andGeorgeKollios. Spatialoutsourcingforlocation-basedservices. InICDE’08. 127
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
GeoCrowd: a spatial crowdsourcing system implementation
PDF
Dynamic pricing and task assignment in real-time spatial crowdsourcing platforms
PDF
Location privacy in spatial crowdsourcing
PDF
Query processing in time-dependent spatial networks
PDF
Privacy-aware geo-marketplaces
PDF
Ensuring query integrity for sptial data in the cloud
PDF
Partitioning, indexing and querying spatial data on cloud
PDF
Efficient reachability query evaluation in large spatiotemporal contact networks
PDF
Combining textual Web search with spatial, temporal and social aspects of the Web
PDF
Location-based spatial queries in mobile environments
PDF
Edge indexing in a grid for highly dynamic virtual environments
PDF
Privacy in location-based applications: going beyond K-anonymity, cloaking and anonymizers
PDF
Spatial query processing using Voronoi diagrams
PDF
A function approximation view of database operations for efficient, accurate, privacy-preserving & robust query answering with theoretical guarantees
PDF
Efficient crowd-based visual learning for edge devices
PDF
Scalable processing of spatial queries
PDF
Enabling spatial-visual search for geospatial image databases
PDF
Practice-inspired trust models and mechanisms for differential privacy
PDF
Kernel methods for unsupervised domain adaptation
PDF
Responsible AI in spatio-temporal data processing
Asset Metadata
Creator
Kazemi, Leyla
(author)
Core Title
Enabling query answering in a trustworthy privacy-aware spatial crowdsourcing
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
11/26/2012
Defense Date
10/24/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
crowdsourcing,OAI-PMH Harvest,privacy,spatial,Trust
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Shahabi, Cyrus (
committee chair
), Narayanan, Shrikanth S. (
committee member
), Sukhatme, Gaurav S. (
committee member
)
Creator Email
leylakazemi@gmail.com,lkazemi@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-119471
Unique identifier
UC11291437
Identifier
usctheses-c3-119471 (legacy record id)
Legacy Identifier
etd-KazemiLeyl-1344.pdf
Dmrecord
119471
Document Type
Dissertation
Rights
Kazemi, Leyla
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
crowdsourcing
spatial