Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
Computer Science Technical Report Archive
/
USC Computer Science Technical Reports, no. 966 (2016)
(USC DC Other)
USC Computer Science Technical Reports, no. 966 (2016)
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Auction-SC–AnAuction-BasedFrameworkfor Real-TimeTaskAssignmentinSpatialCrowdsourcing Mohammad Asghari, Cyrus Shahabi, Liyue Fan University of Southern California, Los Angeles, CA, USA fmasghari, shahabi, liyuefang@usc.edu ABSTRACT A new platform, termed spatial crowdsourcing (SC), is emerging that enables a requester to commission workers to physically travel to some specified locations to perform a set of spatial tasks (i.e., tasks related to a geographical location and time). For spatial crowdsourcing to scale to millions of workers and tasks, it should be able to efficiently assign tasks to workers, which in turn consists of both matching tasks to workers and computing a schedule for each worker. The two current approaches for task assignment in spatial crowdsourcing, batched assignment and centralized online assignment, cannot scale as either task matching or task schedul- ing will become a bottleneck. Instead, we propose a distributed online assignment approach utilizing an auction-based framework where workers bid on every arriving task and the server determines the highest bidder, resulting in splitting the assignment responsibil- ity between workers (for scheduling) and the server (for matching) and thus eliminating all bottlenecks. As a benchmark to evalu- ate the results of our online framework, we introduce a clairvoy- ant offline algorithm that gives the globally optimal assignment. Through several experiments, we evaluate the accuracy of the real- time algorithm by comparing it with the clairvoyant algorithm for small number of tasks. For larger real-world and synthetic datasets, we compare the accuracy and efficiency of our real-time algorithm with centralized online algorithms proposed for similar problems. We show how other algorithms cannot generate as good of an as- signment because they fail to manage the dynamism and/or take advantage of the spatiotemporal characteristics of SC. 1. INTRODUCTION Smartphones are ubiquitous: we are witnessing an astonishing growth in mobile phone subscriptions. The International Telecom- munication Union estimates there are nearly 7 billion mobile sub- scriptions worldwide [1]. Meanwhile, the mobile phones’ sensors (e.g., cameras) are advancing and the network bandwidth is con- stantly increasing. Consequently, every person with a mobile phone can now act as a multi-modal sensor, collecting and sharing vari- ous types of high-fidelity spatiotemporal data instantaneously (e.g., picture, video, audio, location, time, speed, direction, and acceler- ation). Exploiting this large crowd of potential workers and their mobil- ity, a new mechanism for efficient and scalable data collection has emerged: Spatial Crowdsourcing (SC) [2]. Spatial crowdsourcing requires workers (e.g., willing individuals) to perform a set of tasks by physically traveling to certain locations at particular times. Spa- tial crowdsourcing is exploited in numerous industries, e.g., Uber, TaskRabbit, Waze, Gigwalk, etc., and has applications in numer- ous domains such as citizen-journalism, tourism, intelligence, dis- aster response and urban planning. With spatial crowdsourcing, a requester submits a set of spatiotemporal tasks to a spatial crowd- sourcing server (SC-Server). Subsequently, the SC-Server has to select a worker to perform each task. With Spatial Crowdsourcing, it is not enough to only match a task with a worker. An SC-Server must consider the schedule of every worker when matching a task to workers and only consider those workers who are able to fit the task in their schedule. For example in [3, 4] the focus is on matching tasks with workers and the workers’ schedule is ignored. On the other hand, Deng et. al. [5] consider scheduling tasks for a single worker while assuming the tasks have already been matched. They show there is no guar- antee that the worker is going to be able to schedule all its matched tasks. In this paper, we define the task assignment in SC consist- ing of two phases, a matching phase and a scheduling phase, which need to happen in tandem. Neither of these phases should be ig- nored, otherwise, the resulting solution is rendered infeasible for real-world applications. Two recent studies consider both matching and scheduling in spatial crowdsourcing [6, 7]. However, neither studies can provide real-time assignments. This is because in order to match a task to the best worker, the server has to perform the scheduling phase for multiple workers. Scheduling multiple workers is time consuming and cannot be done in real-time. Consequently, both studies utilize a batched assignment scheme, where the assignment is delayed for a period of time (i.e., batching time interval) during which all the arrived tasks are batched to be matched and scheduled during the next time interval. That is, while the SC-Server is busy processing a set of tasks, the next batch of tasks arrive at the server. Once the server completes the previous batch, it starts processing those tasks that have been queued. Once the tasks are batched and processed together, suddenly the matching phase becomes complex because many tasks need to be matched to many workers. This in turn, adds to the running time and increases the batching time interval. A long batching time interval (e.g., 10 minutes) has two main disadvantages. First, the duration of the batching time interval should be subtracted from the tasks’ deadline, leaving each task with less available time to be scheduled. Second, batch scheme can no longer generate real-time assignments which is the require- ment of many SC applications. For example, an Uber user wants to know if there is a driver that can service her as soon as she sub- mits the request. Instead in online assignment, a task is assigned to a worker as soon as it arrives at the SC-Server. This requires the server to perform matching and scheduling in real-time. With online assignment, at each point of time the SC-Server is process- ing only one task and hence, the matching phase becomes a one- to-many matching where there are multiple workers and only one task. Consequently, the complex many-to-many matching phase of batched assignment is reduced to only selecting the best worker that can fit the task in its schedule. Even though matching is fast with online assignment, the server has to still perform scheduling for multiple workers. Therefore, the scheduling phase becomes the bottleneck in online assignment. As shown in [8], scheduling for a single worker can be done fast, however a centralized SC-Server is not capable of scheduling multiple workers in real-time. Towards this end, we propose Auction-SC; an auction-based framework for real-time task matching and scheduling. In this framework, we de- centralize the scheduling problem by utilizing the workers. With Auction-SC, the server broadcasts a task to the workers upon the task’s arrival. Each worker 1 submits a bid for that task based on its current schedule and location. To compute its bid, each worker has to consider only its own schedule so the bid computa- tion phase can be done in real-time. Once every worker submits its bid to the SC-Server, matching the task to a worker reduces to selecting the best bid. We introduce a branch-and-bound scheduling algorithm where for each new task, the worker performs an exhaustive search to find out whether it can fit the incoming task into its schedule. We show that at each point of time the number of remaining tasks for each worker (the number of tasks that the worker has scheduled and not completed yet) is in a range that even the branch-and-bound algo- rithm can be completed in real-time. However, in our experiments, we show that even replacing the branch-and-bound algorithm with a polynomial time approximate algorithm, will not affect the qual- ity of the assignment significantly. In addition to the branch-and-bound algorithm, we propose a more complex bidding technique that takes into consideration the spatial distribution of the tasks seen so far. The key idea is that hav- ing more workers in areas with more tasks can increase the quality of the assignment. This consideration increases the complexity of the bid computation phase such that it may impact the scalability of the framework. However, we show that the SC-Server can still manage a throughput of ˜200 tasks per second with this complex bidding technique. With a real-world SC system, the SC-Server does not know about the time and location of future tasks and workers, until they arrive. Therefore, it is impossible for the SC-Server to generate a glob- ally optimal assignment. As a benchmark to better evaluate the quality of the real-time assignments made by Auction-SC, for the first time, we propose an algorithm for finding the globally optimal assignment in an SC environment. We assume there exists a clair- voyant which knows exactly at what time and which location new tasks and workers will arrive and for how long they will be avail- able. We prove that the clairvoyant algorithm is computationally expensive and cannot be performed on large workloads with thou- sands of tasks and workers. However, in our experiments, for small workloads, we compare the results of Auction-SC with the globally optimal assignment. 1 Hereafter, we use the term ”worker” interchangeably to refer to both the human worker and the software running on her mobile device unless clear distinction is needed. We conduct many experiments on both real world and syntheti- cally generated workloads to evaluate different aspects of Auction- SC. For our real world data we use a dataset of hundreds of thou- sands of geo-tagged Flickr images. We show that with a centralized approach the SC-Server cannot process more than 5 tasks per sec- ond. However, with the auction based framework, the throughput of the system can go as high as 200 tasks per second. The remainder of this paper is organized as follows. In Section 2 we formally define the task assignment problem in SC and prove it is NP-Complete. We review the related work in Section 3. Next, we propose our offline clairvoyant algorithm in Section 4 and analyze its time complexity. We introduce Auction-SC in Section 5 and propose several bidding rules within the auction-based framework. We show the results of our experiments on both real world and synthetic data in Section 6 and conclude the paper with guidelines for future work (Section 7). 2. PRELIMINARIES In this section, we define the terminologies used in the paper and give a formal definition of the problem under consideration. In ad- dition, we analyze the complexity of the problem and its hardness. 2.1 Problem Definition We start by defining some terminologies in order to formally de- fine the task assignment problem in spatial crowdsourcing. DEFINITION 1 (SPATIAL TASK). A spatial task t is a task to be performed at location t.l with geographical coordinates, i.e., latitude and longitude. The task becomes available at t.r (release time) and expires at t.d (deadline). Also, t.v is the obtained reward after completing t. It should be pointed out that in a spatial crowdsourcing environ- ment, a spatial task t can be executed only if a worker is at location t.l. For example, if the query is to report the traffic situation at a specific location, someone has to actually be present at the location to be able to report the traffic. From here on, whenever we use task we are actually referring to a spatial task. Now, we formally define a worker. DEFINITION 2 (WORKER). A worker w is any entity, e.g., a person, willing to perform spatial tasks. We show the current loca- tion of the worker by w.l. Each worker has a list of tasks assigned to it, w.T, and a maximum number of tasks it is willing to perform, w.max. Also w.s and w.e show the availability of the worker such that the worker is available during the time interval (w:s;w:e]. Throughout the paper we assume every worker moves one unit of length per unit of time. Therefore, we can assume that distance (ha;bi) is also the time required to move from point a to pointb. DEFINITION 3 (SCHEDULE). We call an ordered list of tasks a schedule. We show s asht1;:::;tni where n is the number of tasks in s. If we show thei th task in s withsi, we say worker w is able to perform schedule s if and only if: 8i; 1in i X j=1 distance(sj1;sj )si:d wheres0:l is the current location of the w. At each point in time, each worker keeps track of a schedule and completes the tasks based on their order in its schedule. Notation Description t:l location of taskt t:r release time of taskt t:d deadline of taskt t:v value of taskt w:l location of workerw w:s the time workerw becomes available w:e the time workerw becomes unavailable w:T list of tasks assigned to workerw w:max maximum number of tasks workerw performs w:PTS list of all potential task subsets workerw can perform Table 1: List of notations DEFINITION 4 (MATCHING). Assuming we have a set of workers W and a set of tasks T, we callMWT a matching if for eacht2T there is at most onew2W such that (w;t)2M. We call (w;t)2M a match and say t has been matched to w. For each matching M, we define the value (benefit) of M as: Value(M) = X (w;t)2M t:v A matching M is valid if and only if, for every worker w, there exists a schedules, such that (w;ti)2M =) ti2s. Now we can formally define the Task Assignment in Spatial Crowdsourcing (TASC) as follows: DEFINITION 5 (TASK ASSIGNMENT IN SC). Given a set of workers W , a set of spatial tasks T and a cost function d : (W[T )T ! R where d (ha;bi) is the distance between a and b, the goal of the TASChW;T;di problem is to find a valid matchingM with maximum value. It is important to note that with task assignment in SC, the goal is to find a valid matching. This means that in addition to finding a matching between tasks and workers, the SC-Server has to also find a schedule for each worker to perform the tasks. Throughout this paper we use the terms matching phase and scheduling phase to refer to the two different aspects of task assignment in SC. We focus on maximizing the total number of assignment and hence we assume every task has a value equal to 1. Table 1 lists the notations we frequently use in this paper. 2.2 Complexity Analysis The equivalent decision problem for TASC is to decide if there exists a matching M with value K and is denoted as TASChW;T;d;Ki. In this section we use a slightly modified ver- sion of the well known Hamiltonian Path Problem. We call it the Minimum Length Hamiltonian Path Problem (Min-Ham-Path) and define it as follows: DEFINITION 6 (MIN-H AM-P ATH). Given a directed graph G(V;E) where each edgee2E is assigned a lengthl :E!R, a source nodes and a lengthL2 R, the Min-Ham-Path problem hG;l;s;Li is to decide whether there exists a path inG that starts froms, visits every other node exactly once and has a length of at mostL. THEOREM 1. The Min-Ham-Path problem is NP-Hard. PROOF. In order to prove the NP-Hardness of Min-Ham-Path we show Ham-Path p Min-Ham-Path. The Hamiltonian Path problem asks the following question: Given a directed graph G(V;E) does there exist a path that goes through every node ex- actly once? Given an instance of the Ham-Path problem hGi we modify graphG(V;E) and generate a new graphG 0 (V 0 ;E 0 ) whereV 0 = V[fog andE 0 =E[fho;vi :v2Vg. Also, for everye2E 0 we Assumel(e) = 1. Now we show that Ham-PathhGi is true if and only if Min-Ham- PathhG 0 ;l;o;ni is true where n is the number of vertices in G. If Min-Ham-Path returns a path of length n, we can remove the first edge from the path which results in a Hamiltonian Path for the Ham-PathhGi problem. On the other hand every Hamiltonian Path on graphG has lengthn 1. By adding vertexo and connecting it to the starting vertex, we end up with a Hamiltonian Path of length n onG 0 . THEOREM 2. The TASC problem is NP-Complete. PROOF. We start the proof by showing that the decision prob- lem of TASC is verifiable in polynomial time. Given a matching M, we can check that no task is assigned to more than one worker in polynomial time. Furthermore, based on the definition of a valid match, checking the validity of a matchingM can be done in poly- nomial time as well. Finally, we can find the value ofM by adding the value of every task inM. We show TASC is NP-Hard by proving the reduction Min- Ham-Path p TASC. Given an instance of the Min-Ham-Path problemhG(V;E);l;o;Ki we reduce it to an instance of the TASChW;T;l 0 ;n 1i problem such thatW =fog,T =Vnfog. For every task t we set t:v = 1, t:r = 0 and t:d = K. Also for every e 2 E;l 0 (e) = l(e). In addition for every e 0 2 (WT )[ (TT ) wheree 0 62E we setl 0 (e 0 ) =1. Finally we show the result of Min-Ham-Path hG;l;o;Ki is true if TASChW;T;l 0 ;n 1i is true wheren is the number of vertices inG. Considering thatjTj =n 1 andt:v = 1 for everyt2T , if there exists a matching with sizen1 it means every task has been assigned to the single worker. Also, since the deadline of every task isK the worker visits every task no later thanK. Therefore, the path that the worker traverses starts at o and goes through every other vertexv2Vnfog, where the length of the path is no more thanK. 3. RELATED WORK Spatial crowdsorcing research focuses on the task assignment [2, 4, 9, 5, 8, 7, 6], trust [3, 9] and privacy issues [10]. Kazemi and Shahabi [2] formulate task assignment in spatial crowdsourc- ing as a matching problem with the primary objective of maximiz- ing the number of matched tasks, and Alfarrarjeh et al. [4] scaled up the matching algorithm in a distributed setting. Reliable task assignment addressing trust issues in spatial crowdsourcing have been studied in [3, 9]. In [9] Cheng et.al. proposed a partitioning heuristic which recursively divides tasks and workers via KMeans clustering to improve the assignment efficiency. All these studies utilize the batch scheme. The methods used in [2, 9, 4] cannot be applied to a single task for real-time assignment. Furthermore, neither of these studies consider the schedule of a worker when matching tasks and workers. In their settings a worker is able to complete a task if it is matched to it which is not the same in our case. On the other hand, Deng et. al. [5] studied the scheduling problem from the worker’s perspective and developed both exact and heuristic algorithms to help the worker find the best schedule for all the tasks assigned to it. However, their work assumes that each worker has been pre-assigned with some tasks, and thus only deals with scheduling tasks for a single worker. Recent studies have focused on simultaneous task assignment and scheduling [8, 6, 9]. In [8], Li et. al. perform a real-time task assignment for a sin- gle worker. Knowing the worker’s destination, they recommend a route to the worker to be able to complete as many tasks as possi- ble on its way to the destination point. Deng et. al. [6] and Chen at. al [7] perform assignment and scheduling for multiple workers. However, similar to [2, 9, 4] they tackle the assignment problem by batching tasks’ and workers’ arrivals and departures, and then per- form matching and scheduling periodically (e.g., every 10 minutes) for each batch. Our work is also related to some combinatorial optimization problems such as Vehicle Routing Problem(VRP) [11]. The gen- eral setting of VRP is to serve a number of customers with a fleet of vehicles and the objective is to minimize the total travel cost of those vehicles. Compared with VRP, with spatial crowdsourc- ing our objective is to maximize the number of completed tasks, whereas VRP aims to minimize the total travel time. In addition, the spatial workers in our problem setting are not located at one or several fixed depots, and each worker can show up at any unique location. In our setting the spatial tasks are also not guaranteed to be completed by the workers. Finally, in a spatial crowdsourcing platform, we need to provide efficient solutions for potentially mil- lions of tasks and workers. This is different from the solutions in VRP which take hundreds of seconds even for one hundred delivery points. Auction frameworks have been used in various domains such as in dynamic multi-agent environments [12, 13] and online advertis- ing [14]. Bidding strategies from other domains cannot be used in spatial crowdsourcing as they do not consider the spatial and tem- poral characteristics of SC. In this paper, we develop bidding strate- gies that are unique to a spatial crowdsourcing environment. One aspect of auction frameworks that has been widely studied is how to prevent bidders from submitting malicious bids to give them- selves unfair advantage [15]. This problem does not apply to our Auction-SC framework because here, the workers do not directly interact with the SC-Server. Instead, the software on a worker’s phone receives tasks, computes bids and submits them to the server. If a task is assigned to a worker, the worker will be notified with an updated schedule through the software. The worker has no control over the software and hence, cannot cheat the system. 4. EXACT CLAIRVOYANT ALGORITHM In order to have a benchmark to evaluate our real-time algo- rithms, in this section, we define a clairvoyant off-line algorithm. We assume there exists a clairvoyant which has a global knowledge of tasks and workers. In this case, the clairvoyant can assign tasks to workers such that the value of the assignment is maximized. We propose a two step algorithm which finds the optimal solution to the TASC problem. In the first step we find all potential subsets of tasks that a single worker is able to complete. Having the output of step 1, in the next step we find an assignment with maximum value. In the following, we describe different steps of the algorithm and later discuss the complexity of it. We also show how to adapt the algorithm to address other spatial crowdsourcing frameworks. This adaption does not deteriorate the performance of the algorithm, and in most cases the performance is even improved. 4.1 Discovering Potential Task Subsets In the first step of the algorithm we focus on finding all task subsets that a worker w is be able to complete. We can define a Potential Task Subset as: Figure 1: PTS search space forjTj = 4 DEFINITION 7 (POTENTIAL TASK SUBSET (PTS)). We call PTS T a potential task subset for worker w iff there exists a valid schedule s for worker w such that t2 PTS implies that s contains t. We define the value of PTS as: Value(PTS) = X t2PTS t:v For each worker w, we define w.PTS as the set of all such potential task subsets. In order to find all PTSs for a single worker, a straightforward method is to go through all subsets of T and check whether or not it is a PTS for the worker. It is trivial to see that such approach requires O(n!) permutations. Therefore, we utilize a branch and bound algorithm and use the following propositions for pruning the search space. PROPOSITION 1. For everyt2 T andw2 W if (t:r;t:d]\ (w:s;w:e] =; then for everyPTS2w:PTS we havet62PTS. PROPOSITION 2. For anyw2 W for everyPTS2 w:PTS we havejPTSjw:max. PROPOSITION 3. For every w 2 W if PTS 2 w:PTS for everyptsPTS we havepts2w:PTS We model the entire solution space as a tree (Fig. 2) and use a depth-first approach to search the solution space. By Proposition 2 we know that we only have to extend the tree up to depthw:max levels for workerw. Each node of the tree corresponds to a subset of T. Therefore, for any node n of the tree, if the corresponding subset of n is not in w.PTS, according to Proposition 3, then we can prune the entire sub-tree rooted at n. Algorithm 1 FindPTSs(T;W ) Input: T is the set of all tasks and W is the set of all workers Output: PTSs[] is the list containing all potential task subsets for different workers. 1: forw inW do 2: PTSs[w] = SearchBranch(;;T;w) 3: end for 4: return PTSs Algorithm 2 finds all the PTSs under a specific branch of the search space. The current subset,pts, is expanded with a new task, t, if w temporally overlaps with t (lines 4-8). If the new subset pts 0 is a PTS itself we continue the search for more PTSs in the sub-tree rooted at pts 0 (lines 9-15). The subset pts 0 is a PTS only if a valid schedule for workerw exists that contains everyt2pts 0 . Finally in Algorithm 1 for each worker we search for PTSs starting the search at the root of the tree. Algorithm 2 SearchBranch(pts;Tre;w) Input: pts is the current subset, T is the list of remaining tasks and w is the worker Output: PTSs is the set of all potential task subsets for w 1: PTSs =; 2: Tcopy =Tre 3: fort inTre do 4: Tcopy =Tcopynt 5: ift doesn’t temporally overlap withw then 6: continue 7: end if 8: pts 0 =pts[t 9: if ExistsValidSchedule(pts 0 ;w) then 10: PTSs =PTSs[pts 0 11: ifjsj<w:max andTcopy6=; then 12: S = Search Branch(pts 0 ;Tcopy;w) 13: PTSs =PTSs[S 14: end if 15: end if 16: end for 17: return PTSs Figure 2: A sample two layered PTS-Graph 4.2 Finding the Maximal Assignment Having found all PTSs for each worker in step 1, the next step of the algorithm focuses on finding the assignment with the maximum value. For this purpose, we construct a graph such that each PTS from step 1 is a node in the graph. We call this graph the PTS- Graph. Subsequently we show that the maximum weighted clique in the PTS-Graph corresponds to the assignment with the maximum value. The PTS-Graph is a multi-layered graph such that for each worker we add a layer to the graph. Withing each layerl, for each PTS insidew l :PTS we add a node to layerl. The weight of this node is equal to the value of the corresponding PTS. We use an arbitrary ordering to name the nodes within each layer of the PTS- Graph. Assuming layer l contains m nodes, we name them n l1 throughn lm . Also, we define the set of edges of the PTS-Graph as: E =fhnai;n bj ija6=b^ (PTSai\PTS bj =;)g where PTS lm refers to the corresponding PTS of node n lm . In other words, there is an edge between two nodes in different layers if the corresponding PTSs of the two nodes do not contain the same task. Given the PTS-Graph G(V;E), we associate an assignment of tasks to workers with a set of verticesNV as follows: ifn lm 2 N, then every taskt2PTS lm is assigned tow l . We claim that the optimal assignment corresponds to the nodes of the Maximum Weight Clique in PTS-Graph. DEFINITION 8 (MAXIMUM WEIGHT CLIQUE). For undi- rected graph G=(V ,E) where eachv2V is assigned a valuew(v), the Maximum Weight Clique is the clique for which the sum over the weight of all vertices in the clique, is larger than any other clique in the graph. THEOREM 3. If C* is a maximum weighted clique in the PTS- Graph, then the assignment of tasks to workers corresponding to C*, A*, is a maximum value assignment for the TASC problem. PROOF. Based on how we generated the PTS graph, no two vertices in C can contain the same task in their corresponding PTS. Therefore, any task t is assigned to at most one worker. Next we need to show no other assignment A can have a larger value than A . If such assignment A existed, then the clique C corresponding to A, has a larger weight than C which conflicts the fact thatC is the maximum weighted clique. There exists a vast body of literature on solving the maximum weight clique problem [16]. Here we briefly explain one of these algorithms and refer the interested readers to [17]. In Algorithm 3 we start with an arbitrary node ordering. ¨ Osteg˚ ad [17] suggests a node ordering wherev1 is the node with the largest total sum of the weights of its adjacent nodes and so on so forth. Algorithm 3 computes a value C(k); 1 k n which de- termines the largest weight of a clique, only considering nodes fv k ;v k+1 ;:::;vng. Ifw(i) denotes the weight ofvi, then we have C(n) = w(i) and other values for C() are computed in a back- track search. For 1 k n 1;C(k) > C(k + 1) only when a corresponding clique exists and containsv k . If such clique does not exist thenC(k) =C(k +1). In order to find suchC(k) clique, a branch and bound approach is used to search all possible cliques. Details of the branch and bound algorithm can be found in [17]. Algorithm 3 MaximumWeightClique(V;E) Input: V is an ordering of the nodes andE is the set of edges in the graph Output: C a subset ofV as the maximum weighted clique 1: C =fvng 2: max =w(n) 3: fork :n 1 down to 1 do 4: hC k ;wi = FindMaxClique(V;E;v k ) 5: ifw>max then 6: C =C k 7: max =w 8: end if 9: end for 10: return C 4.3 Complexity Analysis We analyze the time complexity of our proposed exact algorithm as follows. For simplicity we assumejTj = n,jWj = m and jw:maxj = p. For Algorithm 2 we can assume that we are run- ning the ExistsValidSchedule() method for each node of the search space tree(Fig. 2) at most once. At leveli of the search space tree we have at mostC(n;i) nodes. Also for each node in leveli the size of the task subsets is i. Hence, a naive implementation for the ExistsValidSchedule() method runs inO(i i+1 ). Therefore the amortized time complexity of Algorithm 2 is: p X i=1 n i ! O(i i+1 ) =O(n np p p+1 ) Since we run the SearchBranch method for each worker once, we can conclude that the time complexity of Algorithm 1 is O(mn np p p+1 ). In Algorithm 3 we implement the FindMaxClique() method, uti- lizing a branch and bound algorithm described in [17]. The time complexity of this algorithm isO(n n ) for a graph of sizen. There- fore, for a graph of size n, the overall time complexity of Algo- rithm 3 isO(n n+1 ). We end the section with a discussion on how generalizing the spatial crowdsourcing framework can affect the algorithm intro- duced in this section. In [3] it is assumed that each task requires a certain level of confidence and only workers with a trust level higher than that are able to perform the task. Moreover, in some spatial crowdsouring frameworks [2, 3, 5], workers define a spatial range and only perform task within this region. Constraints like these only remove a subset of tasks from the list of tasks a cer- tain worker is able to perform. Consequently, the search space of Algorithm 1 gets reduced and we end up with fewer numbers of PTSs per worker. This in turn, reduces the size of the PTS-Graph in Section 4.2. 5. AUCTION-SC FRAMEWORK In practice, an SC system works similar to a complex event pro- cessing (CEP) engine [18] where new tasks arrive at the SC-Server as an input stream. With a real-world SC scenario, the SC-Server will find out about a task and its properties only when it is released. The complexity of the many-to-many matching in addition to the need for immediacy (e.g., Uber) render the batch scheme imprac- tical for real-world scenarios. Furthermore, in an online central- ized SC-Server scheduling multiple workers becomes the bottle- neck and hence, real-time assignment is not guaranteed. In this sec- tion we introduce Auction-SC which has neither shortcomings and generates real-time online assignments by splitting the matching and scheduling responsibilities between the server and the work- ers, respectively. First we explain the auction framework and how tasks are dispatched to workers. Next, we discuss how workers compute their bids and at the end we provide a cost analysis of this framework. 5.1 Task Dispatchment in Auction-SC Auction methods have been effectively used for assignment problems in dynamic multi-agent environments [12, 13]. The main advantage of auction methods are their simplicity and the fact that they allow for a decentralized implementation. Auction-SC con- siders workers as bidders and tasks as goods. Furthermore, the SC-Server plays the role of a central auctioneer in Auction-SC. At a very high level, in Auction-SC, once a new tasks is submitted to the SC-Server (auctioneer), the server presents the task to the workers (bidders). Depending on the bidding rule, which is com- mon among all workers, each worker computes its own bid and submits the bid to the server. The bidding process is performed as a sealed-bid auction where workers simultaneously submit bids and no other workers knows how much the other workers have bid. The SC-Server selects the worker with either the lowest or highest bid (depending on the bidding rule) as the winner and matches the task with the worker. Broadcasting every incoming task to all available workers incurs a large communication cost on the system. Auction-SC lowers the communication cost by only sending any incoming task to eligible workers that are define as: DEFINITION 9 (ELIGIBLE WORKER). An available worker w is said to be eligible for performing a newly released task t, if and only if: distance(w;t)w:dt:r^distance(w;t)t:dt:r In other words, an available worker w is eligible for performing task t, if it has enough time to reach the location of t before either t expires or w leaves the system. The SC-Server maintains a spatial index on the location of the workers. With Auction-SC, we use a grid index since (1) the workers have to send updated locations to the server only if they change cells and (2) the server does not need to know the exact location of the worker to be able to filter out non- eligible workers. A detailed analysis of the communication cost is presented at the end of this section. Algorithm 4 OnlineTASC(W;t) Input: W is the set of currently available workers andt is a task the has just released Output: Eitherw2W as the worker taskt should be assigned to or null if no worker is selected 1: w selected = null 2: Bids =; 3: forw2Wt do 4: bid =w.ComputeBid(t) 5: Bids hw;bidi 6: end for 7: w selected = SelectBestBid(Bids) 8: return w selected Algorithm 4 outlines the process of assigning an incoming task t. Wt in line 3 is the set of eligible workers for task t. Notice that all the iterations of the for loop in Algorithm 4 (lines 3-6) run in parallel. The ComputeBid() method (line 4) that each worker executes depends on the implemented bidding rule. Similarly, the SelectBestBid() method (line 7) returns the worker with either the highest or lowest bid. In case of a tie, the SelectBestBid() method, randomly selects one worker among the ones with optimal bid. 5.2 Worker’s Bid Computation With Auction-SC, every worker computes its bid using a prede- fined bidding rule. A worker’s bid represents how good it is to be matched with the task. When computing a bid for a task, the work- ers have no knowledge about other tasks that might arrive in the future. Consequently, they have to make a greedy decision based on their current status. First we introduce four simple bidding rules based on heuristics that are used for other problems that seem to be similar to task assignment in SC, hence we call them non-SC rules. These problems do not consider either the spatial or tempo- ral dynamism of SC. We use non-SC rules in the experiments to show the importance of both spatial and temporal aspects of SC. The four non-SC rules used in our experiments are: Random (Rnd): As a baseline approach, we consider a rule where every eligible worker submits a bid of value 1. The SC-Server se- lects the winner randomly from the set of eligible workers. Ranking (Rnk): Based on the ideas in Online Bipartite Matching [19], the workers are ranked from 1 ton (the workers’ order do not change over time). Each eligible worker submits a bid of value 1 and the SC-Server selects the eligible worker with the lowest rank as the winner. This heuristic does not consider either spatial or temporal aspects of SC. Nearest Neighbor (NN): Similar to the the Spatial Matching prob- lem [20], for this bidding rule we give priority to the closest worker to the location of the task. This means a worker closer to the loca- tion of the incoming task is better sutied to perform the task com- pared to a worker farther away. To compute a bid, each worker needs to compute the distance between the location of the task and its own location and submit the computed distance as its bid. Once every worker has submitted its bid, the SC-Server chooses the worker with the minimum bid as the winner. This bidding rule only considers spatial constraints of SC. Most Free Time (MFT): In this bidding rule, based on the ideas in Online Scheduling [21], we give priority to workers that have more time before they leave the system. At each point of time, a worker computes its free time as the duration between the time it finishes its current schedule and the time it leaves the system. Each worker submits a bid equal to its free time and the SC-Server selects the worker with the highest bid as the winner. This bidding rule only considers the temporal dynamism of SC. In addition to the four non-SC rules, we propose two bidding rules that consider both spatial and temporal constraints of SC and hence, we call them SC rules. The intuition behind the first bidding rule is that the less time a worker spends completing an incoming task, the higher the chances are that it is available in the future to perform more tasks. Our second SC rule, tends to move workers to areas that there is a higher chance for a task to show up in the future. We describe how workers compute their bids and how the server selects the winner in the following. Best Insertion (BI): Intuitively, if a worker spends less time to complete a task it will likely have more time for performing other tasks. In BI, the server gives priority to workers that can better insert the incoming task into their schedules. Auction-SC considers the extra time each worker will need to complete the new task in addition to its current schedule. Each worker starts with finding a potential optimal schedule where the new task is added to its current schedule. For this pro- cess, the worker uses a branch and bound algorithm to check all possible orders of its uncompleted tasks in addition to the new task. Running an exhaustive search for large number of tasks can be time consuming. However, in our experiments on both real world and synthetic data, we realized that the number of uncompleted tasks at each point of time for each worker, remains in a range where even the exhaustive search can be done in real-time for a single worker. The reason is that, as time passes and new tasks arrive, the worker also completes other tasks and removes them from its schedule. In cases where computing the bid takes too long, we can replace this computation with an approximate algorithm (e.g., the Nearest In- sertion algorithm [22]). In our experiments, we show that running an approximate algorithm for the scheduling phase, reduces the ac- curacy of the assignment by less than 5%. In order to compute a bid, each worker computes the finish time of the potential optimal schedule in case it is matched with the new task (f2). The worker also knows the finish time of its current schedule (f1). For each worker,f2f1 is the extra time it requires to complete the new task in addition to tasks in its current schedule. Therefore the bid each worker submits to the SC-Server is equal tof2f1. After receiving the bids from every worker, the SC-Server assigns the task to the worker with the lowest bid. Best Distribution (BD): BI does not consider the spatial distribu- tion of tasks. It might be beneficial to assign a task located at a task-dense area to a worker with high remaining availability, even if the worker has to move from its current location significantly. The general idea behind this rule is to try to move workers to lo- cations that have a higher chance of having a task in the future. Ideally, we want the spatial distribution of the available workers (SW ) to be as close as possible to the overall spatial distribution of the tasks (ST ). One can argue that knowingST contradicts the assumption that the SC-Server has no spatiotemporal knowledge about future tasks. Even if the SC-Server knows ST , it does not mean it also knows the (a) Distribution 1 (b) Distribution 2 Figure 3: Two spatial distributions for tasks and workers exact locations at which the tasks are going to be released. Never- theless, with Auction-SC we do not make the assumption that ST is known a priori. Instead, we assume the SC-Server starts with an empty distribution and keeps updating it as new tasks arrive. ForSW andST we use a grid index similar to the one Auction- SC maintains for choosing eligible workers when a task arrives. For each event occurring inside a cell, we add to the weight of that cell. To compute ST , for every task we add a value of 1 to the cell containing the task. As forSW , we need to consider the availability of the worker. For example, if the maximum number of tasks workerw can perform isn, and it already has been assigned m tasks, we say the availability of workerw isnm. In this case, when computingSW , we addnm units to the cell coveringw. For bothST andSW , by normalizing the weights of the cells, we can compute the probability of an event occurring in each cell. Having SW and ST , we need a metric to determine the close- ness of these two distributions. Several methods have been used to compute the similarity between two distributions. Among the more commonly used methods, we can name the Kullback-Leibler divergence [23] and Jensen-Shannon divergence [24]. The problem with these methods is that they do not take the distance between the cells into consideration. For example, Fig. 3 shows two spatial dis- tribution for two tasks (red dots) and two workers (blue squares). Clearly, the distribution of workers in Fig. 3(b) is more similar toST compared to Fig. 3(a). However, with the Jensen-Shannon divergence metric, JSD(ST;SW 1 ) is equal to JSD(ST;SW 2 ). Considering the Kullback-Leibler divergence or any other metric that does not take into account the spatial relationship between the cells, ends up with the same results. In this paper, we use the Earth Mover’s Distance metric since it has the ability to incorporate the spatial aspect of the distributions when computing their similarity. DEFINITION 10 (EARTH MOVER’S DISTANCE). The Earth Mover’s Distance (EMD) is a measure of distance between two probability distributions over a regionD. If the distributions are interpreted as two different ways of piling up a certain amount of dirt over regionD, the EMD is the minimum cost of turning one pile into the other; where the cost is assumed to be the amount of dirt moved, times the distance by which it is moved [25]. If we want to compute the EMD between two distributionsA and B, for each grid celli, we call celli a supplier iffPA(i)>PB(i) and a consumer iff PA(i) < PB(i). If PA(i) = PB(i), cell i is neither a supplier nor a consumer. For each supplieri, we say ai =PA(i)PB(i) is the total supply ofi. Also, the total demand for consumerj is shown asbj = PB(j)PA(j). Now we can model the problem as a bipartite network flow problem where on one side we have the supplier nodes and on the other side we have consumer nodes. The weight of each edgecij between supplieri and consumerj, is the cost of moving one unit of mass fromi toj. Consequently, finding EMD reduces to finding the Minimum Cost Flow for this bipartite graph. The Minimum Cost Flow problem can be formalized as the following linear programming problem: minimize X i2I X j2J cij:fij subjectto : fij 0 i2I;j2J (1) X i2I fij =bj j2J (2) X j2J fijai i2I (3) We can solve this LP problem efficiently using the simplex method [26]. We assume the SC-Server maintains SW and ST and shares it with the eligible workers. First, each worker has to compute a po- tential schedule by inserting the incoming task to its current sched- ule. If the worker is able to find a new schedule, it can locally modifySW considering how its location and availability changes in the potential schedule, i.e.,S 0 W . Subsequently, the worker com- putes the EMD betweenS 0 W andST and submits the value as its own bid. The SC-Server selects the worker with the lowest bid as the winner. 5.3 Cost Analysis We end the discussion on Auction-SC with a detailed analysis of the communication cost in this framework. Communication cost can be looked at from two different perspectives; response time and throughput. With Auction-SC, once the task arrives at the SC- Server, it is broadcast to a number of workers. Therefore, the SC- Server can send all messages (/packets) in parallel at the same time. In return, all workers are submitting their bids in parallel as well. Considering current network transmission speeds, even in cellular networks, the response time of transmitting a single packet of data in Auction-SC is negligible. The other aspect of communication cost analysis is the through- put of the network. Specially when at scale, the increase in the number of sent messages can saturate the network bandwidth and cause trouble. In a centralized architecture, tasks are not broadcast to workers and there is no bid submission. Hence it seems that communication cost is not a factor in a centralized approach. How- ever, due to the dynamism of SC, coordination between the workers and the centralized server is inevitable and they need to communi- cate with each other. Here we show that with regard to the number of messages transmitted between the workers and the SC-Server, there is not much difference between a centralized approach and Auction-SC. As explained earlier, with Auction-SC we implement a grid in- dex that keeps track of the location of available workers. The SC- Server uses this grid index to identify eligible workers and only broadcast the new task to them. The worker has to notify the server about its location only if its movement causes it to switch cells. We assume, on average a workers switches cells times and show the total number of workers withn. Subsequently, the number of messages transmitted to notify the SC-Server about a cell change is :n. Upon the arrival of a new task t, the SC-Server identifies all the cells within t.d distance of t’s location and broadcasts the task to workers only in those cells. Assuming on average there are ne eligible workers for each task, we can compute the number of transmitted messages as: jMAuctionj = :n + 2jTj ne On the other hand, In a centralized approach, the SC-Server does all the computation and assigns the new task to the best workers by itself. In order to compute a potential new schedule for each worker, the server has to know the exact location of every worker. One idea is for the SC-Server to internally keep a spatial index on the exact location of the workers and use it to retrieve the exact location of workers when necessary. The problem with such a spa- tial index is that even if only 10% of the workers move and update their locations frequently, it can take up to 20 seconds to update the spatial index [27] which is not acceptable in a real-time system. The other option is for the SC-Server to query the exact location of the eligible workers when it wants to compute their potential schedule. To prevent querying all available workers, we assume the centralized server utilizes a grid index as well. Therefore, it can only query the eligible workers for their exact location once a task arrives. Consequently, upon arrival of each task and for each eligi- ble worker, one message is sent from the server to ask for the exact location and one message is sent back to the server with the exact location. Similarly, we can compute the total number of transmitted messages as: jM Centralized j = :n + 2jTj ne Similar to Auction-SC, :n is the total number of messages sent to the server to change the workers’ cells. We can see the number of messages transmitted in a centralized server is similar to those with Auction-SC. Thus, when compared to a centralized architecture, Auction-SC does not increase the throughput of the network. 6. EXPERIMENTS Unlike regular crowdsourcing, there is no publicly available platform for running real-world spatial crowdsourcing campaigns. With platforms such as Amazon Mechanical Turk, workers do not perform spatial crowdsourcing tasks. For this reason, we sim- ulated various spatial crowdsourcing campaigns on an Inter(R) Core(TM)2 Duo 3.16GHz PC with 4GB memory running Mi- crosoft Windows 7. 6.1 Dataset Due to its commercial value, real-life SC systems such as Uber and TaskRabbit do not make their datasets available to public. However, we were able to use a large collection of geo-tagged im- ages from Flickr [28] as an input to Auction-SC. The dataset con- sists of ˜15 million images with attributes such as location, time taken, time uploaded, the user who took the picture, etc. We correspond each image with a spatial task considering that the task was to take a photo at a specific location. We set the dead- line of the task to the time the image was taken and the duration of the task to the interval between the time the image was taken and the time it was uploaded. Each user in the dataset corresponds to a worker. The original location of the worker is randomly selected. The worker is available during the interval between the first time he took a photo and the last time he uploaded one. We ran 5 separate instances, each of which with images in a different metropolitan area. Table 2 shows the total number of tasks (and workers) for each city. # of Tasks # of Workers Los Angeles 219,332 19,081 New York 390,229 17,603 London 366,034 18,480 Paris 237,344 14,275 Beijing 23,335 1,598 Table 2: Number of tasks/worker for each city in Flickr dataset We also generated a synthetic dataset with realistic streaming workload based on the methodology proposed in [29]. To generate a workload suitable for SC systems we modeled three different sets of parameters: Temporal Parameters: In [30], it is shown that with crowdsourc- ing environments, workers and tasks arrive following Poisson pro- cesses [31].The default Poisson arrival rates for tasks and workers aret = 20=min andw = 3=min, respectively. Subsequently, the duration of the tasks and workers were randomly sampled from closed range of [1; 4] hours and [1; 8] hours, respectively. Spatial Parameters: Fig. 4 shows the spatial distribution of tasks from our real-world dataset in Los Angeles. As depicted, the tasks are not uniformly distributed in space. The spatial distribution is rather skewed, meaning that the density of the tasks at certain areas is higher. To model the same behavior with our synthetic work- loads, we created 6 two dimensional Gaussian clusters with ran- domly selected means and standard deviations. Eighty percent of the tasks are sampled within the clusters and the rest are uniformly distributed. Static Parameters: In addition to the spatiotemporal parameters, we consider two other parameters. The default workload size of each experiment is 10K tasks. The task arrival rate and the number of tasks determine the duration of the simulation. Based on the du- ration of the simulation and the worker arrival rate, the total number of workers may vary. The maximum number of tasks a worker can perform, i.e.,wmax , is a uniformly random number from the closed interval [8; 12]. Figure 4: Spatial Distribution of Tasks in Flickr 6.2 Online vs. Offline As the first set of experiments, we compare how the results of the real-time algorithms compare with the optimal solution computed using the offline clairvoyant algorithm explained in Section 4. Be- cause of the high complexity of the offline algorithm, we are not able to run tests with large workloads and thus we use workloads with 100 tasks. The experimental results of Fig. 5 show that the best real-time algorithms perform close to ˜60% of the optimal solution. Our results are consistent with studies that compute a competitive ratio [32] of 1e 1 for the online matching problem with random inputs [33]. 6.3 Assignment Quality In this section we evaluate the quality of the assignments using different real-time non-clairvoyant algorithms. First, we compare them using the Flickr and the synthetic datasets. Subsequently, us- ing the synthetic datasets, we show how the spatial and temporal settings of the problem can affect the performance of the real-time assignment quality. Fig. 6 compares the assignment quality of different real-time al- gorithms. As we can see, both SC approaches outperforms the Figure 5: Comparison of Offline and Real-time approaches (a) Flickr (b) Synthetic Figure 6: Assignment Quality of Real-Time Approaches best non-SC algorithm by almost 25%. The main reason as ex- plained in Section 5 is that, the SC rules perform scheduling when matching tasks with workers. Furthermore, the BD approach out- performs BI by at most 10%. This is not surprising as BD tends to “move” workers to areas where future tasks are more likely to appear, thus achieving higher assignment in long term. Among the non-SC approaches NN outperforms other rules by almost 2 times more completed tasks. The reason MFT does not outperform base- line approaches (Rnd and Rnk) can be explained by what we call a radical move. A radical move is when the SC-Server assigns a task to a worker which requires it to move a relatively long distance to reach the location of that task. Since we do not consider any spatial proximity to the task with MFT, there is a high chance to end up with assignments resulting in radical moves. With NN and BI the general idea is to prevent radical moves as much as possible. With BD, although radical moves occur, but only if the worker moves to areas where there will be more tasks to complete. In order to study the effect of temporal parameters of SC, we ran several experiments using different pairs of task arrival rates (trate ) and worker arrival rates (wrate ). In Fig. 7 we show the effect of increasingtrate andwrate on the quality of the assignment. The level of grayness corresponds to the percentage of completed tasks with black and white representing 100% and 0%, respectively. As we can see with small number of workers, as we increase the task arrival rate, the percentage of completed tasks decreases where at the top left corner of each plot we get close to 0%. On the other hand in Figs. 7(d) to 7(f) for NN, BI and BD, with small number of incoming tasks, as we increasewrate , eventually all tasks will be completed. Fig. 7 clearly shows that NN, BI and BD outperform Rnd, Rnk and MFT independent of thetrate andwrate . To better evaluate the leading real-time approaches, NN, BI and BD, in Fig. 8 we performed a pair-wise comparison by taking their task completion rates. For example, Fig. 8(a) shows the difference between BI and NN. We observe that these three approaches per- form similarly at the two extreme cases discussed in Fig. 7, i.e., high task-low worker and low task-high worker. BI and BD out- perform NN up to 30% when the problem is more complex, i.e., outside the extreme cases. An interesting observation in Figs. 8(a) and 8(b) is that BI and BD outperform NN by a much larger mar- gin at scale (highertrate andwrate ). The reason is that with higher highertrate andwrate more workers are moving around and more (a) Rnd (b) Rnk (c) MFT (d) NN (e) BI (f) BD Figure 7: Assignment Profile-Varying Worker/Task Arrival Rates (a) BI Vs. NN (b) BD Vs. NN (c) BD Vs. BI Figure 8: Assignment Difference-Varying Worker/Task Arrival Rates tasks come and leave so in general the spatiotemporal dynamism of the system increases. BI and BD cope with the dynamism by guar- anteeing a task gets assigned to worker that can complete it. On the contrary, NN ignores the schedule of a worker during matching and this becomes more important as there is more dynamism in the system. We mentioned earlier that with the SC-rules, the workers per- form an exhaustive search to find out if they can fit a new task into their schedule. As workers accept new tasks, they also complete some other tasks so as we observed in our experiments, performing an exhaustive search did not cause any scalability issues. Neverthe- Figure 9: Assignment Difference of BI Vs. ApproxBI less, one might want to replace the exhaustive search with a poly- nomial time approximate algorithm. With ApproxBI, we use the insertion algorithm from [22] that runs inO(n 2 ). Fig. 9 shows the change affects the quality of the assignment by less than 5%. The difference caused by ApproxBI is that the workers that are eligible for a task using BI, may not be able to fit the task in their sched- ules using ApproxBI due to the approximation. As a result, using ApproxBI the server may not be able to assign some tasks even if they can be completed using BI. Fortunately, as shown in Fig. 9, that does not happen very often regardless oftrate andwrate . The next set of experiments compare the effect of the spatial dis- tribution of tasks. We compared the quality of the final assignment for three different distribution. Even though real-world data usu- ally follow a skewed spatial distribution (Section 6.1), the results of these experiments show that regardless of the distribution, SC rules outperform non-SC rules. With the first distribution, the location of the tasks follow a spatial Poisson process [34]. The other two dis- tributions are a Uniform 2D distribution and a Skewed distribution. The results in Fig. 10 show that SC rules generate assignments at least 20% better than non-SC rules. Also, we can see with Poisson and Uniform distributions, there is not much difference between BI and EMD. The reason is that both distributions generate tasks at completely random locations. Consequently, tasks are released at every area with the same probability and hence EMD and BI be- come similar. Figure 10: Assignment Difference-Varying Distribution The final set of experiments regarding assignment quality, com- pare bathed assignment with online assignment results. For batched assignment we used the LALS algorithm from [6] 2 . Earlier we explained for some use cases (e.g., Uber), batched as- signment does not even satisfy application requirements. Neverthe- less, the results in Fig. 11 show even if non real-time assignments are tolerable, the quality of batched assignment is not as good as the online assignment. One reason is that the LALS algorithm per- forms the matching phase and then attempts to schedule tasks for their matched workers. All tasks that could not be scheduled for their matched workers, will go back to the matching phase and the process continues until all tasks are scheduled or there is no more worker to match with a task. When performing the matching, the 2 The reported runtimes in [6] outperform those of [7] Figure 11: Assignment Difference of BI Vs. LALS schedule of the worker is not considered and hence a task might end up getting matched to and scheduled for a worker that was not the best worker. This in turn, can lower the chances of that worker to get assigned to a new task in the future. The second reason is that while a task is waiting at the server to get processed with the next batch, depending on the length of the batching time interval, it will lose some portion of its availability time which in turn, can lower the chances of the task fitting a worker’s schedule. 6.4 Scalability The last set of experiments focus on measuring the scalability of two system architectures, i.e., centralized and Auction-SC (decen- tralized). We compare the scalability of the real-time Auction-SC algorithms with the equivalent implementation of the same algo- rithms on a single centralized SC-Server. We can measure the scalability of SC systems by their through- put: the number of tasks processed per second , or equivalently, the processing time per task, shown in Fig. 12. Because of the de- centralized architecture of Auction-SC, in Fig. 12 we see that the average processing time of a single task does not change as the ar- rival rate of workers increases. On the contrary, with the centralized architecture, the average processing time of a single task increases linearly as we increase the number of workers and is several or- ders of magnitude higher than Auction-SC approaches. Although BD consumes more time than other real-time algorithms, but only takes up to 3 milliseconds. (a) wRate = 1 worker/min (b) wRate = 2 worker/min (c) wRate = 4 worker/min (d) wRate = 8 worker/min Figure 12: Average processing time for a single task For a CEP engine, it is also common to measure the queuing delay of events [35] once they arrive at the system as a metric for the scalablility of the systems. In Fig. 13 we compare the average queuing delay of tasks in the two architectures after running them for 1 hour. We can see that the centralized system suffers from queuing delays with less than 10 tasks/second. On the other hand, with Auction-SC, even for BD, we do not observe queuing delays for up to 500 tasks/second. Fig. 13(b) also shows that BD incurs higher delay than BI but results in higher completion rates (Figs. 6 and 8). Users of Auction-SC can choose between BD and BI to balance their needs for assignment quality and efficiency. (a) Centralized (b) Auction-SC Figure 13: Average queuing delay To summarize the results of our experiments, we showed that when SC bidding rules are used, the quality of the assignment is much higher as compared to when a non-SC rule is used. The con- sideration of scheduling at the time a task is being matched is the main reason for the better overall assignments. When running on a single centralized SC-Server, neither one of bidding rules can scale. Furthermore, the time required to process a single task increases linearly as more workers are added to the system in a centralized architecture. Scalability of a single centralized server suffers more when SC bidding rules are used due to their higher computation complexity. However, with Auction-SC we solve the scalability problem by splitting the scheduling and matching responsibilities between workers and the SC-Server. Consequently, Auction-AC can afford to execute complex SC bidding rules, resulting in a very high quality assignment. Table 3 shows the summary of our exper- imental results. 7. CONCLUSION AND FUTURE WORK In this paper, we studied the problem of real-time task assign- ment in spatial crowdsourcing. We showed that neither of the two current approaches for task assignment in spatial crowdsourcing, batched assignment and centralized online assignment, can scale as either task matching or task scheduling will become the bottleneck. Therefore, we introduced an auction-based framework in which we split the matching and scheduling responsibilities between the SC- Server and workers, respectively. We showed that by exploiting the spatiotemporal aspects of SC, with our proposed algorithms, the workers will be able to complete up to 30% more tasks as com- pared to non-SC approaches. The decentralized architecture of our framework allows for performing such high quality assignments at scale. In this paper, we assumed each task can be performed instan- taneously, e.g., taking a picture. Once that assumption is relaxed, we will face new challenges with scheduling the tasks. We also non-SC bidding rules SC bidding rules Rnd Rnk NN MFT BI BD Centralized Scalability Bad Bad Bad Bad Very Bad Very Bad Assignment Quality Very Bad Very Bad Bad Very Bad Very Good Very Good Auction-SC Scalability Very Good Very Good Very Good Very Good Good Good Assignment Quality Very Bad Very Bad Bad Very Bad Very Good Very Good Table 3: Summary of Experimental Results assumed each task requires the worker to travel to a single loca- tion. However, in other applications the worker may need to visit multiple locations for a single task, e.g., in the Uber application the worker has to pick up a passenger at one location and drop him off at a second location. We plan to extend our Auction-SC framework to incorporate these two features. 8. REFERENCES [1] “Global mobile statistics 2014 part a: Mobile subscribers; handset market share; mobile operators,” https://mobiforge.com/research-analysis/, accessed: 2015-09-30. [2] L. Kazemi and C. Shahabi, “Geocrowd: Enabling query answering with spatial crowdsourcing,” in Proceedings of the 20th International Conference on Advances in Geographic Information Systems, ser. SIGSPATIAL ’12. New York, NY , USA: ACM, 2012, pp. 189–198. [3] L. Kazemi, C. Shahabi, and L. Chen, “Geotrucrowd: Trustworthy query answering with spatial crowdsourcing,” in Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ser. SIGSPATIAL’13. New York, NY , USA: ACM, 2013, pp. 314–323. [4] A. Alfarrarjeh, T. Emrich, and C. Shahabi, “Scalable spatial crowdsourcing: A study of distributed algorithms,” in Mobile Data Management (MDM), 2015 16th IEEE International Conference on, vol. 1, 2015, pp. 134–144. [5] D. Deng, C. Shahabi, and U. Demiryurek, “Maximizing the number of worker’s self-selected tasks in spatial crowdsourcing,” in Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Orlando, Florida, 2013, pp. 324–333. [6] D. Deng, C. Shahabi, and L. Zhu, “Task matching and scheduling for multiple workers in spatial crowdsourcing,” in Proceedings of the 23rd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Bellevue, Washington, 2015. [7] C. Chen, S. Cheng, H. C. Lau, and A. Misra, “Towards city-scale mobile crowdsourcing: Task recommendations under trajectory uncertainties,” in Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015 , 2015, pp. 1113–1119. [8] Y . Li, M. Yiu, and W. Xu, “Oriented online route recommendation for spatial crowdsourcing task workers,” in Advances in Spatial and Temporal Databases. Springer International Publishing, 2015, vol. 9239, pp. 137–156. [9] P. Cheng, X. Lian, Z. Chen, R. Fu, L. Chen, J. Han, and J. Zhao, “Reliable diversity-based spatial crowdsourcing by moving workers,” Proc. VLDB Endow., vol. 8, no. 10, pp. 1022–1033. [10] H. To, G. Ghinita, and C. Shahabi, “A framework for protecting worker location privacy in spatial crowdsourcing,” Proc. VLDB Endow., vol. 7, no. 10, pp. 919–930, Jun. 2014. [11] O. Br¨ aysy and M. Gendreau, “Vehicle routing problem with time windows, part i: Route construction and local search algorithms,” Transportation Science, vol. 39, no. 1, pp. 104–118, Feb. 2005. [12] A. Mehta, A. Saberi, U. Vazirani, and V . Vazirani, “Adwords and generalized on-line matching,” in Foundations of Computer Science, 2005. FOCS 2005. 46th Annual IEEE Symposium on, Oct 2005, pp. 264–273. [13] M. Lagoudakis, M. Berhault, S. Koenig, P. Keskinocak, and A. Kleywegt, “Simple auctions with performance guarantees for multi-robot task allocation,” in Intelligent Robots and Systems, 2004. (IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on, vol. 1, Sept 2004, pp. 698–705 vol.1. [14] A. Ghosh and A. Sayedi, “Expressive auctions for externalities in online advertising,” in Proceedings of the 19th international conference on World wide web, 2010, pp. 371–380. [15] R. Lavi and C. Swamy, “Truthful and near-optimal mechanism design via linear programming,” in Foundations of Computer Science, 2005. FOCS 2005. 46th Annual IEEE Symposium on, Oct 2005, pp. 595–604. [16] M. Y . Kovalyov, C. Ng, and T. E. Cheng, “Fixed interval scheduling: Models, applications, computational complexity and algorithms,” European Journal of Operational Research, vol. 178, no. 2, pp. 331 – 342, 2007. [17] P. R. J. ¨ Osterg˚ ard, “A new algorithm for the maximum-weight clique problem,” Nordic J. of Computing, vol. 8, no. 4, pp. 424–436, Dec. 2001. [18] D. C. Luckham, The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2001. [19] R. M. Karp, U. V . Vazirani, and V . V . Vazirani, “An optimal algorithm for on-line bipartite matching,” in Proceedings of the 22nd Annual ACM Symposium on Theory of Computing (STOC), Baltimore, MD, 1990, pp. 352–358. [20] R. C.-W. Wong, Y . Tao, A. W.-C. Fu, and X. Xiao, “On efficient spatial matching,” in Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, 2007, pp. 579–590. [21] K. Lee and K. Lim, “Semi-online scheduling problems on a small number of machines,” Journal of Scheduling, vol. 16, no. 5, pp. 461–477, 2013. [22] D. Rosenkrantz, R. Stearns, and P. Lewis, “Approximate algorithms for the traveling salesperson problem,” in Switching and Automata Theory, IEEE Conference Record of 15th Annual Symposium on, Oct 1974, pp. 33–42. [23] S. Kullback and R. A. Leibler, “On information and sufficiency,” Ann. Math. Statist., vol. 22, no. 1, pp. 79–86, 1951. [24] J. Lin, “Divergence measures based on the shannon entropy,” Information Theory, IEEE Transactions on, vol. 37, no. 1, pp. 145–151, Jan 1991. [25] Y . Rubner, C. Tomasi, and L. Guibas, “A metric for distributions with applications to image databases,” in Computer Vision, 1998. Sixth International Conference on, Jan 1998, pp. 59–66. [26] G. B. Dantzig, “A history of scientific computing,” 1990, ch. Origins of the Simplex Method, pp. 141–151. [27] A. Akdogan, C. Shahabi, and U. Demiryurek, “Toss-it: A cloud-based throwaway spatial index structure for dynamic location data,” in Proceedings of the 2014 IEEE 15th International Conference on Mobile Data Management - Volume 01, ser. MDM ’14, Washington, DC, USA, 2014, pp. 249–258. [28] B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li, “The new data and new challenges in multimedia research,” arXiv preprint arXiv:1503.01817, 2015. [29] W. Tang, Y . Fu, L. Cherkasova, and A. Vahdat, “Modeling and generating realistic streaming media server workloads,” Computer Networks, vol. 51, no. 1, pp. 336 – 356, 2007. [30] S. Basu Roy, I. Lykourentzou, S. Thirumuruganathan, S. Amer-Yahia, and G. Das, “Task assignment optimization in knowledge-intensive crowdsourcing,” The VLDB Journal, pp. 1–25, 2015. [31] D. Stoyan, W. S. Kendall, and J. Mecke, Stochastic geometry and its applications, ser. Wiley series in probability and mathematical statisitics. Wiley, 1987. [32] D. D. Sleator and R. E. Tarjan, “Amortized efficiency of list update and paging rules,” Communications of the ACM, pp. 202–208, 1985. [33] G. Goel and A. Mehta, “Online budget matching in random input models with applications to adwords,” in Proceedings of the 19TH Annual ACM-SIAM symposium on Discrete algorithms, San Francisco, CA, 2008, pp. 982–991. [34] W. Weil, Ed., Stochastic Geometry: Lectures given at the C.I.M.E. Summer School held in Martina Franca, Italy, September 13–18, 2004, 2007, ch. Spatial Point Processes and their Applications, pp. 1–75. [35] E. Wu, Y . Diao, and S. Rizvi, “High-performance complex event processing over streams,” in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’06, 2006, pp. 407–418.
Linked assets
Computer Science Technical Report Archive
Conceptually similar
PDF
USC Computer Science Technical Reports, no. 962 (2015)
PDF
USC Computer Science Technical Reports, no. 943 (2014)
PDF
USC Computer Science Technical Reports, no. 840 (2005)
PDF
USC Computer Science Technical Reports, no. 968 (2016)
PDF
USC Computer Science Technical Reports, no. 647 (1997)
PDF
USC Computer Science Technical Reports, no. 868 (2005)
PDF
USC Computer Science Technical Reports, no. 733 (2000)
PDF
USC Computer Science Technical Reports, no. 959 (2015)
PDF
USC Computer Science Technical Reports, no. 855 (2005)
PDF
USC Computer Science Technical Reports, no. 964 (2016)
PDF
USC Computer Science Technical Reports, no. 739 (2001)
PDF
USC Computer Science Technical Reports, no. 744 (2001)
PDF
USC Computer Science Technical Reports, no. 785 (2003)
PDF
USC Computer Science Technical Reports, no. 694 (1999)
PDF
USC Computer Science Technical Reports, no. 845 (2005)
PDF
USC Computer Science Technical Reports, no. 740 (2001)
PDF
USC Computer Science Technical Reports, no. 835 (2004)
PDF
USC Computer Science Technical Reports, no. 893 (2007)
PDF
USC Computer Science Technical Reports, no. 736 (2000)
PDF
USC Computer Science Technical Reports, no. 948 (2014)
Description
Mohammad Asghari, Cyrus Shahabi, and Liyue Fan. "Auction-SC - An auction-based framework for real-time task assignment in spatial crowdsourcing." Computer Science Technical Reports (Los Angeles, California, USA: University of Southern California. Department of Computer Science) no. 966 (2016).
Asset Metadata
Creator
Asghari, Mohammad
(author),
Fan, Liyue
(author),
Shahabi, Cyrus
(author)
Core Title
USC Computer Science Technical Reports, no. 966 (2016)
Alternative Title
Auction-SC - An auction-based framework for real-time task assignment in spatial crowdsourcing (
title
)
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Tag
OAI-PMH Harvest
Format
12 pages
(extent),
technical reports
(aat)
Language
English
Unique identifier
UC16269695
Identifier
16-966 Auction-SC - An Auction-Based Framework for Real-Time Task Assignment in Spatial Crowdsourcing (filename)
Legacy Identifier
usc-cstr-16-966
Format
12 pages (extent),technical reports (aat)
Rights
Department of Computer Science (University of Southern California) and the author(s).
Internet Media Type
application/pdf
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/
Source
20180426-rozan-cstechreports-shoaf
(batch),
Computer Science Technical Report Archive
(collection),
University of Southern California. Department of Computer Science. Technical Reports
(series)
Access Conditions
The author(s) retain rights to their work according to U.S. copyright law. Electronic access is being provided by the USC Libraries, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Repository Email
csdept@usc.edu
Inherited Values
Title
Computer Science Technical Report Archive
Description
Archive of computer science technical reports published by the USC Department of Computer Science from 1991 - 2017.
Coverage Temporal
1991/2017
Repository Email
csdept@usc.edu
Repository Name
USC Viterbi School of Engineering Department of Computer Science
Repository Location
Department of Computer Science. USC Viterbi School of Engineering. Los Angeles\, CA\, 90089
Publisher
Department of Computer Science,USC Viterbi School of Engineering, University of Southern California, 3650 McClintock Avenue, Los Angeles, California, 90089, USA
(publisher)
Copyright
In copyright - Non-commercial use permitted (https://rightsstatements.org/vocab/InC-NC/1.0/