Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
GeoCrowd: a spatial crowdsourcing system implementation
(USC Thesis Other)
GeoCrowd: a spatial crowdsourcing system implementation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
University of Southern California Los Angeles GeoCrowd: A Spatial Crowdsourcing System Implementation A thesis submitted in partial satisfaction of the requirements for the degree Master of Science in Computer Science by Giorgos Constantinou 2014 c Copyright by Giorgos Constantinou 2014 Abstract of the Thesis GeoCrowd: A Spatial Crowdsourcing System Implementation by Giorgos Constantinou Master of Science in Computer Science University of Southern California, Los Angeles, 2014 Professor Prof. Cyrus Shahabi, Chair The increasing usage of smartphones, along with the ubiquitous internet ac- cess, brought to the research and market community a new type of crowdsourcing: spatial crowdsourcing. With spatial crowdsourcing each task is related to a spe- cic geographical location (e.g., latitude, longitude) and in order to be executed, the assigned workers need to physically travel at the task's location. In this thesis project, a novel spatial crowdsourcing system is introduced, coined GeoCrowd. GeoCrowd focuses on a specic mode of spatial crowdsourc- ing, named Server Assigned Tasks (SAT) mode. In this mode the server's goal is to solve the Maximum Task Assignment (MTA) problem (i.e. maximize the as- signments of available workers to tasks), while satisfying their constraints. Next, the three core components of the system and their implementation details are presented: 1) a web application, 2) a mobile application on Android and 3) a background process that runs the assignment algorithm. The current algorithmic solution is based on a ow network representation which can be eciently solved (i.e. O(VE 2 )) by computing its maximum ow when running the Edmonds-Karp algorithm. Subsequently, we present the integration of GeoCrowd in a multimedia plat- ii form, named MediaQ, which is used to collect crowdsourced mobile videos from the public. As the popularity of MediaQ platform expands to dierent universities, a par- allel, distributed version of maximum ow is investigated to provide a scalable solution based on the MapReduce programming model. Finally, we introduce new constraints to the MTA problem which makes the problem NP-Hard. A novel hybrid algorithm is proposed that uses dierent heuris- tics to solve the MTA problem. iii The thesis of Giorgos Constantinou is approved. Prof. William G.J. Halfond Prof. G erard G. Medioni Prof. Cyrus Shahabi, Committee Chair University of Southern California, Los Angeles 2014 iv Table of Contents 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 2 Background : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 System : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Front-End Components . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2.1 Mobile Application . . . . . . . . . . . . . . . . . . . . . . 8 3.2.2 Web Application Interfaces . . . . . . . . . . . . . . . . . . 8 3.3 Back-End Components . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.1 Web Services . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.2 GeoCrowd Engine . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.3 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4 GeoCrowd Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 10 4 Implementation: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 4.1 MediaQ Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Web Application Server . . . . . . . . . . . . . . . . . . . . . . . 14 4.2.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2.2 Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2.3 Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Android Mobile Application . . . . . . . . . . . . . . . . . . . . . 22 v 4.4 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4.1 DB Schema . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.5 GeoCrowd Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.6 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5 Experiments : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32 6 Extensions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 34 6.1 Distributed Max Flow Algorithm . . . . . . . . . . . . . . . . . . 34 6.1.1 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.2 Time-Limited Workers . . . . . . . . . . . . . . . . . . . . . . . . 35 6.2.1 Running Example . . . . . . . . . . . . . . . . . . . . . . . 37 6.2.2 Hybrid MTA Algorithm . . . . . . . . . . . . . . . . . . . 37 7 Related Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 44 7.1 Crowdsourcing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 7.2 Spatial Crowdsourcing . . . . . . . . . . . . . . . . . . . . . . . . 44 8 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 47 8.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 References : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 48 vi List of Figures 3.1 GeoCrowd system architecture. . . . . . . . . . . . . . . . . . . . 7 3.2 Instance problem of maximum task assignment (MTA). . . . . . . 11 3.3 Reduction of MTA to the maximum ow problem. . . . . . . . . . 12 4.1 MediaQ home page screenshot. Playing a crowdsourced mobile video uploaded with GeoCrowd. . . . . . . . . . . . . . . . . . . . 14 4.2 Create a new task screenshot. . . . . . . . . . . . . . . . . . . . . 18 4.3 View list of created tasks. . . . . . . . . . . . . . . . . . . . . . . 19 4.4 View task details. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.5 A list of Android mobile app screenshots . . . . . . . . . . . . . . 23 a Login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 b Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 c Task Inquiry . . . . . . . . . . . . . . . . . . . . . . . . . . 23 d Notication . . . . . . . . . . . . . . . . . . . . . . . . . . 23 e Assigned Tasks . . . . . . . . . . . . . . . . . . . . . . . . 23 f Task respond . . . . . . . . . . . . . . . . . . . . . . . . . 23 g Video List . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 h Video Options . . . . . . . . . . . . . . . . . . . . . . . . . 23 6.1 Example of Workers with Time Limit. . . . . . . . . . . . . . . . 38 vii List of Tables 4.1 Structure of table USERS . . . . . . . . . . . . . . . . . . . . . . 25 4.2 Structure of table USERS PROFILES . . . . . . . . . . . . . . . 25 4.3 Structure of table TASKS . . . . . . . . . . . . . . . . . . . . . . 26 4.4 Structure of table TASKS INQUIRIES . . . . . . . . . . . . . . . 27 4.5 Structure of table WORKERS ASSIGNMENTS . . . . . . . . . . 28 4.6 Structure of table WORKERS RESPONSE . . . . . . . . . . . . 28 5.1 GeoCrowd response time . . . . . . . . . . . . . . . . . . . . . . . 33 viii CHAPTER 1 Introduction Nowadays, due to the technological advances on software and hardware, most smartphone devices have ubiquitous access to the Internet through cellular or wireless networks. The vast majority of these devices are equipped with a number of sensors (e.g., GPS, camera, gyroscope etc) which can be used to sense and collect real time and high quality data. In addition, recent statistics [5] report an increase of 42.3 percent (or 968 million units) on smartphones sales in 2013 compared to 2012. Moreover, in 2013 a record-breaking number of smartphones sales exceeded by 53.6 percent the sales of feature phones. The same report predict that by the end of 2014 Android smartphone devices are going to approach 1 billion. Crowdsourcing has been emerged as a new platform to provide services or solve problems by utilizing labor from a large group of people. An example of such crowdsourcing platform is Amazon Mechanical Turk [1] which uses people's intelligence to perform tasks that usually cannot be solved by computers. A more recent application of crowdsourcing was triggered by the mysterious disappearance of the Malaysian Airlines MH370 Flight. DigitalGlobe [4] initiated a large scale crowdsourcing campaign in an eort to locate the Boeing 777 jetliner through their Tomnod [9] crowdsourcing platform. People who join the campaign can explore a small map segment, divided into cells, and pin point any ndings. The company reported that half a million users joined the campaign and they have about 100000 active users each minute. The continuous growth of smartphone usage and their capabilities along with 1 the very promising crowdsourcing platform, open a new kind of marketplace; spatial crowdsourcing. Spatial crowdsourcing was studied in [20, 21, 14]. Unlike crowdsourcing,spatial crowdsourcing requires that the workers need tophysically travel to the task's location in order to execute the task. In [20], a spatial crowd- sourcing taxonomy was introduced, while the study focuses on a specic mode of spatial crowdsourcing, named SAT mode. In this mode the server has the big pic- ture; all constraints of tasks and workers are known and the goal is to maximize the number of assigned tasks to workers. It was shown that this can be e- ciently solved by reducing the MTA problem to the maximum ow problem. This project uses the results of [20] to build and test the GeoCrowd system. In [18], Halim et al. provided a MapReduce-Based algorithm that solves the maximum ow problem. Our GeoCrowd system is shown to can be extended, using the methods introduced in [18], to be applied to a distributed system that supports the MapReduce programming model such as Hadoop [7]. In [14], Deng et al. focused on another mode of spatial crowdsourcing dened in [20], termed Worker Selected Tasks (WST). In this mode, the worker has a list of available tasks and can autonomously select a subset of them. However, new constraints were introduced to this problem: 1) each task has an expiration time and 2) it considers travel cost from the worker's current location to a task. The goal is to solve the Maximum Task Scheduling (MTS) problem, i.e. maximize the number of performed tasks by the worker. Similar to [14], in this project an extension of the GeoCrowd algorithm was studied where each task has an expiration time constraint. Additionally, two more constraints are added: 1) time limit constraint for each worker which is the total time the worker is available to perform tasks and 2) perform time for each task which is the time required to complete a task once arrived to its location. Like [14], the new constraints add extra complexity which makes the problem NP-Hard by reducing it to a specialized version of Travelling Salesman Person (TSP). A new hybrid algorithm is proposed 2 to solve the MTA problem which incorporates algorithms from [14] in some cases and a new algorithmic approach in others. The remainder of this project is organized as follows. Chapter 2 provides the preliminaries of our spatial crowdsourcing platform. Next, in Chapter 3 the GeoCrowd system and its underlying features are introduced. Chapter 4 discusses the implementation details of GeoCrowd system. Chapter 5 presents the exper- imental results. In Chapter 6 we provide some extensions we consider to add to the current implementation. In Chapter 7 we review the related work and nally, Chapter 8 concludes the study and discusses some future directions. 3 CHAPTER 2 Background In this chapter we dene a set of terminologies that will be used in the context of the SAT mode. We rst provide the preliminaries for the implemented GeoCrowd system and then we adjust them to incorporate the extensions described in Chap- ter 6. Finally, we discuss the complexity of the GeoCrowd algorithm. 2.1 Terminology Some of the denitions below are dened in [20]. However, the implementation of the GeoCrowd system and its extensions require additional attributes which are absent from [20]. Denition 1 (Requester). A requester r is a natural person who posts spatial tasks. Each requester is dened by the tuple r:hridi. A requester r has a unique id rid. Denition 2 (Spatial Task). A spatial task t is dened by the tuple t:hrid, tid, l(x, y), s, e, k, qi. A task t has a unique id tid, is created by a requester with id rid, should be performed k times at location l between start timestamp s and end timestamp e and is described by spatial query q. Denition 3 (Task Query). The query q is dened by the tuple q:hh, d, mti. A query q has a header h which brie y describes the query, a description d which provides a more detailed description of the query and media typemt which species the content to be submitted back to the requester. 4 Denition 4 (Worker). A worker w is a natural person who is able to perform spatial tasks. Each worker w is dened by the tuple w:hwid, oi. A worker w has a unique id wid and is either online or oine depending on the ag o. Once a worker becomes online, he is ready to accept spatial tasks. Denition 5 (Task Inquiry). A task inquiry is sent by an online worker w, who wants to perform spatial tasks. The task inquiry is dened by the tuple ti: hwid, tiid, l(x, y), RhSW(x 1 , y 1 ), NE(x 2 , y 2 )i, MaxTi. A task inquiry ti has a unique id tiid, is created by a worker at location l with id wid, who accepts at most MaxT spatial tasks within the rectangular region R. The region R is dened by the South-West (SW) and North-East (NE) coordinates. The following denitions add more constraints to the GeoCrowd system (and in extend complexity). These extended denitions are discussed in Chapter 6. Denition 6 (Extended Task Query). We extend the Denition 3 of task query by adding the performance time p; q: hh, d, mt, pi. The performance time p species the amount of time needed to execute this task after the worker arrives to the task's location. Denition 7 (Time Limited Task Inquiry). We extend the Denition 5 of task inquiry by adding a temporal constraint TL; ti:hwid, tiid, l, RhSW(x 1 ,y 1 ), NE(x 2 , y 2 )i, MaxT, TLi. The time limit constraint TL species the total amount of time the worker is willing to devote when executing tasks. 2.2 Complexity The current implementation of GeoCrowd system is proved to solve the MTA problem eciently (i.e., runs in polynomial time). In [20] and also shown in Section 3.4 the MTA problem can be solved by reducing it to the maximum ow problem. After constructing the network graph we can use any algorithm that 5 solves the maximum ow problem. GeoCrowd system employs the Edmonds-Karp algorithm [16] which solves the maximum ow problem eciently (i.e. O(VE 2 ))). We plan to extend the current implementation by using a distributed algorithm based on the MapReduce programming model. In [18] a promising and scalable algorithm was designed to compute the maximum ow problem on MapReduce. It was shown that for graphs with millions of nodes and edges, but with the property of a small diameter D (as our constructed network graph), the maximum ow can be computed in near linear runtimes with the graph size. In Chapter 6 we discuss this approach further. 6 CHAPTER 3 System In this chapter, we provide an overview of the system architecture and brie y explain its major components, both on the front-end and back-end sides. 3.1 Overview Figure 3.1: GeoCrowd system architecture. An overview of the GeoCrowd system architecture is illustrated in Figure 3.1. In order to support all necessary operations (e.g., task publishing, assignment processing, etc), the GeoCrowd system consists of two main components: 1) a smartphone app based on a mobile architecture, and 2) a server that manages all back and front-end functionalities (i.e., web services, user management, task assignment, interfaces). Both the mobile application and the server provide user interfaces for requesters and workers to allow the necessary system interaction. 7 The two main components are detailed below. 3.2 Front-End Components On the front-end various user interfaces are exposed to workers and requesters to allow them perform the basic operations. Workers can interact with the GeoCrowd system through the mobile app, while requesters interact with the system with the web app interfaces. 3.2.1 Mobile Application The mobile application is used by the workers to receive and execute tasks. The app includes a map-based interface to enter and post workers' task inquiries. The task inquiries data are posted to the GeoCrowd API for further processing. After a task inquiry is posted and validated, the worker immediately becomes online and is ready to be considered in the next task assignment cycle. Additionally, the app implements a background notication service to contin- uously receive updates from the GeoCrowd server for assigned tasks. The user can then check the status of assigned tasks (i.e., incomplete, complete, expired, active) and interact with them. 3.2.2 Web Application Interfaces The web application provides all interfaces to capture the requester's inputs that are needed before publishing a task to the GeoCrowd API. Similarly to the mo- bile application, it provides a multi-functional map-based interface, which allows requesters to publish tasks. Specically, requesters can dene the task details, which include Title, Description, Location, Expiry Date, K and Media Type to be captured. In addition, it supports interfaces to monitor, delete, view tasks' status 8 per requester and accept or decline a workers response. 3.3 Back-End Components 3.3.1 Web Services The web services connect the front-end of the mobile and web apps with the and back-end. GeoCrowd API The GeoCrowd API species how users should interact with the back-end. It can handle remote calls exposed by the API to send and receive GeoCrowd data (e.g., tasks, task inquiries, etc). Data received from the API are ltered and validated by the Data Processing module. Subse- quently the data are stored in the database. Notication API The notication API is built on top of the GeoCrowd API and is used to push notications to the newly task-assigned workers in real- time. This module operates in conjunction with the background notication service as described in section 3.2.1. 3.3.2 GeoCrowd Engine Task Assignment The Task Assignment module runs the assignment algorithm periodically. It rst retrieves all available workers and tasks from the database, builds the ow network and runs the Edmonds-Karp algorithm. After exe- cution assigned workers-tasks pairs are stored in the database and the No- tication API is triggered to inform all aected users. Data Processing The Data Processing receives the requested data from the GeoCrowd API. After the user pass the authentication and authorization check, the data is examined for malicious content. The resulting ltered 9 data are passed to the Query Interface. Query Interface The Query Interface provides the models that are designed to work with information in the database. Typically the models contain functions and access methods that help to retrieve, insert, and update in- formation in the database. 3.3.3 Database Data are safely stored in a database with spatial extensions for further processing. Spatial indices are created to support spatial queries performed by the Query Interface and to speed up the retrieval time. 3.4 GeoCrowd Algorithm The implementation of the spatial crowdsourcing algorithm, GeoCrowd, provides the mechanisms to support spatial tasks that are assigned and executed by human workers. The current version is an implementation of the method described by Kazemi and Shahabi [20] where the server is responsible for assigning workers to tasks, the algorithm of which is introduced below. Requesters (users who are in need of labor to collect media content) can create spatial tasks and send them to the server. Each spatial task is dened by a task and requester idtidrid respectively, its geo-locationl, the start times, the end timee, the number of videos to be crowdsourcedk and the queryq. A task is represented as per Denition 2 by the tuplehrid, tid, l(x, y), s, e, k, qi. Workers (users who are willing to collect media content for requesters) can send task inquiries (i.e., spatial regions of workers' interests) to the server. Each task inquiry is dened according to the Denition 5. The goal of the GeoCrowd algorithm is to assign as many tasks as possible to workers while respecting their constraints. For example, an instance 10 problem of the Maximum Task Assignment (MTA) is depicted in Figure 3.2. Figure 3.2: Instance problem of maximum task assignment (MTA). Figure 3.2 shows three workers (w 1 tow 3 ) along with their constraints, (MaxT 1 - MaxT 3 and R 1 -R 3 ) and the tasks (t 1 -t 10 ). In this scenario, it is clear that the taskst 1 andt 3 are not possible to be assigned to any of the workers since they are outside of every spatial region R. In addition, worker w 1 can only accept tasks t 2 , t 5 and t 7 but can perform only two of them because of the MaxT 1 constraint. In [20], it was proved that the MTA problem can be eciently solved in poly- nomial time by reducing it to the maximum ow problem. Figure 3.3 shows how the above mentioned instance problem can be reduced to the maximum ow prob- lem. Each worker and task are represented as vertices in a graph (v 1 -v 3 for w 1 -w 3 and v 4 -v 13 for t 1 -t 10 ). There is an edge between a worker and a task i the task location is within the spatial region R of the worker. The edge capacity between workers and tasks is limited to 1, since we desire that each worker can perform a 11 specic task once. Two new vertices are added, i.e., source (src) and destination (dest). There is an edge between the source node and each worker node with a weight equal to the MaxT of the worker's constraint, thus restricting the ow and in extend the number of assignments for this worker. Similarly, there is an edge between each task node to the destination node with a weight equal to K (the number of times that the task is going to be crowdsourced). In Figure 3.3 all weights are equal to 1 assuming that each task will be performed once. In the current algorithm implementation, the system is not restricted to K being equal to 1. After the graph construction any algorithm that solves the maxi- mum ow problem can be used. In our system, the Edmonds-Karp algorithm is implemented [16]. Figure 3.3: Reduction of MTA to the maximum ow problem. 12 CHAPTER 4 Implementation In this chapter we discuss the implementation details. In Section 4.1 we present the integration of GeoCrowd in a multimedia platform, named MediaQ. Section 4.2 and Section 4.3 explains the implementation of the web and mobile application re- spectively. Thereafter, in Section 4.4 we provide the details behind the database implementation, the required schema and supported queries. Finally, in Sec- tion 4.6 we discuss the challenges faced during the implementation. 4.1 MediaQ Integration MediaQ system was rst introduced in [22]. The main purpose of MediaQ is to provide a media management system that can eciently collect, organize, share, and search mobile multimedia contents from people's mobile devices using auto- matically tagged geospatial metadata. The advantage of MediaQ compared to other multimedia systems is that it attaches metadata to individual video seg- ments (up to frames) of the captured videos. Specically, uploaded videos are automatically annotated with spatial (i.e., location and direction), temporal (i.e., time) and context-related (i.e., keywords and people) metadata. GeoCrowd system is integrated as a core module of MediaQ. Primarily, the spatial crowdsourcing supported by GeoCrowd is used in MediaQ to collect data eciently and at scale in the cases where media contents are not available to users, either due to users lack of interests in specic videos or due to other spatial and 13 Figure 4.1: MediaQ home page screenshot. Playing a crowdsourced mobile video uploaded with GeoCrowd. temporal limitations. Additionally, we can collect real spatiotemporal data and perform experiments under a real-world system environment. Figure 4.1 shows a screenshot of the MediaQ home page. The video along with its captured trajectory and eld of view (FOV ) was collected on-demand by a public worker and uploaded via GeoCrowd. Various queries (i.e., Range, Region and Point) with advanced lters (i.e., time, direction, keyword and people) can be performed to search and retrieve the segment of the video of interest. In the remaining sections GeoCrowd is described within the context of MediaQ. 4.2 Web Application Server The Web Application Server runs on top of Fedora 17 OS equipped with an 64-bit Intel Core i7-950 quad-core processor 1 running on a 3.06 GHz clock speed. The 1 Intel Core i7-950 processor: http://ark.intel.com/products/37150/ Intel-Core-i7-950-Processor-8M-Cache-3_06-GHz-4_80-GTs-Intel-QPI. 14 web application is hosted on an Apache v2.2.23 web server, is written in PHP v5.4.17 and runs on top of CodeIgniter (CI ) PHP framework 2 v2.1.4. CI is an open source web application framework for rapid development. It provides a rich set of libraries and abstractions for commonly needed tasks and follows the popular Model-View-Controller (MVC ) development pattern. The main advantage of this pattern is that the presentation (i.e., the web page render- ing) is separate from the PHP code. In the context of GeoCrowd the three MVC components are detailed below. Listing 4.1: Model method to post task <?php c l a s s Task model extends CI Model f . . . . . . . . function p ost task ( $user id , $data )f $point = " 'POINT( $data>lng $data>l a t ) ' " ; $this>db>s e t ( ' TaskId ' , gen uuid ( ) ) ; $this>db>s e t ( ' T i t l e ' , $data>t i t l e ) ; $this>db>s e t ( ' Description ' , $data>d e s c r i p t i o n ) ; $this>db>s e t ( ' StartDate ' , date ( "Y md H: i : s " ) ) ; $this>db>s e t ( ' EndDate ' , $data>expiry ) ; $this>db>s e t ( ' Location ' , "GeomFromText( $point ) " , false ) ; $this>db>s e t ( 'K' , $data>maxhits ) ; $this>db>s e t ( ' RequesterId ' , $ u s e r i d ) ; $this>db>s e t ( 'Type ' , $data>contenttype ) ; i f ( ! $this>db>i n s e r t ( 'TASKS ' ) ) return FALSE; return T R U E; g . . . . . . . . ?> 2 CodeIgniter PHP framework: http://ellislab.com/codeigniter. 15 4.2.1 Models The Models represent the data structures. The model classes contain functions that are used to access the database and perform basic SQL statements (i.e., SELECT, UPDATE, DELETE, INSERT). Each controller that needs to access the database needs to load the appropriate model and use its methods. GeoCrowd implements two models: GeoCrowd Model This model contains all access methods to database tables that are related to GeoCrowd operations such as posting, deleting, retriev- ing a task or task inquiry. Moreover, it implements methods to retrieve worker's assignments and responses, requester's tasks and their status etc. Finally, it provides methods that are exclusively used by the GeoCrowd task assignment background job. These methods contain queries that return the possible workers-tasks pairs by respecting their constraints and other infor- mation (e.g., the remaining number a task needs to be crowdsourced). For example, the query contains a spatial function to check if the task is within the worker's area of availability. The retrieved pairs are used to build the network ow graph and the assignment algorithm is executed. The pairs that are assigned are inserted to the database by accessing another method of this model. A simple code example for posting a new task is given in Listing 4.1. The method accepts two parameters: the user id of the requester and the posted task data which are encapsulated in the $data variable. CodeIgniter uses a modied version of the Active Record Database Pattern which simplies the process of creating a query. The generated SQL INSERT query for this example is shown in Listing 4.2. Note that the models do not validate the input parameters; the controller which call this model method is respon- sible to check the user's input. Nevertheless, the calls to Active Record 16 Database methods, like $this!db!set(), are producing safer queries by automatically escaping all values. User Model The user model, as the name suggests, contains user-related meth- ods to access the user tables of the database (e.g., get user, change password, delete account etc). Listing 4.2: Generated query of Listing 4.1 INSERT INTO TASKS ( TaskId , Title , Description , StartDate , EndDate , Location , K, RequesterId , Type) VALUES ( ' 4489 fa5b29e844 ' , ' Video of Tommy Trojan ' , '<p>Record a <b>3 minute</b> video of Tommy Trojan during the business hours</p>' , '20140312 00:40:29 ' , '20150325 00:38:07 ' , GeomFromText( 'POINT(118.4434479475 34.064187787675) ' ) , 3 , ' 35 cfc15d9aab44 ' , ' video ' ) 4.2.2 Views The Views are used to construct the actual web page that is presented to the user. They can contain both HTML and PHP code. In CI, a view can also be a page fragment like a header or footer. A view is not accessed directly; an appropriate controller must be used to load the views and provide any dynamic data to use. Basically, the Views are the user interfaces of the web application. To provide the end users with a responsive graphical front end we used state of the art web technologies, including HTML5, CSS3, JavaScript libraries such as jQuery and jQueryUI. To deal with spatial data (e.g.when creating a task, the requester needs to specify the exact location) we make use of Google Maps 17 Javascript API V3. Finally, the Views are designed based on HTML5 Boilerplate 3 , a fast, robust, and adaptable front-end template and use the Bootstrap 4 front-end framework. GeoCrowd's most popular views are listed below: Create Task View The resulting web page is shown in Figure 4.2. It provides a map based interface to specify the data for a new task. The requester needs to set the task's location and other task's information, such as title, description, expiry date, type of data requested etc. Figure 4.2: Create a new task screenshot. Tasks Details View This view, depicted in Figure 4.3, provides the UI to view all tasks created by the logged in user/requester. The view lists all tasks with dierent color based on their status: 1) green when completed (i.e., the task was crowdsourced successfully), 2) blue when is pending (i.e., the task still needs to be crowdsourced) and 3) red when expired (i.e., the task end 3 HTML5 Boilerplate: http://html5boilerplate.com/. 4 Bootstrap: http://getbootstrap.com/. 18 date was reached without completion). From this view, the user can see a brief description of the task, delete the task or choose to navigate and check the complete status of the task. Figure 4.3: View list of created tasks. Task Progress View The progress of a particular task is displayed in this view. Figure 4.4 shows the task that was created by the rst view. The map based interface pin points the task location and any responds crowdsourced by the assigned workers. Assigned workers are listed below the map and the requester can choose to accept or reject a respond from a particular user. Note that the aforementioned views are just the core fragments of the actual HTML page that is loaded. GeoCrowd implements template views as well (e.g., the topbar, header, footer, JavaScript plugins etc) which are loaded before or after the views above as common fragments. This will be more clear after explaining how Controllers work. 19 Figure 4.4: View task details. 4.2.3 Controllers Controllers serve as an intermediary between the Models, the Views, and any other resources needed to process the HTTP request and generate a web page. They provide the connection between web and mobile interfaces (i.e.views) to the database (i.e.models). Web Services are built as controllers. All HTTP requests are made to an appropriate controller which is associated with a URI and retrieves data from the database, passes data to the views and returns the rendered HTML page. Listing 4.3: User Controller. User login method example c l a s s User extends CI Controller f . . . public function l o g i n ( $username=' ' ) f // Load user model $this>load>model ( ' user model ' ) ; $this>load>helper ( ' form ' ) ; $this>load>l i b r a r y ( ' f o r m v a l i d a t i o n ' ) ; // V a l i d a t i o n r u l e s apply i f ( $this>form validation>run ( ' l o g i n ' ) === FALSE ) f 20 // I s a browser ? i f ( ! $this>agent>is mobile app ( ) ) f $data [ ' username ' ] = $username ; $data [ ' t i t l e ' ] = APP NAME; // Load l o g i n page $this>load>view ( ' templates / header ' , $data ) ; $this>load>view ( ' account / l o g i n v i e w ' , $data ) ; $this>load>view ( ' templates / f o o t e r ' , $data ) ; $this>load>view ( ' account / account plugins ' , $data ) ; g else // Send JSON response $this> j s o n r e s p o n s e (FALSE) ; g else f $username = $this>input>post ( ' username ' ) ; // Log ' Login ' log message ( ' i n f o ' , date ( "Y md H: i : s " ) . "n tUser logged in ( s e s s i o n created ) : " . $username ) ; // I s a browser ? i f ( ! $this>agent>is mobile app ( ) ) f // Go to home r e d i r e c t ( 'home ' , ' r e f r e s h ' ) ; g else f $this> j s o n r e s p o n s e ( $this>session>userdata ( ' l o g g e d i n ' ) [ ' UserId ' ] ) ; g g g . . . Listing 4.3 shows a segment of the user controller which is used to log in a user. Initially, the controller loads the user model, helper functions and the form vali- dation library. The POST HTTP request is validated based on user-dened rules with a call to the form validation class ($this!form validation!run( 0 login 0 )). The login web method for example, checks that the username and password were indeed submitted, they are of the correct type, contain only permitted charac- ters (i.e., protect from Cross-site scripting) and must be within a minimum and 21 maximum length. If the validation succeeds a callback function (not show in List- ing 4.3) is triggered to check the username and password against the database. This callback function uses the user model to access the database as described above. Controllers are used both by the web and mobile application. For this reason the controller calls a user agent library function ($this! agent! is mobile app()). In case the request was made by the web application, the response should be an HTML web page. This is done by loading the views mentioned above by calling $this! load! view(< viewname >;< data >). It accepts two parameters: 1) the view lename and 2) the data to pass to this views, thus producing dynamically generated pages. Multiple calls concatenate the views together and produce a single web page. This way views can be combined in dierent controllers and be reused (e.g., the topbar view is loaded for every page). In case the request was made by the mobile app, the response is sent in JSON format. The GeoCrowd controller works in similar fashion. For each GeoCrowd oper- ation, the controller exposes a web method for each operation (e.g., post a task, send a task inquiry etc). The controller loads and use GeoCrowd model methods to query the database and returns the result as an HTML web page or a JSON response depending of the device the user used to access the system. 4.3 Android Mobile Application For the purpose of assigning spatial tasks to the crowd, an Android mobile ap- plication was developed. The application targets API Level 19 (KitKat) but is available for Android devices that run Android 4.0 (Ice Cream Sandwich/SDK 14) and above. The mobile app consists of several user interfaces that are summarized in Figure 4.5. Figure 4.5a shows the log in screen. From there, the user can select to 22 (a) Login (b) Dashboard (c) Task Inquiry (d) Notication (e) Assigned Tasks (f) Task respond (g) Video List (h) Video Options Figure 4.5: A list of Android mobile app screenshots sign in, create an account, capture a video or retrieve a lost password. Although, the user is able to capture a video, he/she is not able to upload it without logging in. The reason is that every user generated content is related to a user id. Without logging in this information will not be available to the server. We provide this functionality so the user can still record a video even when network connection is unreachable. The user can upload the video to the GeoCrowd server afterwards. After logging in, available options are displayed in a dashboard layout shown in Figure 4.5b. User who is ready to perform spatial tasks can send a task inquiry using the map based interface. In Figure 4.5c the rectangular region refers to the spatial worker constraint R and denes the area where the user can accept 23 spatial tasks. The app uses the Google Cloud Messaging (GCM) for Android to allow push notication to be sent on the device. Newly task-assigned workers are notied by service running in the background as shown in Figure 4.5d. The assigned tasks are displayed in a list format (Figure 4.5e) and provide information about the assigned task. GeoCrowd currently accepts two types of response formats: videos and text responses. Figure 4.5f shows an example of a video response. The worker can select to use one of the pre-recorded videos (Figure 4.5g) or record a new video. Given the restriction that the response needs to be at the task's location we deny workers' responses to be uploaded when they are taken 100 meters or more from the task's location. 4.4 Database GeoCrowd data are safely stored in a MySQL database 5.5.32 with spatial ex- tensions. Spatial indexes are created to support spatial queries performed by the Query Interface and to speed up the retrieval time. 4.4.1 DB Schema GeoCrowd database schema consists of several tables which are listed below: User tables User information are stored in two tables, Table 4.1 and Table 4.2. The reason is to separate security information, such as passwords, with other user information such as rst and last name. The User Id is the primary key and is used to relate each user with TASKS and/or TASKS INQUIRIES. Also we keep data for the user actions, such as last login, failed login attempts etc. 24 Table 4.1: Structure of table USERS Column Type Null Default UserId binary(16) No Password char(128) No Email varchar(128) No LoweredEmail varchar(128) No PasswordQuestion varchar(256) Yes NULL PasswordAnswer varchar(128) Yes NULL IsApproved bit(1) No b'1' IsLockedOut bit(1) No b'0' CreateDate datetime No LastLoginDate datetime No LastPasswordChangedDate datetime No LastLockoutDate datetime Yes NULL FailedPasswordAttemptCount varchar(45) Yes NULL FailedPasswordAttemptWindowStart datetime Yes NULL FailedPasswordAnswerAttemptCount varchar(45) Yes NULL FailedPasswordAnswerAttemptWindowStart datetime Yes NULL Comment text Yes NULL Salt char(128) No ResetPassword char(128) Yes NULL Active tinyint(1) No 1 Table 4.2: Structure of table USERS PROFILES Column Type Null Default UserId binary(16) No UserName varchar(128) No 25 Column Type Null Default LoweredUserName varchar(128) No IsAnonymous bit(1) No b'0' LastActivityDate datetime No PhoneNumber varchar(45) Yes NULL ProlePicture varchar(45) No /images/people/no image.png FirstName varchar(45) Yes NULL LastName varchar(45) Yes NULL UserLevel int(11) Yes NULL Tasks table Task information are stored in another table as shown in Table 4.3. The Task Id acts as a primary key and Requester Id refers to the user who created this task. Note that the Location eld is stored as a Geometry type and is spatially indexed. This eld can be used as parameter to spatial functions (e.g., point within region) that take advantage of the spatial indices to speed up the query. Confidence is not currently used by the system. This is part of future work in case we consider the worker's reputation when assigning him to a task. The other elds are self explanatory except the last three (InProgress, JobServer, PID). These three elds are used by the assignment job to indicate that this task is in assignment progress and identify which machine and process id is running the assignment algorithm. Table 4.3: Structure of table TASKS Column Type Null Default TaskId binary(16) No RequesterId binary(16) No Title varchar(100) No Description text No 26 Column Type Null Default StartDate datetime No EndDate datetime No Location geometry No Type text No IsCompleted bit(1) No b'0' Status text Yes NULL K smallint(5) No Condence tinyint(3) No 60 InProgress bit(1) No b'0' JobServer varchar(45) Yes NULL PID int(10) Yes NULL Task inquiries table Each worker can send multiple task inquiries which are stored in the Table 4.4. Worker Id refers to the user who sent the task inquiry. A user can send many task inquiries but only the last one is con- sidered active and will be used to the assignment process. The table stores information for the worker's constraints (i.e., Region and MaxT). Table 4.4: Structure of table TASKS INQUIRIES Column Type Null Default TaskInquiryId binary(16) No WorkerId binary(16) No Location geometry No Region geometry No MaxT tinyint(3) No Date datetime No IsActive bit(1) No b'1' 27 Worker assignments table When the assignment algorithm is executed, the assigned workers to tasks pairs are stored in Table 4.5. This tables keeps data about the progress of a task assignment, such as the assigned date. When the worker performs the task the completion date is recorded and the assignment is agged as completed. In such case the worker's response is recorded in Table 4.6. Table 4.5: Structure of table WORKERS ASSIGNMENTS Column Type Null Default TaskId binary(16) No TaskInquiryId binary(16) No IsCompleted bit(1) No b'0' AssignedDate datetime No CompletedDate datetime Yes NULL IsNotied bit(1) No b'0' IsRemoved bit(1) No b'0' Worker responses table Table 4.6 keeps the responses submitted by workers. The primary key is the Response Id and the pair Task Id and Task Inquiry Id refer to the Table 4.6 primary key. With this design, we allow each worker to submit multiple responses. The requester can select which responses to accept and this is stored in the IsChosen eld. Table 4.6: Structure of table WORKERS RESPONSE Column Type Null Default ResponseId binary(16) No TaskId binary(16) No TaskInquiryId binary(16) No ResponseDate datetime No 28 Column Type Null Default IsChosen bit(1) No b'0' The tables that store the uploaded video and text information are trivial and are omitted from this section. Worker's responses refer to these tables with the ResponseId. 4.5 GeoCrowd Algorithm In the current implementation a controller is used to solve the MTA problem pe- riodically. We scheduled the UNIX / Linux system crontab deamon to run every minute the assignment algorithm. The crontab le can be edited at the UNIX / Linux shell prompt with the following command: # crontab e The controller can be set to be executed every minute by appending the following line to the crontab le: # * * * * * /<path>/<to>/<controller> <method> <parameters> For security purposes, the controller is not accessible to clients like other con- trollers. It can only be executed through command line. Every minute the controller retrieves from database, through the query interface (and in extend the models), all active tasks (not expired and the number of times to be crowd- sourced K is not reached) and all active workers (the task inquiries that have less assignments than MaxT ). Spatial queries are performed to determine if a worker is able to perform a task with respect to his spatial region constraint R. The graph is constructed as mentioned above and the Edmonds-Karp algorithm runs to solve the max ow problem. After the max ow problem is solved, the task assignments can be determined by checking which edges carry ow between workers and tasks. New task assignments are inserted to the database (see Ta- 29 ble 4.6) and the notication API is informed to propagate this information to assigned workers. Although we set up the controller to run every minute, no research has been performed to decide what the best interval is. This is part of our future work after experimental evaluation is performed in large scale with real data. 4.6 Challenges In this section we discuss some of the challenges faced during the system's imple- mentation. Multiple task inquiries A worker may use the mobile app to send task in- quiries. Each task inquiry contain the worker's constraints (region R and maximum tasks MaxT ) and the system needs to handle the scenario when a worker sends multiple task inquiries. For this scenario we considered two options: 1) treat each task inquiry as a new worker, which means the al- gorithm will assign tasks within R to the (same) worker until the MaxT is reached and 2) consider only the latest task inquiry sent by the worker. The system implements the latter. A worker most likely wants to update the constraints and use the latests other than getting assignments on other task inquiries. A new problem that arises with this choice is when a worker is assigned to a task and then he/she sends a new task inquiry (which becomes the recent). What will happen to the new task assignment of the previous task inquiry? In such scenario, we keep all task assignments of past task inquiries and make sure that the worker will not be assigned to the same task with the new task inquiry. Delete a task The requesters may want to delete a task. The system allows this as long as the task was created within ve minutes before the attempt 30 to delete or when there are no task assignments on this task. The rst rule ensures that the requester can delete a task which was accidentally created and the second rule favors the workers who were already assigned. In a real spatial crowdsourcing system a worker heading to the task will be disappointed if his task was deleted. As part of future work, we will provide some kind of award to the worker who perform the task and user ratings. Deleting a task will aect both of the attributes. Remaining assignments A worker may get assigned to tasks from dierent executions of the task assignment algorithm. However, when constructing the network ow graph the system should check the existing assignments and update the graph correspondingly. For example, a worker who is already assigned to two tasks and his task inquiry has the constraint MaxT=3, then the edge from the source node to this worker node should have capacity 1 as described in 3.4. Deleting an assignment A requester, instead of deleting the task, can delete an assignment of a specic worker. The system must keep some information about the deleted assignment so that subsequent executions of the task assignment algorithm do not assign the same worker to the same task. This is achieved by the IsRemoved eld in Table 4.5. Workers as requesters A worker can be a requester and vice versa. However, a requester cannot be a worker for his own task. 31 CHAPTER 5 Experiments We have tested the feature within the USC campus with real scenarios. Specif- ically, geospatial tasks were created through the MediaQ web application, by selecting dierent locations around the USC campus and varying task constraints (i.e., expiration date, number of videos to be crowdsourced, and task descriptions). To imitate multiple workers, we used the MediaQ mobile app with dierent user accounts to send multiple task inquiries. The rectangular regions of the task in- quiries were intentionally chosen as multiple cases such as none, one or multiple tasks, while varying the MaxT parameter. The GeoCrowd algorithm was sched- uled to solve an MTA instance every one minute, and new workers' assignments were inserted into the database. Thus, assigned workers were automatically no- tied of their task assignments within one minute. The results were correct at this campus-scale experiment with ten workers. However, this experiment was for proof of concept. We performed a stress test on a larger scale using real world data (i.e., tasks and task inquiries) from public users. The data set was obtained from Gowalla [6], a location-based social networking website where users share their locations by checking-in. The results are reported in Table 5.1. The Total response time includes DB retrieval time, network graph construction, maximum ow execution time and the DB insertion time. The tasks were intentionally inserted in every workers' spatial region, thus maximizing the number of edges between workers and tasks. 32 Table 5.1 shows that when increasing both tasks and workers the total response time approaches one minute. If the execution takes more than a minute, multiple processes of the task assignment job will be executed (since the algorithm runs each minute). Consequently, this will cause a problem in long term as the resources of the system are limited. Table 5.1: GeoCrowd response time Tasks # Workers # Edges Total Response Time (sec) 0 0 0 0.005299 1 1 3 0.105916 10 10 120 0.762789 50 50 2600 19.432897 100 100 10200 53.246348 33 CHAPTER 6 Extensions In this chapter we discuss some extensions we consider to incorporate in the future. First, we present a parallel solution to compute the maximum ow algorithm. This can be directly applied to scale up the existing system implementation. Later, we discuss a new version of the MTA problem by adding additional constraints. The MTA problem becomes NP-Hard which means heuristics must be used to approximate the solution. 6.1 Distributed Max Flow Algorithm MediaQ system (see Section 4.1) and its spatial crowdsourcing features were used within USC campus to collect data from real users who acted as workers and requesters. The increasing popularity of our MediaQ system gained attention and research interest from several other Universities including Hong Kong Uni- versity of Science and Technology (HKUST), National University of Singapore (NUS), Ludwig Maximilian University of Munich, Tsinghua University, Pusan National University (PNU) and King Abdullah University of Science and Tech- nology (KAUST). The focus of the GeoCrowd implementation so far was to assign as much tasks to workers as possible. Despite the task assignment algorithm is fast, our centralized approach cannot scale when the number of tasks and workers increase. This becomes more evident with the expansion of the platform to Universities 34 across the world. In [18] a promising and scalable algorithm was designed to compute the maxi- mum ow problem on MapReduce [13] (MR). MR is a simple programming model introduced by Google and an associated implementation for parallel processing and generating large data sets that runs on large number of commodity machines. Hadoop [7] is an open source implementation of the MR framework. 6.1.1 Scalability In [18], Halim et al. study an optimized and distributed version of the Ford- Fulkerson [17] that can be applied to the MR framework. It was shown that a direct conversion of the Ford-Fulkerson algorithm to the MR framework is not scalable as the rounds of MR executions depend on the resulting maximum ow. To optimize the algorithm further they study dierent variants which can speed up the average execution time. 6.2 Time-Limited Workers Until so far we investigate the impact of two worker constraints, i.e. the maximum number of assigned tasks MaxT and the spatial region R where a worker is avail- able to perform tasks. In this section we introduce another type of constraint. We rst introduce a temporal constraint on workers; the time-limit (TL). TL refers to the maximum amount of time the worker is willing to spend to complete the tasks assigned to him. Additionally, a performance time p is added to the tasks which denes the time required to complete the task after arriving at the task's location. Although we dened a task with an expiration time e, this constraint was not used while constructing the network graph. Expired tasks did not take part in the assignment process and was only used to present their status on the user interface (see 4.2.2). 35 The exact denition of the extended task inquiries and tasks are dened in 2. In this section we consider all the aforementioned constraints. We prove that MTA problem becomes NP-Hard by reduction from the Maximum Task Scheduling (MTS) presented in [14]. In [14], a specialized version of the Travelling Salesman Problem (TSP) was reduced to the MTS problem, thus proved to be NP-Hard. The new task assignment algorithm highly depends on the order the worker visits the tasks. Thus, a solution to the problem must provide for each worker a sequence of tasks to visit. Proof (Sketch). First we need to show that the modied MTS problem is in NP. Given a sequence of tasks for each worker, we can decide in polynomial time the total number of completed tasks by calculating the arrival time to each task by checking if the deadline (considering the performance time) was met and the time limit was not reached. We now show that MTS P MTA. Consider an instance of MTS. For each task in MTS, we create a task in MTA, positioned at the same location and having the same expiration time. Since tasks in MTS do not require a performance time p and the tasks are crowdsourced once we set p value to zero and k value to one respectively for all tasks. Since the MTS deals with just one worker w, we construct a worker w in MTA. We set the worker's constraints as follows: the rectangular region R is set to cover all tasks in MTS (since we know where each task is located we can extend R to cover all of them), the time limit TL does not exist in MTS, so we set TL to be innite, giving the worker enough time to visit all tasks (if is possible due to tasks' constraints). Solving the MTA, will output the maximum number of task assignments in a sequence for this particular worker. This will also serve as the result to the MTS. This construction completes our proof. 36 6.2.1 Running Example Figure 6.1 illustrates an instance of the new MTA problem. It includes 4 workers and 9 tasks. The locationl and time limitTL constraint are shown on top of each worker icon while the region R is color coded with the same worker's color. For instance, the spatial region R of w 1 covers t 1 , t 2 and t 3 . Also, tasks in diamond shapes, are placed on the 2D grid. The d constraint refers to the deadline of the task, i.e. the dierence of t i .e-t i .p (expiration time - performance time). This means that if workers arrive after the timet i .d they will not be able to perform the task on time. The travel cost on the grid is calculated based on the Manhattan distance. For example, w 1 need 10 time units to reach t 2 . The goal is to assign as much tasks as possible to the workers, while respecting their constraints. The problem becomes challenging as the worker needs to visit many tasks. It must be obvious that each worker needs to have a tour to visit the tasks in some order. What makes the problem even more challenging is that each worker is competing with others to reach as much tasks before they expired. 6.2.2 Hybrid MTA Algorithm A hybrid algorithm 1 is proposed which solves the MTA problem approximately. The algorithm accepts as an input a set of available tasks and a set of available workers along with their constraints. Lines 1-14 initializes the algorithm. Line 1 initializes three counters which keep the number of assignments, the number of tasks remained unassigned and the number of pruned tasks respectively. As- signed tasks are going to be performed by workers. Tasks remained unassigned is the number of tasks that do not have any candidate workers. For example, in Figure 6.1 t 7 does not have any chance to be assigned to a worker since it is outside of every spatial region. Pruned tasks counter calculates the total tasks that could not be assigned because of the worker's constraint, task's constraint or 37 Figure 6.1: Example of Workers with Time Limit. in combination. Each worker keeps three lists: for candidate tasks, for assigned tasks in visiting order and for tasks that were candidates but not assigned to him. Each task keeps a priority queue of candidate workers and two lists to keep track of assigned workers and workers who initially were candidates but not assigned to the task. Lines 2-9 initializes the lists and queue. Lines 10-14 iterate over all workers and tasks and ll the initial candidate list of the worker and candidate priority queue of the task. Note the call to method toPrune() at line 12 which is given by Algorithm 2. This method accepts 3 parameters: a worker, a task and a ag that species whether or not to use the spatial region lter. This method returns true if one of the constraints is violated. Line 1 checks if the task, given as input, has already reached the maximum execution times k. In that case, the candidate worker 38 w cannot be assigned to this task and the method returns false. Line 3 checks if the worker has reached the MaxT limit. Recall that this limit refers to the maximum number of tasks the worker is willing to perform. Next, at line 5 if the ag was set to true the line 6 will check whether the task is within the spatial region constraint R of the worker. Later, at line 8, the method checks if the worker's time limit constraint TL is exceeded. The calculation is given as follows w.TL < travel cost(w, t) + t.p. The travel cost calculates the time required to reach t from the worker's last location. This can be either the initial worker's location or the location of the last task assigned to him. Finally, line 10 checks if the task's end time e was exceeded. The calculation is given as follows t.e - t.p < travel cost(w, t) + w.time. t.e - t.p refers to the deadline t.d as depicted in example Figure 6.1. This is the time when the worker needs to reach the task in order to complete it, given that it needs t.p time to execute it. The travel cost is calculated the same way as described above. Thew.time refers to the time elapsed so far. This can be zero when the algorithm is not yet executed or it is the total travel time plus the total execution time of all assigned tasks of the worker up in the order of assignment. Now lets get back to the Algorithm 1. At line 15, we take advantage of the research done on the MTS problem by Deng et al. [14]. When a worker does not have any competing tasks, like w 4 in Figure 6.1, we can independently run the algorithms given in [14]. In [14], Deng et al.develop algorithms that can solve the MTS problem exactly and approximately. Here, we propose to use a threshold; if the number of tasks for this worker is less than the threshold we can use the dynamic programming algorithm proposed in [14]. Otherwise, we can either use the code provided below or use the heuristics proposed in [14]. Finding appropriate threshold is part of future work. However, for competing workers (i.e.workers who must compete with others for candidate tasks, such as w 1 - w 3 ), the MTA cannot use any of the proposed algo- rithms in [14]. Here we propose an approximate algorithm to solve the remaining 39 problem. All non-competing tasks are sorted based on the number of candidate workers in their corresponding priority queue and break ties with the deadline value of each task t.d (t.e - t.p). This is what line 17 does. For example, in Figure 6.1 the tasks will be sorted like this: t 7 , t 1 , t 2 , t 5 , t 6 , t 4 , t 3 . t 7 is rst because zero workers are available. t 1 , t 2 , t 5 and t 6 have one candidate worker whereas t 4 and t 3 have two, so they appear rst in the list. The tie breaker is used among t 1 , t 2 , t 5 and t 6 . t 1 appears before t 2 before it has a sooner deadline (10 vs 11) and so on. The intuition here is that tasks with less workers (e.g. only one) will have less chance to get assigned, so we prioritize them rst. The tie breaker on deadline is used with the hope that workers will reach earlier the tasks which are going to expire soon. Line 18 starts an innite loop which is break when condition at line 38 is satised, i.e. when all tasks are removed from the sorted list. We iterate over tasks at line 19 and poll the rst nearest candidate worker from the priority queue oft i . The priority queue for each task keeps the candidate workers in order of the nearest worker. We then apply two tie breakers in case two workers have the same distance from task. The rst one gives more priority to the worker with the least assigned tasks and then to the worker with the most unassigned tasks. The intuition here is that if a worker is near the task, the travel cost is less and the chances that he can later perform other tasks are increased. The rst tie breaker will assign a worker to a task who is not assigned to as many assignments as others and the second will prioritize an unfortunate worker who keeps increasing his unassigned list. Line 21 checks if indeed we have an available worker. For example, t 7 does not have one so it is removed from the sorted list (line 22). In case some of the constraints is violated (line 25), then we remove the task from the worker's candidate list and put it in the unassigned list. Symmetrically, we insert the 40 worker to the task's unassigned list and update the priority queues (line 29) of all tasks that include this worker. This is required because the task was inserted to the worker's unassigned list and its size is the tie breaker we used above. If the assignment is not pruned lines 31-35 are executed. The tasks is added to the assigned tasks of the worker (line 31) and symmetrically to the assigned workers of the task (line 32). The current location of the worker is updated (line 33) and the travel cost and task's perform time is re ected by updating the elapsed time. The dynamic nature of tasks' priority queues require the update at line 35. Line 36 checks if the candidate workers priority queue is empty and it is re- moved from the sorted list. Finally, the algorithm exits at line 40 and returns the assigned tasks per worker. 41 Algorithm 1 Hybrid MTA algorithm(T, W) Input: T the set of available tasks, W the set of available workers Output: For each worker a valid sequence of tasks to visit 1: assigned = 0, unassigned = 0, pruned = 0 2: for all w i 2 W do 3: w i cand T ; 4: w i assigned T ; 5: w i unassigned T ; 6: for all t i 2 T do 7: t i cand W pq ; 8: t i assigned W ; 9: t i unassigned W ; 10: for all w i 2 W do 11: for all t i 2 T do 12: if !toPrune(w i , t i , true) then 13: w i cand T w i cand T S t i 14: t i cand W pq t i cand W pq S w i 15: for all not competing workers w i 2 W do 16: Use the algorithms from [14] 17: T sortedList = sort(T, t i cand W pq) 18: while true do 19: for all t i 2 T sortedList do 20: nn worker = getNextCandidateWorker(t i cand W pq) 21: if !nn worker then 22: T sortedList T sortedList -ft i g 23: unassigned++ 24: continue 25: if toPrune(nn worker, t i , false) then 26: pruned++ 27: nn worker unassigned T nn worker unassigned T S t i 28: t i unassigned W t i unassigned W S nn worker 29: updatePriorityQueues(nn worker) 30: else 31: nn worker assigned T nn worker assigned T S t i 32: t i assigned W t i assigned W S nn worker 33: updateLocation(nn worker) 34: updateTimeElapsed(nn worker) 35: updatePriorityQueues(nn worker) 36: if !empty(t i cand W pq) then 37: T sortedList T sortedList -ft i g 38: if empty(T sortedList) then 39: break 40: return W assigned T 42 Algorithm 2 toPrune(w, t, withinRegionFilterFlag) Input: worker w, task t, ag withinRegionFilterFlag Output: true if task cannot be assign to worker, otherwise false 1: if reachK(t) then 2: return false 3: if reachMaxT(w) then 4: return false 5: if withinRegionFilterFlag then 6: if !taskWithinWorkersRegion(w, t) then 7: return false 8: if exceedsWorkersTime(w, t) then 9: return false 10: if exceedsTasksTime(w, t) then 11: return false 12: return true 43 CHAPTER 7 Related Work In this Chapter, we discuss the related work in the area of crowdsourcing and spatial crowdsourcing. 7.1 Crowdsourcing Recently, crowdsourcing has gained much attention from both the research and industry community. A survey on crowdsourcing system can be found in [15]. Popular examples of such crowdsourcing platform are the Amazon Mechanical Turk [1] and CrowdFlower [3]. In these systems, tasks are publicly posted by companies or individuals and workers can execute them online. As mentioned earlier, a more recent application of crowdsourcing was the online campaign to help the search of the mysterious disappearance of the Malaysian Airlines MH370 Flight [4, 9]. Crowdsourcing applications were also used in image search [26], nat- ural language annotations [25], video annotations [1, 12], image annotations [24], architecture [2] etc. 7.2 Spatial Crowdsourcing Despite all the studies on the area of crowdsourcing, only a few exist have stud- ied spatial crowdsourcing [11, 20, 21, 14]. In [20], under the SAT mode, the MTA problem was eciently solved by reducing it to the maximum ow problem. In [21], the study of [20] was extended to include a condence level for each spatial 44 task, i.e. an answer to a task is accepted only when the condence level of the task is satised. To accomplish this, each worker was associated with a reputa- tion score. In [14], Deng et al. focused on another mode of spatial crowdsourcing, originally dened in [20], termed Worker Selected Tasks (WST). In this mode, the worker has a list of available tasks and can autonomously select a subset of them. The goal is to solve the Maximum Task Scheduling (MTS) problem, i.e. maximize the number of performed tasks by the worker. An example of spatial crowdsourcing system is TaskRabbit [8]. Similar to GeoCrowd, TaskRabbit allows requesters to describe and post a task. Requesters can either choose their workers (or TaskRabbits) or the company picks them up for them. Although, we do not exactly know the details behind their implementation, GeoCrowd diers from TaskRabbit from the fact that the assignment process is completely automatic and it considers several constraints on both workers and tasks, which makes it more generic. Another spatial crowdsourcing example is Uber [10]. Uber is a mobile application which allows people act as drivers and riders. Riders can request a ride (task) and the system assigns drivers (workers) to pick up the customer (like a taxi service). GeoCrowd diers from Uber, because Uber is specialized only on the assignment of drivers to workers. The assignment process is simpler than GeoCrowd's, since it considers only the nearest available drivers and their reputation. Also, the assignment is always limited to one-to-one relationship. In this project the system implementation uses the algorithms introduced in [20]. However, none of these existing studies and systems consider the time limit constraint of workers, and the competing nature of the MTA problem discussed in Chapter 6. Moreover, the spatial crowdsourcing system architecture overcomes the limitations of several participatory sensing systems. These systems [19, 23] mainly focus on a single campaign and try to address challenges related to this specic campaign. With GeoCrowd system we want to provide a generic spatial 45 crowdsourcing system that can be used by many campaigns simultaneously. 46 CHAPTER 8 Conclusion In this thesis project, a robust spatial crowdsourcing system was developed. We believe that GeoCrowd is the rst generic spatial crowdsourcing system as we consider workers' and tasks' constraints during the assignment process. The main three core components of the system were analyzed. In our experiments we show that GeoCrowd system can perform well with real data. 8.1 Future Work In Chapter 6 we propose two future extensions of the GeoCrowd system. The MapReduce implementation is directly applicable to the existing system. Cur- rently the implementation was tested on a single node machine. As future work we aim to set up a set of clusters and run experiments on Hadoop to test the scalability of the system. Moreover, although the extended MTA algorithm dis- cussed in Chapter 6 was implemented it is not yet incorporated in GeoCrowd. After integrating with GeoCrowd, we plan to run experiments with real data to evaluate its performance. In addition, privacy, reputation and reward were not taken into consideration in the current system. By incorporating them in the future will be benecial towards attracting new users as they are concerned for their location privacy and the quality of the crowdsourced answers. 47 References [1] Amazon mechanical turk. http://www.mturk.com. [2] Arcbazar. http://www.arcbazar.com/. [3] Crowd ower. http://www.crowdflower.com/. [4] Digital globe. http://www.digitalglobe.com/. [5] Gartner's report on smartphone sales.http://www.gartner.com/newsroom/ id/2665715. [6] Gowalla. http://snap.stanford.edu/data/loc-gowalla.html. [7] Hadoop. http://hadoop.apache.org/. [8] Task rabbit. https://www.taskrabbit.com/. [9] Tomnod. http://www.tomnod.com. [10] Uber. https://www.uber.com/. [11] Florian Alt, Alireza Sahami Shirazi, Albrecht Schmidt, Urs Kramer, and Zahid Nawaz. Location-based crowdsourcing: Extending crowdsourcing to the real world. In Proceedings of the 6th Nordic Conference on Human- Computer Interaction: Extending Boundaries, NordiCHI '10, pages 13{22, New York, NY, USA, 2010. ACM. [12] Kuan-Ta Chen, Chen-Chi Wu, Yu-Chun Chang, and Chin-Laung Lei. A crowdsourceable qoe evaluation framework for multimedia content. In Pro- ceedings of the 17th ACM International Conference on Multimedia, MM '09, pages 491{500, New York, NY, USA, 2009. ACM. [13] Jerey Dean and Sanjay Ghemawat. Mapreduce: Simplied data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6, OSDI'04, pages 10{10, Berkeley, CA, USA, 2004. USENIX Association. [14] Dingxiong Deng, Cyrus Shahabi, and Ugur Demiryurek. Maximizing the number of worker's self-selected tasks in spatial crowdsourcing. In Proceed- ings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL '13, pages 314{323, New York, NY, USA, 2013. ACM. [15] Anhai Doan, Raghu Ramakrishnan, and Alon Y. Halevy. Crowdsourcing systems on the world-wide web. Commun. ACM, 54(4):86{96, April 2011. 48 [16] Jack Edmonds and Richard M. Karp. Theoretical improvements in algo- rithmic eciency for network ow problems. J. ACM, 19(2):248{264, April 1972. [17] L. R. Ford and D. R. Fulkerson. Maximal Flow through a Network. Canadian Journal of Mathematics, 8:399{404. [18] Felix Halim, Roland H. C. Yap, and Yongzheng Wu. A mapreduce-based maximum- ow algorithm for large small-world network graphs. In ICDCS, pages 192{202, 2011. [19] Bret Hull, Vladimir Bychkovsky, Yang Zhang, Kevin Chen, Michel Goraczko, Allen Miu, Eugene Shih, Hari Balakrishnan, and Samuel Madden. Cartel: a distributed mobile sensor computing system. In In 4th ACM SenSys, pages 125{138, 2006. [20] Leyla Kazemi and Cyrus Shahabi. Geocrowd: Enabling query answering with spatial crowdsourcing. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems, SIGSPATIAL '12, pages 189{198, New York, NY, USA, 2012. ACM. [21] Leyla Kazemi, Cyrus Shahabi, and Lei Chen. Geotrucrowd: Trustworthy query answering with spatial crowdsourcing. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Informa- tion Systems, SIGSPATIAL'13, pages 314{323, New York, NY, USA, 2013. ACM. [22] Seon Ho Kim, Ying Lu, Giorgos Constantinou, Cyrus Shahabi, Guanfeng Wang, and Roger Zimmermann. Mediaq: Mobile multimedia management system. In Proceedings of the 5th ACM Multimedia Systems Conference, MMSys '14, New York, NY, USA, 2014. ACM. [23] Prashanth Mohan, Venkata N. Padmanabhan, and Ramachandran Ramjee. Nericell: Rich monitoring of road and trac conditions using mobile smart- phones. In Proceedings of the 6th ACM Conference on Embedded Network Sensor Systems, SenSys '08, pages 323{336, New York, NY, USA, 2008. ACM. [24] Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. Col- lecting image annotations using amazon's mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT '10, pages 139{147, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. [25] Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. Cheap and fast|but is it good?: Evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in 49 Natural Language Processing, EMNLP '08, pages 254{263, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics. [26] Tingxin Yan, Vikas Kumar, and Deepak Ganesan. Crowdsearch: Exploiting crowds for accurate real-time image search on mobile phones. In Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services, MobiSys '10, pages 77{90, New York, NY, USA, 2010. ACM. 50
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Dynamic pricing and task assignment in real-time spatial crowdsourcing platforms
PDF
Enabling query answering in a trustworthy privacy-aware spatial crowdsourcing
PDF
Location privacy in spatial crowdsourcing
PDF
Efficient crowd-based visual learning for edge devices
PDF
Cloud-enabled mobile sensing systems
PDF
Partitioning, indexing and querying spatial data on cloud
PDF
Efficient indexing and querying of geo-tagged mobile videos
PDF
Location-based spatial queries in mobile environments
PDF
Scalable processing of spatial queries
PDF
Scalable evacuation routing in dynamic environments
PDF
Inferring mobility behaviors from trajectory datasets
PDF
Toward understanding mobile apps at scale
PDF
Query processing in time-dependent spatial networks
PDF
Privacy-aware geo-marketplaces
PDF
Improving efficiency, privacy and robustness for crowd‐sensing applications
PDF
Analysis of embedded software architecture with precedent dependent aperiodic tasks
PDF
Multiple humnas tracking by learning appearance and motion patterns
PDF
Energy optimization of mobile applications
PDF
Combining textual Web search with spatial, temporal and social aspects of the Web
PDF
Scalable data integration under constraints
Asset Metadata
Creator
Constantinou, Giorgos
(author)
Core Title
GeoCrowd: a spatial crowdsourcing system implementation
School
Viterbi School of Engineering
Degree
Master of Science
Degree Program
Computer Science
Publication Date
04/29/2014
Defense Date
03/31/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
crowdsourcing,mobile multimedia,OAI-PMH Harvest,spatial crowdsourcing,spatial task assignment
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Shahabi, Cyrus (
committee chair
), Halfond, William G. J. (
committee member
), Medioni, Gérard G. (
committee member
)
Creator Email
gconstan@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-405541
Unique identifier
UC11296422
Identifier
etd-Constantin-2442.pdf (filename),usctheses-c3-405541 (legacy record id)
Legacy Identifier
etd-Constantin-2442.pdf
Dmrecord
405541
Document Type
Thesis
Format
application/pdf (imt)
Rights
Constantinou, Giorgos
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
crowdsourcing
mobile multimedia
spatial crowdsourcing
spatial task assignment