Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Dynamic pricing and task assignment in real-time spatial crowdsourcing platforms
(USC Thesis Other)
Dynamic pricing and task assignment in real-time spatial crowdsourcing platforms
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DYNAMIC PRICING AND TASK ASSIGNMENT IN REAL-TIME SPATIAL CROWDSOURCING PLATFORMS by Mohammad Asghari A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2018 Copyright 2018 Mohammad Asghari Abstract A new platform, termed spatial crowdsourcing (SC), is emerging that enables a requester to commission workers to physically travel to some specied locations to perform a set of spatial tasks (i.e., tasks related to a geographical location and time) by paying the workers a certain fee for each completed task. Similar to any other marketplace, a spatial crowdsourcing platform's success relies on the participation of requesters and workers. Consequently, the fee for each task should be high enough for workers to participate in the platform but not too high to discourage the requesters. Furthermore, for spatial crowdsourcing to scale to thousands of workers and tasks, it should be able to eciently assign tasks to workers, which in turn consists of both matching tasks to workers and computing a schedule for each worker. In this dissertation, we discuss two key problems in spatial crowdsourcing. Assuming that the goal of a spatial crowdsourcing platform is to ultimately increase its revenue, we rst focus on the problem of dynamically pricing tasks in spatial crowdsourcing. Current approaches consider the platform's current supply and demand at the location where the task is posted and price the task accordingly. Instead, we introduce a pricing method that also considers the network's future demand and can increase the platform's revenue while lowering the prices as compared to state-of-the-art algorithms. Next, we consider the task ii assignment problem in spatial crowdsourcing. As mentioned this consists of both match- ing tasks to workers and computing a schedule for each worker. The existing approaches for task assignment in spatial crowdsourcing cannot scale as either task matching or task scheduling will become a bottleneck. Instead, we propose an on-line assignment approach utilizing an auction-based framework where workers bid on every arriving task and the server determines the highest bidder, resulting in splitting the assignment responsibil- ity between workers (for scheduling) and the server (for matching) and thus eliminating all bottlenecks. Finally, we apply this auction framework to a commercial ridesharing platform as a real-world application of an SC system. We utilize the framework for max- imizing the platform provider's prot while satisfying both the workers' and requesters' monetary constraints. iii Acknowledgements I am very grateful to my advisor Cyrus Shahabi for piquing my interest in spatial data management through the class I took with him during my M.Sc. studies and later throughout my Ph.D. at USC. Cyrus's patience, enthusiasm and immense knowledge helped me develop during the doctoral program and complete my dissertation. His eye for identifying dicult, yet practical problems were nothing short of exemplary for me. I am sincerely thankful to him for always being there with his wisdom when I needed help. I would like to thank my dissertation committee members, Dr. Antonio Ortega and Dr. Craig Knoblock, for providing additional guidance and suggestions for completing my dissertation. I would also like to extend my appreciations to Dr. Milind Tambe and Dr. Hamid Nazerzadeh for being part of my qualifying exam committee for their feedback and comments on my proposal topic. I had the great opportunity to work with many great peers from college to graduate school. Classmates from my undergraduate years who always motivated me to be at my best to succeed. Members of University of Southern California's Information Lab who were by my side every step of the way and on the endless nights in the oce. iv I am eternally greatful to my parents without whom I would not be the person who I am today. My brother, who I've always looked up to and my sister for being the bundle of joy in our lives.. Last but not the least I thank my wife, Sara. Words cannot express my gratitude for her love and support throughout the past 7 years. At my lowest times during these years, she was the one who gave me the inspiration I needed to continue and I cannot imagine going through the doctoral program without her. v Table of Contents Abstract ii Acknowledgements iv List Of Tables ix List Of Figures x Chapter 1 Introduction 1 1.0.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.0.2 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.0.3 Dynamic Pricing in Spatial Crowdsourcing . . . . . . . . . . . . . 2 1.0.4 Task Assignment in Spatial Crowdsourcing . . . . . . . . . . . . . 5 1.0.5 Real-time Ridesharing . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 2 Related Work 12 2.1 Dynamic Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Task Assignment in Spatial Crowdsourcing . . . . . . . . . . . . . . . . . 14 2.3 Real-time Ridesharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 3 Dynamic Pricing 18 3.1 Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.2 The Revenue Maximization Problem . . . . . . . . . . . . . . . . . 20 3.2 ADAPT-Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 Baseline: Local Optimization . . . . . . . . . . . . . . . . . . . . . 23 3.2.2 P-Pricing: Predicting Demand at Origin . . . . . . . . . . . . . . . 25 3.2.3 POD-Pricing: Predicting Demand at Origin & Destination . . . . 30 3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.2 Experimental Methodology . . . . . . . . . . . . . . . . . . . . . . 33 3.3.3 Pricing Method Comparison . . . . . . . . . . . . . . . . . . . . . . 34 vi Chapter 4 Task Assignment in Spatial Crowdsourcing 41 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1.1 Problem Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1.2 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Auction-SC Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.1 Task Dispatching in Auction-SC . . . . . . . . . . . . . . . . . . . 46 4.2.2 Worker's Bid Computation . . . . . . . . . . . . . . . . . . . . . . 50 4.2.3 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.1 Spatial Crowdsourcing Dataset . . . . . . . . . . . . . . . . . . . . 55 4.3.2 Experimental Methodology . . . . . . . . . . . . . . . . . . . . . . 57 4.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 59 Chapter 5 Price-aware Ridesharing 63 5.1 Problem Denition and Preliminaries . . . . . . . . . . . . . . . . . . . . . 63 5.1.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.1.2 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1.3 Dispatch Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1.4 Bid Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2 Competitive Bidding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3 The Latent Space Transition Model . . . . . . . . . . . . . . . . . . . . . . 73 5.3.1 Network Demand Model . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3.2 Parameter Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4 The SPARP Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4.1 Pricing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.4.2 Payments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.5.1 Ridesharing Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.5.2 Experimental Methodology . . . . . . . . . . . . . . . . . . . . . . 84 5.5.3 Comparing Dierent Pricing Models . . . . . . . . . . . . . . . . . 91 5.5.4 Accuracy of the LSTM Prediction Model . . . . . . . . . . . . . . 93 5.5.5 Payment Mechanism Comparison . . . . . . . . . . . . . . . . . . . 94 Chapter 6 Conclusion and Future Work 99 References 102 Appendix A Proof of Theorems, Lemmas and Propositions . . . . . . . . . . . . . . . . . . . 109 A.1 Proof of Proposition 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 A.2 Proof of Proposition 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 A.3 Proof of Theorem 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 A.4 Proof of Theorem 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 A.5 Proof of Lemma 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 A.6 Proof of Lemma 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A.7 Proof of Proposition 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 vii A.8 Proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 A.9 Proof of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 A.10 Proof of Theorem 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 A.11 Proof of Theorem 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 viii List Of Tables 3.1 Parameters for Pricing Method Comparison . . . . . . . . . . . . . . . . . 34 3.2 Overall Results of Method Comparison . . . . . . . . . . . . . . . . . . . . 35 3.3 Overall Results with Real-world Travel Times . . . . . . . . . . . . . . . . 39 4.1 Number of tasks/worker for each city in real dataset . . . . . . . . . . . . . . . 56 5.1 Parameters for Algorithm Comparison . . . . . . . . . . . . . . . . . . . . 86 5.2 Eects of Untruthful Bidding . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.3 Eects of Prediction Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 98 ix List Of Figures 3.1 Supply and Demand at Origin . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Supply and Demand at Destination . . . . . . . . . . . . . . . . . . . . . . 26 3.3 Example of Network's Transition . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Revenue Increase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.5 Service Rate Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.6 Price Discount . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.7 Prediction Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.8 Supply/Demand Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.9 Revenue Increase with Real-world Travel Times . . . . . . . . . . . . . . . 39 4.1 Task Dispatching in Auction-SC . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Valid Schedule Tree (VST) Example . . . . . . . . . . . . . . . . . . . . . 52 4.3 Example of Cuto Times in a VST . . . . . . . . . . . . . . . . . . . . . . 54 4.4 Spatial Distribution of Tasks in Gowalla . . . . . . . . . . . . . . . . . . . 56 4.5 Assignment Rate of Real-Time Approaches . . . . . . . . . . . . . . . . . 59 4.6 Pairwise Dierence in Assignment Rates . . . . . . . . . . . . . . . . . . . 60 4.7 Average Processing Time for a Single Task . . . . . . . . . . . . . . . . . 61 4.8 Real-world Scalability Requirements . . . . . . . . . . . . . . . . . . . . . 62 x 5.1 Simple Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Comparing Revenue of the Algorithms . . . . . . . . . . . . . . . . . . . . 87 5.3 Fairness of Pricing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.4 Comparing Service Rate of the Algorithms . . . . . . . . . . . . . . . . . . 90 5.5 Comparing Response Time of the Algorithms . . . . . . . . . . . . . . . . 91 5.6 Eect of Applying an Arbitrary Pricing Model . . . . . . . . . . . . . . . 92 5.7 Eect of Proles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.8 Comparing Precision of LSTM Vs LORE . . . . . . . . . . . . . . . . . . 94 5.9 LSTM Vs LORE - Varying Cell & Time Slot Size . . . . . . . . . . . . . . 94 5.10 Comparing Revenue of Dierent Mechanisms . . . . . . . . . . . . . . . . 95 5.11 Comparing Relative Revenue of SPA & SPARP Vs FPACB . . . . . . . . . . . 96 5.12 Comparing Relative Revenue of SPA & SPARP Vs FPACB . . . . . . . . . . . 97 A.1 Adversary Generated Input . . . . . . . . . . . . . . . . . . . . . . . . . . 117 xi Chapter 1 Introduction 1.0.1 Motivation Smartphones are ubiquitous: we are witnessing an astonishing growth in mobile phone subscriptions. The International Telecommunication Union estimates there are nearly 7 billion mobile subscriptions worldwide [1]. Meanwhile, the mobile phones' sensors (e.g., cameras) are advancing and the network bandwidth is constantly increasing. Conse- quently, every person with a mobile phone can now act as a multi-modal sensor, collect- ing and sharing various types of high-delity spatiotemporal data instantaneously (e.g., picture, video, audio, location, time, speed, direction, and acceleration). Exploiting this large crowd of potential workers and their mobility, a new mechanism for ecient and scalable crowd recruitment has emerged: Spatial Crowdsourcing (SC) [42]. In spatial crowdsourcing, each task is requested at a certain location (i.e., origin). Spatial crowdsourcing requires workers (e.g., willing individuals) to perform a task by physically traveling to the origin of the task at particular times. In general, the worker might need to travel to another location (i.e., destination) in order to complete a task (e.g., parcel delivery). Spatial crowdsourcing is exploited in numerous industries, e.g., 1 Uber, TaskRabbit, Waze, Gigwalk, InstaCart, etc., and has applications in numerous domains such as citizen-journalism, tourism, intelligence, disaster response and urban planning. With spatial crowdsourcing, a requester submits a set of spatiotemporal tasks to a spatial crowdsourcing server (SC-Server). Subsequently, the SC-Server has to select a worker to perform each task. 1.0.2 Thesis Statement A key element in the success of any SC system is users that are willing to participate. On one hand requesters want their tasks to be performed fast at with a reasonable price. On the other hand, an SC system requires a pool of workers who are willing to dedicate their time in exchange for a reward. Furthermore, the SC server has to be able to process hundreds of tasks in the presence of thousands of workers in real-time. More formally, our goal is to show that the revenue and assignment rate in spatial crowdsourcing platforms can be improved by dynamically pricing tasks and real-time task assignment. 1.0.3 Dynamic Pricing in Spatial Crowdsourcing The popularity of spatial crowdsourcing platforms such as Lyft, Uber and Didi has grown them into multi-million dollar markets. Platform providers do not own the resources (i.e., cars) in these markets and hence, cannot directly control the supply (i.e., workers). They can only encourage workers to participate in the market by setting appropriate prices. However, the set prices not only impact the availability of the workers, they also impact the willingness of the requesters to use the platform. Consequently, nding the optimal 2 price to balance the supply and demand has a crucial impact on the performance of the market. While spatial crowdsourcing platforms can simultaneously have multiple objectives, it goes without saying that revenue maximization is always among the primary goals. Due to its economic value, these companies do not publish the details of their pricing approaches. However, in the case of ridesharing we know that they use a real-time dynamic algorithm that modies the prices in dierent regions depending on the current supply and demand in that region [5, 2]. The set prices control the supply and demand which in turn, impact the total number of serviced requests at each time. Furthermore, the future geographical distribution of the workers depends on the destination of the requests they are currently assigned to service. Consequently, how the platform provider sets prices in dierent regions impacts the spatial distribution of the workers in the future. As a result, locally optimizing the prices at the current time does not necessarily yield to an overall higher revenue. Recently, many studies have focused on dynamic pricing based on the market's future demand (i.e., predictive pricing) [10, 19, 33]. In the context of ridesharing markets, the problem is studied under certain demand patterns [10]. However, we know from real-world data that the demand pattern of the network can vary during a day [4]. For example, a region with a high demand in the morning rush hour might not necessarily have a high demand in the afternoon. Furthermore, demand prediction is a complex problem and can be erroneous [38, 46, 72]. Consequently, any pricing method which depends on the future demand of the network, must be resilient to inaccuracies in demand prediction. 3 We present A Dynamic And Predictive Technique for pricing (ADAPT-Pricing) where in addition to the network's current supply and demand, we also consider the predicted future demand. Requesters are more encouraged to use spatial crowdsourcing platforms when the prices are low. Therefore, in regions with an abundance of supply, lowering the prices can result in higher demand which in turn increases the number of completed requests. With predictive pricing (P-Pricing) the goal is to increase the rev- enue of the platform by increasing the number of completed requests originating and ending in lower and higher demand regions, respectively. Changing the fee of every task based only on the origin location of it, will aect all the requests originating from that location regardless of their destinations. In other words, not all the requests resulted from P-Pricing, would end up in high demand regions in the future. Consequently, the platform must set prices not just based on the origin of the request, but based on the origin-destination (OD) pair for each task. However, if tasks in the same region have dierent prices, workers are more attracted to perform higher priced tasks. Consequently, it is important that the platform providers prevent shortage of supply for tasks with lower prices. Thus, we improve the performance of P-Pricing with a predictive origin-destination based pricing (POD-Pricing) method. The goal is to further increase the revenue by better controlling which requests get serviced and where the workers end up in future. We evaluate ADAPT-Pricing using a real-world dataset from New York City [4] and compare it with a baseline approach that optimizes the prices at each location only based on the current supply and demand at that location (origin). We show with P- Pricing and POD-Pricing, the generated revenue within each time period increases up 4 to 5% and 15%, respectively. In addition, we show that the tasks prices set by our approaches are on average 5% cheaper, i.e., the revenue growth is due to the increase in the number of completed tasks, not higher prices. Furthermore, we show that our pricing methods outperform the baseline approach even with up to 25% error in demand prediction, showing the resilience of our ADAPT-Pricing techniques. 1.0.4 Task Assignment in Spatial Crowdsourcing Earlier studies on task assignment in spatial crowdsourcing [42, 48, 28, 17, 20, 31, 34] can be classied based on two basic characteristics of the problem; (1) whether the problem matches a task with a worker and (2) whether the problem schedules matched tasks for the workers. Early studies in SC [42, 31, 20] used a scheduling-oblivious-matching (SOM) approach where tasks are matched with workers without considering the workers' schedule. Assuming the tasks have already been matched with workers, other studies. [27, 48] study the problem of scheduling the tasks that have been assigned to a worker. They show that there is no guarantee that the worker could schedule all of its matched tasks. We argue that with Spatial Crowdsourcing, it is not enough to only match a task with a worker. An SC-Server must consider the schedule of every worker when matching a task to workers and only consider those workers who are able to t the task in their schedule. In this paper, we dene the task assignment problem in SC consisting of two phases, a matching phase and a scheduling phase, which need to happen in tandem. Neither of these phases should be ignored, otherwise, the resulting solution is rendered infeasible for real-world applications. 5 More recent studies consider both matching and scheduling in spatial crowdsourcing [28, 17, 34]. These studies utilize a batched assignment scheme, where the assignment is delayed for a period of time (i.e., batching time interval) during which all the arrived tasks are batched to be matched and scheduled during the next time interval. Once the tasks are batched and processed together, suddenly the matching phase becomes complex because many tasks need to be matched to many workers. This in turn adds to the running time and increases the batching time interval. A long batching time interval (e.g., 10 minutes) has two main disadvantages. First, the duration of the batching time interval should be subtracted from the tasks' deadline, leaving each task with less available time to be scheduled. Second, batch scheme can no longer generate real-time assignments. While there might be cases where a real-time assignment is not required, there are many other real-world applications where real-time assignments are a necessity; For example, an Uber user requesting a ride, does not want to wait for 10 minutes to nd out if a driver is available or not. Contrary to batched assignment, in on-line assignment, a task is assigned to a worker as soon as it arrives at the SC-Server. This requires the server to perform matching and scheduling in real-time. With on-line assignment, at each point of time the SC-Server is processing only one task and hence, the matching phase becomes a one-to-many matching where there are multiple workers and only one task. Consequently, the complex many-to-many matching phase of batched assignment is reduced to only selecting the best worker that can t the task in its schedule. Even though matching is fast with on-line assignment, the server must still perform scheduling for multiple workers. Therefore, the scheduling phase becomes the bottleneck in on-line assignment. As shown in [48], scheduling for a single worker can 6 be performed fast. However, an on-line monolithic[65] SC-Server (monolithic-SC), where the server has to schedule only a single task for all workers, is not capable of processing tasks in real-time. For example, in New York City, during rush hours, there are as many as 10+ ride requests per second [4]. Through experiments, we show that monolithic-SC is not able to support such throughput in real-time. Towards this end, we propose Auction- SC: an auction-based framework for real-time task matching and scheduling. Similar to monolithic-SC, Auction-SC is an on-line framework where tasks get matched with workers as soon as they become available and hence, the matching phase is fast. Moreover, we overcome the scheduling bottleneck by distributing it among the workers. With Auction-SC, the server broadcasts a task to the workers upon the task's arrival. Each worker 1 submits a bid for that task based on its current schedule and location. To compute its bid, each worker has to consider only its own schedule so the bid computation phase can be done in real-time. Once every worker submits its bid to the SC-Server, matching the task to a worker reduces to selecting the highest bid. We introduce a branch-and-bound scheduling algorithm where for each new task, the worker performs an exhaustive search to nd out whether it can t the incoming task into its schedule. We show that at each point of time the number of remaining tasks for each worker (the number of tasks that the worker has scheduled and not completed yet) is in a range that even the branch-and-bound algorithm can be completed in real- time. However, in our experiments, we show that even replacing the branch-and-bound algorithm with a polynomial time approximate algorithm, will not aect the quality of the assignment signicantly. 1 Hereafter, we use the term "worker" interchangeably to refer to both the human worker and the software running on her mobile device unless clear distinction is needed. 7 Our experiments on both real-world and synthetically generated data show that com- pared to Auction-SC, both the SOM and batched approach result in a much lower assign- ment rate as a result of ignoring scheduling while matching and long batching intervals, respectively. Subsequently, we show that when matching and scheduling are performed in tandem, neither the batched scheme nor the online monolithic SC-Server can process more than 5 tasks per second. However, with the auction based framework, the throughput of the system increases by orders of magnitude. 1.0.5 Real-time Ridesharing Ridesharing is one of the most well-known applications of Spatial Crowdsourcing where drivers and passengers are the workers and requesters in an SC system. Real-time ridesharing, as an alternative transportation service, alleviates trac congestion and de- creases auto emissions. With the emergence of many commercial platforms (e.g., Uber and Lyft), which automatically match drivers and riders on-the- y, real-time ridesharing becomes more and more popular. According to [6], millions of trips have been taken on UberPool since its launch at August 2014, and thousands of passengers take it ve times a week during commuting hours. Ridesharing platforms, as an example of commercial SC systems, introduce new chal- lenges. From a business point of view, the platform provider (e.g., Uber) seeks to maxi- mize its own prot. However, higher prots should not be at the cost of either charging passengers more or paying drivers less than what would compromise participation and retention due to no monetarily incentive for either parties. Consequently, the design of a fair pricing model becomes an essential business strategy. This is particularly important 8 in cases of carpooling where riders share their ride with other riders. Even though car- pooling reduces the riders' cost, it incurs extra distance (i.e., detour) for riders. While each rider's fare should be discounted as a function of the length of the detour, the driver should be rewarded more as the total travel distance is increased due to all detours. Furthermore, dierent users (i.e., riders and drivers) might value their time dierently. Therefore, a fair pricing model should be available to both riders and drivers to express, for a certain amount of detour, how much discount or compensation they expect. Finally, in addition to fair pricing scheme for riders and drivers, the model should account for the provider's revenue as well. The majority of previous studies [54, 24, 15, 56] focus on improving the eciency of on-the- y assignment with the objective of minimizing the total travel distance of drivers. In particular, in existing studies a new request is assigned to a driver who can t the request in his schedule with the least amount of increase in the total traveled distance. However, minimizing drivers' total travel distance is not always equivalent to overall shorter trips for riders. Consequently, when assigning a new request, the driver who would incur the minimum increase in total travel distance is not necessarily the most cost eective option. To illustrate, suppose driver a has two passengers on board and driver b has only one. To serve an incoming request, a's incurred detour is 2 miles while for b the detour is 3 miles. Even though a's detour is shorter, the platform owner has to compensate both passengers of a for 2 miles (a total of 4 miles) while in the case of b it has to compensate only one passenger for a total of 3 miles. In addition, from the riders' perspectives, in the rst scenario two riders incur extra detour while in the second only one rider incurs an extra detour. Few studies [50, 51] consider a pricing model by dening 9 monetary incentives for riders and drivers. In [50], a pricing model is introduced where instead of being compensated, a rider can potentially end up being penalized for longer detours by paying a higher fare. Ma et. al. [51] overcomes the unfairness issue in [50] to some extent. Even though, a new rider can incur detour in his trip, their model only compensates riders that are already on board. In addition, since this model is targeted for a dierent application, the notion of revenue fails to provide any incentive for the platform provider. To address aforementioned challenges, in this paper, we introduce an Auction-based Price-Aware Real-time (APART) ridesharing framework. APART is built on top of Auction-SC as its underlying framework. We propose a general and versatile pricing model that allows both riders and drivers to set their monetary expectations for partic- ipating in ridesharing based on their predened proles. Specically, each rider's prole denes the expected discount ratio for the detours incurred by ridesharing. For example, one rider can express that he is willing to accept a 10 mile detour for 30% discount. On the other hand, each driver's prole denes the expected cost in terms of his total travel distance and time. The model also accounts for the revenue of the platform provider. Consequently, our objective is to maximize the revenue of the ridesharing framework while satisfying various temporal and monetary constraints of all users. APART is price-aware because a new request is assigned to a driver which generates the highest prot. Since our pricing model is designed to compensate riders for detours, the most protable choice is also the one where riders incur the least amount of detour, hence better service quality. Finally, APART also maximizes the revenue of the provider by increasing the service rate 10 (throughput) in the system through engaging available drivers more eectively to serve more requests. Our experiments on a large scale New York City taxi dataset show that APART is scalable and ecient, capable of processing hundreds of tasks per second in the pres- ence of thousands of drivers. By comparing our framework with the state-of-the-art approaches [37], we show that our framework can simultaneously match up to 10% more riders to drivers (i.e. higher service rate), while the total travel distance of riders are 20% less (i.e., better service quality), hence our framework can generate more prot than other approaches with an even better service quality. On the other hand, we show that in a framework were riders are assigned to drivers with the least increase in the driver's travel distance, up to 25% of the requests are not assigned to the most protable driver. 11 Chapter 2 Related Work In this chapter, we review the related studies to our work. 2.1 Dynamic Pricing Pricing tasks in crowdsourcing markets [63, 55, 11] and (mobile) crowd sensing applica- tions [47, 39, 67] have been studied extensively. However, with regular crowdsourcing markets, the location aspect of tasks and workers in SC does not exist and thus, the approaches are not applicable to SC markets. Even with crowd sensing where the tasks do have a specic location, the workers do not need to move towards the tasks and the goal is to assign tasks to workers who are already in the acceptable area around a task. There also exists studies which have specically focused on pricing in SC markets [68, 40, 61, 35, 53]. Guo et. al. [35] designed a platform that provides historical data for the region of the task and the requesters can use this information to better set prices for their tasks. In [40, 61] the workers initially specify their expected reward and the platform assigns tasks to workers with the objective of maximizing social welfare. Yang et. al. [68] propose a mechanism similar to those of [40, 61], but instead of maximizing 12 social welfare, the objective is to maximize the server's utility. In all these studies the prices are xed for the tasks and rewards are awarded to workers to either maximize social welfare or server's utility. In [53] there is a xed budget which gets spent uniformly over time. Depending on a workers historic contributions, the platform decides how much to pay dierent workers. Dynamic pricing in commercial sharing platforms (e.g., AirBnb and Lyft) also re- lates to our work as these platforms are some of the most popular applications of spatial crowdsourcing. There is a rich literature on the behavior of users in commercial sharing platforms. Recent studies have shown that the expected income largely impacts the par- ticipation of resource providers (i.e., drivers) [36, 30] and that cost is the most in uential factor in their participation [29]. On the other hand, an empirical study on passen- gers shows that suddenly increasing prices (e.g., Uber's surge pricing) greatly decrease the network's demand [17]. We use the ndings of these studies to dene our model in Section 3.1. Most studies on dynamic pricing in ridesharing markets focus on surge pricing as a means to increase participation of drivers in locations with high demand. Without surge pricing, drivers can end up driving longer distances to pick up passengers as there are no available drivers close to the passenger which results in ineciency in the market [16]. On the other hand, it is shown that when surge pricing is in eect, it is less likely that the drivers will leave the platform and end their services [18]. Furthermore, with regard to revenue maximization, surge pricing outperforms xed pricing [9, 14]. These studies only consider the implications of surge pricing at a single region at a particular time. They do not consider the eects on the drivers' movement in the network which determines 13 the future supply in dierent regions. While [10] does consider the spatial structure of the network and future supply in setting prices, they do not account for uctuations in future demand. 2.2 Task Assignment in Spatial Crowdsourcing Many studies in spatial crowdsorcing research focus on the task assignment problem[42, 27, 48, 17, 28, 20, 31, 34]. Kazemi and Shahabi[42], Cheng et. al.[20] and Fonteles et. al.[31] formulate task assignment in spatial crowdsourcing as a matching problem. In [42, 31] the primary objective is to maximize the number of matched tasks while in [20] the minimize the distance between the tasks and their matched workers. Furthermore, neither of these studies consider the schedule of a worker when matching tasks and workers. Deng el. al.[27] and Li et. al[48] study the problem of scheduling tasks that have already been asigned to a worker. While in [27] all matched tasks are available at the time scheduling is performed, the authors look at the online version of the same problem in [48] where the scheduling algorithm is performed immediately after a new task gets matched with the worker. More recent studies have considered both matching and scheduling using the batched scheme [17, 28, 34]. Chen et. al.[17] formulate the problem as in Integer Linear Program where the objective is to minimize the total traveled distance of the workers. In [28], to process each batch, the algorithm rst performs matching and then tries to schedule the matched tasks. For those tasks that did not get scheduled another round of matching and scheduling is performed and this continues until either all tasks are scheduled or there is no eligible worker remained for an unscheduled task. In [34], 14 Guo et. al. propose greedy-enhanced genetic algorithms for matching and scheduling time-sensitive and time-insensitive tasks separately. Our work is also related to some combinatorial optimization problems such as Ve- hicle Routing Problem(VRP) [13]. The general setting of VRP is to serve a number of customers with a eet of vehicles and the objective is to minimize the total travel cost of those vehicles. Compared with VRP, with spatial crowdsourcing our objective is to maximize the number of completed tasks, whereas VRP aims to minimize the total travel time. In addition, the spatial workers in our problem setting are not located at one or several xed depots, and each worker can show up at any unique location. In our setting the spatial tasks are also not guaranteed to be completed by the workers. 2.3 Real-time Ridesharing There are mainly two categories of ridesharing, i.e., static and dyanmic ridesharing. Most existing studies [32, 59, 23] focus on static ridesharing, where all riders and drivers are known a priori and thus, trips are prearranged. Furuhata et al. [32] provoides a com- prehensive suvery of the dierent types of ridesharing regarding their formulations, op- timizations and key computation challenges. Santi et al. [59] proposes a graph-based approach to quantify the potential of ridesharing using New York's taxi data, and Cici et al. [23] evaluated the potential of carpooling using four cities' mobile dataset. In ad- dition, ridesharing problem can be treated as a special class of the dial-a-ride problem (DARP) [25], or dynamic vehicle routing problem (VRP) [26, 48] in operational research, which is proven to be NP-hard. All these studies assume that the riders' and drivers' 15 statuses are know in advance, and hence can aord high computation cost, which is not the case in real-time ridesharing. With the emergence of many ridesahring mobile applications (e.g., Uber and Lyft), real-time ridesharing [50, 51, 37, 54, 24, 15, 56] has recently attracted more research interest. Ma et al. [50, 51] proposed a ridesharing dispatch system named \T-share" to serve the rider request on-the- y with the objective of reducing drivers' total travel distance. Their work focuses on maintaining a spatial-temporal index to retrieve the candidate drivers. On the other hand, Huang [37] proposed a kinect tree scheduling algorithm to dynamically match trip request to drivers with minimum incurred travel distance. Ota et al. [54] introduced a data-driven simuation framework that enables the analysis of ridesharing by using New York's taxi dataset. Santos et. al [60] propose a ridesharing system to maximize the number of matched request. The majority of these studies aim to minimize the total travel distance of drivers, however, we show that this does not necessarily mean shorter travel distance for the riders. Compared with these work, we discuss the con icting interest between riders, drivers and platform providers. We propose a general and versatile pricing model and our objective is to maximize the total prot of the platform provider. We show that by maximizing the overall prot, our framework achieves higher service rate and quality. Finally, we introduced a decentralized auction-based framework to support scalable and real-time scheduling, which diers from existing centralized scheduling framework. Several mechanisms have been introduced to promote ridesharing in traditional peer- to-peer carpooling platforms [41, 43, 21, 70, 62]. These mechanisms assume passengers have valuations for each driver and assign passengers to drivers based on this valuation. 16 Most previous studies assume all the information regarding passengers and drivers are known a priori and process the ride requests in batch [41, 43, 21, 70]. These mechanism do not work in on-line environments and cannot provide an immediate response to ride requests, which is a key requirement of the current commercial ridesharing platforms. One exception is the mechanism proposed in [62] that does generate on-line assignments. However, it assumes the drivers are autonomous and comply with any assignment made by the platform and thus the drivers' incentives are ignored. Furthermore, none of these mechanisms consider the platform providers revenue in their pricing model. 17 Chapter 3 Dynamic Pricing 3.1 Problem Denition 3.1.1 Model We assume the entire geographical space consists ofn dierent equidistant regions. Also, we consider an innite time horizon discretized into equisized time periods. Within each time period t, we assume for each region i there are a potential ofR t i task requests. In general, every request r has a start location (i.e., origin region) and an an end location (i.e., destination region). The origin and destination region can be the same and are not necessarily dierent. We assume there exists a function f r (p) which gives the probability of requesters willing to pay price p for their task. We dene the network's demand at location i at time t as: D t i (p) =R t i (1F r (p)) (3.1) where F r (:) is the cumulative distribution function (cdf) of f r (:). 18 Similarly, during each time period t, there are a potential ofW t i workers available at location i. The probability of workers participating in the platform for a certain price p is shown with f w (p). We dene the network's supply at location i at time t as: S t i (p) =W t i F w (p) (3.2) where F w (:) is the cdf of f w (:). Intuitively, increasing the price of rides increases (/decreases) the chance of drivers (/passengers) participating in the platform. More formally we assume: Assumption 3.1. F r (:) and F w (:) are continuous and strictly increasing. In addition, similar to [30], we assume F r (:) is strictly convex. Assumption 3.2. F r (0) = 1 and F w (0) = 0 and there exists a nite price p max such that F r (p max ) = 0 and F w (p max ) = 1. Proposition 3.1. Assuming R t i > 0 and W t i > 0, there exists a price p c at which D t i (p c ) =S t i (p c ). We call p c the market clearing price. 1 Proof. Proof of all Theorems, Lemmas and Propositions are presented in the Appendix. The total number of assigned tasks at location i at time t can be computed as: T t i (p) = minfD t i (p);S t i (p)g (3.3) 1 Hereafter, unless mentioned otherwise, we assumeR t i > 0 andW t i > 0. 19 Proposition 3.2. If for location i at time t there exists a market clearing price, then p c = arg max p T t i (p). We assume all assigned starting at time period t will end at time period t + 1. Con- sidering that once a task is completed, the assigned worker ends up at the destination region the task, we can say the assigned tasks during each time period aect the supply of the network at dierent locations in the following time period. We assume for each time period there exists a transition matrix t such that the ij -th entry of the matrix, t ij , gives the fraction of task requests at location i with destination location j at time t, we compute the number of potential workers at each location at time t + 1 as: W t+1 j = X i t ij T t i (p t i ) ! + W t j T t j (p t j ) + t+1 j (3.4) The rst term in Eq. (3.4) gives the workers that serviced a task in time period t and ended up at locationj int+1. The second term refers to those workers who were already at location j at t and did not service any request and remained at location j at t + 1. Finally, in each time period, a number of workers can enter/leave the system. t+1 j is the number of workers that logged in to the platform minus those who left. 3.1.2 The Revenue Maximization Problem In our model, we assume requesters pay a fee equal to the price set by the platform provider (p). The provider keeps a portion () of the fee as its own service fee and pays 20 the remainder to the worker. Consequently, for a task with price p, the share of the platform provider will be p. The revenue generated at time t for location i is: Rev t i (p t i ) =T t i (p t i )p t i (3.5) where p t i is the price set by the platform for rides originating at location i at time t. Furthermore, the platform's total revenue would be: TotalRev = X t X i Rev t i (p t i ) (3.6) The Revenue Maximization Problem is to determine optimalp t i 's in order to maximize the generated revenue (Eq. (3.6)). Earlier we mentioned that all n regions in the geographical space have the same pairwise distance to each other. We end this section with a discussion on the practicality of this assumption. The assumption of equidistant regions has two consequences that are relevant to our problem. First, when every pair of regions have the same distance to each other, all the requests will have the same duration to complete and hence, every task starting at time t will end at time t + 1. Second, all requests originating from the same region will have the same price (unless we utilize POD-Pricing and consider the destination of the task when setting the prices). Following we discuss the impact of these outcomes on the practicality of our pricing model. As explained earlier, the basis of ADAPT-Pricing is setting prices for dierent requests such that in the following time periods there is enough supply to support the demand 21 in dierent regions. While demand patterns in a location do change, the changes do not happen rapidly (similar to rush hour trac that does not rapidly change, and thus most navigations, e.g., Waze, use the trac pattern at the time of departure). Suppose our model assumes a certain location is in high demand in the next time period and adjusts the prices so more workers are available in that location in the next time period. Even if some workers take more than one time period to reach their destination, by the time they arrive, most possibly the demand is still high. In a real-world setting, the assumption of all requests having the same duration aects the solution only on the edge cases of the demand pattern where the relative demand at a certain location makes a sudden shift. Furthermore, as we discuss in Sections 3.2.2 and 3.2.3, assuming all requests have similar travel times reduces the complexity of the optimization problems in ADAPT-Pricing by orders of magnitude. Finally, a main goal of this work is to compare the results of our pricing model with existing studies (e.g., [10]) which use the similar assumption. However, to conrm the practicality of our model, we performed some experiments based on real-world travel times between dierent regions, and the dierence with the results with similar travel times is only 1% (more details are presented in Section 3.3). With regard to tasks having similar prices, if requests have dierent durations, our model can use the same logic to compute the \price per unit of time/length" which makes requests with dierent durations having dierent prices. In fact, in the experiments we performed on real-world travel times this is how we set prices for dierent tasks. 22 3.2 ADAPT-Pricing In this section, we present A Dynamic And Predictive Technique for pricing (ADAPT- Pricing). First, we show a baseline approach for dynamic pricing in which, the platform only considers the current supply and demand at the origin. Following, we discuss the predictive aspect of ADAPT-Pricing where both the current and future demand of the network are accounted for. We end the section with further optimizing ADAPT-Pricing by setting prices not only based on a task's origin, but rather considering the origin- destination pair of each task. 3.2.1 Baseline: Local Optimization Ridesharing applications (e.g., Uber, Lyft, etc.) use a real-time dynamic algorithm to determine the price of rides based on the network's supply and demand. This algorithm considers the current supply and demand in the network and when demand is higher than the available drivers (i.e., workers), the prices are increased to encourage more drivers to participate in the platform [2, 5]. In this section, we discuss how the optimal price for location i at time t (p t i ) can be computed only consideringD t i (:) andS t i (:). Figure 3.1(a) shows how the network's supply and demand (y-axis) changes as a function of the price (x-axis) based on the model in Section 3.1. As depicted, at p = 0 we haveS(p) = 0 andD(p) > 0 (Assumption 3.2). As p increases,S(p) andD(p) keep increasing and decreasing, respectively (Assumption 3.1). Finally, where p = p c the network's supply and demand are equal (Proposition 3.1) 23 p c S(p) D(p) p (a) p c p* S(p) D(p) p (b) Figure 3.1: Supply and Demand at Origin During each time period, for each of the locations, the network's supply and demand can be modeled similar to Fig. 3.1(a). Since the platform only considers the supply and demand at the origin location of the ride, the revenue maximization problem reduces to locally maximizing Rev t i () from Eq. (3.5) for every location i at time period t. 2 Theorem 3.1. The optimal price p which maximizes Rev(p) from Eq. (3.5) is always greater or equal to p c . For pricepp c , we haveS(p)D(p) and hence, the network's demand becomes the dominant factor in deciding the number of serviced trips (Eq. (3.3)). Therefore: 8pp c Rev(p) =D(p)p (3.7) Assumingp d is the price that maximizesD(p)p, Theorem 3.2 gives the optimal price for maximizing Rev(p) in Eq. (3.5): 2 Hereafter, for simplicity, when the region and time period can be inferred from the context, indices i and t are dropped. 24 Theorem 3.2. The optimal price p for maximizing the revenue at each location at each time is: (i) p =p c if p d p c (ii) p =p d if p d >p c When only considering the current demand and supply of the network, the platform provider sets the price at location i at timet to the optimal price p t i . Consequently, the total revenue generated for the platform is: TotalRev = X t X i Rev t i (p t i ) (3.8) 3.2.2 P-Pricing: Predicting Demand at Origin In recent years, there has been an increasing eort to predict the demand of ridesharing networks both from academia [38] and the industry [46, 72]. In this section, we focus on utilizing the network's future demand in order to maximize its overall revenue. With the baseline approach Section 3.2.1, any location where p c < p , for any price p where p c p < p , we have T (p) > T (p ) (Fig. 3.1(b)). That is, even though setting the price top will lower the generated revenue (Theorem 3.2), it will increase the number of serviced requests. Furthermore, we know from Eq. (3.4) that the number of serviced requests at the current time aects how the workers are distributed among dierent locations in the following time period. The basic idea of predictive pricing (P-Pricing) is to increase the overall revenue by setting the price at some locations lower than the optimal price in order to increase the number of serviced requests and thus, potentially 25 have more workers at locations with high demand in the future. Before we formalize the optimization problem for P-Pricing, we need to discuss how the increased number of serviced requests in the current time period, aects the network's supply in the following time period. Figure 3.2 shows how supply changes at the destination regions. We assumeS 1 (p) gives the network's supply if the price of tasks at every location are set to p in the previous time period (Fig. 3.2(a)). In Fig. 3.2(b) we consider a second scenario where the number of serviced requests in the previous time period are increased at certain locations (by lowering the prices) and as a result, in the current time period we end up with more potential workers at the destination of those added requests. S 2 (p) in Fig. 3.2(b) shows the supply at the destination locations in the second scenario. p c 1 S 1 (p) D(p) p (a) p c 1 p c 2 S 1 (p) S 2 (p) D(p) p (b) Figure 3.2: Supply and Demand at Destination Lemma 3.1. If p c 1 and p c 2 are the market clearing prices forS 1 (:) andS 2 (:), respectively and p d = arg max p fD(p)pg the generated revenue fromS 2 (:) is larger than that ofS 1 (:) ifp c 2 <p c 1 ^p d <p c 1 ^p d p c 2 . 26 Lemma 3.2. Assuming p c 2 = p d , adding more potential workers does not increase the generated revenue. Compared to setting all the prices to p , lowering the prices at time t reduces the generated revenue at timet. However, it also increases the number of requests at certain locations which can add more supply to high demand regions in the future and thus, increase the generated revenue at t + 1. The question that needs to be answered is: \How much and at which locations should we sacrice revenue in order to increase the generated revenue in the following time period?". To answer this question, rst we have to compute the revenue decrease in the current time period as a function of the added serviced requests (due to lowering the prices) and the corresponding revenue increase in the following time period as a function of the added potential workers. We start with computing the revenue decrease in the current time period: RevDec(T ) =T (p )p (T (p ) +T ) (p p) =T (p )p +TpTp (3.9) where, p =p D 1 (T (p ) +T ) (3.10) p<p p d (3.11) 27 Similarly, we can compute the revenue increase as a function of the number of added potential workers as: RevInc =T (p c 2 )p c 2 T (p c 1 )p c 1 (3.12) Since p c 1 and p c 2 are market clearing prices forS 1 (:) andS 2 (:) respectively, we can re-write Eq. (3.12) as: RevInc =S 2 (p c 2 )p c 2 S 1 (p c 1 )p c 1 (3.13) AssumingS 1 (p) =WF w (p), we can compute the revenue increase as a function of the added potential workers (W) as: RevInc(W) = (W +W)F w (p c 2 )p c 2 WF w (p c 1 )p c 1 (3.14) where p c 2 p d . We can nd the optimal number of serviced request increases in the current time period by solving the following optimization problem: maximize X j RevInc(W t+1 j ) X i RevDec(T t i ) subject to T t t = W t+1 (3.15) T t i 0 8i W t j 0 8j 28 where, • t is the transition matrix at time period t. • T t =hT t 1 ;T t 2 ; ;T t n i • W t+1 =hW t+1 1 ;W t+1 2 ; ;W t+1 n i After nding T t , we can set the prices for the current time period at dierent locations using Eq. (3.10). With P-Pricing, at time t, even though we consider the potential revenue increase at time t + 1, we only set the prices for time t. Prices for rides at t + 1 are set at t + 1 when P-Pricing can partially sacrice the potential revenues increase computed before, in order to gain even more revenue in the future. Ideally, one would extend the optimization problem in Eq. (3.15) to perform a global optimization over all time periods. However, many demand prediction studies [22, 71] use markov models where only the demand of one time period ahead is predicted every time. As a result, in addition to the complexity of a global optimization, the current practices in predicting future demand only allows us to perform a one period ahead optimization. With regard to the complexity of solving the optimization problem in Eq. (3.15), based on Assumption 3.1, RevDec is always convex and RevInc is concave. Consequently, the optimization function becomes convex and can be solved eciently (proportional to n 3 , where n is the number of regions). 29 3.2.3 POD-Pricing: Predicting Demand at Origin & Destination P-Pricing sets a similar price for all tasks originating from the same location, regardless of their destination. As a result, not all the workers servicing the increased requests will end up at a high demand location in the following time period. Figure 3.3 shows a simple scenario with three locations. Suppose we know in the following time period l 2 will have a high demand while the demands at l 1 and l 3 will be low. Based on P-Pricing, we decide to lower the prices at l 1 so 10 more requests will be serviced hoping that this will add more potential workers at l 2 in the following time period. However, this will also lower the prices for rides going froml 1 tol 3 . Consequently, out of the 10 added trips, on average 5 of those will end up at l 2 ( 12 = 0:5) and the other 5 will end up at l 3 . Ideally, we should only lower the prices for those rides going to l 2 so all the 10 workers servicing the added requests will end up at l 2 in the following time period. Figure 3.3: Example of Network's Transition In this section we show how we can set prices for tasks not only based on their origin location, but based on their origin-destination pair. With predictive origin-destination pricing (POD-Pricing), once the platform sets dierent prices for tasks originating from 30 the same location, all the potential workers at that location will try to service tasks with higher prices. However, as shown in the example from Fig. 3.3, trips to high demand locations will have lower prices. To overcome this problem, for a task requestr at location i, rather than notifying every worker at i, we only notify a subset of workers based on the destination ofr. In other words, we divide the available workers at each location into disjoint sets and use the workers of each subset to service rides for a specic destination. We assume ij gives the fraction of workers at location i that the platform decides to allocate to service the tasks with destination j such that P j ij = 1. Then we can re-write Eq. (3.3) and Eq. (3.5) as: T t ij ( t ij ;p t ij ) = minfD t ij (p t ij );S t ij (p t ij )g (3.16) = minf t ij W t i F w (p t ij ); t ij R t i F r (p t ij )g Rev t i (B t i ; P t i ) = X j T t ij ( t ij ;p t ij )p t ij (3.17) where, • B t i =h t i1 ; t i2 ; ; t in i. • P t i =hp t i1 ;p t i2 ; ;p t in i. Proposition 3.3. The generated revenue from Eq. (3.17) is at least as much as that of Eq. (3.5). 31 Similar to P-Pricing, at each point in time we optimize the prices only for the current time period. In addition to the prices, we also need to nd optimal ij 's. Consequently, the optimization problem can be written as: maximize X j RevInc(V t+1 j ) X i X j RevDec( t ij ;T t ij ) subject to X i T t ij =V t+1 j 8j X j t ij = 1 8i (3.18) T t ij 0 8i;j V t j 0 8j t ij 0 8i;j 3.3 Experiments 3.3.1 Dataset We evaluate our methods using one month (May, 2013) of New York City's taxi dataset [4], which contains around 500,000 trips per day. Each trip in the dataset has a pick-up latitude/longitude, a drop-o latitude/longitude and request time. We consider 1 hour long time periods in our experiments and map each ride to a time period based on its request time. Furthermore, we consider each neighborhood in the city [3] as a unique location and mapped each ride's pick-up and drop-o points to an origin and destination location, respectively. 32 3.3.2 Experimental Methodology We measure the overall revenue generated by each ADAPT-Pricing method (Section 3.2). Additionally, we compare the average price set for the rides by each approach. We assume the total number of potential drivers follows the same pattern as the total number of ride requests during dierent time periods. In other works, the ratio of the total number of potential drivers to the total number of ride requests, showed by , is approximately the same during a day. Furthermore, the drivers are uniformly distributed in dierent locations in the rst time period. For the following time periods, the distribution of the drivers can be computed using Eq. (3.4). As discussed in Section 3.2, with both P-Pricing and POD-Pricing, the optimization problem requires the future demand of the network at dierent locations. In our exper- iments, we measure the sensitivity of ADAPT-Pricing to the accuracy of the predicted future demand. We know the exact number of potential ride requests at each region i at any time t (R t i ), from our dataset. An accuracy of 100% means that the exactR t i 's are used to compute the demand (Eq. (3.1)) which in turn are used in the optimization problems in Eqs. (3.15) and (3.18). Furthermore, in the experiments when the accuracy is set to, assuming thatR t i is the exact number of ride request at regioni at timet, we compute the predicted number of ride request at region i at time t, ^ R t i , as: t i = (1)R t i ^ R t i =Random(R t i t i ;R t i + t i ) 33 where Random(a;b) chooses a random number in the range [a;b]. Table 3.1 shows the dierent values we used for the parameters in our experiments. Parameter Max-Min (increment) default Prediction Accuracy 100%-50% (5%) 100% 5.0-0.5 (0.5) 2.5 Table 3.1: Parameters for Pricing Method Comparison In addition, p max (from Assumption 3.2), F r (:) and F w (:) are set as: p max = $10 F r (p) =F w (p) = p 2 100 We run our experiments for every day in May 2013 and report the average values over all days. 3.3.3 Pricing Method Comparison Our rst set of results compare the generated revenue and the average price of the rides for all three pricing methods (Table 3.2). The revenue increase and price discounts in Table 3.2 are in comparison to the baseline approach. As shown, with P-Pricing and POD- Pricing, we observe a 2:8% and 10:3% increase in revenue, respectively. Furthermore, the prices are lowered which conrms that the revenue increase is the result of increased number of trips and not higher prices. The main idea of ADAPT-Pricing is to partially sacrice revenue in earlier time periods if it results in higher gains in future. Fig. 3.4 compares the revenue increase of P-Pricing and POD-Pricing as compared to the baseline approach in each time period. As depicted, in the rst few time periods (until 6:00am), neither of the pricing methods 34 Baseline P-Pricing POD-Pricing Revenue 2013341 2069606 2220335 Rev. Increase 2.8% 10.3% Avg. Price 6.38 6.05 6.08 Price Discount 5.26% 4.6% Table 3.2: Overall Results of Method Comparison do much better than the baseline and in fact in some cases they can be slightly worse than the baseline. However, in later time periods both methods outperform the baseline approach. 0 5 10 15 20 0 5 10 15 20 Revenue Increase (%) Time Period P POD Figure 3.4: Revenue Increase In Section 3.2 we explained that the main idea behind our pricing model is to sacrice revenue in the current time period in order to increase the service rate (i.e., percentage of completed tasks) and send more workers to locations with higher demand in the future. In Fig. 3.5(a) we compare the service rare of P-pricing and POD-pricing with that of the baseline approach. It is worth mentioning that in the model we described, the requesters' willingness to pay (i.e., f r (p)) and workers' willingness to provide service (i.e., f w (p)) aects the service rate. The results depicted in Fig. 3.5(a) compare the approaches based 35 on how f r (p) and f w (p) are dened in Section 3.3.2. Modifying f r (p) and f w (p) such that requesters are willing to pay more and workers are willing to provide service for a lower price would increase the service rate along all approaches. To better compare the service rates among dierent approaches, Fig. 3.5(b) depicts the relative increase in the service rate for P-pricing and POD-pricing as compared to the baseline approach when f r (p) and f w (p) are xed among all approaches. As observed, in most time periods, P-pricing and POD-pricing increase the service rate up to 5% and 15%, respectively. 45 50 55 60 65 0 5 10 15 20 Service Rate (%) Time Period Baseline P POD (a) Service Rate 5 10 15 20 0 5 10 15 20 Increased Service (%) Time Period P POD (b) Service Rate Increase Figure 3.5: Service Rate Comparison Fig. 3.6(a) depicts the average price discounts and the percentage of rides that en- countered a discount in dierent time periods. As observed, more than 80% of rides, on average, received around 4% discount. While Fig. 3.6(a) shows the average price dis- count across all regions, Fig. 3.6(b) shows the price discounts in dierent origin locations (x-axis) for two time periods. As observed, in many regions the discount is higher than 10%. Our pricing methods depend on predicting the network's demand in the near future. Accurately predicting the demand is a complex task by itself. In the next set of experi- ments, we analyzed the dependency of our pricing methods on the accuracy of the demand 36 1 2 3 4 5 6 0 5 10 15 20 25 50 75 100 Price Discount (%) Affected Regions (%) Time Period Price Discount Affected Regions (a) Price Discount per Time Period 0 5 10 15 20 25 30 35 Price Discount (%) Origin Locations 14:00PM 18:00PM (b) Price Discount per Region Figure 3.6: Price Discount prediction (Fig. 3.7). As depicted, with POD-Pricing, even with an 80% accuracy, we still observe of 5% increase in revenue. One interesting observation is that P-Pricing is more resilient to prediction errors as compared to POD-Pricing. When we make a bad decision based on erroneous demand prediction, some of the negative eects are masked by the fact that with P-Pricing the drivers do not always take those rides that we want them to take (as explained in Section 3.2.3). -20 -10 0 10 20 100% 90% 80% 70% 60% 50% Revenue Increase (%) α P POD Figure 3.7: Prediction Accuracy 37 We also we evaluated the performance of our pricing methods for various supply/demand ratios (referred to as )(Fig. 3.8). The parameter is based on the total number of po- tential drivers and the total number of ride requests in the entire network. The supply to demand ratio in each location is not necessarily the same. For smaller values of , due to limited supply, most locations will have a relatively higher market clearing price. According to Theorem 3.2, it is likely that the optimal price in the baseline approach is very close to the market clearing price. As a result, lowering the price would no longer increase the number of trips and hence, predictive pricing will not have a huge advan- tage. On the other hand, for larger values of , due to the abundance of drivers, most high-demand locations will always have enough potential drivers regardless of whether it is predicted or not. Consequently, after the optimal point, the advantage of predictive pricing reduces as increases. 0 2 4 6 8 10 12 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Revenue Increase (%) ρ P POD Figure 3.8: Supply/Demand Ratio In our model, we assume all regions have the same distance to each other. At the end of Section 3.1 we provide a supporting argument for this assumption from both a 38 practical and an analytical point of view. To further support the arguments in Section 3.1, in our last set of experiments, we show that the eect of this assumption is negligible in real-world scenarios. We compute the price for each origin/destination pair using the same method as before. However, for the real-world scenario, we no longer assume that all the trips are completed within one time period and consider the real travel time for each trip. Furthermore, in the real-world scenario, we consider the computed price as the price per unit of time for each trip. Consequently, the actual fare for each ride will be the computed price multiplied by the travel time of the trip. As observed in Table 3.3, even when considering real-worlds travel times, POD-Pricing increases the generated revenue by 9%. Baseline POD-Pricing Revenue 6043028 6591735 Rev. Increase 9.08% Table 3.3: Overall Results with Real-world Travel Times 0 5 10 15 20 0 5 10 15 20 Revenue Increase (%) Time Period RW-POD M-POD Figure 3.9: Revenue Increase with Real-world Travel Times 39 Furthermore, Fig. 3.9 depicts the revenue increase within each time period for both real-world travel times (RW-POD) and one time period travel times (M-POD) when using POD-Pricing. As depicted, in most time periods the revenue increase remains the same when considering real-world travel times. As discussed in Section 3.1, the only case where our model does not perform optimally with real-world travel times is when there is a sudden change in the demand patterns of the regions. It can be observed in Fig. 3.9 that in the early afternoon and late night when the potential ride requests increase and decrease respectively, the revenue increase with RW-POD decreases by a few points. 40 Chapter 4 Task Assignment in Spatial Crowdsourcing 4.1 Preliminaries In this section, we dene the terminologies used in the paper and provide a formal de- nition of the problem under consideration. In addition, we analyze the complexity of the problem and its hardness. 4.1.1 Problem Denition We start by dening some terminologies in order to formally dene the task assignment problem in spatial crowdsourcing. Denition 4.1 (Spatial Task). A spatial task t shown ashl; [r;d]i is a task to be performed at location l with geographical coordinates, i.e., latitude and longitude. The task becomes available at r (release time) and expires at d (deadline). It should be pointed out that in a spatial crowdsourcing environment, a spatial task t can be executed only if a worker is at location t.l. For example, if the query is to report the trac situation at a specic location, someone has to actually be present at 41 the location to be able to report the trac. Hereafter, whenever we use task we are referring to a spatial task. Now, we formally dene a worker. Denition 4.2 (Worker). A worker w shown ashl;T;max; [s;e]i is any entity, e.g., a person, willing to perform spatial tasks. We show the current location of the worker by w.l. Each worker has a list of tasks assigned to it, w.T, and a maximum number of tasks it is willing to perform, w.max. Also w.s and w.e show the availability of the worker such that the worker is available during the time interval (w:s;w:e]. Throughout the paper, we assume every worker moves one unit of length per unit of time. Therefore, we can assume that distance (a;b) is also the time required to move from point a to point b. Denition 4.3 (Schedule). A schedule is an ordered list of tasks shown asht 1 ;:::;t n i where n is the number of tasks in . We show the i th task in with i . Denition 4.4 (Valid Schedule). Schedule is a valid schedule for workerw, if and only if: 8i; 1in i X j=1 distance( j1 ; j ) i :dt 0 where 0 and t 0 represent the current location of w and current time, respectively. At each point in time, a worker w is associated with a valid schedule ( w ) and com- pletes the tasks based on their order in w . Denition 4.5 (Matching). Assuming we have a set of workers W and a set of tasks T, we call M WT a matching if for each t2 T there is at most one w2 W such 42 that (w;t)2M. We call (w;t)2M a match and say t has been matched to w. For each matching M, we dene the value (benet) of M as: Value(M) =jMj Denition 4.6 (Valid Matching). A matchingM is valid if and only if, for every worker w, there exists a valid schedule w , such that (w;t i )2M =) t i 2 w . Now we can formally dene the Task Assignment in Spatial Crowdsourcing (TASC) as follows: Denition 4.7 (Task Assignment in SC). Given a set of workersW, a set of spatial tasks T and a cost function d : (W[T )T !R where d (a;b) is the distance between a and b, the goal of the TASChW;T;di problem is to nd a valid matching M with maximum value. It is important to note that with task assignment in SC, the goal is to nd a valid matching. This means that in addition to nding a matching between tasks and workers, the SC-Server has to also nd a schedule for each worker to perform the tasks. Throughout this paper we use the terms matching phase and scheduling phase to refer to the two dierent aspects of task assignment in SC. In a real life scenario, the SC-Server only nds out about the exact properties of tasks once they are submitted. Similarly, the server does not know when future workers will become available. Consequently, a real-world SC-Server can either process every single task as soon as it becomes available (on-line) or periodically wait for a specic duration and process all the tasks that have been submitted during that time (Batched). In this 43 paper, we study the OnlineTASC problem, where the server processes an incoming task as soon as it is submitted by the requester. 4.1.2 Complexity Analysis Previous studies have shown the TASC problem is NP-Hard [28]. However, the focus of this study is the OnlineTASC problem and thus, in this section we brie y discuss the complexity of OnlineTASC. In order to analyze on-line algorithms, where each request is processed without know- ing the future, we use a method named competitive analysis[64]. With this method, the performance of an on-line algorithm is compared to the performance of an optimal o- line algorithm that has knowledge of future events (clairvoyant). For the TASC problem, we measure the performance of each algorithm based on the number of assigned task. Assuming we show the performance of algorithmA on inputI asjA (I)j we can dene competitive ratio of an algorithm as: Denition 4.8 (Competitive Ratio). For an on-line algorithmA, we sayA is c-competitive for some c> 0, if and only if: c = min I2I jA (I)j jA (I)j whereA is the optimal o-line algorithm and I is the set of all possible inputs. Now we can prove the following theorem regarding the complexity of OnlineTASC. Theorem 4.1. There does not exist a deterministic on-line algorithm for the Online- TASC problem that is c-competitive (c> 0). 44 Theorem 4.1 shows that for any deterministic on-line algorithm for OnlineTASC there exists a worst case scenario where the competitive ratio is very small and hence, there is no theoretical bound for the OnlineTASC problem. However, in the following sections we propose algorithms for OnlineTASC that generate close to optimal results when applied to both real-world and synthetic data. 4.2 Auction-SC Framework In practice, an SC system works similar to a complex event processing (CEP) engine [49] where new tasks arrive at the SC-Server as an input stream. With a real-world SC sce- nario, the SC-Server will nd out about a task and its properties only when it is released. The complexity of the many-to-many matching in addition to the need for immediacy (e.g., Uber) render the batch scheme impractical for real-world scenarios. Furthermore, in on-line monolithic-SC, scheduling multiple workers becomes the bottleneck and hence, real-time assignment is not guaranteed. For example, in New York City, during rush hours, there are as many as 10+ ride requests per second [4]. Through experiments, we show a neither the batched scheme nor the on-line monolithic-SC are able to process such throughput in real-time. In this section we introduce Auction-SC which has neither shortcomings and generates real-time on-line assignments by splitting the matching and scheduling responsibilities between the server and the workers, respectively. This allows Auction-SC to scale up orders of magnitude higher than competing approaches. First we explain the auction 45 framework and how tasks are dispatched to workers. Following, we discuss how workers compute and submit their bids and how the server assigns tasks to workers. 4.2.1 Task Dispatching in Auction-SC Auction methods have been eectively used for assignment problems in dynamic multi- agent environments [52, 45]. The main advantage of auction methods are their simplicity and the fact that they allow for a decentralized implementation. Auction-SC considers workers as bidders and tasks as goods. Furthermore, the SC-Server plays the role of an auctioneer in Auction-SC. Figure 4.1 explains how tasks are dispatched to workers at a very high level: (1) Everything starts with a requester submitting a new spatial task to the SC-Server. (2) Once the SC-Server receives a new task, it noties the available workers about the new task. (3) Each worker independently computes his bid by only considering the new task and his current schedule and submits a bid to the SC-Server. (4) Once all the bids are received, the SC-Server assigns the task to the worker with the optimal bid. Broadcasting every incoming task to all available workers incurs a large communica- tion cost on the system. Auction-SC lowers the communication cost by only sending any incoming task to the eligible workers that are dene as: 46 Denition 4.9 (Eligible Worker). An available worker w is said to be eligible for per- forming a newly released task t, if and only if: distance(w;t)w:dt:r^distance(w;t)t:dt:r This means an available worker w is eligible for performing task t, if it has enough time to reach the location of t before either t expires or w leaves the system. Ideally, distance(w;t) in Denition 4.9 is equal to the road network distance between w:l andt:l. However, computing the network distance for all workers is time consuming and hence, in Auction-SC, we use the Euclidean distance between w:l and t:l as a lower bound to their network distance. Furthermore, maintaining the exact location of every worker with a spatial index (e.g., R-Tree) while the workers' locations are constantly changing is too costly and cannot be done in real-time [7]. Consequently, with Auction-SC, the SC-Server maintains a spatial grid on the location of the workers and only keeps track of the workers' cell. If we assume that worker w is in cell c i of the spatial grid, then distance(w;t) is computed as: distance(w;t) =distance(p ;t) s:t: p = min p in c i fdistance(p;t)g In other words, if any point in cell c i satises the constraints of Denition 4.9, all the workers in c i are considered as eligible workers. Therefore, upon arrival of a task t, the server rst nds the cells that are within t:d distance of the task (Shaded cells in Fig- ure 4.1) and broadcasts the new task only to workers within those cells. 47 Figure 4.1: Task Dispatching in Auction-SC Once the eligible workers are notied about a new task, depending on the bidding rule, which is common among all workers, each worker computes its own bid and submits the bid to the server. As mentioned earlier, the bid computation process is done by a software installed on the workers' cell phone. As a result, the process is the same for all workers and more importantly, it does not require any interaction with the human worker. The bidding process is performed similar to a sealed-bid auction where workers simultaneously submit bids and no worker knows how much the other workers have bid. The SC-Server selects the worker with the optimal bid as the winner and matches the task with that worker. Algorithm 1 outlines the process of assigning an incoming task t. W t in line 3 is the set of eligible workers for task t. While there are workers who have not submitted their bids yet (lines 4-6), the server will keep accepting bids and adds every received bid to the setBids (line 5). Once all eligible workers submit their bids, the SelectBestBid() method 48 Algorithm 1 OnlineTASC(W;t) Input: W is the set of currently available workers and t is a task that has just released Output: Either w2W as the worker task t should be assigned to or null if no worker is selected 1: w selected = null 2: Bids =; 3: Broadcast t to every w2W t 4: while w2W t has not submitted bid do 5: Bids hw;bidi 6: end while 7: w selected = SelectBestBid(Bids) 8: return w selected (line 7) returns the worker with the optimal bid. In case of a tie, the SelectBestBid() method, randomly selects one worker among the ones with the optimal bid. We end this section with a brief discussion on the communication cost in Auction- SC. As a result of task dispatching and bid submission in Auctions-SC, it seems that the communication cost in this framework will limit its advantages over existing ap- proaches: i.e., batched assignment and monolithic-SC. However, regardless of the ap- proach, there is always going to be some communication cost. In both the batched and on-line monolithic-SC approaches, the SC-Server is responsible for performing schedul- ing. To perform scheduling, the server requires the exact location of the workers, which means the workers have to constantly communicate with the server to update their loca- tions. At the very least, each time a new task arrives, the server needs to communicate with the eligible workers to get their exact locations. On the other hand, the server in Auction-SC does not require the exact location of the workers as it is not respon- sible for performing scheduling for the workers. Instead, the server in Auction-SC has some communication cost for dispatching tasks and submitting bids. Nevertheless, in our experiments, we do consider the communication cost in Auction-SC and since other 49 approaches do not discuss their communication model, we ignore the communication cost in other approaches. We still show that Auction-SC scales better than both the batched and the on-line monolithic-SC approaches. 4.2.2 Worker's Bid Computation With Auction-SC, every worker computes his bid using a predened bidding rule. A worker's bid represents how good it is for him to be matched with the task. When computing a bid for a task, the workers have no knowledge about other tasks that might arrive in the future. Consequently, they have to make a greedy decision based on their current status. Existing studies have used various heuristics in order to match tasks with workers; e.g., spatial region [20], nearest neighbor [31, 42, 34] and earliest expiring task [31]. All these heuristics are applied to an SOM approach where the schedule of the worker is ignored. In this section we introduce a new heuristic where a task is assigned to a worker who can insert it into its schedule better than other worker and hence, it is called Best Insertion (BI). Intuitively, if a worker spends less time to complete a task, it will likely have more time for performing other tasks. In BI, the server gives priority to workers that can better insert the incoming task into their schedules. Auction-SC considers the extra time each worker will need to complete the new task in addition to its current schedule. Algorithm 2 shows how each worker computes its bid. The GetFinishTime() method on lines 1 and 3 return the time the input schedule gets completed. The key step in computing the bid is for the workers to nd the most optimal schedule consisting of the tasks in w in addition 50 to the new incoming task (t). The FindBestSchedule() method in line 2 does this using a dynamic programming approach explained in Section 4.2.3 Algorithm 2 ComputeBid(w;t) Input: w is the worker computing the bid and t is the incoming task. Output: bid as the value of worker w's bid for task t. If w cannot complete task t, bid is set to1 1: finish = GetFinishTime( w ) 2: = FindBestSchedule(w;t) 3: finish = GetFinishTime( ) 4: bid =finish finish 5: return bid 4.2.3 Scheduling For worker w, to compute his bid on task t, he has to nd the most optimal valid schedule that can addt to w . This requires checking all permutations of the set of tasks w [ftg. The most optimal schedule is a permutation of the tasks such that (1) is a valid schedule and (2) has the earliest completion time among all permutations that yield a valid schedule. Scheduling a large set of tasks through an exhaustive search can take a long time. However, in an SC system, as workers bid on and get assigned to new tasks, they also complete the tasks that have already been assigned to them. Consequently, in most cases, the set of tasks that a worker has to perform scheduling on, is in a range that even an exhaustive search using a branch-and-bound technique is fast. In this section, we introduce a dynamic programming approach for scheduling (DPS) that in practice, does not require checking all permutations of the set of tasks every time. The key idea of DPS is that if any task is removed from a valid schedule, the remaining schedule is still valid. Consequently, when worker w tends to add a new task t to its schedule, he only needs to consider those permutations of tasks in w that result in 51 a valid schedule. For example, a set of 3 tasks T =ft 1 ;t 2 ;t 3 g have 3! = 6 dierent permutations. Let us assume that only 3 out of the 6 permutations yield a valid schedule for T and are shown as 1 , 2 and 3 . To schedule a new task t 4 , we only need to check if t 4 can be inserted inf i g 3 i=1 without reordering the tasks already in i . To achieve this, in addition to w , every worker also keeps track of other valid schedules that are not necessarily optimal. For each worker, we utilize a Valid Schedule Tree (VST) data structure to keep track of all valid schedules. A path from the root to any leaf in a VST corresponds to a valid schedule. Figure 4.2 shows an example of a VST and how a new task can be added to it. (a) (b) (c) Figure 4.2: Valid Schedule Tree (VST) Example Figure 4.2(a) shows a VST with 2 tasks. The root of the tree is always the current location of the worker. In Figure 4.2(a) we assume both a 1 =ht 1 ;t 2 i and a 2 =ht 2 ;t 1 i are valid schedules. Once t 3 arrives, it can be added to 3 dierent positions in both a 1 and a 2 and hence, 6 permutations have to be checked (Figure 4.2(b)). Assuming the dark nodes in Figure 4.2(b) result in invalid schedules, they are not added to the VST. Figure 4.2(c) shows how the updated VST after adding t 3 . In general, a new task can either be added on an existing edge or after a leaf of a VST. Consequently, if a new task 52 t 4 arrives, based on the VST in Figure 4.2(c), only 11 options have to be checked (out of potentially 4! = 24 permutations with 4 tasks). Algorithm 3 FindBestSchedule(w;t) Input: w is the worker and t is the incoming task. Output: as the optimal schedule for worker w after inserting t. If w cannot add t, the return value is null 1: vst 0 = InsertTask(vst;t) 2: if vst 0 ==null then 3: return null 4: end if 5: = ShortestSchedule(vst 0 ) 6: return Algorithm 3 shows the process of nding the best schedule for inserting a new task t. The InsertTask() method in line 1, generates a new VST for w (vst 0 ) by inserting a new task into w's current VST (vst) 1 . Once the vst is generated, the ShortestSchedule() method in line 5 runs DFS on vst starting at the root and returns the schedule with the earliest nish time ( ). To further improve the performance of the scheduling process, for every node n in a VST we assign a cuto time (c n ), which gives the latest time worker w can arrive at noden. If workerw arrives atn afterc n , then every schedule wheren2, will become invalid. For example in Figure 4.3(a), if the worker arrives at node n 1 at time c n 1 +, then both schedules 1 =ht 1 ;t 2 ;t 3 i and 2 =ht 1 ;t 3 ;t 2 i would become invalid. Earlier we explained that new tasks can either be added on an existing edge of a VST or after a 1 Details of the InsertTask() method and other algorithms on the VST data structure can be found in the appendix 53 leaf node. Figure 4.3(b) shows the process of adding t 4 on edge e in Figure 4.3(a). Task t 4 can be added on edge e only if: d(w;t 4 ) +d(t 4 ;t 1 )t 0 +c n 1 wheret 0 is the current time andd(a;b) gives the time it takes to go from pointa to point b. (a) (b) Figure 4.3: Example of Cuto Times in a VST The cuto time for a leaf node n l is equal to the deadline of the task corresponding to n l . For every other node, the cuto time is equal to the smaller value between the deadline of the task corresponding to that node and the maximum cuto time of its children (Equation (4.1)). c n = 8 > > > > < > > > > : deadline(n) if children(n) =; minfdeadline(n); maxfc n 0j n 0 2children(n)gg if children(n)! =; (4.1) 54 We end this section with brief discussion on the complexity of the scheduling algo- rithm. Even though in theory, the number of valid schedules can grow exponentially as we add more tasks, in our experiments on both real world and synthetic data, we realized that in practice this does not happen at all. In fact, our observations show with only 3-4 tasks in a workers schedule, at most 1 valid schedule exists. In cases where running Algo- rithm 3 takes too long due to the exponential growth of the VST, we can only consider the current optimal schedule and only consider adding the new task without reordering the current optimal schedule (i.e., the Nearest Insertion algorithm [57]). In our experiments, we show that this change reduces the the assignment rate (i.e., the percentage of tasks that get completed) by less than 5%. 4.3 Performance Evaluation 4.3.1 Spatial Crowdsourcing Dataset We evaluate our algorithms using real check-in data in Foursquare and Gowalla and convert them to spatial tasks and workers in our system. We consider check-ins as a spatial task performed at the location the check-in happened. For each location, we consider all check-ins within a two hours duration. For each task, we set the release time and deadline to the rst and last check-in time within the two hours duration. We consider each user as a spatial worker with start and end times equal to the user's rst and last check-in during a day. We select the initial location of a worker as a random point within the bounding box of all checked-in locations of the corresponding user. We also measure the travel time with the Euclidean distance between two points divided by 55 an average speed of 60km=h. We use the data from 5 metropolitan areas: New York, Los Angeles, Paris, London & Beijing. Table 4.1 shows the total number of tasks (and workers) for each city. Figure 4.4: Spatial Distribution of Tasks in Gowalla Gowalla Foursquare # Tasks # Workers # Tasks # Workers Los Angeles 197,353 4,126 185,061 9,136 New York 118,406 3,987 577,124 17,367 London 60,180 2,294 186,755 9,711 Paris 18,932 1,829 105,998 6,095 Beijing 3,638 699 21,013 1,075 Table 4.1: Number of tasks/worker for each city in real dataset We also generate a synthetic datasets with realistic streaming workload using [66]. To generate a workload suitable for SC systems, we modeled three dierent sets of parame- ters: Temporal Parameters: We assume workers and tasks arrive following Poisson pro- cesses. In our experiments, the default Poisson arrival rates for tasks and workers are t = 20=min and w = 3=min, respectively. Subsequently, the duration of the tasks and workers were randomly sampled from closed range of [1; 4] hours and [1; 8] hours, respectively. 56 Spatial Parameters: Fig. 4.4 shows the spatial distribution of tasks from our real- world dataset in Los Angeles. As depicted, the tasks are not uniformly distributed in space. The spatial distribution is rather skewed, meaning that the density of the tasks at certain areas is higher. To model the same behavior with our synthetic workloads, we created 6 two dimensional Gaussian clusters with randomly selected means and standard deviations. Eighty percent of the tasks are sampled within the clusters and the rest are uniformly distributed. Static Parameters: In addition to the spatiotemporal parameters, we consider two other parameters. The default workload size of each experiment is 50K tasks. The task arrival rate and the number of tasks determine the duration of the simulation. Based on the duration of the simulation and the workers' arrival rate, the total number of workers may vary. The maximum number of tasks a worker can perform, i.e.,w max , is a uniformly random number from the closed interval [8; 12]. 4.3.2 Experimental Methodology We compared the results of our framework(AUC) with three other approaches: NN (i.e., nearest neighbor) as a scheduling-oblivious-matching (SOM) approach, BCHD (i.e., batched) and MONO (i.e., on-line monolithic-SC). We explained that the SOM approach is when the tasks are assigned to workers without considering the workers' schedules. In other words, the server assigns tasks to workers based on some heuristic and once the task is assigned to a worker, the worker attempts to add the task to its schedule. If it succeeds then the task gets completed and otherwise the task is dropped and will not get completed. We tried various heuristics and 57 among those the nearest neighbor generated the best results and hence, we only include the NN algorithm in our comparisons with other approaches. Our implementation of BCHD is based on the algorithms in [28]. In our implementa- tion of BCHD, we set an initial batching interval of 1 second. The rst batch will consist of the tasks that arrived in the rst second. All tasks that arrive while the rst batch is being processed are queued. Once the rst batch is processed those tasks that have been queued from the second batch will be processed by the server. This process repeats itself as new tasks arrive. The MONO algorithm is implemented based on the on-line monolithic-SC scheme. MONO is similar to AUC as both process a task as soon as it arrives at the server. However, unlike AUC, in MONO the server is responsible for both the scheduling and matching phases of the process. In our experiments, we evaluate two dierent aspects of our framework. First, we compare the assignment rate of the proposed algorithms, i.e., the percentage of tasks that are completed. We compare the assignment rate of AUC with those of BCHD and NN. The reason we do not include the results of MONO is that this algorithm uses the same heuristics as AUC. Consequently with regard to the assignment rate, the results of MONO is similar to those of AUC. Next, we focus on the scalability of Auction-SC. We compare the average processing time of a single task in AUC to those of BCHD, NN and MONO. 58 4.3.3 Experimental Results We evaluate the assignment rate of AUC and compare it with those of BCHD and NN. First, we compare the algorithms using our real-world and synthetic dataset. Subse- quently, using the synthetic datasets, we show how varying spatial and temporal param- eters of the problem can aect the assignment rate. 0 20 40 60 80 100 NN AUC BCHD Assignment Rate (%) NN AUC BCHD (a) Gowalla 0 20 40 60 80 100 NN AUC BCHD Assignment Rate (%) NN AUC BCHD (b) Foursquare 0 20 40 60 80 100 NN AUC BCHD Assignment Rate (%) NN AUC BCHD (c) Synthetic Figure 4.5: Assignment Rate of Real-Time Approaches First we compare the assignment rate of dierent algorithms. Fig. 4.5 depicts that AUC outperform NN by almost 25%. The main reason is that AUC performs the schedul- ing and matching phases in tandem. However, NN is an SOM approach where the sched- ule of the worker is not considered when a task is matched with him. Consequently, it is likely that a task gets assigned to a worker while the worker is not able to schedule it and thus, the task does not get completed. Furthermore, AUC outperform BCHD by almost a similar margin. This is because BCHD performs the matching phase, the schedule of 59 the worker is not considered and hence, a task might end up getting matched to and scheduled for a worker that was not the best worker. This in turn can lower the chances of that worker to get assigned to a new task in the future. The second reason is that while a task is waiting at the server to get processed with the next batch, depending on the length of the batching interval, it will lose some portion of its available time before its deadline, which in turn, can lower the chances of the task tting a worker's schedule. 0.1 1 10 Worker Arrival Rate (worker/min) 1 10 100 Task Arrival Rate (task/min) 0 5 10 15 20 25 Assignment Rate (%) (a) AUC Vs. NN 0.1 1 10 Worker Arrival Rate (worker/min) 1 10 100 Task Arrival Rate (task/min) 0 5 10 15 20 25 Assignment Rate (%) (b) AUC Vs. BCHD Figure 4.6: Pairwise Dierence in Assignment Rates In order to study the eect of temporal parameters of SC, we ran several experiments using dierent pairs of task arrival rates (t rate ) and worker arrival rates (w rate ). To better evaluate the dierence between alternative algorithms, in Figs. 4.6(a) and 4.6(b) we performed a pair-wise comparison of dierent algorithms by taking their task completion rates. For example, Fig. 4.6(a) shows the dierence between AUC and NN. We observe that all approaches perform similarly at the two extreme cases, i.e., high task-low worker and low task-high worker. AUC outperforms NN and BCHD up to 20% when the problem is more complex, i.e., outside the extreme cases. 60 10 0 10 1 10 2 10 3 5 10 25 50 Avg Processing Time (ms) Worker Arrival Rate (worker/min) (a) MONO 10 0 10 1 10 2 10 3 5 10 25 50 Avg Processing Time (ms) Worker Arrival Rate (worker/min) (b) AUC 10 0 10 1 10 2 10 3 5 10 25 50 Avg Processing Time (ms) Worker Arrival Rate (worker/min) (c) BCHD Figure 4.7: Average Processing Time for a Single Task The last set of experiments focus on comparing the scalability of AUC with BCHD and MONO. We can measure the scalability of a SC systems by the average time required to process a task which in turn indicates the throughput of the system. Fig. 4.7 illustrates the average processing time of a single task in dierent algorithms for dierent worker arrival rates. As shown in Fig. 4.7, as the arrival rate of workers increases, the average processing time of a single task in AUC does not change since AUC utilizes the workers for scheduling an incoming task. On the contrary, with MONO and BCHD, in addition to performing matching, the server also performs scheduling for all the workers. Consequently, with these algorithms, the average processing time of a single task grows as we increase the number of workers and is several orders of magnitude higher than that of AUC. To provide a more practical perspective, in Fig. 4.8 we compare the scalability of dierent approaches given the current minimum requirements of a ride sharing application 61 in New York City [4]. As shown, while MONO and BCHD cannot satisfy the current requirements, AUC can scale much higher than what currently is needed. 10 0 10 1 10 2 NN AUC DESIRED BCHD Throughput (tasks/sec) Figure 4.8: Real-world Scalability Requirements 62 Chapter 5 Price-aware Ridesharing 5.1 Problem Denition and Preliminaries In this section, we dene the terminologies, and formally dene the problem under con- sideration. 5.1.1 Basic Concepts The road network is represented as a graph G(V;E), where each node represents inter- sections, and each edge represents a road segment. Each edge (i;j)2 E (i;j2 V ) is associated with a weight c(i;j) which is a travel cost (can be either time or distance) from i to j. The shortest path cost (s;t) is dened as a minimal cost path connecting s and t. Denition 5.1 (Ride Request). A ride request r can be represented ashs r ;e r ;w r ; r ; r i consisting of a starting point s r 2V and an end point e r 2V . Each request also species w r as the maximum time the rider can wait after making a request and r (s r ;e r ) is the 63 maximum detour the rider can tolerate. In addition, a rider's prole r : R + ! [0; 1], species the relative discount in exchange for an incurred detour of 2R + . We show the set of all requests withR. Upon the acceptance of a request, APART assigns it to a driver. Denition 5.2 (Driver). A driver d is represented ashR d ;n d ; d i whereR d R is the set of ride requests assigned tod, andn d is the maximum number of requestsd can accept at any point in time. A driver also has a prole d :R + ! $ 1 which species the monetary cost of d providing service to its assigned request for a duration !2R + . Denition 5.3 (Schedule). Schedule d for driver d on the set of requestsR d with n requests, is an ordered sequencehx 1 ; ;x 2n i, of pick-up and drop-o points of requests inR d , where for each r2R d , s r precedes e r in d . Schedule d is feasible for driver d, if it satises the following conditions: • The riders' waiting time constraint: for any request r2R d , the waiting time from the time the request is made until d arrives at s r should be less than w r . • The driver's capacity constraint: the number of riders in the vehicle cannot exceed the total capacity n d . (jR d jn d ) • Detour constraint: the actual distance of every rider's trip ( ^ r (s r ;e r ) or simply ^ r ) should be less than (1 + r )(s r ;e r ). • The driver's and all riders' (the new rider and the those already in the vehicle) monetary constraints (See Section 5.4.1) 1 In this paper, we show monetary values with $ 64 The driver follows the sequence of picking up and dropping o riders. For every two nodes x a ;x b 2 d where x a precedes x b , we show the cost of traveling from x a to x b in schedule d with d (x a ;x b ). Furthermore, The schedule changes over time as riders are serviced (picked-up/dropped-o) and new requests are added to the schedule. In fact, adding a new request to a schedule can re-order some requests that already exist in the schedule. For example, when a new requestr 3 arrives, the initial schedule ofhs 1 ;s 2 ;e 1 ;e 2 i can be reordered tohs 1 ;s 2 ;s 3 ;e 2 ;e 3 ;e 1 i, where rider 1 is dropped o after rider 2. Denition 5.4 (Matching). Assuming we have a set of driversD and a set of requests R, we call MDR a matching if for each r2R there is at most one d2D such that (d;r)2M. We call (d;r)2M a match. In a matchingM, for every driverd, there exists a feasible schedule d , such that (d;r)2 M =) r2R d (or simply r2 d ). In Section 5.4.1, we dene a generic pricing model where given a driver and its sched- ule, the pricing model computes the nal fare each rider has to pay, the income of the driver and the ridesharing platform's prot. Subsequently, we can dene the ridesharing problem as follows: Denition 5.5 (Ridesharing Problem). Given a set of ride requests R and a set of driversD, the goal of the ridesharing problem is to nd a matching M betweenR andD such that the revenue of M is maximized. 65 5.1.2 Complexity Analysis The ridesharing problem is NP-Hard since the Vehicle Routing Problem (VRP) [26] is reducible to the ridesharing problem in polynomial time. A globally optimal solution to the ridesharing problem can be achieved when a Clairvoyant exists which knows what requests are going to be submitted to the framework at what time and also has the knowl- edge of which drivers are going to be available, in advance. However, in this paper we study the on-line version of the problem, i.e., the framework has no knowledge regarding future requests and incoming requests have to be matched with drivers as soon as they are submitted to the framework. The optimality of on-line algorithms are usually analyzed using competitive ratio [64], i.e., an algorithmA is called c-competitive for a constant c> 0, if and only if, for any inputI the result ofA(I) is at mostc times worst than the globally optimal solution. In the following we show no on-line algorithm can achieve a good competitive ratio for ridesharing problem. Theorem 5.1. There does not exist a deterministic on-line algorithm for the ridesharing problem that is c-competitive (c> 0). 5.1.3 Dispatch Requests APART considers drivers as bidders and ride requests as goods. The server plays the role of a central auctioneer in APART. Figure 5.1 explains how ride requests are dispatched to drivers at a very high level: (1) Everything starts with a passenger submitting a new ride request to the server. (2) Once the server receives a new task, it noties the available drivers in the vicinity of the pick-up location about the new request. (3) Each 66 driver independently computes his bid by nding the optimal schedule that can t the new request into his current schedule. The bidding process is performed as a sealed- bid auction where drivers simultaneously submit bids and no other driver knows how much the other drivers have bid.(4) Once all the bids are received, the server assigns the passenger to the driver with the optimal bid. Figure 5.1: Simple Scenario Algorithm 4 Dispatch(D;r;startTime) Input: D is the set of currently available drivers, r is a new request and startTime is the current time Output: d2D as the driver that request r is assigned to 1: d opt null 2: Bids r ; 3: D r EligibleDrivers(r) 4: for d2D r do 5: bid r d ComputeBid(d; d ;r;startTime) 6: Bids r Bids r [fbid r d g 7: end for 8: d opt arg max d fbid r d jbid r d 2Bidsg 9: return d opt Algorithm 4 outlines the process of assigning an incoming request r, whereD r is the set of eligible drivers for request r (line 4). For each candidate driver d, the ComputeBid method (line 5) is executed to perform scheduling and compute d's bid (Section 5.1.4). 67 Subsequently, the platform chooses the driver with the highest bid. In case of a tie in line 8, the algorithm randomly selects one driver among the ones with the highest bid. Notice that in practice all the iterations of the for loop in Algorithm 4 (lines 4-7) run in parallel. 5.1.4 Bid Computation Once a driver is notied of a new request, it has to compute a bid. The bid each driver generates re ects the prot the system can gain if the request is assigned to that driver. Once the ridesharing application on the driver's phone receives the request, it generates a bid and submits the bid to the server. When a new request is assigned to a driver, he will be notied with an updated schedule. This means that the human driver's interac- tion with APART is limited to conguring and reporting his prole on the ridesharing application. Algorithm 5 ComputeBid(d; d ;r;startTime) Input: d is a driver with schedule d , r is a new request and startTime is the current time. Output: additional profit that d can generate by accepting r 1: src fs 0 r jr 0 2 d g 2: src src[fs r g 3: newProfit; FindBestSchedule(d; d ;r;startTime) 4: oldProfit GetProt(d; d ;startTime) 5: additionProfit newProfitoldProfit 6: return additionProfit Algorithm 5 outlines the bid computation process. First, the algorithm calls Find- BestSchedule (line 3) which nds the best valid schedule and its corresponding prot. Because each driver's bid is the additional prot that the new request can generate for 68 the platform, the algorithm calculates oldProfit for d's original schedule using the Get- Prot method (line 4). Finally, the additional prot that d can generate by accepting r is the dierence betweennewProfit andoldProfit. The FindBestSchedule method uses a branch-and-bound technique to search for the optimal schedule that can t the new request into a driver's existing schedule. The GetProt method takes a valid schedule () and a driver d as input and computes the total fare collected from the passengers serviced with and the total cost of d performing and returns the dierence between the collected fare and cost as the prot of the schedule. In Section 5.4 we explain how the fare and cost of rides are computed. Further details of the FindBestSchedule and GetProt methods are out of the scope of this paper and can be found in [8]. Once drivers submit their bids, the server selects the driver with the highest bid and assigns the new request to that driver. 5.2 Competitive Bidding With APART, a driver's income is as much as his reported cost. This is called the rst-price auction scheme. Here, we explain how bidders can compute their bids in a rst-price scenario to manipulate the system and increase their own income. We assume the bidders know how many other bidders are participating in the auction and also know the distribution of their bids, but not the exact value for everyone else's bids. We show the probability density function and the cumulative distribution function of the bids with f(:) and F (:), respectively. Also, for every bidder i with valuation v i , we assume there exists a strategy function s i (:) that bidderi applies to its valuation to compute what bid 69 to submit. We are interested to nd the optimal s i (:) such that bidderi's expected utility E[u i ] is maximized. Before we continue, we make two assumptions: 1. The strategy function s i (:) for every user is strictly increasing. In other words, if v 1 <v 2 then s i (v 1 )<s i (v 2 ). 2. We will restrict our search to symmetric equilibria (i.e., all bidders use the same equilibria strategy). We show everything from the point of view of bidder 1. However, since we are consid- ering only symmetric equilibria, the computation will be the same for all other bidders. We start by dening the expected utility of bidder 1 as: E[u 1 ] = (v 1 s 1 (v 1 ))Prob [win 1 ] (5.1) where Prob [win 1 ] is the probability of bidder 1 winning. To compute Prob [win 1 ], rst we consider a single bidder i. For bidder 1 to win over bidder i we need to have s i (v i )<s 1 (v 1 ). Prob [s i (v i )<s 1 (v 1 )] =Prob v i <s 0 i (s 1 (v 1 )) =F s 0 i (s 1 (v 1 )) =F (v 1 ) The last equation holds because all bidders use the same strategy function and thus, s i (x) =s j (x) for every bidder i and j. 70 Bidder 1 wins if her bid is higher than all other n 1 bidders. Since every bidder i's bid (i6= 1) is independent of other bidders we can say: Prob [win 1 ] = (Prob [s i (v i )<s 1 (v 1 )]) n1 =F (v 1 ) n1 Now we can rewrite Equation (5.1) as: E[u 1 ] = (v 1 s 1 (v 1 )) F (v 1 ) n1 (5.2) We can maximizeE[u 1 ] by dierentiating Equation (5.2) w.r.t. b 1 and setting it equal to zero, where b 1 is bidder 1's bid. In other words, b 1 =s 1 (v 1 ). @ @b 1 E[u 1 ] = 0 @ @b 1 (v 1 b 1 ) F s 1 1 (b 1 ) n1 = 0 Using the Chain Rule and the Product Rule we get: (v 1 b 1 ) @F s 1 1 (b 1 ) n1 @F s 1 1 (b 1 ) @F s 1 1 (b 1 ) @ s 1 1 (b 1 ) @ s 1 1 (b 1 ) @b 1 =F s 1 1 (b 1 ) n1 We know that the derivative of the cumulative probability function F (:) is the prob- ability density function f(:). Also: @ @b 1 s 1 1 (b 1 ) = 1 s 0 1 s 1 1 (b 1 ) 71 Then we will have: (v 1 b 1 ) (n 1)F s 1 1 (b 1 ) n2 f s 1 1 (b 1 ) s 0 1 s 1 1 (b 1 ) =F s 1 1 (b 1 ) n1 (5.3) (v 1 b 1 ) (n 1)f s 1 1 (b 1 ) s 0 1 s 1 1 (b 1 ) =F s 1 1 (b 1 ) (5.4) Knowing that s 1 1 (b 1 ) =v 1 , we can re-write Equation (5.4) as: (v 1 b 1 ) (n 1)f (v 1 ) 1 s 0 1 (v 1 ) =F (v 1 ) (5.5) We can set b 1 =s 1 (v 1 ) and re-arrange Equation (5.5) and get: s 0 1 (v 1 ) = (n 1) f (v 1 ) (v 1 s 1 (v 1 )) F (v 1 ) (5.6) Solving the dierential equation of Equation (5.6) yields the optimal strategy function s 1 (:) for bidder 1 which gives her what to bid for a valuation ofv 1 . To solve Equation (5.6), we need to know f(:) and F (:) (i.e., the probability density function and cumulative density functions of the bids). For example, assuming the bids are uniformly distributed in the range [0;v max ] we will have: 8x2 [0;v max ] f(x) = 1 v max and F (x) = x v max Using these values for f(:) and F (:) in Equation (5.6) we get: s 1 (v 1 ) = n 1 n v 1 (5.7) 72 Which give the optimal bidding strategy if every driver's valuation is a uniform random variable in [0;v max ]. 5.3 The Latent Space Transition Model In Section 5.2, we showed how the drivers can take advantage of the platform if they know how many bids are being submitted and the distribution of the bids. In this section, we show how the drivers can estimate the number of bids, based on historical data. We assume for each location p and time t, there are a potential of t p ride requests submitted to the system. A transition matrix A t called the network demand at time t shows the fraction of riders moving from one location to another location at each point in time. Thepq-th entry ofA t , shown as t pq , gives the fraction of riders going from location p to location q at time t. This means that the number of riders requesting a ride from p to q at time t is given by t p t pq . We can compute the number of available drivers at location p at the start of time t as: v t p = X q t1 qp min v t1 q ; t1 q + t p (5.8) where t p is the number of drivers who enter the system at time t in location p. To predict the number of drivers that enter the platform (i.e., t p ) we use the approach in [22] where a grid-based Gaussian mixture model is introduced to predict the number of passengers for taxi bookings at a specic time and location. The other key component in estimating the number of available drivers at each location (Equation (5.8)) is learning the network demand matrix at each point in time. In the 73 remainder of this section, we introduce a Latent Space Transition Model (LSTM) where we model the demand network as a Latent Dirichlet Allocation (LDA)[12] model. We rst explain how we can model the network demand matrix as an LDA and subsequently explain how we can learn the parameters of the model using historic data. 5.3.1 Network Demand Model An LDA is a generative probabilistic model for collections of discrete data, in which each item of a collection is modeled as a nite mixture over an underlying set of topics. Each topic is, in turn, modeled as an innite mixture over an underlying set of topic probabilities[12]. For example LDA is widely used in natural language processing where the observations are words in documents. LDA then assumes each document is a mixture of a small number of topics and tends to nd patterns that probabilistically associate the words in the document to topics. With regard to ridesharing environments, various factors (hereafter, topics) such as locality (i.e., source and destination neighborhoods), time and weather, can in uence the network demand. Rowp of the matrixA t (i.e.,A t p ), gives the probability distribution over the destination of requests submitted at time t in location p. Therefore, we can think of A t p as a collection of destinations where the dierent topics impact the probability of each location being the destination of those specic requests. For example, a high demand for rides from a business district in a city to residential neighborhoods can be attributed to the locality topic. In the remainder of this section we assume that the entire geographical space is dis- cretized into smaller regions using a grid index. Similarly, we assume the temporal space 74 is discretized into equal length time slots. For a set of ride request data 1 ; 2 ; ; d in the form of (source;destination;time), we dene a spatial document as: Denition 5.6 (Spatial Document). For every source region p and time t, we dene the Spatial Document X t p as a vector of length w such that x t pq for 1 q w records the number of requests originated in location p for destination q at time t. We assume each spatial document is a k component Multinomial Mixture Model (MMM). Consequently, each spatial document is modeled as an independent draw from the probability mass function: p(x i ) = k X j=1 w X p=1 x ip ij jp (5.9) where, • j 2 R w , w is the total number of regions and jp is the unknown probability of selecting region p from topic j. • ij is the unknown probability of selecting topic j for document i, such that P k j=1 ij = 1. Consequently, the generative model for each destination region in spatial document i with latent variables ( i ; 1:k ) can be written as: 1. We draw latent topic indicators z ip iid Mult( i ). 2. For each topic z ip , we can draw x from the topic's multinomial distribution: xjz ip ind Mult( z ip ) 75 5.3.2 Parameter Inference In order to learn the parameters of the model introduced in Section 5.3.1, we utilize the expectation maximization (EM) algorithm. The EM algorithm iteratively nds the maximum likelihood estimates of parameters of a statistical model which depends on unobserved latent variables. Each iteration of the EM algorithm consists of two steps. The expectation (E) step which estimates each latent variable z's conditional on the observed data p(z 1:n;1:w jx 1:n;1:w ; 1:n ; 1:k ). Subsequently, the maximization (M) step, nds the corresponding parameters that maximize the expected log-likelihood w.r.t. the latent variables estimated in the E step. Before we explain each step in more details, we need to have the complete log- likelihood function for the model introduced in Section 5.3.1. The log-likelihood for our moder can be computed as: L ; =logp(x 1:n;1:w ;z 1:n;1:w ; 1:n ; 1:k ) = n X i=1 w X p=1 logp(x ip ;z ip ; i ; 1:k ) (5.10) = n X i=1 w X p=1 log k X j=1 Ifz ip =jg x ip iz ip z ip p whereI is the indicator function. 76 5.3.2.1 E-Step In this step, we estimate the conditional distribution of each z ip given x ip (p(z ip = jjx ip ; i ; 1:k )). By Bayes's rule: p(z ip =jjx ip ; i ; 1:k )/p(z ip =j; i )p(x ip jz ip ; 1:k ) (5.11) By plugging in the densities based on Equation (5.9) we get: p(z ip =jjx ip ; i ; 1:k ) = ij jp P k l=1 il lp (5.12) For simplicity, we show this estimated conditional distribution as ipj . 5.3.2.2 M-Step This step will maximizes the expected complete log-likelihood. Given the updated latent variables from the E-Step, it can be derived that the following setting updates the log- likelihood of our model: ij = P w p=1 x ip ijp P w p=1 x ip jp = P n i=1 x ip ijp P w q=1 P n i=1 x iq ijq (5.13) Algorithm 6 shows the overall process of inferring the parameters of the LSTM. Algorithm 6 shows the overall process of inferring the parameters of the statistical model introduced in Section 5.3.1. In Lines 2 and 5, Dir() represents the Dirichlet distribution and is used to initialize values for and. We used the Dirichlet distribution since it is the conjugate prior for the Multinomial distribution. In Line 7 the log-likelihood 77 Algorithm 6 EM(X 1:n ; 1 ; 2 ) Input: X 1:n aren spatial documents, 1 and 2 are vectors of positive reals with size k and w respectively. Output: ; as the parameters of the Multinomial Mixture Model 1: for i = 1 :n do 2: i Dir( 1 ) 3: end for 4: for k = 1 :K do 5: k =Dir( 2 ) 6: end for 7: L 0 ; = compute likelihood from Equation (5.10) 8: converge = false, n 1 9: while (!converge) do 10: update from Equation (5.12) 11: update and from Equation (5.13) 12: L n ; = compute likelihood from Equation (5.10) 13: if (L n ; L n1 ; <) then 14: converge = true 15: end if 16: n n + 1 17: end while 18: return ; 78 of the data is computed with the initial values of and . The while loop (Line 9-17) iteratively performs the E-Step and M-Step to update , and the auxiliary variable . After each step, it computes the log-likelihood with the updated parameters and if the improvement of the log-likelihood is less than a predened , it terminates the algorithm and returns the nal parameters. Each spatial document X, corresponds to one row of the transition matrix, A. As- suming the corresponding spatial document of A t p is X i , each cell of matrix A can be derived as follows: t pq = k X j=1 ij jq (5.14) where index i in ij refers to the spacial document X t p . 5.4 The SPARP Mechanism 5.4.1 Pricing Model Every request r has a default fare based on the duration of the shortest trip, (s r ;e r ) (For convenience, we show this as r ). We dene function F :R + ! $ such that F ( r ) is the default fare of a ride. In addition, we show the actual route between the two end points of a ride with ^ r and dene the detour of a ride as r = ^ r r . As explained in denition Denition 5.1, each request is associated with a prole as a tool for the rider to specify how much discount he expects to receive in return for a certain amount of detour on his trip. 79 Subsequently, for a request r with shortest trip r , detour r and a prole r , the nal fare is represented as: fare(r) =F ( r ) r ( r ) (5.15) Every driver has a prole which allows them to set their minimum expectations for participating in the platform. A driver's prole represents the cost of the driver partic- ipating in the platform. The prole can depend on any number of parameters; e.g., the length of the trip, the number of passengers, etc. Without loss of generality, we assume a driver's prole only depends on the duration for which the driver provides service. Drivers can misreport their actual proles if it is to their best interest. We show driver d's reported prole as ^ d 2 where is the set of all possible proles. At any point in time, each driver has a schedule. Therefore, for every driver d, the cost of servicing schedule d based on the reported prole ^ d is: cost( d ; ^ d ) = Z end d start d I t d 6=hi : ^ d (t)dt (5.16) Where I is the indicator function and t d is the driver's schedule at time t. In addition, start d and end d are the rst pick-up time and last drop-o time of d . For every request r assigned to a driver, depending on the driver's reported prole ( ^ d ) the driver has to pay a prot to the platform provider ((r)). We can dene driver d's payments to the platform and income for servicing schedule d as: payment( d ; ^ d ) = X r2 d r (5.17) 80 income( d ; ^ d ) = X r2 d fare(r)payment( d ; ^ d ) (5.18) For every driverd, we dene the utility as the dierence between his income and cost for servicing schedule d : u( d ; ^ d ; d ) =income( d ; ^ d )cost( d ; d ) (5.19) If a driver does not participate in the ridesharing platform, both the cost and income is zero. We say the platform is individually rational if no driver receives a negative utility by participating in the platform. Another crucial aspect of the framework should be preventing the drivers to strategically manipulate the platform by misreporting their proles, which is known as truthfulness. Denition 5.7 (Truthfulness). A platform is truthful if and only if for every driver d, 8 ^ d 2 ^ ^ d 6= d , u( d ; ^ d ; d )u( d ; d ; d ). That is, a platform is truthful if the dominant (most protable) strategy for drivers is to report their proles truthfully. The overall goal of the platform is to maximize the revenue of the platform provider. Denition 5.8 (Revenue). Given the matching M(D;R) between the set of driversD and requestsR, the revenue of the platform provider is revenue(M(D;R)) = X d2D payments( d ; ^ d ) (5.20) 81 We call a framework budget balanced if revenue(M(D;R)) 0. Otherwise, we say the framework runs a decit. 5.4.2 Payments In Section 5.4.1 we mentioned that for every requestr assigned to a driver, the driver has to pay r to the platform provider. In this section we explain what r should be so that the platform be: 1. individually rational; i.e., the drivers do not end up with a negative utility. 2. truthful; i.e., the drivers cannot manipulate the framework by misreporting their proles. Based on Algorithm 5 the value of the bid driver d submits to the server for request r (bid r d ) is equal to the additional prot driver d generates by accepting a new request r. Theorem 5.2. If for every request r assigned to driver d, r bid r d then the ridesharing platform is individually rational. Intuitively, by adopting a rst-price auction scheme (i.e., for every request r, r = bid r dopt where d opt is the driver with the highest bid), the platform can maximize its rev- enue while remaining individually rational. However, computing the prot of a schedule depends on the driver's reported prole. Consequently, a driver's reported prole can eventually aect the driver's bid. The disadvantage of setting r =bid r dopt is that drivers would have the incentive to lower their bids by misreporting their proles and hence, the framework will not be truthful. Theorem 5.3 shows by adopting a second-price auction scheme, the drivers do not have an incentive to misreport their proles. 82 One of the key features of the second-price auction scheme is truthfulness. In The- orem 5.3 we show how truthfulness is guaranteed in APART by adopting the second price-auction scheme. Theorem 5.3. If for every request r assigned to driver d, r is equivalent to the value of the second highest bid, then the platform is truthful. With r > 0 for every requestr, we can guarantee that the platform is budget balanced. Lemma 5.1. If for every request r, r > 0, then the ridesharing platform is budget balanced. Asking the drivers to pay an equivalent amount to the second highest bid takes away any incentives for the drivers to misreport their proles. However, if there is only one driver who can t the request in his current schedule and thus there will be no second highest bid and the driver will not pay anything to the platform. To avoid situations like this, the platform can set a reserved price for every request. Denition 5.9 (Reserved Price). For every request r, the reserved price bid r server , is a hidden minimum price the platform sets for the payments it expects from the winning driver. The server treats the reserved price as another bid. If there is no other bid higher than the reserved price, the server does not assign the request to any driver. Otherwise, the reserved price guarantees the second highest bid is not 0. With APART, for every request, the reserved price is the dierence between the requests default fare and the cost of the most expensive driver servicing that request: 83 8r; bid r server =F ( r ) ( r ) and = arg max f(w r )j2 g where is the set of all possible proles for the drivers. 5.5 Performance Evaluation 5.5.1 Ridesharing Dataset We evaluate our algorithms using one month (May, 2013) of New York City's taxi dataset [4], which contains 39437 drivers and around 500,000 trips per day. Each ride in the dataset has a pick-up latitude/longitude, a drop-o latitude/longitude and request time. We extracted the road network of New York City from Open Street Map (OSM), which is represented as an undirected graph with 55,957 vertices and 78,597 edges. Subse- quently, we mapped the source and destination of each trip to the road network. Similar to [37], we maintain a cache for shortest paths between vertices, which means that the shortest path can be found in constant time. Initially, each driver is randomly located on one vertex of the road network. When the vehicle is serving rider requests, we assume it is following the schedule and moving constantly towards the destination. 5.5.2 Experimental Methodology We compared the results of our framework (APART) with two other approaches: TREE (i.e., Kinetic tree [37]) from academia and NN (i.e., Nearest Neighbor) from industry. Our implementation of TREE is based on the algorithms in [37]. Since TREE [37] does not provide any pricing model, once a ride is completed, we compute its incurred 84 detour and use Equations (5.15) and (5.17) to compute the platform's revenue. Also, to make the comparison fair, before assigning a request to a driver we perform prole matching to insure the provider does not end up loosing money. If the proles were not compatible, we select the next driver with the shortest increase in travel distance. The NN algorithm is implemented based on the current approach adopted by major ridesharing platforms such as Uber. To the best of knowledge, these platforms nd the rst nearest driver to the pick-up location of a new request. If the driver is able to t the new request in its schedule without violating any constraints, he accepts the request. Otherwise the request is rejected and the algorithm tries to assign the request to the next nearest driver. This continues until a driver accepts the request, or every driver rejects it in which case the request is dropped. In addition to comparing the generated revenue of APART with those of TREE and NN, we also compared their service rate as the percentage of requests that were completed and the response time for matching a request with a driver. Furthermore, to evaluate LSTM's prediction precision, we compare its performance in predicting the transition model to that of LORE [69]. LORE models transition prob- abilities based on their previous locations using Additive Markov Chains. In our evaluation we used the k-fold cross validation (k=4) method to divide our data into test and train sets and for each fold compared the models built with LSTM and LORE with the actual transition probabilities for the test data. To compare the transition probabilities of the test data to those from the output of the model, we used the Kullback- Leibler divergence (KLD) metric [44]. The reason for choosing KLD is that unlike other widely used metrics (e.g. the Jensen-Shanon divergence [58]), KLD is not symmetric and 85 is best used when measuring how a probability distribution (model output) diverges from an expected probability distribution (test data). With KLD, we measure the divergence of probability distribution Q from P as: KLD (PkQ) = X i P (i) log P (i) Q(i) Based on the denition of KLD, the lower the KLD score is, the closer Q is to P where a KLD score of 0 means that Q and P are equal. Subsequently, we compare the proposed second-price auction with reserved price scheme (SPARP) with the regular second-price auction scheme (SPA) and the rst-price auctions scheme where drivers are not truthful and bid competitively (FPACB). We com- pare the generated revenue in dierent pricing mechanisms and also evaluate the eects of bidding competitively on the workers utility in a rst-price auction scheme. Table 5.1 shows the dierent values we used for various parameters in our experiments (default values are shown in bold). Parameter Values Gide Size (km) 1, 2, 3, 4, 5 Time Slot Size (hour) 1, 2, 3, 4, 5, 6 Max Wait Time (min) 3, 6, 9, 12, 15, 20 # of Drivers 1000, 2000, 5000, 10000, 20000 Max Passengers 2, 3, 4, 5, 6 Max Allowed Detour 25%, 50%, 75%, 100% Table 5.1: Parameters for Algorithm Comparison We congure the pricing model as: F ( r ) =2 r 86 8r; r ( r ) =1 2 r 8d; d (l) =1:5l 5.5.2.1 Experimental Results As mentioned, the main objective of APART is to maximize the ridesharing platform's revenue. In this experiment, we compare the generated revenue of each algorithm. To- wards that end, we apply the pricing model explained in Section 5.4.1. Here, we want to evaluate the eect of varying the parameters on the revenue and compare dierent algo- rithms. In the following experiments, we apply dierent pricing models to the algorithms and compare revenue under dierent pricing models. 0 0.2 0.4 0.6 0.8 1 3min 6min 9min 12min 15min 20min Revenue ($ Millions) APART NN TREE (a) Maximum Wait Time 0 0.2 0.4 0.6 0.8 1 1000 2000 5000 10000 20000 Revenue ($ Millions) APART NN TREE (b) Number of Drivers 0 0.1 0.2 0.3 0.4 0.5 0.6 2 3 4 5 6 Revenue ($ Millions) APART NN TREE (c) Maximum Passengers 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 25% 50% 75% 100% Profit ($ Millions) APART NN TREE (d) Maximum Allowed Detour Figure 5.2: Comparing Revenue of the Algorithms 87 Fig. 5.2 shows that regardless of the values of dierent parameters, APART generates more revenue than any other approaches. The main reason for higher revenue is that APART is designed to make a price-aware assignment, i.e., assign the request to a driver that generates the most prot. On the other hand, the TREE and NN algorithms were not designed to maximize revenue. As explained in Section 5.4.1, the pricing models that are used in APART are designed such that the higher prots are not gained by scamming the riders. Next, we evaluate the eect of the pricing model to show the importance of designing a fair pricing model. We utilize the three approaches with the model in [50] and show how some riders may suer by participating in ridesharing. Subsequently, we perform some experiments utilizing the pricing model in [51] and show that as a result of price-aware assignments, regardless of the model, APART generates more revenue for the platform provider. Fig. 5.3 shows the result of utilizing the pricing model in [50]. Based on this pricing model, the driver's income is: c:d 1 + (1 +):c:d 2 whered 1 is the distance the driver had only one rider on-board,d 2 is the total distance the driver had more than one rider on-board andc is some predened constant. takes a value between 0 and 1 which determines the increase in the driver's income for serving more than one rider. As we show in Fig. 5.3(a), by participating in ridesharing, the majority of riders save money (pay less as compared to riding alone). Fig. 5.3(a) supports the claim in [50] that on average riders will save money. However, Fig. 5.3(b) shows that regardless of what algorithm is used, up to 10% of riders pay more by participating in ridesharing which is not acceptable. The reason is that, riders have to pay even for detours. Even 88 though riders split the fare on detours, if detours are suciently long, even carpooling riders loose money. 70 75 80 85 90 95 100 0 0.5 0.8 Service rate (%) APART NN TREE (a) Rider's that Saved Money 0 2 4 6 8 10 12 0 0.5 0.8 Service rate (%) APART NN TREE (b) Rider's that Lost Money Figure 5.3: Fairness of Pricing Models 5.5.2.2 Service Rate (APART Vs. TREE and NN) In this set of experiments we compare the service rate of the three approaches. As shown in Fig. 5.4, all algorithms generate high service rates when the constraints are relaxed or there is high resource availability. However, under tight constraints or limited resources, APART outperforms the other two approaches by up to 20%. In the previous section we showed how APART copes with the dynamism in the system better than the other two approaches. 5.5.2.3 Response Time (APART Vs. TREE and NN) Similar to [50, 37], APART instantly processes a request once it is submitted. In order to evaluate the scalability of our framework, our next set of experiments evaluate the response time of processing a single request. Fig. 5.5(b) shows that when more drivers are added, the scalability of TREE suers as it has to perform scheduling for a larger number of vehicles. On the other hand, due to 89 0 20 40 60 80 100 3min 6min 9min 12min 15min 20min Service rate (%) APART NN TREE (a) Maximum Wait Time 0 20 40 60 80 100 1000 2000 5000 10000 20000 Service rate (%) APART NN TREE (b) Number of Drivers 0 20 40 60 80 100 2 3 4 5 6 Service rate (%) APART NN TREE (c) Maximum Passengers 0 20 40 60 80 100 25% 50% 75% 100% Service rate (%) APART NN TREE (d) Maximum Allowed Detour Figure 5.4: Comparing Service Rate of the Algorithms the distributed nature of APART's auction-based approach, each driver does scheduling for itself and adding drivers does not aect the overall response time of APART as much. In Fig. 5.5(c) we observe that although APART's response time does not go beyond 5ms, TREE handles the increase in maximum passengers better due to the Kinetic Tree structure implementation [37]. The reason for NN's poor performance is that it has to perform scheduling computation sequentially, for possibly multiple drivers. Finally, in Fig. 5.5(a) and Fig. 5.5(d) we conclude that for relaxed constraints, the response time of TREE increases up to 4 times higher than that of APART. The main reason is that the Kinetic Tree structure keeps track of all valid orders of requests that are assigned to a driver. As we relax the constraints, the number of feasible permutations of the requests increases which makes the size of the Kinetic Tree larger and updates become more expensive. This in turn increases the response time. Fig. 5.5 shows unlike the other 90 0 5 10 15 20 25 30 3min 6min 9min 12min 15min 20min Response time (ms) APART NN TREE (a) Maximum Wait Time 0.1 1 10 1000 2000 5000 10000 20000 Response time (ms) APART NN TREE (b) Number of Drivers 0.1 1 10 2 3 4 5 6 Response time (ms) APART NN TREE (c) Maximum Passengers 0 1 2 3 4 5 6 7 8 25% 50% 75% 100% Response time (ms) APART NN TREE (d) Maximum Allowed Detour Figure 5.5: Comparing Response Time of the Algorithms two approaches, APART's scalability does not suer by varying dierent parameters of the framework. 5.5.3 Comparing Dierent Pricing Models In the next set of experiments, we apply the model in [51] and evaluate the performance of APART and TREE. In this model, riders get compensated for any detour incurred in their trip. The amount of compensation is based on the new rider's fare and the length of a rider's detour compared with the detour of other riders on the vehicle. Because the algorithm in [51] is similar to TREE, we only compared APART with TREE. Fig. 5.6(a) shows that APART provides a slightly higher service rate than TREE. However, due to assigning the riders to the most protable drivers, APART ends up generating 10% more revenue. 91 0 20 40 60 80 100 APART TREE Service rate APART TREE (a) Service Rate 0 5 10 15 20 25 30 35 40 APART TREE Revenuse ($ thousands) APART TREE (b) Revenue Figure 5.6: Eect of Applying an Arbitrary Pricing Model In Section 5.4, we mentioned by setting their proles, users can congure APART to make assignments the way they nd desirable. In the last set of our experiments, we use two dierent congurations to represent the riders' proles. First, we set the riders' proles to f T (d r ) = 1 (dr +1) . Such prole is suitable for a rider who wants to minimize his detour and is willing to share a ride only if the detour is short. Since the rider sets Tight constraints we show this prole by f T . In the second iteration, we set the prole of the riders to f R (d r ) = 1 ( dr max ). This prole is more Relaxed (hence, f R ) and it is expected that more riders share a trip. Fig. 5.7 shows the result of utilizing APART and TREE with f T andf R . Since TREE does not make price-aware assignments, the results in both iterations were the same. However, as we observe with APART T, almost 10% fewer riders ended up sharing a ride while on average they only observed 6-7% increase in their trips. On the other hand, with APART R, almost every rider shares a ride and the average increase in their trip was almost 20%. An interesting observation in Fig. 5.7 is that with APART R, more riders share a ride compared to TREE while their average detour was still less. 92 80 85 90 95 100 APART T SP APART R 0 10 20 30 Shared rides (%) Avg Detour (%) Avg Detour Shared rides Figure 5.7: Eect of Proles In conclusion, APART is agnostic of the price model and is able to generate more prot. In addition, APART supports dierent types of riders' expectations by adjusting the proles. 5.5.4 Accuracy of the LSTM Prediction Model In the next set of experiments, we compared the prediction precision of LSTM to that of LORE. Figures 5.8(a) and 5.8(b) depict the KLD scores of LSTM and LORE for varying time slots and cell sizes, respectively. The reason LSTM predicts transition probabilities better than LORE is because dierent topics can capture transition patterns that are observed in the training data. To better illustrate the improvements of LSTM over LORE, in Figure 5.9 we did a pairwise comparison between the KLD scores of LSTM and LORE. Each bar shows the percentage of improvement we observed for each (Cell Size, Time Slot Size) pair. As the number of regions and time slots increase (i.e., smaller cell and time slot sizes), more patterns can be observed in the data and yielding better accuracy for LSTM. 93 10 -3 10 -2 10 -1 10 0 1 2 3 4 5 6 Average KLD LSTM LORE (a) Varying Time Slot Size (h) 10 -3 10 -2 10 -1 10 0 1 2 3 4 5 Average KLD LSTM LORE (b) Varying Cell Size (km) Figure 5.8: Comparing Precision of LSTM Vs LORE Figure 5.9: LSTM Vs LORE - Varying Cell & Time Slot Size 5.5.5 Payment Mechanism Comparison In the next set of experiments, we compare the generated revenue of each pricing mech- anism. Towards this end, we compute the fares and cost of the drivers based on Equa- tions (5.15) and (5.16). We compare the generated revenue under dierent scenarios by varying the parameters in Table 5.1. 94 As depicted in Figure 5.10, SPA generates less revenue than FPACB since the platform will only receive a prot for each request that's equal to the second highest bid. However, by introducing the reserved price in SPARP, we manage the losses in SPA. Furthermore, because the drivers do not bid truthfully in FPACB, in most cases, SPARP slightly generates more revenue than FPACB. To better show the dierence between the revenue of each mechanism under dierent congurations, in Figure 5.11 we show the relative dierence of both SPA and SPARP when compared to FPACB. 0 0.2 0.4 0.6 0.8 1 3min 6min 9min 12min 15min 20min Revenue ($ Millions) FPACB SPA SPARP (a) Maximum Wait Time 0 0.2 0.4 0.6 0.8 1 1000 2000 5000 10000 20000 Revenue ($ Millions) FPACB SPA SPARP (b) Number of Drivers 0 0.2 0.4 0.6 0.8 1 2 3 4 5 6 Revenue ($ Millions) FPACB SPA SPARP (c) Maximum Passengers 0 0.2 0.4 0.6 0.8 1 25% 50% 75% 100% Revenue ($ Millions) FPACB SPA SPARP (d) Maximum Allowed Detour Figure 5.10: Comparing Revenue of Dierent Mechanisms In our next experiments, we changed the ratio of the drivers who bid untruthfully. As illustrated in Figure 5.12, when no driver tries to manipulate the framework, FPACB generates more prot as compared to both SPA and SPARP (in this case there is no competitive bidding so FPACB works as the normal rst-price auction scheme). However, 95 -40 -20 0 20 40 3min 6min 9min 12min 15min 20min Revenue Difference(%) SPA SPARP (a) Maximum Wait Time -40 -30 -20 -10 0 10 20 30 40 1000 2000 5000 10000 20000 Revenue Difference (%) SPA SPARP (b) Number of Drivers -20 -10 0 10 20 2 3 4 5 6 Revenue Difference (%) SPA SPARP (c) Maximum Passengers -50 -40 -30 -20 -10 0 10 20 30 40 25% 50% 75% 100% Revenue Difference (%) SPA SPARP (d) Maximum Allowed Detour Figure 5.11: Comparing Relative Revenue of SPA & SPARP Vs FPACB as the percentage of untruthful drivers increases, the platform makes less revenue and when every driver is bidding competitively in FPACB, we notice that SPARP generates almost 10% more revenue. In Section 5.2 we proved that in theory, drivers can increase their utility by bidding untruthfully. To show this in practice, we observe in Table 5.2 that when drivers bid untruthfully, their average and median income per mile increases 20-25%. At the same time, bidding untruthfully does not cost them losing many requests. We can see in Table 5.2 that the assigned requests for untruthful drivers is on average only 1 request less than truthful bidders. The last set of our experiments focuses on how the accuracy of the prediction model can aect the utility of the drivers. For this experiment we compared the drivers' utilities when they predicted the number of drivers using LSTM with when they used LORE. 96 0 0.2 0.4 0.6 0.8 1 0% 25% 50% 75% 100% Revenue ($ Millions) FPACB SPA SPARP (a) Untruthfull Drivers -40 -30 -20 -10 0 10 20 0% 25% 50% 75% 100% Revenue Difference (%) SPA SPARP (b) Untruthfull Drivers Figure 5.12: Comparing Relative Revenue of SPA & SPARP Vs FPACB Untruthful Drivers (%) 25% 50% 75% Trutful Assigned Requests (Average) 14.33 14.19 13.04 Assigned Requests (Median) 8 6 9 Income per Mile (Average) $1.00 $1.00 $1.00 Income per Mile (Median) $1.00 $1.00 $1.00 Untruthful Assigned Request (Average) 13.51 13.49 12.67 Assigned Requests (Median) 7 6 8 Income per Mile (Average) $1.24 $1.25 $1.24 Income per Mile (Median) $1.21 $1.23 $1.23 Table 5.2: Eects of Untruthful Bidding To better explain the observed results, we distinguish between the cases where LORE overestimates and when it underestimates the number of drivers. As depicted in in Table 5.3, when drivers overestimate, it causes them to take less risk in bidding. However, while this does not increase their chances of winning an auction, it does aect their average income per mile. With LSTM the drivers earn almost 25% more by bidding competitively while when they overestimate the number of drivers, they only gain close to 15% more than a driver who bids truthfully. On the other hand, when the drivers underestimate, their income per mile is up to 10% higher as compared to using LSTM. However, this is the result of bidding much lower than their true valuation and as such they do not win 97 Untruthful Drivers (%) 25% 50% 75% LSTM Assigned Request (Average) 13.51 13.49 12.67 Assigned Requests (Median) 7 6 8 Income per Mile (Average) $1.24 $1.25 $1.24 Income per Mile (Median) $1.21 $1.23 $1.23 Over Est. Assigned Requests (Average) 14.01 13.98 12.91 Assigned Requests (Median) 7 6 8 Income per Mile (Average) $1.14 $1.14 $1.15 Income per Mile (Median) $1.11 $1.12 $1.11 Under Est. Assigned Request (Average) 7.51 7.39 8.81 Assigned Requests (Median) 5 4 5 Income per Mile (Average) $1.34 $1.33 $1.37 Income per Mile (Median) $1.32 $1.31 $1.32 Table 5.3: Eects of Prediction Accuracy many auctions. As shown in Table 5.3, when drivers underestimate, they win about 50% less auctions than when they predict more accurately. 98 Chapter 6 Conclusion and Future Work In this dissertation we studied two of the main problems that arise in any spatial crowd- sourcing platform; Pricing and Task Assignment. For both problems, the objective was to maximize the platform's revenue and the assignment rate (i.e., the number of completed tasks). With pricing, we showed how current methods use the system's current demand and supply in each location to price the tasks within a spatial regions. Following, we presented ADAPT-Pricing as a dynamic pricing method in spatial crowdsouring markets which (1) utilizes the platform's predicted future demand and (2) rather than setting prices based on the tasks' origins, it sets prices based on the origin-destination pairs of tasks. We introduce P-Pricing and POD-Pricing as two aspects of ADAPT-Pricing and show that compared to a baseline approach they increase the generated revenue in each time period by up to 5% and 15%, respectively, while reducing the price of the rides by 5%. We then focused on a simple case of task assignment in spatial crowdsourcing plat- forms where every task has the same reward and the only goal is to maximize the number of completed tasks. We showed that it is not enough to only match a task with a worker, 99 but in addition have to schedule all the matched tasks for each worker. We introduced an auction-based framework in which we split the matching and scheduling responsibilities between the spatial crowdsourcing server and workers, respectively. We compared our framework with three alternative approaches; the scheduling-oblivious-matching (SOM) approach where matching and scheduling are performed separately, the batched approach where the server periodically performs matching and scheduling on a batch of tasks and the on-line monolithic approach where tasks get processed one at a time but the server performs scheduling for all workers. We showed that by exploiting the spatiotemporal aspects of spatial crowdsourcing, with our proposed algorithms, the workers will be able to complete up to 30% more tasks as compared to the SOM and batched approaches while scaling orders of magnitude higher than both batched and on-line monolithic-SC methods. Finally we showed how our auction-based framework can be applied to a ridesharing network as one of the most commonly used spatial crowdsourcing platforms. We intro- duced a pricing model for ridesharing networks where given the initial price of a ride (i.e., the output of our dynamic pricing algorithms), the platform can decide the fare each passenger has to pay and the income each driver receives. We showed how our pricing mechanism is (1) individually rational, (2) truthful and (3) budget balanced. Currently, most of the commonly used applications that t in the broad spectrum of spatial crowdsourcing, involve tasks that start and end in dierent geographical locations. E.g., picking up a passenger from point A and dropping him o at point B, picking up groceries from a store and delivering to the customer's home, etc. Accordingly, some of our models were designed based on the fact that completing a task involves workers moving 100 from a starting point to an ending point. However, there are many other applications of spatial crowdsourcing with instantaneous tasks that do not require moving between and origin and destination to complete a task. For example, a requester might ask available workers for a picture of a famous statue. Even though our models are guaranteed to not perform worse than alternative approaches, it is not clear if we observe the same level of improvements. As more data from spatial crowdsourcing platforms with instantaneous tasks become publicly available, our models can be further tested and if required improved based on behaviors that are specic to these types of platforms. Furthermore, in designing ADAPT-pricing, we optimize prices one period at a time. We explained that this choice is partially related to the fact that most of the current demand prediction algorithms use a Markov assumption where the near future demand is predicted based on the current state of the network. As demand prediction algorithms improve and are able to predict further in the future, we will not be limited to only looking one time period ahead and an interesting direction would be to study how this additional information can further improve our dynamic pricing methods. 101 References [1] Global mobile statistics 2014 part a: Mobile subscribers; handset market share; mobile operators. https://mobiforge.com/research-analysis/. Accessed: 2015- 09-30. [2] How surge pricing works. https://www.uber.com/drive/partner-app/ how-surge-works/. Accessed: 2018-01-15. [3] New york city neighborhoods. http://www1.nyc.gov/site/planning/data-maps/ open-data.page#other. Accessed: 2017-11-30. [4] New york city taxi trips. http://www.nyc.gov/html/tlc/html/about/trip_ record_data.shtml. Accessed: 2017-03-30. [5] Prime time for drivers. https://help.lyft.com/hc/en-us/articles/ 115012926467. Accessed: 2018-01-15. [6] Uberpool. https://newsroom.uber.com/us-california/ its-a-beautiful-pool-day-in-the-neighborhood/. Accessed: 2016-06-13. [7] Afsin Akdogan, Cyrus Shahabi, and Ugur Demiryurek. Toss-it: A cloud-based throwaway spatial index structure for dynamic location data. In Proceedings of the 2014 IEEE 15th International Conference on Mobile Data Management - Volume 01, MDM '14, pages 249{258, Washington, DC, USA, 2014. [8] Mohammad Asghari, Dingxiong Deng, Cyrus Shahabi, Ugur Demiryurek, and Yaguang Li. Price-aware real-time ride-sharing at scale: An auction-based approach. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS '16, pages 3:1{3:10, 2016. [9] Siddhartha Banerjee, Ramesh Johari, and Carlos Riquelme. Pricing in ride-sharing platforms: A queueing-theoretic approach. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, EC '15, pages 639{639, 2015. [10] Kostas Bimpikis, Ozan Candogan, and Daniela Saban. Spatial pricing in ride-sharing networks. 2016. [11] Arpita Biswas, Shweta Jain, Debmalya Mandal, and Y. Narahari. A truthful bud- get feasible multi-armed bandit mechanism for crowdsourcing time critical tasks. In Proceedings of the 2015 International Conference on Autonomous Agents and Mul- tiagent Systems, AAMAS '15, pages 1101{1109, 2015. 102 [12] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993{1022, March 2003. [13] Olli Br aysy and Michel Gendreau. Vehicle routing problem with time windows, part i: Route construction and local search algorithms. Transportation Science, 39(1):104{118, February 2005. [14] Grard P. Cachon, Kaitlin M. Daniels, and Ruben Lobel. The role of surge pricing on a service platform with self-scheduling capacity. Manufacturing & Service Operations Management, 19(3):368{384, 2017. [15] B. Cao, L. Alarabi, M. F. Mokbel, and A. Basalamah. Sharek: A scalable dynamic ride sharing system. In 2015 16th IEEE International Conference on Mobile Data Management, volume 1, pages 4{13, June 2015. [16] Juan Castillo, Daniel T. Knoep e, and E. Glen Weyl. Surge pricing solves the wild goose chase. 2017. [17] Cen Chen, Shih-Fen Cheng, Hoong Chuin Lau, and Archan Misra. Towards city- scale mobile crowdsourcing: Task recommendations under trajectory uncertainties. In Proceedings of the Twenty-Fourth International Joint Conference on Articial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 1113{ 1119, 2015. [18] M. Keith Chen. Dynamic pricing in a labor market: Surge pricing and exible work on the uber platform. In Proceedings of the 2016 ACM Conference on Economics and Computation, EC '16, pages 455{455, 2016. [19] Yiwei Chen and Ming Hu. Pricing and matching with forward-looking buyers and sellers. 2017. [20] P. Cheng, X. Lian, L. Chen, J. Han, and J. Zhao. Task assignment on multi- skill oriented spatial crowdsourcing. IEEE Transactions on Knowledge and Data Engineering, 28(8):2201{2215, Aug 2016. [21] Shih-Fen Cheng, Duc Thien Nguyen, and Hoong Chuin Lau. Mechanisms for ar- ranging ride sharing and fare splitting for last-mile travel demands. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Sys- tems, AAMAS '14, pages 1505{1506, Richland, SC, 2014. International Foundation for Autonomous Agents and Multiagent Systems. [22] Meng-Fen Chiang, Tuan-Anh Hoang, and Ee-Peng Lim. Where are the passengers?: A grid-based gaussian mixture model for taxi bookings. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Sys- tems, SIGSPATIAL '15, pages 32:1{32:10, 2015. [23] Blerim Cici, Athina Markopoulou, Enrique Frias-Martinez, and Nikolaos Laoutaris. Assessing the potential of ride-sharing using mobile and social data: a tale of four cities. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 201{211. ACM, 2014. 103 [24] Blerim Cici, Athina Markopoulou, and Nikolaos Laoutaris. Designing an on-line ride- sharing system. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS '15, pages 60:1{60:4, 2015. [25] Jean-Fran cois Cordeau and Gilbert Laporte. The dial-a-ride problem: models and algorithms. Annals of Operations Research, 153(1):29{46, 2007. [26] G. B. Dantzig and J. H. Ramser. The truck dispatching problem. Manage. Sci., 6(1):80{91, October 1959. [27] Dingxiong Deng, Cyrus Shahabi, and Ugur Demiryurek. Maximizing the number of worker's self-selected tasks in spatial crowdsourcing. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Informa- tion Systems, pages 324{333, Orlando, Florida, 2013. [28] Dingxiong Deng, Cyrus Shahabi, and Linhong Zhu. Task matching and schedul- ing for multiple workers in spatial crowdsourcing. In Proceedings of the 23rd ACM SIGSPATIAL International Conference on Advances in Geographic Information Sys- tems, Bellevue, Washington, 2015. [29] Liran Einav, Chiara Farronato, and Jonathan Levin. Peer-to-peer markets. Techni- cal report, National Bureau of Economic Research, August 2015. [30] Zhixuan Fang, Longbo Huang, and Adam Wierman. Prices and subsidies in the sharing economy. In Proceedings of the 26th International Conference on World Wide Web, WWW '17, pages 53{62, 2017. [31] Andr e Sales Fonteles, Sylvain Bouveret, and J er^ ome Gensel. Heuristics for task recommendation in spatiotemporal crowdsourcing systems. In Proceedings of the 13th International Conference on Advances in Mobile Computing and Multimedia, MoMM 2015, pages 1{5, 2015. [32] Masabumi Furuhata, Maged Dessouky, Fernando Ord o~ nez, Marc-Etienne Brunet, Xiaoqing Wang, and Sven Koenig. Ridesharing: The state-of-the-art and future directions. Transportation Research Part B: Methodological, 57:28{46, 2013. [33] Harish Guda and Upender Subramanian. Strategic surge pricing on on-demand service platforms. 2017. [34] B. Guo, Y. Liu, W. Wu, Z. Yu, and Q. Han. Activecrowd: A framework for optimized multitask allocation in mobile crowdsensing systems. IEEE Transactions on Human- Machine Systems, PP(99):1{12, 2016. [35] Bin Guo, Huihui Chen, Zhiwen Yu, Wenqian Nan, Xing Xie, Daqing Zhang, and Xingshe Zhou. Taskme: Toward a dynamic and quality-enhanced incentive mecha- nism for mobile crowd sensing. International Journal of Human-Computer Studies, 102:14 { 26, 2017. Special Issue on Mobile and Situated Crowdsourcing. 104 [36] Juho Hamari, Mimmi Sj oklint, and Antti Ukkonen. The sharing economy: Why people participate in collaborative consumption. J. Assoc. Inf. Sci. Technol., 67(9):2047{2059, September 2016. [37] Yan Huang, Favyen Bastani, Ruoming Jin, and Xiaoyang Sean Wang. Large scale real-time ridesharing with service guarantee on road networks. Proceedings of the VLDB Endowment, 7(14):2017{2028, 2014. [38] Ram on Iglesias, Federico Rossi, Kevin Wang, David Hallac, Jure Leskovec, and Marco Pavone. Data-driven model predictive control of autonomous mobility-on- demand systems. In Proceedings of 2018 IEEE International Conference on Robotics and Automation, ICRA '18, 2017. [39] L. G. Jaimes, I. Vergara-Laurens, and M. A. Labrador. A location-based incen- tive mechanism for participatory sensing systems with budget constraints. In 2012 IEEE International Conference on Pervasive Computing and Communications, Per- Com'12, pages 103{108, March 2012. [40] Haiming Jin, Lu Su, Danyang Chen, Klara Nahrstedt, and Jinhui Xu. Quality of information aware incentive mechanisms for mobile crowd sensing systems. In Proceedings of the 16th ACM International Symposium on Mobile Ad Hoc Networking and Computing, MobiHoc '15, pages 167{176, 2015. [41] Ece Kamar and Eric Horvitz. Collaboration and shared plans in the open world: Studies of ridesharing. In Proceedings of the 21st International Jont Conference on Artical Intelligence, IJCAI'09, pages 187{194, San Francisco, CA, USA, 2009. Morgan Kaufmann Publishers Inc. [42] Leyla Kazemi and Cyrus Shahabi. Geocrowd: Enabling query answering with spatial crowdsourcing. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems, SIGSPATIAL '12, pages 189{198, New York, NY, USA, 2012. ACM. [43] Alexander Kleiner, Bernhard Nebel, and Vittorio Amos Ziparo. A mechanism for dynamic ride sharing based on parallel auctions. In Proceedings of the Twenty- Second International Joint Conference on Articial Intelligence - Volume Volume One, IJCAI'11, pages 266{272. AAAI Press, 2011. [44] S. Kullback and R. A. Leibler. On information and suciency. Ann. Math. Statist., 22(1):79{86, 1951. [45] M.G. Lagoudakis, M. Berhault, S. Koenig, P. Keskinocak, and A.J. Kleywegt. Simple auctions with performance guarantees for multi-robot task allocation. In Intelligent Robots and Systems, 2004. (IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on, volume 1, pages 698{705 vol.1, Sept 2004. [46] Nikolay Laptev, Jason Yosinski, Li Erran Li, and Slawek Smyl. Time-series extreme event forecasting with neural networks at uber. In Proceedings of the 34th Interna- tional Conference on Machine Learning, ICML '17, 2017. 105 [47] Juong-Sik Lee and Baik Hoh. Dynamic pricing incentive for participatory sensing. Pervasive and Mobile Computing, 6(6):693 { 708, 2010. Special Issue PerCom 2010. [48] Yu Li, ManLung Yiu, and Wenjian Xu. Oriented online route recommendation for spatial crowdsourcing task workers. In Advances in Spatial and Temporal Databases, volume 9239, pages 137{156. Springer International Publishing, 2015. [49] David C. Luckham. The Power of Events: An Introduction to Complex Event Pro- cessing in Distributed Enterprise Systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2001. [50] Shuo Ma, Yu Zheng, and Ouri Wolfson. T-share: A large-scale dynamic taxi rideshar- ing service. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 410{421. IEEE, 2013. [51] Shuo Ma, Yu Zheng, and Ouri Wolfson. Real-time city-scale taxi ridesharing. IEEE Transactions on Knowledge and Data Engineering, 27(7):1782{1795, 2015. [52] A. Mehta, A. Saberi, U. Vazirani, and V. Vazirani. Adwords and generalized on-line matching. In Foundations of Computer Science, 2005. FOCS 2005. 46th Annual IEEE Symposium on, pages 264{273, Oct 2005. [53] Panagiota Micholia, Merkouris Karaliopoulos, Iordanis Koutsopoulos, Luca Maria Aiello, Gianmarco De Francisci Morales, and Daniele Quercia. Incentivizing social media users for mobile crowdsourcing. International Journal of Human-Computer Studies, 102:4 { 13, 2017. Special Issue on Mobile and Situated Crowdsourcing. [54] M. Ota, H. Vo, C. Silva, and J. Freire. A scalable approach for data-driven taxi ride- sharing simulation. In Big Data (Big Data), 2015 IEEE International Conference on, pages 888{897, Oct 2015. [55] Hyunjung Park and Jennifer Widom. Crowdll: Collecting structured data from the crowd. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 577{588, 2014. [56] Dominik Pelzer, Jiajian Xiao, Daniel Zehe, Michael H Lees, Alois C Knoll, and Heiko Aydt. A partition-based match making algorithm for dynamic ridesharing. IEEE Transactions on Intelligent Transportation Systems, 16(5):2587{2598, 2015. [57] D.J. Rosenkrantz, R.E. Stearns, and P.M. Lewis. Approximate algorithms for the traveling salesperson problem. In Switching and Automata Theory, IEEE Conference Record of 15th Annual Symposium on, pages 33{42, Oct 1974. [58] Y. Rubner, C. Tomasi, and L.J. Guibas. A metric for distributions with applications to image databases. In Computer Vision, 1998. Sixth International Conference on, pages 59{66, Jan 1998. [59] Paolo Santi, Giovanni Resta, Michael Szell, Stanislav Sobolevsky, Steven H Stro- gatz, and Carlo Ratti. Quantifying the benets of vehicle pooling with shareability 106 networks. Proceedings of the National Academy of Sciences, 111(37):13290{13294, 2014. [60] Douglas O. Santos and Eduardo C. Xavier. Dynamic taxi and ridesharing: A frame- work and heuristics for the optimization problem. In Proceedings of the Twenty- Third International Joint Conference on Articial Intelligence, IJCAI '13, pages 2885{2891, 2013. [61] H. Shah-Mansouri and V. W. S. Wong. Prot maximization in mobile crowdsourc- ing: A truthful auction mechanism. In 2015 IEEE International Conference on Communications (ICC), pages 3216{3221, June 2015. [62] Wen Shen, Cristina V. Lopes, and Jacob W. Crandall. An online mechanism for ridesharing in autonomous mobility-on-demand systems. In Proceedings of the Twenty-Fifth International Joint Conference on Articial Intelligence, IJCAI'16, pages 475{481. AAAI Press, 2016. [63] Yaron Singer and Manas Mittal. Pricing mechanisms for crowdsourcing markets. In Proceedings of the 22Nd International Conference on World Wide Web, WWW '13, pages 1157{1166, 2013. [64] Daniel D. Sleator and Robert E. Tarjan. Amortized eciency of list update and paging rules. Communications of the ACM, pages 202{208, 1985. [65] Rod Stephens. Beginning Software Engineering. Wrox Press Ltd., Birmingham, UK, 1st edition, 2015. [66] H. To, M. Asghari, D. Deng, and C. Shahabi. Scawg: A toolbox for generating syn- thetic workload for spatial crowdsourcing. In 2016 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops), pages 1{6, March 2016. [67] H. To, L. Fan, L. Tran, and C. Shahabi. Real-time task assignment in hyperlocal spatial crowdsourcing under budget constraints. In 2016 IEEE International Confer- ence on Pervasive Computing and Communications, PerCom'16, pages 1{8, March 2016. [68] Dejun Yang, Guoliang Xue, Xi Fang, and Jian Tang. Crowdsourcing to smartphones: Incentive mechanism design for mobile phone sensing. In Proceedings of the 18th An- nual International Conference on Mobile Computing and Networking, Mobicom'12, pages 173{184, 2012. [69] Jia-Dong Zhang, Chi-Yin Chow, and Yanhua Li. Lore: Exploiting sequential in u- ence for location recommendations. In Proceedings of the 22Nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPA- TIAL '14, pages 103{112, 2014. [70] Dengji Zhao, Sarvapali D Ramchurn, and Nicholas R Jennings. Incentive design for ridesharing with uncertainty. In arXiv preprint arXiv:1505.01617, 2015. 107 [71] K. Zhao, D. Khryashchev, J. Freire, C. Silva, and H. Vo. Predicting taxi demand at high spatial resolution: Approaching the limit of predictability. In 2016 IEEE International Conference on Big Data, BigData '16, pages 833{842, Dec 2016. [72] Lingxue Zhu and Nikolay Laptev. Deep and condent prediction for time series at uber. In 2017 IEEE International Conference on Data Mining Workshops, ICDMW '17, pages 103{110, 2017. 108 Appendix A Proof of Theorems, Lemmas and Propositions A.1 Proof of Proposition 3.1 Proposition A.1. Assuming R t i > 0 and V t i > 0, there exists a price p c at which D t i (p c ) =S t i (p c ). Proof. We know from Assumption 3.2 thatS(0) <D(0) andS(p max ) >D(p max ). Fur- thermore, bothS(p) andD(p) are continuous (Assumption 3.1). Consequently, at some point p c in the range (0;p max ) we haveS(p c ) =D(p c ). A.2 Proof of Proposition 3.2 Proposition A.2. If for location i at time t there exists a market clearing price, then p c = arg max p T t i (p). 109 Proof. According to Assumption 3.1 F r (:) is continuous and strictly decreasing with regard to p. Since R t i 0 for any i and t, we can imply thatD t i (:) is continuous and strictly decreasing. Therefore: 8p<p c D(p)>D(p c ) (A.1) 8p>p c D(p)<D(p c ) (A.2) Similarly, we can showS t i (:) is continuous and strictly increasing and hence: 8p<p c S(p)<S(p c ) (A.3) 8p>p c S(p)>S(p c ) (A.4) Considering thatD(p c ) =S(p c ) we can say: 8p<p c S(p)<D(p) (A.5) 8p>p c S(p)>D(p) (A.6) Based on Eq. (3.3) we have: 8p<p c T (p) =S(p) (A.7) 8p>p c T (p) =D(p) (A.8) 110 ifp =p c T (p) =D(p) =S(p) (A.9) Therefor for p < p c , T (p) will be strictly increasing and for p > p c , T (p) will be strictly decreasing. Therefore: p c = arg max p fT (p)g (A.10) A.3 Proof of Theorem 3.1 Theorem A.1. The optimal price p which maximizes Rev(p) from Eq. (3.5) is always greater or equal to p c . Proof. T (p) is continuous and strictly increasing for p < p c and continuous and strictly decreasing for p > p c (refer to proof of Proposition 3.2). We also know from Assump- tion 3.2 that T (0) =S(0) = 0 and T (p max ) =D(p max ) = 0. Consequently, for every p 1 < p c , there exists a p 2 > p c such that T (p 1 ) = T (p 2 ). Since, p 1 > p 2 we can imply that T (p 1 )p 1 <T (p 2 )p 2 . Therefore: p c arg max p fRev(p)g 111 A.4 Proof of Theorem 3.2 Theorem A.2. The optimal pricep for maximizing the revenue at each location at each time is: (i) p =p c if p d p c (ii) p =p d if p d >p c Proof. a case(i): For all pp c : T (p) =D(p) Rev(p) =D(p)p Based on Assumption 3.1, we can sayD(p) is strictly concave in the range [0;p max ]. Furthermore, p d p c and hence, for all pp c : Rev(p c )Rev(p) (A.11) On the other hand, ifp<p c we know from Theorem 3.1 that there existsp 0 >p c such that: Rev(p)<Rev(p 0 ) (A.12) 112 Since p 0 >p c , combining Eq. (A.11) and Eq. (A.12) we get: Rev(p)<Rev(p 0 )Rev(p c ) Therefore, if p d p c then for all p2 [0;p max ] we have: Rev(p)Rev(p c ) case(ii): We know from Theorem 3.1 that p >p c . Also, for p>p c we have T (p) = D(p). Furthermore,D(p) is strictly concave which makesD(p)p concave. Consequently, since p d = arg max p fD(p)pg, it is safe to say p =p d . A.5 Proof of Lemma 3.1 Lemma A.1. Ifp c 1 andp c 2 are the market clearing prices forS 1 (:) andS 2 (:), respectively and p d = arg max p fD(p)pg the generated revenue fromS 2 (:) is larger than that ofS 1 (:) ifp c 2 <p c 1 ^p d <p c 1 ^p d p c 2 . Proof. Since p d <p c 1 , we know from Theorem 3.2 that p 1 =p c 1 , where p 1 is the optimal price that maximizes the revenue forS 1 (p) andD(p). Similarly, we can say p 2 = p c 2 , where p 2 maximizes the revenue forS 2 (p) andD(p). 113 On the other hand we knowD(p)p is concave in the range [0;p max ] and since, p d <p c 2 <p c 1 : D(p c 2 )p c 2 >D(p c 1 )p c 1 T (p c 2 )p c 2 >T (p c 1 )p c 1 Rev(p c 2 )>Rev(p c 1 ) Therefore, the generated revenue fromS 2 (:) is greater than that ofS 1 (:) A.6 Proof of Lemma 3.2 Lemma A.2. Assuming p c 2 = p d , adding more potential drivers does not increase the generated revenue. Proof. We know from Theorem 3.2 that ifp c 2 =p d , thenp 2 =p d , wherep 2 is the optimal price that maximizes the revenue forS 2 (p) andD(p). Assuming we show the new supply after adding more potential drivers withS 3 (p), we can nd the new market clearing price, p c 3 , where p c 3 < p c 2 . Consequently, p c 3 < p d . We know from Theorem 3.2 that ifp c 3 <p d thenp 3 =p d , wherep 3 is the price that maximizes the revenue forS 3 (p) andD(p). Therefore, by adding more potential drivers the optimal price remains atp d and thus, there will be no increase in the revenue. 114 A.7 Proof of Proposition 3.3 Proposition A.3. The generated revenue from Eq. (3.17) is at least as much as that of Eq. (3.5). Proof. Using Eq. (3.17), the total revenue from rides originating at location i can be computed as: Rev t i (B t i ; P t i ) = X j T t ij ( t ij ;p t ij )p t ij = X j minf t ij V t i F w (p t ij ); t ij R t i F r (p t ij )gp t ij Assuming regardless of its destination, every ride has the same price p i and we set t ij = t ij we get: Rev t i (B t i ; P t i ) = X j minf t ij V t i F w (p t i ); t ij R t i F r (p t i )gp t i = X j t ij minfV t i F w (p t i );R t i F r (p t i )gp t i = minfV t i F w (p t i );R t i F r (p t i )gp t i X j t ij = minfS t i (p t i );D t i (p t i )gp t i =T t i (p t i )p t i =Rev t i (p t i ) 115 Therefore, the generated revenue from Eq. (3.17) is at least as much as that of Eq. (3.5). A.8 Proof of Theorem 4.1 Theorem. There does not exist a deterministic online algorithm for the OnlineTASC problem that is c-competitive (c> 0). Proof. Suppose there exists an algorithmA that is c-competitive for some c 0. To prove no such algorithm exists, all we need to do is to prove there is at least one possible input (I), for which jA(I)j jA (I)j is unboundedly small. For analyzing the competitive ratio of a deterministic online algorithmA, it is assumed that there exist an adversary which knows every decisionA makes and creates an input knowing what decisionsA is going to make. Here we show, how an adversary can generate an input for which the competitive ratio of any algorithm is unboundadly small. For simplicity, we only consider points on the x-axis and assume there is only one worker at point x = 0 in the beginning. The input starts with t 1 such that t 1 =h5; [0; 5]i (Figure A.1(a); a task at point 5 with release time 0 and deadline 5). The algorithm can make two choices for the worker: (1) move towardst 1 or (2) stay still (in theory it can also make the worker to move away from t 1 which in the context of this proof would be similar to case (2)). If choice 1 is selected, the adversary can generate the input such that at time t = 2, tasks t 2 ;:::t n are all submitted with the exact same properties ash4; [2; 7]i (Figure A.1(b)). Considering that at the release time of t2;:::;t n the worker is at point x = 2, it does not have enough time to get tot 2 ;:::;t n before their deadline. However, an optimal oine algorithm would 116 have known about t 2 ;:::;t n in advance and would have ignored t 1 in order to be able to complete n 1 tasks instead. In other words,jAj = 1 wherejA j =n 1 and the ratio could be unboundedly small by increasing n. Therefore, we contradicted the assumption thatA is c-competitive. A similar argument can be made if choice 2 was selected by the algorithm by releasing tasks t 2 ;:::;t n with properties ash7; [2; 7]i (Figure A.1(c)). (a) t = 0 (b) t = 2, case 1 (c) t = 2, case 2 Figure A.1: Adversary Generated Input A.9 Proof of Theorem 5.1 Theorem. There does not exist a deterministic on-line algorithm for the ridesharing problem that is c-competitive (c> 0). Proof. Suppose there exists an algorithmA that is c-competitive. ForA to be c-competitive, it should be at most c times worst than the optimal solution for every input. Conse- quently, to show no c-competitive algorithm exists, we only need to show one input for which whereA does not have a competitive ratio of c. We assume there exist an adversary which knows every decisionA makes and consider the input generated by this adversary. For simplicity, we assume there is only one driver at point (0; 0). The input starts with r 1 with a pick-up location at (w; 0) and r 2 with pick-up location at (w; 0) (we assume all requests have a maximum wait time ofw). The algorithm can make three choices for the driver. (1) move toward r 1 , (2) move towards 117 r 2 and (3) stay still. If choice 1 is selected, the adversary can generate the input such that at time t = 1, n more request are submitted with pick-up location at (w 1; 0) and drop-o locations similar to r 2 . Similar arguments can be made if choice 2 or 3 are selected by the algorithm. A globally optimal solution can complete n + 1 requests while A can at most complete one request. By adding more drivers far away in a similar situa- tion, the adversary can makeA's solution unboundedly worse than the optimal solution. Therefore, we contradicted the assumption thatA is c-competitive. A.10 Proof of Theorem 5.2 Theorem. If for every request r assigned to driver d, r bid r d then the ridesharing platform is individually rational. Proof. From Equation (5.17) we know: income( d ; ^ d ) = X r2 d fare(r) X r2 d r X r2 d fare(r) X r2 d bid r d for every request r and driver d, bid r d is the dierence between the prot d can make after accepting r ( d r ) and the prot d can make before accepting r ( d r ). Therefore: income( d ; ^ d ) X r2 d fare(r) X r2 d ( d r d r ) 118 Assuminghr 1 ;r 2 ; ;r n i shows the order in which driver d accepts the request we can say d r i = d r i1 and hence: income( d ; ^ d ) X r2 d fare(r) ( d rn d r 1 ) Before accepting the rst request, the driver cannot generate any prot (i.e., d r 1 = 0). Furthermore, the prot each driver generates for the platform is equal to the dierence the total collected fares and the cost of that driver: d rn = X r2 d fare(r)cost( d ; ^ d ) Therefore: income( d ; ^ d ) X r2 d fare(r) X r2 d fare(r)cost( d ; ^ d ) ! cost( d ; ^ d ) In other words, the driver's income is always at least as much as his costs which implies the utility is always non-negative and the platform is individually rational. A.11 Proof of Theorem 5.3 Theorem. If for every request r assigned to driver d, r is equivalent to the value of the second highest bid, then the platform is truthful. 119 Proof. We assumed 2 is the driver with the second highest bid andbid r d 2 is his correspond- ing bid. We show that the winning driverd opt cannot increase his utility by misreporting his prole and either increasing or decreasing his bid. We show d opt 's bid based on his actual prole as bid r dopt and his bid based on a misreported prole as bid r dopt . Case 1: bid r d 2 <bid r dopt <bid r dopt : In this cased opt will have the highest bid so he will be selected as the winner and his payment will be bid r d 2 . Case 2: bid r d 2 <bid r dopt <bid r dopt : Similar to the rst case, hered opt will have the highest bid and will be assigned the request. The second highest bid is still from d 2 and hence, d opt will still pay bid r d 2 to the platform provider. Case 3: bid r dopt < bid r d 2 < bid r dopt : In this case, d opt will no longer have the highest bid and r will not be assigned to him. Therefore, both his cost and payment will be 0 and there will be no change to his utility. Consequently, in all three cases,d opt cannot increase his utility by misreporting his prole and thus, the framework is truthful. 120
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
GeoCrowd: a spatial crowdsourcing system implementation
PDF
Enabling query answering in a trustworthy privacy-aware spatial crowdsourcing
PDF
Location privacy in spatial crowdsourcing
PDF
Partitioning, indexing and querying spatial data on cloud
PDF
Coded computing: a transformative framework for resilient, secure, private, and communication efficient large scale distributed computing
PDF
Inferring mobility behaviors from trajectory datasets
PDF
Dispersed computing in dynamic environments
PDF
Privacy in location-based applications: going beyond K-anonymity, cloaking and anonymizers
PDF
Query processing in time-dependent spatial networks
PDF
Spatiotemporal traffic forecasting in road networks
PDF
Target assignment and path planning for navigation tasks with teams of agents
PDF
Combining textual Web search with spatial, temporal and social aspects of the Web
PDF
Theoretical foundations for dealing with data scarcity and distributed computing in modern machine learning
Asset Metadata
Creator
Asghari, Mohammad
(author)
Core Title
Dynamic pricing and task assignment in real-time spatial crowdsourcing platforms
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
12/12/2018
Defense Date
08/22/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
dynamic pricing,OAI-PMH Harvest,Pricing,revenue maximization,ridesharing,ride-sharing,spatial crowdsourcing,task assignment
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Shahabi, Cyrus (
committee chair
), Knoblock, Craig (
committee member
), Ortega, Antonio (
committee member
)
Creator Email
masghari@usc.edu,masghari10@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-114372
Unique identifier
UC11675537
Identifier
etd-AsghariMoh-7013.pdf (filename),usctheses-c89-114372 (legacy record id)
Legacy Identifier
etd-AsghariMoh-7013.pdf
Dmrecord
114372
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Asghari, Mohammad
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
dynamic pricing
revenue maximization
ridesharing
ride-sharing
spatial crowdsourcing
task assignment